Query S3 data via Hive on local box
1 min readDec 28, 2018
In the last post we discussed about how to generate synthetic data. Here we will talk about how to query S3 data via Hive.
Provide AWS configuration to Hadoop and Hive
We need to add the following configuration to the Hadoop and Hive config files.
hive-site.xml
You can find hive-site.xml in HIVE_HOME.
You can find all this file in HADOOP_HOME.
core-site.xml
mapred-site.xml
hdfs-site.xml
Add Hadoop Env variable
Run Hive
First we will run hive on local system via console.
$ source ~/.profile
$ hstart
$ hive
while running Hive, make sure Hadoop is running in the background.
Create Hive Table
Here we will create table using data stored in S3 bucket.
Originally published at confusedcoders.com on December 28, 2018.