Query Kaggle data via Apache Spark and Zeppelin via EMR cluster

1 min readOct 29, 2018

This is a 3 post blog series on querying Kaggle data on EMR cluster. I will be using Apache Zeppelin for the data exploration, and internally using Apache Spark for the query execution.

Most of the complications would be hidden from us and Amazon EMR is going to take care of it.

Here are the 3 posts for our task:

Part 1: How to copy Kaggle data to Amazon S3
Part 2: How to create EMR cluster with Apache Spark and Apache Zeppelin
Part 3: Query Kaggle data via Apache Zeppelin

I have provided examples and complete walk though on the steps involved for the task. I hope the post is helpful.

Cheers

Originally published at confusedcoders.com on October 29, 2018.

Query Kaggle data via Apache Spark and Zeppelin via EMR cluster

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Nikita sharma

No responses yet