Member-only story
Real world application project for Big Data — with Apache Spark and AWS-EMR
Get Hands on experience with Big Data processing pipeline with this real life use case

Hey readers, I am learning Data Engineering from last few months and I thought of sharing my learning with you all. Recently I made a project on a real life application of a Big Data pipeline and thought to share with you so that if anyone is interested they can go through this and practice for self learning.
Use Case
Nikita is a new Data Engineer at a great startup. The company is currently getting lots of Apache/application logs being copied to Amazon S3 frequently. Nikita’s team is assigned the task to process all of the high volume logs, and to expose this data to the Analyst and Scientist of the organisation.
The Analyst and Scientist prefer consuming the data via SQL or python. The primary objective for Nikita’s team is to make the access to data easier and optimised.
Project Details and Components
Here I am talking about the problem statements and not about the solutions because at this point I don’t want to bias your opinions with my solutions. After knowing the problems we will discuss how to implement the solutions.