How to work on Kaggle data on your local Jupyter Notebook

Working on local system with Kaggle data with Jupyter notebooks

3 min readNov 18, 2018

My last post was a 3 post blog series on working with Kaggle data on EMR and Apache Spark. In this post we will learn how to use Kaggle data on your local Jupyter Notebook.

Env details:

Ubuntu
Python 3.6.3

Steps

We need these steps for our task –

Download file from Kaggle to your local box.
Unzip the Zip file.
Read the file from your Jupyter Notebook.

Download dataset from Kaggle

I am downloading the PUBG Finish Placement Prediction dataset from Kaggle. Refer to this post to download Kaggle dataset.

Unzip the Zip file

Downloaded Kaggle dataset is in Zip file format. Now, we have to unzip that file to read the data.

$ unzip <file name>

Read Data from local Jupyter Notebook

After unzip file, we are ready to use our data on Jupyter Notebook. Open the jupyter notebook on your system.

Note :- If the data size is too large then we can create a small file to run on local system.

$ head -size ~/old_file_name > ~/new_file name
$ head -20000 ~/train_V2.csv > ~/train_V4.csv

Now, we are ready to run the data on Jupyter

import pandas as pd
pd.read_csv(‘/file_path/file_name’, engine = ‘python’)
data =pd.read_csv (‘/home/bond/train_V4.csv’,engine=’python’) data.head(10)

When we run this, we might get error: Permission denied