How to work on Kaggle data on your local Jupyter Notebook
Working on local system with Kaggle data with Jupyter notebooks

My last post was a 3 post blog series on working with Kaggle data on EMR and Apache Spark. In this post we will learn how to use Kaggle data on your local Jupyter Notebook.
Env details:
- Ubuntu
- Python 3.6.3
Steps
We need these steps for our task –
- Download file from Kaggle to your local box.
- Unzip the Zip file.
- Read the file from your Jupyter Notebook.
Download dataset from Kaggle
I am downloading the PUBG Finish Placement Prediction dataset from Kaggle. Refer to this post to download Kaggle dataset.

Unzip the Zip file
Downloaded Kaggle dataset is in Zip file format. Now, we have to unzip that file to read the data.
$ unzip <file name>

Read Data from local Jupyter Notebook
After unzip file, we are ready to use our data on Jupyter Notebook. Open the jupyter notebook on your system.
Note :- If the data size is too large then we can create a small file to run on local system.
$ head -size ~/old_file_name > ~/new_file name
$ head -20000 ~/train_V2.csv > ~/train_V4.csv
Now, we are ready to run the data on Jupyter
import pandas as pd
pd.read_csv(‘/file_path/file_name’, engine = ‘python’)
data =pd.read_csv (‘/home/bond/train_V4.csv’,engine=’python’) data.head(10)
When we run this, we might get error: Permission denied

In order to get Permission we have to run following command on command prompt:
$ sudo chmod 600 <file path>

Now, we have to run the code again

Now, we are ready to play with our data.
That’s all for this post, hope it was helpful. Cheers!
Originally published at confusedcoders.com on November 18, 2018.