-
Kaggle API Package: https://github.com/Kaggle/kaggle-api
-
Official API Credential Instructions
- Please follow the steps below to download and use kaggle data within Python. (Original Source: https://github.com/Kaggle/kaggle-api#api-credentials)
-
Log into www.kaggle.com
-
Go to your Account page
(Click on your profile pic in top right corner of website and select Account) -
Scroll down to API section and:
- Click Expire API Token to remove previous tokens.
- Click on Create New API Token
- It will download
kaggle.json
file on your machine that needs to be moved to a special "~/.kaggle" folder.
- It will download
-
Make the
~/.kaggle/
folder.- Short Relative Filepath version:
mkdir ~/.kaggle/
- Full Absolute Filepath Version:
- will be something like:
mkdir "/Users/YOUR-USERNAME/.kaggle/
- or:
mkdir "/c/Users/YOUR-USERNAME/.kaggle"
- will be something like:
- Short Relative Filepath version:
-
Move the kaggle.json file to the new
.kaggle
folder.- See the "API Credentials" section of the kaggle api README for details
- Example shell command:
cp ~/Downloads/kaggle.json ~/.kaggle/
-
Change access permissions to just your user account
chmod 600 ~/.kaggle/kaggle.json
-
Confirm the API credentials work.
- Run the command to list datasets:
kaggle datasets list
- Run the command to list datasets:
-
Remove the kaggle.json from the downloads folder:
rm ~/kaggle.json
- On any dataset listing on Kaggle, click on the
...
on the menu and select "Copy API Command"
- paste the command in a cell and add a
!
to the beginning.- e.g.
!kaggle datasets download -d jillanisofttech/fake-or-real-news
- e.g.
- Then, run
!unzip
on the name of the dataset source (the last part of the api command).- e.g.
!unzip fake-or-real-news.zip
- e.g.
!kaggle datasets download -d jillanisofttech/fake-or-real-news
!unzip fake-or-real-news
## Using the kaggle.api
import kaggle.api as kaggle
kaggle.authenticate()
## Folder for downloaded dataset
import os, glob
path = "Data/"
os.makedirs(path, exist_ok=True)
- On any dataset listing on Kaggle, click on the ... on the menu and select "Copy API Command"
## Paste in API command from kaggle
API_COMMAND = "kaggle datasets download -d catherinerasgaitis/mxmh-survey-results"
## Get just the dataset name
dataset_name = API_COMMAND.split(' -d ')[1]
print(f"Kaggle Dataset name = '{dataset_name}'")
data_fpath = f"{path}{dataset_name.split('/')[-1]}/"
os.makedirs(data_fpath, exist_ok=True)
data_fpath
Kaggle Dataset name = 'catherinerasgaitis/mxmh-survey-results'
'Data/mxmh-survey-results/'
Signature:
kaggle.dataset_download_files(
dataset,
path=None,
force=False,
quiet=True,
unzip=False,
)
Docstring:
download all files for a dataset
Parameters
==========
dataset: the string identified of the dataset
should be in format [owner]/[dataset-name]
path: the path to download the dataset to
force: force the download if the file already exists (default False)
quiet: suppress verbose output (default is True)
unzip: if True, unzip files upon download (default is False)
## Download dataset
kaggle.dataset_download_files(dataset_name,
path=data_fpath,
force=True,
unzip=True)
## Get list of downloaded files
print(f"[i] Files from {dataset_name}:")
dl_files = glob.glob(data_fpath+"*")
[print(f" - {file}") for file in dl_files];
[i] Files from catherinerasgaitis/mxmh-survey-results:
- Data/mxmh-survey-results/mxmh_survey_results.csv
import pandas as pd
df = pd.read_csv(dl_files[0])
df.head(3)
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Timestamp | Age | Primary streaming service | Hours per day | While working | Instrumentalist | Composer | Fav genre | Exploratory | Foreign languages | ... | Frequency [R&B] | Frequency [Rap] | Frequency [Rock] | Frequency [Video game music] | Anxiety | Depression | Insomnia | OCD | Music effects | Permissions | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 8/27/2022 19:29:02 | 18.0 | Spotify | 3.0 | Yes | Yes | Yes | Latin | Yes | Yes | ... | Sometimes | Very frequently | Never | Sometimes | 3.0 | 0.0 | 1.0 | 0.0 | NaN | I understand. |
1 | 8/27/2022 19:57:31 | 63.0 | Pandora | 1.5 | Yes | No | No | Rock | Yes | No | ... | Sometimes | Rarely | Very frequently | Rarely | 7.0 | 2.0 | 2.0 | 1.0 | NaN | I understand. |
2 | 8/27/2022 21:28:18 | 18.0 | Spotify | 4.0 | No | No | No | Video game music | No | Yes | ... | Never | Rarely | Rarely | Very frequently | 7.0 | 7.0 | 10.0 | 2.0 | No effect | I understand. |
3 rows × 33 columns
- Please follow the steps below to download and use kaggle data within Google Colab:
(Instructions adapted from Using the Kaggle API on Colab) 1. Go to your account, Scroll to API section and Click Expire API Token to remove previous tokens. 2. Click on Create New API Token - It will downloadkaggle.json
file on your machine. 3. Go to your Google Colab project file and run the following commands: 1.!pip install -q kaggle
2.from google.colab import files
files.upload();
- Choose the kaggle.json file that you downloaded 3.!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/
- Make directory named kaggle and copy kaggle.json file there. 4.!chmod 600 ~/.kaggle/kaggle.json
- Change the permissions of the file. 5. Test the installation using:! kaggle datasets list
6. Remove the uploaded kaggle.json: -!rm kaggle.json
# ## Install kaggle api and upkoad kaggle.json
# ! pip install kaggle
# from google.colab import files
# files.upload();
# ## Make .kaggle directory and copy json fil
# ! mkdir ~/.kaggle
# ! cp kaggle.json ~/.kaggle/
# ## Change the permissions on the kaggle.json file
# !chmod 600 ~/.kaggle/kaggle.json
# ## remove original uploaded copy of kaggle.json
# !rm kaggle.json
# ## Check list of available datasetss
# !kaggle datasets list