/kaggle-career-con-2019

Predict surface type for robot navigation in Kaggle Career Con 2019 competition.

Primary LanguageJupyter Notebook

Kaggle Career Con 2019

Predict surface type for robot navigation in Kaggle Career Con 2019 competition.

Prerequisites

API credentials

To use the Kaggle API, sign up for a Kaggle account at https://www.kaggle.com. Then go to the 'Account' tab of your user profile (https://www.kaggle.com/<username>/account) and select 'Create API Token'. This will trigger the download of kaggle.json, a file containing your API credentials. Place this file in the location ~/.kaggle/kaggle.json (on Windows in the location C:\Users\<Windows-username>\.kaggle\kaggle.json).

For your security, ensure that other users of your computer do not have read access to your credentials. On Unix-based systems you can do this with the following command:

chmod 600 ~/.kaggle/kaggle.json

Installation guide

Set up conda environment

Using conda:

conda env create -f environment.yml
activate kaggle_career_con_2019

The packages necessary to run the project are now installed inside the conda environment.

Note: The following sections assume you are located in your conda environment.

Set up project's module

To move beyond notebook prototyping, all reusable code should go into the src/ folder package. To use that package inside your project, install the project's module in editable mode, so you can edit files in the src/ folder and use the modules inside your notebooks :

pip install --editable .

To use the module inside your notebooks, add %autoreload at the top of your notebook :

%load_ext autoreload
%autoreload 2

Example of module usage :

from src.data.make_dataset import generate
generate(10)

Set up Git diff for notebooks and lab

We use nbdime for diffing and merging Jupyter notebooks.

To configure it to this git project :

nbdime config-git --enable

To enable notebook extension :

nbdime extensions --enable --sys-prefix

Or, if you prefer full control, you can run the individual steps:

jupyter serverextension enable --py nbdime --sys-prefix

jupyter nbextension install --py nbdime --sys-prefix
jupyter nbextension enable --py nbdime --sys-prefix

jupyter labextension install nbdime-jupyterlab

You may need to rebuild the extension : jupyter lab build

Set up Plotly for Jupyterlab

Plotly works in notebook but further steps are needed for it to work in Jupyterlab :

  • @jupyter-widgets/jupyterlab-manager # Jupyter widgets support
  • plotlywidget # FigureWidget support
  • @jupyterlab/plotly-extension # offline iplot support

There are conflict versions between those extensions so check the latest Plotly README to ensure you fetch the correct ones.

jupyter labextension install @jupyter-widgets/jupyterlab-manager@0.36 --no-build
jupyter labextension install plotlywidget@0.2.1  --no-build
jupyter labextension install @jupyterlab/plotly-extension@0.16  --no-build
jupyter lab build

Invoke command

We use Invoke to manage an unique entry point into all of the project tasks.

List of all tasks for project :

$ invoke -l

Available tasks:

  lab     Launch Jupyter lab

Help on a particular task :

$ invoke --help lab
Usage: inv[oke] [--core-opts] notebook [--options] [other tasks here ...]

Docstring:
  Launch Jupyter lab

Options:
  -i STRING, --ip=STRING   IP to listen on, defaults to *
  -p, --port               Port to listen on, defaults to 8888

You will find the definition of each task inside the tasks.py file, so you can add your own.

PS : we don't use Makefile because some people work on Windows workstations and the install of make is cumbersome on those.

Project organization

├── tasks.py           <- Invoke with commands like `notebook`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── environment.yml    <- The requirements file for reproducing the analysis environment
│
└── src                <- Source code for use in this project.
    ├── __init__.py    <- Makes src a Python module
    │
    ├── data           <- Scripts to download or generate data
    │   └── make_dataset.py
    │
    ├── features       <- Scripts to turn raw data into features for modeling
    │   └── build_features.py
    │
    ├── models         <- Scripts to train models and then use trained models to make
    │   │                 predictions
    │   ├── predict_model.py
    │   └── train_model.py
    │
    └── visualization  <- Scripts to create exploratory and results oriented visualizations
        └── visualize.py

Project based on the cookiecutter Kaggle template project. #cookiecutterdatascience