/kaggle-gearbox-analysis

Jupyter notebooks running from Kaggle configuration

Primary LanguageJupyter NotebookMIT LicenseMIT

Jupyter based on Scipy

The command below has a serie of ENV variables that provide these features:

  • User in the container is mapped to your user in the host machine
  • -w /home/jupyter & -e HOME=/home/jupyter are required so that notebooks are placed at the home path
docker run --restart always -p 8008:8888 --name jupyter-scipy  --user root \
-e NB_USER=$(whoami) -e NB_GROUP=RnD -e NB_UID=$(id -u) -e NB_GID=$(cut -d: -f3 < <(getent group RnD)) -e JUPYTER_ENABLE_LAB=yes \
-e HOME=/home/jupyter -e CHOWN_HOME_OPTS=-R -e CHOWN_HOME=yes -e GRANT_SUDO=yes  -e NB_UMASK=022 \
-w /home/jupyter -v $(pwd):/home/jupyter  jupyter/scipy-notebook

NOTE: $(whoami) = pablo

Jupyter based on Kaggle image

Source: Utilizing the Kaggle Python Docker Container image

0. Create data folder

Docker container will map this folder.

mkdir data

1. Run the container based on kaggle/pythonimage:

docker run --restart always -v ${PWD}/data:/tmp/working -w=/tmp/working -p 8800:8888 --name kaggle \
   -d kaggle/python jupyter notebook --no-browser --ip="0.0.0.0" --notebook-dir=/tmp/working --allow-root

2. Access the log to get the http token for accessing Jupyter:

docker logs kaggle

CURRENT TOKEN:

40119a2f87c125c72f7603945ca6b1561e0fb9ed45929234

For example:

http://640b804c545b:8888/?token=8e28bf1201d83f3f43521fba4b0cf382107781a4955ecf93&token=8e28bf1201d83f3f43521fba4b0cf382107781a4955ecf93

  • Replace 640b804c545b with localhostor the IP of the machine where Kaggle image is running.
  • Replace port 8888 (container) by 8800 (host)

Everything can be done with the bash script ./kaggle.sh

Using the Jupyter token

In the http line above:

token=40119a2f87c125c72f7603945ca6b1561e0fb9ed45929234

Don't know why the next procedure does not set the password

So if you want to set a password for accessing Jupyter, after launching the container go to: http://localhost:8888

Enter your token and change the password.

3. SSH into the container

docker exec -it kaggle bash

4. GEARBOX FAULT ANALYSIS

4.1 Gearbox Fault logistic regression

  • Using raw temporal serie: AUC= 0.514
  • Using standard deviation over sets of consecutive data points (AUC):
  • stdev every 10 data points: 0.717
  • stdev every 100 data points: 0.911
  • stdev every 1000 data points: 1.000

4.2 Gearbox Fault ROC curve

  • Replicated from ROC of PIMA dataset. ROC curve explained HERE
    • Interactive plot of ROC changing the threshold value in the probability distribution, for both:
      • Logistic regression
      • Random forest

5. UBER LUDWIG EXAMPLE

Based on the Titanic dataset, copied into this one in my Kaggle profile

Pending tests from the command line:

  • ludwig experiment
  • ludwig visualize

There are more advanced examples with this dataset in Uber Ludwig examples in its official repository