Python Data Science Workspace

This repository contains my workspace for doing Data Science in Python.

Requirements

Anaconda or Miniconda
Apache Spark (with hadoop)

Installation and setup

How to set-up the workspace the first time

If not already existing, create a conda environment:
```
 conda create -n data_science python=3.7
```
Activate the environment:
```
 source activate data_science
```

Setup the workspace:

 pip install -U pip numpy
 pip install -r requirements.txt
 python -m ipykernel install --user

Setup jupyter notebooks

 jupyter contrib nbextension install --user
 jupyter nbextensions_configurator enable --user
 jupyter nbextension install https://github.com/drillan/jupyter-black/archive/master.zip --user
 jupyter nbextension enable jupyter-black-master/jupyter-black

Setup jupyter lab

 jupyter labextension install jupyter-leaflet
 jupyter labextension install @jupyter-widgets/jupyterlab-manager
 jupyter labextension install @krassowski/jupyterlab_go_to_definition
 jupyter labextension install jupyterlab_bokeh
 jupyter labextension install ipysheet
 jupyter labextension install jupyterlab-drawio
 jupyter labextension install @jupyterlab/toc
 jupyter labextension install jupyterlab_vim

 jupyter labextension install @jupyterlab/git
 pip install jupyterlab-git
 jupyter serverextension enable --py jupyterlab_git

 jupyter labextension install @ryantam626/jupyterlab_code_formatter
 pip install jupyterlab_code_formatter
 jupyter serverextension enable --py jupyterlab_code_formatter

Reactivate the environment:

 source deactivate data_science
 source activate data_science

Load the submodules:

 git submodule init
 git submodule update

How to use the workspace

Activate the environment (if not already activated on this session):
```
 source activate data_science
```

Set Spark environment variables:

 export SPARK_HOME=/opt/spark
 export PATH=$SPARK_HOME/bin:$PATH

Start Jupyter Notebook:

 jupyter notebook --NotebookApp.iopub_data_rate_limit=10000000

How to update the workspace (after an upstream update)

Get the last changes from upstream:
```
 git pull
```
Activate the environment (if not already activated on this session):
```
 source activate data_science
```
Update the dependencies:
```
 pip install -r requirements.txt
```

Reactivate the environment:

 source deactivate data_science
 source activate data_science

Update submodules:

 git submodule init
 git submodule update

How to upgrade the workspace (upgrading python packages)

Activate the environment (if not already activated on this session):
```
 source activate data_science
```

Upgrade the dependencies:

 pip-compile --upgrade
 pip install -r requirements.txt

Reactivate the environment:

 source deactivate data_science
 source activate data_science

Facets

Facets is a tool for the visual exploration of datasets. It can be installed as following:

jupyter nbextension install facets/facets-dist/ --user

Then jupyter notebook should be started with an additional command line option:

--NotebookApp.iopub_data_rate_limit=10000000

The visualization can then be loaded as explained in the demo notebook.

Troubleshooting

GPU support for Jupyter

For computers on linux with optimus, you have to make a kernel that will be called with "optirun" to be able to use GPU acceleration. For this go to the following folder:

    cd ~/.local/share/jupyter/kernels/

then edit the file python3/kernel.json in order to add "optirun" as first entry into the argv array:

    {
            "language": "python",
            "display_name": "Python 3",
            "argv": [
            "optirun",
            "/home/fabien/.conda/envs/data_science/bin/python",
            "-m",
            "ipykernel",
            "-f",
            "{connection_file}"
            ]
    }

Interesting notebook extensions

I recommend installing the following notebook extension:

Code prettify
Codefolding
Collapsible Headings
contrib_nbextensions_help_item
Execute time
Initialization cells
Jupyter Black
Nbextensions dashboard tab
Nbextensions edit menu item
Notify
Python Markdown
Runtools
ScrollDown
Skip-Traceback
spellchecker
table_beautifier
Table of Contents (2)
Tree Filter
VIM binding

StreakyCobra/data-science-workspace

Python Data Science Workspace

Requirements

Installation and setup

How to set-up the workspace the first time

How to use the workspace

How to update the workspace (after an upstream update)

How to upgrade the workspace (upgrading python packages)

Facets

Troubleshooting

GPU support for Jupyter

Interesting notebook extensions