/python_kickstart

How to kickstart your python code

Primary LanguagePythonMIT LicenseMIT

KICKSTART YOUR MACHINE LEARNING PROJECT IN PYTHON

Part 1: Coockiecutter, git and virtualenv

Cookiecutter

  1. Install cookiecutter in your system/user python profile (not a virtual environment).

    $ pip install --user cookiecutter
  2. Surf the file system until your code folder (e.g. path/to/repos_folder). This is the parent folder of your code. NB: cookiecutter will create a new folder project_name with everything inside. Your actual code will be in a subfolder, i.e. path/to/repos_folder/project_name/src/ Then run cookiecutter with the link to the data-science-template and prompt the question it will ask you:

    $ cd Documents/xxx/xxx/Code/
    $ cookiecutter https://github.com/drivendata/cookiecutter-data-science
    # fill the question using project name: kickstart_python
    # once it is finished, cd the forder kickstart_python
    $ cd kickstart_python

    From now on, our current directory will be path/to/kickstarn_python/ unless specified

Virtual Environment pt1

  1. set up a virtual environment, named venv, specifying the python version. venv is a typical convention, you could call it remi if you want. This code will create a folder named venv containing lot of things and a local copy of all the packages you will pip-install from now on.

    $ virtualenv venv -p python3
  2. edit the .gitignore by adding the virtualenv's folder with you favorite text editor or just run the following command

    $ echo venv >> .gitignore

GIT

  1. set up git and link it to a new github/gitlab reporitory:

    1. On github.com or inria's gitlab create an empty reporitory online (it means no README and no license. If you do so it will display usefull command).
    2. Start git locally and synch it with the following commands:
    $ git init
    # check we are not 'saving' wried files
    $ git status
    # if so, commit
    $ git add .
    $ git commit -m "first commit"
    
    # If github
    $ git remote add origin https://github.com/USER/python_kickstart.git
    # If inria gitlab
    $ git config --global user.name "your_name"
    $ git config --global user.email "your_email@inria.fr"
    $ git remote add origin git@gitlab.inria.fr:USER/python_kickstart.git
    
    $ git push -u origin master
    # avoid writing login and password for the future time
    $ git config credential.helper store

Virtual Environment pt2

  1. Activate the virtualenv

    [user@localhost] project_name/ $ source venv/bin/activate
    # check that it is activated. You should have (venv) at the beginnig of your command line
    (venv) [user@localhost] project_name/ $
  2. Install the basic dependencies of cookiecutter (if you want). Notice that doing so also you will install the src package by default. Then install your everyday-coding-favorite-life packages: numpy, matplotlib, jupyter

    (venv) $ pip install -r requirements.txt
    (venv) $ pip install numpy matplotlib jupyter
  3. Freeze the requirements ('>' overwrite, '>>' append)

    (venv) $ pip freeze >> requirements.txt
  4. Install the package for a toy example

    (venv) $ pip install sklearn
  5. in src/ create the main.py file and paste the following code:

    from numpy.random import permutation
    from sklearn import svm, datasets
    
    C = 1.0
    gamma = 0.7
    iris = datasets.load_iris()
    perm = permutation(iris.target.size)
    iris.data = iris.data[perm]
    iris.target = iris.target[perm]
    model = svm.SVC(C, 'rbf', gamma=gamma)
    model.fit(iris.data[:90],
            iris.target[:90])
    print(model.score(iris.data[90:],
                    iris.target[90:]))
  6. commit the changes

    $ git add .
    $ git commit -m 'toy svm'
  7. edit the models/train_model.py and models/predict_model.py files. I In both of the files (actually python modules) create new function respectively In ./src/models/train_model.py:

    from sklearn import svm
    def train(data, target, C, gamma):
        clf = svm.SVC(C, 'rbf', gamma=gamma)
        clf.fit(data[:90],
                target[:90])
        return clf

    In ./src/models/predict_model.py:

    def predict(clf, data, target):
        return clf.score(data, target)
  8. Update the main file in order to import with the following imports In src/main.py add:

    from models.predict_model import predict
    from models.train_model import train

    Now The main code should looks like:

    # std imports
    from numpy.random import permutation
    from sklearn import datasets
    # my imports
    from models.predict_model import predict
    from models.train_model import train
    
    C = 1.0
    gamma = 0.7
    iris = datasets.load_iris()
    per = permutation(iris.target.size)
    iris.data = iris.data[per]
    iris.target = iris.target[per]
    model = train(iris.data[:90], iris.target[:90], C, gamma)
    score = predict(model, iris.data[90:], iris.target[90:])
    print(score)
  9. Run and debug

    (venv) $ python src/main.py

Sacred

  1. PIP-install Sacred for tracking experiments

    (venv) $ pip install sacred pymongo
  2. create a new function for the parameters C and gamma and add the colorators for Sacred

    #...add here the new nice imports and add the followings
    from sacred import Experiment
    ex = Experiment('iris_svm') # id of the experiments
    
    @ex.config
    def cfg():
        C = 1.0
        gamma = 0.7
    
    @ex.automain
    def run(C, gamma):
        # ...
        # ... paste here the main
        #...
        return score
  3. run it from the project's root directory

    (venv) $ python src/main.py

MongoDB and Omniboard

  1. install mongodb in your system. In a new terminal

    $ sudo dnf install mongodb mongodb-server mongoose
    # start service
    $ sudo service mongod start
    # verify it is woring
    $ mongo  # it will start the mongo-db-shell
  2. Run and re-run as many time as you want the code with the database flag:

    (venv) $ python src/main.py -m MY_IRIS_EXP

    notice how the ID value increase at each run

  3. In a mongo shell (just run mongo in the command line) check if the MY_IRIS_EXP database exists

    $ mongo
    # after in the mongo shell
    > show dbs
    # look for MY_IRIS_EXP entry
  4. download and install Ominboard, the sacred+mongo frontends

    # in a new terminal
    $ sudo npm install -g omniboard
  5. In the same shell run the server listener

    $ omniboard -m localhost:27017:MY_IRIS_EXP
  6. go to http://localhost:9000 to access omniboard frontends:

  7. play with it

Experiment metrics and omniboard visualization

  1. add a metric in the main.py file add

    @ex.automain
    def run(C, gamma):
        ... # the code before
        ex.log_scalar("val.score", score)
        return score
  2. And what about a typical loss fuction in a for loop? for instance add the following line. We need to pass the object _run at the main()

    @ex.automain
    def run(_run, C, gamma):
        ... # the code before
        my_loss = 0
        for i in range(20):
            # Explicit step counter (0, 1, 2, 3, ...)
            # incremented with each call for training.accuracy:
            _run.log_scalar("training.loss", my_loss, i)
            my_loss += 1.5*i + np.random.random(1)
        return score
  3. run some experiments

  4. play in omniboard