/deep-learning-workshop

Deep Learning Workshop : Including a VirtualBox VM with pre-configured Jupyter, Theano, Tensorflow, models and data

Primary LanguageJupyter NotebookMIT LicenseMIT

Deep Learning Workshop

This repo includes all scripts required to build a VirtualBox 'Appliance' (an easy-to-install pre-configured VM) that can be used by Deep Learning Workshop participants.

This workshop consists of an introduction to deep learning (from single layer networks-in-the-browser, then using the VM/Jupyter setup to train networks using both Theano (+Lasagne for model components) and Tensorflow (+some sugar layers). The modules also include pretrained state-of-the-art networks, such as GoogLeNet, in various applications) :

  • FOSSASIA 2016 : Deep Learning Workshop (2 hours)

    • Application : Generative art (~style transfer)
    • Application : Classifying unknown classes of images (~transfer learning)
    • Slides for the talk are here, with an accompanying blog post
  • PyCon-SG 2016 : Deep Learning Workshop (1.5 hours)

    • Unfortunately, due to 'demand' for speaker slots, PyCon has only scheduled 1h30 for the workshop, rather than the 3h00 they originally suggested...
    • Application : Reinforcement Learning
    • Slides for the talk are here, with an accompanying blog post, which includes a video link
  • DataScienceSG MeetUp : 'Hardcore' session about Deep Learning (2.5 hours)

    • Application : Anomaly Detection (mis-shaped MNIST digits)
    • Application : Classifying unknown classes of images (~transfer learning)
    • Slides for the talk are here, with an accompanying blog post, which includes a video link
  • Fifth Elephant, India : Deep Learning Workshop (6 hours : 4x 1.5hr classes in one day)

    • Application : Classifying unknown classes of images (~transfer learning)
    • Application : Generative art (~style transfer)
    • Application : RNN Tagger
    • Application : RNN Fun (work-in-progress)
    • Application : Anomaly Detection (mis-shaped MNIST digits)
    • Application : Reinforcement Learning
    • Slides for the talk are here, with an accompanying blog post
  • PyDataSG MeetUp : Talk on RNNs and NLP (1.5 hours)

    • Application : RNN Tagger (cleaned up a little)
    • Slides for the talk are here, with an accompanying blog post, which includes a video link
  • TensorFlow & Deep Learning MeetUp : Talk on transfer learning (0.5 hours)

    • Application : Classifying unknown classes of images (~transfer learning) in TensorFlow
    • Slides for the talk are here, with an accompanying blog post, which includes a video link
  • FOSSASIA 2017 : Deep Learning Workshop (1 hour)

    • Application : Speech Recognition using a CNN
    • Slides for the talk are here, with an accompanying blog post, which includes a video link
  • TensorFlow & Deep Learning MeetUp : Talk on CNNs (0.5 hours)

  • TensorFlow & Deep Learning MeetUp : Generative Art : Style-Transfer (0.5 hours)

    • Application : Generative Art (Style-Transfer)
    • Slides for the talk are here
  • APAC Machine Learning & Data Science Community Summit : In the news : AlphaGo and Reinforcement Learning (0.75 hours)

    • Application : Bubble-Breaker in TensorFlow / Keras
    • Slides for the talk are here with an accompanying blog post
  • TensorFlow & Deep Learning MeetUp : Text : Embeddings, RNNs and NER (~1 hour)

    • Application : BiDirectional RNNs for Case-Insensitive NER
    • Slides for the talk (including a more general introduction to NLP) are here with an accompanying blog post, which includes a video link
  • TensorFlow & Deep Learning MeetUp : Advanced Text and Language (0.75 hours)

    • Application : Image Captioning (Flickr30k)
    • Slides for the talk are here with an accompanying blog post, which includes a video link

NB : Ensure Conference Workshop announcement / blurb includes VirtualBox warning label

  • Also : for the Art (and potentially other image-focussed) modules, having a few 'personal' images available might be entertaining *

The VM itself includes :

  • Jupyter (iPython's successor)
    • Running as a server available to the host machine's browser
  • Data
    • MNIST training and test sets
    • Trained models from two of the 'big' ImageNet winners
    • Test Images for both recognition, 'e-commerce' and style-transfer modules
    • Corpuses and pretrained GloVe for the language examples
  • Tool chain (Python-oriented)
    • Theano / Lasagne
    • Tensorflow is included, and will become more prominent going forward
      • (In the past ?) Tensorflow demanded far more RAM than Theano, and we can't assume that the VM will be allocated more than 2Gb
      • Keras, plus other 'sugars', are also installed

And this repo can itself be run in 'local mode', using scripts in ./local/ to :

  • Set up the virtual environment correctly
  • Run jupyter with the right flags, paths etc

Status : Workshop WORKS!

Currently working well

  • Scripts to create working Fedora 25 installation inside VM

    • Has working Python3.x virtualenv with Jupyter and TensorFlow / TensorBoard
  • Script to transform the VM into a VirtualBox appliance

    • Exposing Jupyter, TensorBoard and ssh to host machine
  • Locally hosted Convnet.js for :

    • Demonstration of gradient descent ('painting')
  • Locally hosted TensorFlow Playground for :

    • Visualising hidden layer, and effect of features, etc
  • Locally hosted cnn demo for :

    • Demonstration of how a single CNN 3x3 filter works
  • Existing workshop notebooks :

    • Basics
    • MNIST
    • MNIST CNN
    • ImageNet : GoogLeNet
    • ImageNet : Inception 3
    • CNN for simple Voice Recognition
    • 'Anomaly Detection' - identifying mis-shaped MNIST digits
    • 'Commerce' - repurpose a trained network to classify our stuff
    • 'Art' - Style transfer with Lasagne, but using GoogLeNet features for speed
    • 'Reinforcement Learning' - learning to play "Bubble Breaker"
    • 'RNN-Tagger' - Processing text, and learning to do case-less Named Entity Recognition
  • Notebook Extras

    • U - VM Upgrade tool
    • X - BLAS configuration fiddle tool
    • Z - GPU chooser (needs Python's BeautifulSoup)
  • Create rsync-able image containing :

    • VirtualBox appliance image
      • including data sets and pre-trained models
    • VirtualBox binaries for several likely platforms
    • Write to thumb-drives for actual workshop
      • and/or upload to DropBox
  • Workshop presentation materials

Still Work-in-Progress

  • Create sync-to-latest-workbooks script to update existing (taken-home) VMs

  • Create additional 'applications' modules (see 'ideas.md')

  • Monitor TensorBoard - to see whether it reduces its memory footprint enough to switch from Theano...

  • 'RNN-Fun' - Discriminative and Generative RNNs

Notes

Running the environment locally

See the local/README file.

Git-friendly iPython Notebooks

Using the code from : http://pascalbugnion.net/blog/ipython-notebooks-and-git.html (and https://gist.github.com/pbugnion/ea2797393033b54674af ), you can enable this kind of feature just on one repository, rather than installing it globally, as follows...

Within the repository, run :

# Set the permissions for execution :
chmod 754 ./bin/ipynb_optional_output_filter.py

git config filter.dropoutput_ipynb.smudge cat
git config filter.dropoutput_ipynb.clean ./bin/ipynb_optional_output_filter.py

this will add suitable entries to ./.git/config.

or, alternatively, create the entries manually by ensuring that your .git/config includes the lines :

[filter "dropoutput_ipynb"]
	smudge = cat
	clean = ./bin/ipynb_output_filter.py

Note also that this repo includes a <REPO>/.gitattributes file containing the following:

*.ipynb    filter=dropoutput_ipynb

Doing this causes git to run ipynb_optional_output_filter.py in the REPO/bin directory, which only uses import json to parse the notebook files (and so can be executed as a plain script).

To disable the output-cleansing feature in a notebook (to disable the cleansing on a per-notebook basis), simply add to its metadata (Edit-Metadata) as a first-level entry (true is the default):

  "git" : { "suppress_outputs" : false },

Useful resources