/MachineLearningInJulia2020

Resources for a 3.5 hour workshop on machine learning using the MLJ toolbox

Primary LanguageJupyter NotebookOtherNOASSERTION

Machine Learning in Julia using MLJ, JuliaCon2020

Now updated for MLJ version 0.16 and Julia 1.6

But binder notebook will not work until this binder issue is resolved.

Interactive tutorials for a workshop introducing the machine learning toolbox MLJ (v0.14.4)

MLJ

These tutorials were prepared for use in a 3 1/2 hour online workshop at JuliaCon2020, recorded here. Their main aim is to introduce the MLJ machine learning toolbox to data scientists.

Differences from the original resources are minor (main difference: @load now returns a type instead of an instance). However, if you wish to access resources precisely matching those used in the video, switch to the JuliaCon2020 branch by clicking here.

Future revisions of these tutorials will appear here.

Topics covered

Basic

  • Part 1 - Data Representation

  • Part 2 - Selecting, Training and Evaluating Models

  • Part 3 - Transformers and Pipelines

Advanced

  • Part 4 - Tuning hyper-parameters

  • Part 5 - Advanced model composition (as time permits)

The tutorials include links to external resources and exercises with solutions.

Options for running the tutorials

1. Plug-and-play

Only recommended for users with little Julia experience or users having problems with the other options.

Use this option if you have neither run Julia/Juptyer notebook on your local machine before, nor used a Julia IDE to run a Julia script.

Pros

One click. No need to install anything on your local machine.

Cons

  • The (automatic) setup can take a little while, sometimes over 15 minutes (but you do get a static version of the notebook while it loads).

  • You will have to start over if:

    • The notebook drops your connection for some reason.
    • You are inactive for ten minutes.

Instructions

Click this button: Binder

2. Clone the repo and choose your preferred interface

Assumes that you have a working installation of Julia 1.3 or higher and that either:

  • You can run Julia/Juptyer notebooks on your local machine without problems; or

  • You are comfortable running Julia scripts from an IDE, such as Juno or Emacs (see here for a complete list).

Pros

More stable option

Cons

You need to meet above requirements

Instructions

  • Clone this repository

  • Change to your local repo directory "MachineLearningInJulia2020/"

  • Either run the Juptyper notebook called "tutorials.ipynb" from that directory (corresponding to this file on GitHub) or open "tutorials.jl" from that directory in your favourite IDE (corresponding to this file on GitHub). You cannot download these files individually - you need the whole directory.

  • Immediately evaluate the first two lines of code to activate the package environment and pre-load the packages, as this can take a few minutes.

More about the tutorials

  • The tutorials focus on the machine learning part of the data science workflow, and less on exploratory data analysis and other conventional "data analytics" methodology

  • Here "machine learning" is meant in a broad sense, and is not restricted to so-called deep learning (neural networks)

  • The tutorials are crafted to rapidly familiarize the user with what MLJ can do and how to do it, and are not a substitute for a course on machine learning fundamentals. Examples do not necessarily represent best practice or the best solution to a problem.

Binder notebook for stacking demo used in video

Binder