This three hour lab was designed for NASSLLI 2018 and the 2017 Jelinek Summer School at Carnegie Mellon University. Students will learn about the efficient implementation of online learning (i.e. stochastic gradient descent) through a choose-your-own-adventure style exercise. The task is language identification; two datasets are provided. A very basic implementation of streaming multinomial logistic regression is improved to near state-of-the-art performance.
Read the instructions in PDF writeup for more detail.
Run the following to download the data and install various dependencies.
cd py
make setup