/ml-bigdata-fp

Links to useful ML, Big Data, Functional programming resources

Primary LanguageJupyter Notebook

ml-bigdata-fp

Machine Learning

Classic Andrew Ng's ML course - probably the best default beginner course

Udacity Machine Learning

Understanding Machine Learning : From Theory to Algorithms - introductory book on ML. Very clear on theory, with exercise sections for each subsection. Can be found on the internet.

Foundations of Data Science - advanced, covers mathematical aspects of ML, especially Big-data type ML. First chapter contains theoretical aspects of 'the curse of dimensionality'.

Bayesian Reasoning and Machine Learning - ML from Bayesian inference standpoint. Derivation of regularization in this framework is particularly valuable. Also contains information on structured models and variational inference.

Deep Learning

Hinton's Neural Networks for Machine Learning - Coursera. Really comrehensive introductory course on Deep Learning. The most theory-oriented MOOCs. The only drawback is the course wasn't updated since its first iteration in 2013. Contains some programming exercises using Matlab/Octave. See here for code setup.

Andrew Ng's Deep Learning Coursera Specialization - really thorough course series, covers both theoretical and practical aspects. Video lectures and Coursera-hosted Jupyter Notebooks for this course are available for free.

Udacity Deep Learning Nanodegree Notebooks - cover whole range of topics, from basics to GANs. Uses Tensorflow.

MXNet - the straight dope - notebook-style deep learning course (covers both theory and practice). Uses MXNet, a pretty interesting framework - it supports both static computation graphs and dynamic computations. It also covers Gluon, high-level library for writing neural networks (like Keras, similar to scikit-learn API).

Kadenze Creative Applications of Deep Learning with TensorFlow - covers interesting applications of DL like image painting.

Deep Learning Implementations and Frameworks - comparison of DL frameworks. Covers Torch.nn, Theano, Caffe, Chainer, MXNet, TensorFlow and PyTorch. It is particularly useful for someone who knows one of these frameworks and wants to learn another.

NLP/Information retrieval

Speech and Language Processing (3rd ed draft) - introductory material on NLP. 3rd edition has much bigger sections on machine learning and neural network methods. Theory-oriented, one of few textbooks covering word embeddings.

Natural Language Processing with Python (NLTK book) - introductory book using NLTK. Doesn't go as deep as S&LP, but contains lots of programming exercises and is more beginner-friendly.

Introduction to Information Retrieval - classic IR book. Contains information ranging from index definition and construction to machine learning in search.

Hands-on Text Mining and Analytics Pretty introductory, but covers useful Java libraries and provides reference articles. See here for maven project template.

Big Data

Data Science at Scale - basic Big Data concepts. Contains interesting exercises, for example writing MapReduce programs.

Introduction to Big Data & Scalable Machine Learning - great introductory courses on Apache Spark. They use Databricks Cloud, a notebook environment with Spark setup. Lots of exercises using PySpark.

Big Data University's courses - not so good as Coursera or edX. Also the courses are Hadoop-centered.

Functional Programming

Scala Specialization on Coursera - remake of classical Coursera Scala courses.

Scala by example by Martin Odersky - really concise book on Scala and some FP. I found it really handy as supplementary resource for Coursera Scala courses.

Introduction to Type Theory lecture notes