/kaggle-berlin

Material of the Kaggle Berlin meetup group!

kaggle-berlin

Material of the Kaggle Berlin meetup group!

Collection of Sources

If you want a comprehensive introduction to the field you find decent advice [here]. Note that this is a guide for AI safety yet the areas outlined with books and sources is fairly decent.

Here is a small, but growing, collection of sources that we have been discussing on our hack sessions.

Star ratings are from ⭐ to ⭐⭐⭐⭐⭐ and subject of discussions in the Kaggle group.

Tutorials

[0] Nicolas P. Rougier, Python & Numpy [link] (Outstanding Numpy introduction for scientists and optimizers) ⭐⭐⭐⭐⭐

[1] Sebastian Ruder, gradient descent methods [link] (If you are wondering what it is all about stochastic gradient descent, Nesterov momentum, Adam, ...) ⭐⭐⭐

[2] Scikit-learn documentation [link] (Absolutely great read to start learning about specific topics. Tons of superb example code. When I am bored I spend time here!) ⭐⭐⭐⭐⭐

[3] Donne Martin, "Data Science iPython notebooks." [Github repository] (Some useful examples to learn from.) ⭐⭐

Toying and Fun (but still learning)

[0] Andrew Karpathy: CNNs in the browser [link] (Great to gain some intuition.) ⭐⭐⭐

[1] Loss Function tumblr [link] (If you do not suffer from PTSD from neural network training already ;)) ⭐⭐⭐⭐

[2] Tensorflow in the browser [link] (Start with this when you learn about NNs!) ⭐⭐⭐⭐

[3] Narayanan, Arvind; Shmatikov, Vitaly: Robust De-anonymization of Large Sparse Datasests [paper] (Ridiculous example of de-anonymization - this should make you very afraid! Anonymous identities in the Netflix challenge data set are discovered via public available data on IMDB.)

[4] IBM personality insights [link] (Maps text to big five personality traits with Twitter or free text input. Supports English, Spanish, Arabic, and Japanese.) ⭐⭐⭐⭐

[5] Visualizing the DBScan algorithm [link] (Underrated clustering algorithm, only K-means and DBScan are useful bread-and-butter clustering algorithms.) ⭐⭐⭐⭐

Practical Tips

[0] Aarshay Jain: Complete Guide to Parameter Tuning in XGBoost (with codes in Python) [link] (XGBoost won many Kaggle competitions and is from the gradient boosted tree-based model family.)

[1] HJ van Veen: Feature Engineering [slideshare] (Read this to understand basics of preprocessing and feature engineering!) ⭐⭐⭐⭐

[2] hat y: Kaggle Ensembling Guide [link] (You must learn on how to combine several submission files and stack several models together if you want to score highly in contests.) ⭐⭐⭐⭐

[3] Megan Risdal: Communicating Data Science [kaggle blog] (Communication of your results is one of the major skills you have to learn - and you can exercise it in our group! It is a good summary of communication, presentation, and visualization.) ⭐⭐⭐

[4] Tim Dettmers: Which GPU(s) to Get for Deep Learning. [article] (Excellent guide on how to build your GPU machine, what to look for, and why cloud is too expensive) ⭐⭐⭐⭐

Books

[0] Bengio, Yoshua, Ian J. Goodfellow, and Aaron Courville. "Deep learning." An MIT Press book. (2015). [pdf] (Good theory book to get started, modern! Then go to papers.) ⭐⭐⭐⭐

[1] Murphy, Kevin. "Machine Learning" An MIT Press book. (2012) [link] (Not a good starter book, comprehensive and mathematics heavy. I use this a reference manual) ⭐⭐⭐

[2] Bishop, Christopher. "Pattern Recognition and Machine Learning" Springer. (2008) [link] (Written like a typical CS book, a bit outdated but solid introduction.) ⭐⭐⭐

[3] Abu-Mostafa, Yaser "Learning From Data" AMLBook (2012) [class site] (If you have only two months to learn ML, also has an accompanied class at Caltech.) ⭐⭐⭐