Data Science Resources

A list of data science resources.

Language specific

Learning R

Resource Comments
R for Data Science Introduction to modern R programming using the tidyverse
Text Mining with R Guide to a modern approach to text mining in R
The caret package Guide to performing machine learning in R using the caret package
An Introduction to Statistical Learning A book on machine learning with examples in R
Shiny Learning for to use the Shiny package to create interactive dashboards
Advanced R Advanced guide to R, particularly good is the style guide
R packages Guide to writing packages
The reticulate package Website of the reticulate package that allows you to use Python functions within R. For example, see this blog where it is used to embed a Python model within a Shiny web app. (Similarily look at the feather package for passing dataframes between R and Python.)
R-bloggers Blogs about the use of R in analytics
knitr in a knutshell A short introduction to the knitr package for reproducible research

Learning D3

Resource Comments
bl.ocks Website showing popular D3 examples
Observable A notebook approach to written D3 and javascript
Search the Bl.ocks Search D3 examples produced my others (great for inspiration!)
D3 Tips and Tricks A good book about D3
D3 in Depth A good introduction to writing D3
D3 tutorial list A list of D3 tutorials from the D3 website
A better way to structure D3 code Interesting blog post on how to strucutre D3 code
Eloquent JavaScript Knowing a bit of JavaScript is a prerequisite for mastering D3 and this book is a good introduction
A Tour Through the Visualization Zoo A good introduction to a wide range of visualisations you could do in D3 (though here they have been done in a precusor to D3)
dc.js A library that combines D3 and crossfilter that makes it easier to create interactive dashboards

Python resources

Resource Comments
scikit-learn sklearn is the go-to Python package for machine learning and the documentation is a worth of information, not only on usage but also about the techniques themselves
Modern Pandas A guide to using pandas dataframes
imbalanced-learn A package to deal with classifying imbalanced data with excellent documentation
Natural Language Processing with Python This is a book on NLP in Python from the team behind the NLTK package. For text mining you should also look into spaCy and gensim (for topic modelling)
Requests: HTTP for Humans Library for making HTTP requests from Python, great way of making API calls
Flask Python framework for creating web apps
Seven Strategies for Optimizing Numerical Code Slides on different approaches to speeding up Python code
Comparing Python Clustering Algorithms Does what it says on the tin!
Style Guide for Python Code This is PEP 8, the official style guide for Python. One incentive to following its guidance is that your code will better integrate with IDEs

Technique specific

Guides to aspects of machine learning

Resource Comments
Feature Engineering and Selection Guide to feature engineering and model selection
Kaggle Ensembling Guide A guide to combining models to approve performance
Elements of Statistical Learning The classic text on machine learning

Neural Networks

Resource Comments
Neural Networks and Deep Learning Simple introduction to neural networks
Convolutional Neural Networks for Visual Recognition Stanford course on convolutional neural networks
Understanding Convolutional Neural Networks for NLP Article explaining CNNs in the context of NLP
On word embeddings Introduction to word embeddings
fast.ai Online AI course

Other

Resource Comments
Towards Data Science Interesting articles about data science
Data Science Weekly Weekly data science newsletter that aggregates articles on data science
Why Use Make Thoughts from Mike Bostock on using make for reproducible research
Statistical Modeling: The Two Cultures Leo Breiman's article on the difference between statistical models and algorithmic models