/Data-Science-Resources

A collection of useful books, links and resources for Data Science

Data-Science-Resources

A collection of useful books, links and resources for Data Science

I will try to keep this up to date with any new information, links and thoughts.

Books

Book titles which have numbered star ratings (out of 5) at the end are ones that I have personally read where I then go on to include a brief review and comment on it.

General

  • Doing Data Science, Cathy O'Neil and Rachel Schutt 4*

Nice overview, covers a range of interesting topics. Not so technical and an easy read.

Math

  • Linear Algebra and Its Applications, Gilbert Strang

  • Convex Optimization, Stephen Boyd and Lieven Vendenberghe

  • A First Course in Probability (Pearson) and Introduction to Probability Models (Academic Press), Sheldon Ross

  • Mathematics: Its Content, Methods and Meaning, A.D. Aleksandrov, A.N. Kolmogorov, M.A. Lavrent'ev 4*

Excellent reference. Wide range of topics written in a clean and simple fashion with good examples.

Coding

  • R in a Nutshell, Joseph Adler

  • The Hitchhiker's Guide to Python: Best Practices for Development, Kenneth Reitz, Tanya Schlusser 4*

Excellent overview of all aspects of Python programming language. Good starting point for beginners and up to intermediate level. More a reference guide on style and also awareness of available packages. Use it as a springboard to go to other branches within Python. Nice references at the end. Highly recommend.

  • Learning Python, Mark Lutz and David Ascher

  • R for Everyone: Advanced Analytics and Graphics, Jared Lander

  • The Art of R Programming: A Tour of Statistical Software Design, Norman Matloff

  • Python for Data Analysis by Wes McKinney 5*

By the author and creator of Pandas. A must read book for anyone working in science, engineering, statistics, data science and machine learning. Covers all the feature engineering whet people spend most of their time. Also an excellent book for future reference to look stuff up. Clearly written with lots of practical examples and included Jupyter Notebooks.

Web Scraping

  • Web Scraping with Python: A Comprehensive Guide to Data Collection Solutions, Ryan Mitchell 4*

Good introduction to web scraping giving you all the tools and relevant libraries you need depending on your application.

Data Analysis and Statistical Inference

  • Statistical Inference, George Casella and Roger L. Berger

  • Bayesian Data Analysis, Andrew Gelman, et al.

  • Data Analysis Using Regression and Multilevel/Hierarchical Models, Andrew Gelman and Jennifer Hill

  • Advanced Data Analysis from an Elementary Point of View, Cosma Shalizi

  • The Elements of Statistical Learning: Data Mining, Inference and Prediction, Trevor Hastie, Robert Tibshirani, and Jerome Friedman

  • An Introduction to Statistical Learning: With Applications in R, Gareth James, Trevor Hastie, Robert Tibshirani, Daniela Witten 5*

Phenomenal book. Though it has examples in R, this can easily be translated to Python or Matlab. More importantly, the description, diagrams and examples in the book for the various statistical learning techniques are the best I have seen anywhere. Very clear, concise and up to the point with excellent examples. This book is the simpler, more accessible version to Elements of Statistical Learning. Indispensable.

  • Time Series Analysis and Its Applications, Robert H. Shumway, David S. Stoffer 1*

Too mathematical and abstract with not so good examples. Definition and symbols in equation also not clear nor explained just because it was defined in chapter 1. Very theoretical despite R examples with datasets. Theory not explained well at all.

  • Think Stats: Exploratory Data Analysis by Allen Downey 2*

Not much detail. Good simple explanations, but overall too simplistic and lacks depth. Plus a lot of the functions the author uses he wrote himself. It’s perhaps better to stick to the established libraries such as pandas and statsmodels to do similar work

  • Statistical Modeling: A Fresh Approach by Danny Kaplan

  • Applied Predictive Modeling by Max Kuhn and Kjell Johnson

Artificial Intelligence and Machine Learning

  • Pattern Recognition and Machine Learning by Christopher Bishop

  • Bayesian Reasoning and Machine Learning, David Barber

  • Programming Collective Intelligence, Toby Segaran

  • Artificial Intelligence: A Modern Approach, Stuart Russell and Peter Norvig

  • Foundations of Machine Learning, Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar

  • Introduction to Machine Learning (Adaptive Computation and Machine Learning), Ethem Alpaydim

  • Image Processing, Analysis, and Machine Vision, Milan Sonka, Roger Boyle, Václav Hlaváč 3*

  • Hands-On Machine Learning with Scikit-Learn and Tensorflow: Concepts, Tools, and Techniques to Build Intelligent Systems, Aurélien Géron 4*

5* for the first half of the book, scikit learn. 3* for the second half, Tensor Flow. Nice examples with Jupyter notebooks. Good mix of practical with theoretical. The scikit learn section is a great reference, nice detailed explanation with good references for further reading to deepen your knowledge. The tensor flow part is weaker as examples become more complex. Chollet’s book Deep Learning with Python, which uses Keras is much stronger, as the examples are easier to understand as Keras is a simple layer over tensor flow to ease the use. Also Chollet explains the concepts better and nicely annotates his code. Buy this book for scikit learn and overall best practise for machine learning and data science. Buy Chollet’s Deep Learning using Python for practical deep learning itself. Overall still a practical book with Jupyter Notebook supplementary material.

  • Deep learning with Python, Francois Chollet 5*

Absolutely phenomenal book. A very practical and up to the point book on deep learning techniques in python by the guru who created the Keras library. Hence all the examples in the book are in Keras. Highly recommend anyone who want to get into the field to start with this first, write their own code and tinker, and then go through the more theoretical books such as Deep Learning by Goodfellow et al which is very theoretical, broad and academic.

  • Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville 3*

Very theoretical and steep learning curve. Would be much better if it had code and practical examples as well as exercises. Perhaps it is better to get Deep Learning with Python by Chollet or the O’Reilly book by Gerhon which has Jupyter Notebook examples and exercises.

Experimental Design

  • Field Experiments, Alan S. Gerber and Donald P. Green

  • Statistics for Experimenters: Design, Innovation, and Discovery, George E. P. Box, et al.

Visualization

  • The Elements of Graphing Data, William Cleveland

  • Visualize This: The FlowingData Guide to Design, Visualization, and Statistics, Nathan Yau

  • The Visual Display of Quantitative Information, Edward R. Tufte 5*

Excellent book. Must read for anyone in the public AND private sector on how to design for demand and eliminate failure demand and boost value demand. Lots of analogies to TPS (Toyota) and critiques of ABC costing methods. Plenty of examples of waste gets created despite good intentioned targets from top down. Great arguments against command and control style management and more emphasis on localism and initiative given to workers.

  • Beautiful Evidence, Edward R. Tufte 5*

Standard Tufte. Superb!

  • Envisioning Information, Edward R. Tufte 4*

The book after Visual Display of Quantitative Information. Not as good as the first, but still excellent. A more general book dealing with maps, fonts, typography, 3D charts and illustration.