List of Data Science/Big Data Resources
This list contains free learning resources for data science and big data related concepts, techniques, and applications. Inspired by Free Programming Books.
Each entry provides the expected audience for the certain book (beginner, intermediate, or veteran). It may be subjective, but it provides some clue of how difficult the book is.
###How To Contribute
- Fork
- Edit, and add your recommendations (for beginner, intermediate, or veteran)
- Send a Pull Request
###Index
- Data Science Introduction
- [Data Processing] (#big-data-processing)
- [Data Analysis] (#big-data-analysis)
- [Fundamentals] (#fundamentals)
- [Network Analysis] (#network-analysis)
- [Statistics] (#statistics)
- [Data Mining] (#data-mining)
- [Machine Learning] (#machine-learning)
- [Data Science Application] (#big-data-application)
- [Data Visualization] (#data-visualization)
- [Uncategorized] (#uncategorized)
- [MOOCs about Data Science] (#moocs)
###Data Science Introduction
- [Data Science: An Introduction] (http://en.wikibooks.org/wiki/Data_Science:_An_Introduction) - Wikibook -
Beginner
- [Disruptive Possibilities: How Big Data Changes Everything] (http://www.amazon.com/Disruptive-Possibilities-Data-Changes-Everything-ebook/dp/B00CLH387W) - Jeffrey Needham -
Beginner
- Introduction to Data Science - Jeffery Stanton -
Beginner
- [Real-Time Big Data Analytics: Emerging Architecture] (http://www.amazon.com/Real-Time-Big-Data-Analytics-Architecture-ebook/dp/B00DO33RSW) - Mike Barlow -
Beginner
- [The Evolution of Data Products] (http://www.amazon.com/The-Evolution-Data-Products-ebook/dp/B005QEKQUY/ref=sr_1_63?s=digital-text&ie=UTF8&qid=1351898530&sr=1-63) - Mike Loukides -
Beginner
- [The Promise and Peril of Big Data] (http://www.aspeninstitute.org/sites/default/files/content/docs/pubs/The_Promise_and_Peril_of_Big_Data.pdf) - David Bollier -
Beginner
###Data Processing
- [Data-Intensive Text Processing with MapReduce] (http://lintool.github.io/MapReduceAlgorithms/MapReduce-book-final.pdf) - Jimmy Lin and Chris Dyer -
Intermediate
###Data Analysis ####Fundamentals
- [Fundamental Numerical Methods and Data Analysis] (http://ads.harvard.edu/books/1990fnmd.book/) - George W. Collins -
Beginner
- [Introduction to Metadata] (http://www.getty.edu/research/publications/electronic_publications/intrometadata/index.html) - Murtha Baca -
Beginner
- [Introduction to R - Notes on R: A Programming Environment for Data Analysis and Graphics] (http://cran.r-project.org/doc/manuals/R-intro.pdf) - W. N. Venables, D. M. Smith, and the R Core Team -
Beginner
- [Modeling with Data: Tools and Techniques for Scientific Computing] (http://modelingwithdata.org/about_the_book.html) - Ben Klemens -
Beginner
####Network Analysis
- [Introduction to Social Network Methods] (http://faculty.ucr.edu/~hanneman/nettext/) - Robert A. Hanneman and Mark Riddle -
Intermediate
- [Networks, Crowds, and Markets: Reasoning About a Highly Connected World] (http://www.cs.cornell.edu/home/kleinber/networks-book/) - David Easley and Jon Kleinberg -
Intermediate
- [Network Science] (http://barabasilab.neu.edu/networksciencebook/downlPDF.html) - Sarah Morrison -
Beginner
- [The Wealth of Networks] (http://www.benkler.org/Benkler_Wealth_Of_Networks.pdf) - Yochai Benkler -
Beginner
####Statistics
- [Advanced Data Analysis from an Elementary Point of View] (http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf) - Cosma Rohilla Shalizi -
Veternan
- [An Introduction to R] (http://cran.r-project.org/doc/manuals/R-intro.pdf) - W. N. Venables, D. M. Smith, and the R Core Team -
Beginner
- [Analyzing Linguistic Data: a practical introduction to statistics] (http://www.ualberta.ca/~baayen/publications/baayenCUPstats.pdf) - R. H. Baayan -
Beginner
- [Applied Data Science] (http://columbia-applied-data-science.github.io/appdatasci.pdf) - Ian Langmore and Daniel Krasner -
Intermediate
- [Concepts and Applications of Inferential Statistics] (http://vassarstats.net/textbook/) - Richard Lowry -
Beginner
- [Forecasting: Principles and Practice] (https://www.otexts.org/fpp/) - Rob J. Hyndman and George Athanasopoulos -
Intermediate
- [Introduction to Probability] (http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/pdf.html) - Charles M. Grinstead and J. Laurie Snell -
Beginner
- [Introduction to Statistical Thought] (http://www.math.umass.edu/~lavine/Book/book.pdf) - Michael Lavine -
Beginner
- [OpenIntro Statistics - Second Edition] (http://www.openintro.org/stat/textbook.php) - David M. Diez, Christopher D. Barr, and Mine Cetinkaya-Rundel -
Beginner
- [simpleR - Using R for Introductory Statistics] (http://cran.r-project.org/doc/contrib/Verzani-SimpleR.pdf) - John Verzani -
Beginner
- [Statistics] (http://upload.wikimedia.org/wikipedia/commons/8/82/Statistics.pdf) -
Beginner
- [Think Stats: Probability and Statistics for Programmers] (http://www.greenteapress.com/thinkstats/thinkstats.pdf) - Allen B. Downey -
Beginner
####Data Mining
- [Data Mining and Analysis: Fundamental Concepts and Algorithms] (http://www2.dcc.ufmg.br/livros/miningalgorithms/files/pdf/dmafca.pdf) - Mohammed J. Zaki and Wagner Meira Jr. -
Intermediate
- [Data Mining and Knowledge Discovery in Real Life Applications] (http://www.intechopen.com/books/data_mining_and_knowledge_discovery_in_real_life_applications) - Julio Ponce and Adem Karahoca -
Beginner
- [Data Mining for Social Network Data] (http://link.springer.com/book/10.1007%2F978-1-4419-6287-4) - Springer -
Veteran
- [Mining of Massive Datasets] (http://infolab.stanford.edu/~ullman/mmds/book.pdf) - Anand Rajaraman, Jure Leskovec, and Jeffrey D. Ullman -
Intermediate
- [Knowledge-Oriented Applications in Data Mining] (http://www.intechopen.com/books/knowledge-oriented-applications-in-data-mining) - Kimito Funatsu -
Intermediate
- [New Fundamental Technologies in Data Mining] (http://www.intechopen.com/books/new-fundamental-technologies-in-data-mining) - Kimito Funatsu -
Intermediate
- [R and Data Mining: Examples and Case Studies] (http://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdf) - Yanchang Zhao -
Beginner
- [The Elements of Statistical Learning] (http://statweb.stanford.edu/~tibs/ElemStatLearn/) - Trevor Hastie, Robert Tibshirani, and Jerome Friedman -
Intermediate
- [Theory and Applications for Advanced Text Mining] (http://www.intechopen.com/books/theory-and-applications-for-advanced-text-mining) - Shigeaki Sakurai -
Intermediate
####Machine Learning
- [A Course in Machine Learning] (http://ciml.info/) - Hal Daume -
Beginner
- [A First Encounter with Machine Learning] (https://www.ics.uci.edu/~welling/teaching/273ASpring10/IntroMLBook.pdf) - Max Welling -
Beginner
- [Bayesian Reasoning and Machine Learning] (http://web4.cs.ucl.ac.uk/staff/D.Barber/textbook/031013.pdf) - David Barber -
Veteran
- [Gaussian Processes for Machine Learning] (http://www.gaussianprocess.org/gpml/chapters/) - Carl Edward Rasmussen and Christopher K. I. Williams -
Veteran
- [Introduction to Machine Learning] (http://alex.smola.org/drafts/thebook.pdf) - Alex Smola and S.V.N. Vishwanathan -
Intermediate
- [Probabilistic Programming & Bayesian Methods for Hackers] (http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/) - Cam Davidson-Pilon (main author) -
Intermediate
- [The LION Way: Machine Learning plus Intelligent Optimization] (http://www.lionsolver.com/LIONbook/) - Robert Battiti and Mauro Brunato -
Intermediate
- [Thinking Bayes] (http://www.greenteapress.com/thinkbayes/) - Allen B. Downey -
Beginner
- [Sklearn Basics] (http://nbviewer.ipython.org/github/jakevdp/sklearn_scipy2013/tree/master/notebooks/) -
Beginner
###Data Science Application ####Information Retrieval
- [Introduction to Information Retrival] (http://nlp.stanford.edu/IR-book/) - Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze -
Intermediate
####Data Visualization
- [Interactive Data Visualization for the Web] (http://chimera.labs.oreilly.com/books/1230000000345/index.html) - Scott Murray -
Beginner
- [Plotting and Visualization in Python] (http://nbviewer.ipython.org/urls/gist.github.com/fonnesbeck/5850463/raw/a29d9ffb863bfab09ff6c1fc853e1d5bf69fe3e4/3.+Plotting+and+Visualization.ipynb) -
Beginner
###Uncategorized
- [Data Journalism Handbook] (http://datajournalismhandbook.org/1.0/en/) - Jonathan Gray, Liliana Bounegru, and Lucy Chambers -
Beginner
- [Building Data Science Teams] (http://assets.en.oreilly.com/1/eventseries/23/Building-Data-Science-Teams.pdf) - DJ Patil -
Beginner
- [Information Theory, Inference, and Learning Algorithms] (http://www.inference.phy.cam.ac.uk/itprnn/book.html) - David MacKay -
Intermediate
- [Mathematics for Computer Science] (http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-042j-mathematics-for-computer-science-fall-2010/readings/MIT6_042JF10_notes.pdf) - Eric Lehman, Thomas Leighton, and Albert R. Meyer -
Beginner
- [The Field Guide to Data Science] (http://www.boozallen.com/media/file/The-Field-Guide-to-Data-Science.pdf) -
Beginner
###MOOCs about Data Science
- [Data Mining with Weka] (http://www.cs.waikato.ac.nz/ml/weka/mooc/dataminingwithweka/) - Ian H. Witten -
Intermediate
- [Introduction to Data Science] (https://class.coursera.org/datasci-001/class) - Bill Howe (Coursera) -
Beginner
- [Introduction to Hadoop and MapReduce] (https://www.udacity.com/course/ud617) - Udacity -
Beginner
- [Machine Learning] (https://class.coursera.org/ml-003/class) - Andrew Ng (Coursera) -
Beginner
- [Machine Learning Foundatiaons (taught in Chinese)] (https://class.coursera.org/ntumlone-001) - Hsuan-Tien Lin -
Beginner
- [Machine Learning Video Library] (http://work.caltech.edu/library/#!?goback=.gde_35222_member_5810981726511443971) - Yaser Abu-Mostafa -
Intermediate
- [Natural Language Processing] (https://class.coursera.org/nlp/lecture/preview) - Dan Jurafsky and Christopher Manning (Coursera) -
Intermediate
- [Social and Economic Networks: Models and Analysis] (https://class.coursera.org/networksonline-001/class) - Matthew O. Jackson (Coursera) -
Intermediate
- [Social Network Analysis] (https://class.coursera.org/sna-003/class) - Lada Adamic (Coursera) -
Intermediate