/data-science-from-scratch

code for Data Science From Scratch book

Primary LanguagePythonThe UnlicenseUnlicense

Data Science from Scratch

Here's all the code and examples from my book Data Science from Scratch. The code directory contains Python 2.7 versions, and the code-python3 direction contains the Python 3 equivalents. (I tested them in 3.5, but they should work in any 3.x.)

July 2018: I am currently working on the second edition. It will be based on Python 3.6, will have much cleaner code, and will contain expanded coverage of deep learning, NLP, and whatever else I feel like adding. Stay tuned.

Each can be imported as a module, for example (after you cd into the /code directory):

from linear_algebra import distance, vector_mean
v = [1, 2, 3]
w = [4, 5, 6]
print distance(v, w)
print vector_mean([v, w])

Or can be run from the command line to get a demo of what it does (and to execute the examples from the book):

python recommender_systems.py

Additionally, I've collected all the links from the book.

And, by popular demand, I made an index of functions defined in the book, by chapter and page number. The data is in a spreadsheet, or I also made a toy (experimental) searchable webapp.

Table of Contents

  1. Introduction
  2. A Crash Course in Python
  3. Visualizing Data
  4. Linear Algebra
  5. Statistics
  6. Probability
  7. Hypothesis and Inference
  8. Gradient Descent
  9. Getting Data
  10. Working With Data
  11. Machine Learning
  12. k-Nearest Neighbors
  13. Naive Bayes
  14. Simple Linear Regression
  15. Multiple Regression
  16. Logistic Regression
  17. Decision Trees
  18. Neural Networks
  19. Clustering
  20. Natural Language Processing
  21. Network Analysis
  22. Recommender Systems
  23. Databases and SQL
  24. MapReduce
  25. Go Forth And Do Data Science