/data_science_python_tools

This repository collects most important ML & DS tools in annotated Jupyter Notebooks.

Primary LanguageJupyter Notebook

Data Science Tools in Python

This project contains notebooks and notes related to the most important concepts and tools necessary for machine learning and data science.

I started collecting most of the notebooks and notes while following several web tutorials and Udemy courses, such as:

Unfortunately, sometimes I have not found a repository to fork, so the attribution is done in this README.md.

The aforementioned courses are very practical, they don't focus so much on the theory; for that purpose, I used:

  • "An Introduction to Statistical Learning with Applications in R", by James et al. A repository with python notebooks can be found in https://github.com/JWarmenhoven/ISLR-python.
  • "Reinforcement Learning" by Sutton & Barto.
  • "Pattern Recognition and Machine Learning" by Bishop. A repository with python notebooks can be found in https://github.com/ctgk/PRML.

Note that in some cases I also just simply followed the documentation provided in the websites of the used packages.

Important related howto files (not public) of mine are (for my personal tracking):

  • ~/Dropbox/Learning/PythonLab/python_manual.txt
  • ~/Dropbox/Documentation/howtos/sklearn_scipy_sympy_stat_guide.txt
  • ~/Dropbox/Documentation/howtos/keras_tensorflow_guide.txt
  • ~/Dropbox/Documentation/howtos/pybullet_openai_guide.txt
  • ~/Dropbox/Documentation/howtos/python_reinforcement_learning_openai.txt

To run the notebooks locally, first, install an environment manager, e.g., conda, create an environment and install the required dependencies:

# Create your env
conda create --name ds pip python=3.8
conda activate ds

# Install all necessary packages
# FIXME: Many packages can be removed
pip install -r requirements.txt

Then, you open the notebooks; if I were a beginner, I'd start sequentially.

See also:

  • An 80/20 guide for Data Processing: Data Cleaning, Exploratory Data Analysis, Feature Engineering, Feature Selection — eda_fe_summary.
  • My notes and the code of the IBM Machine Learning Professional Certificated offered by IBM & Coursera — machine_learning_ibm.

Mikel Sagardia, 2018.
No guarantees.