ML-foundations

This repository covers detailed explanations of all Python libraries which are essential for Machine Learning.
Each notebook contains the most commonly used functions/practices which are required in assessment of data sets.
Exploratory Data Analysis (EDA) for an hands-on experience for how data is pre-processed before training into a ML model.
Probablility concepts which are integral part of the Math behind concepts of Machine Learning.

Build your concepts 📜

All these libraries forms the ground upon which everything is built in Machine Learning. One should follow this chronology in order to build the foundation step-by-step :-

1. Numpy 🔢

NumPy is a python library used for working with arrays. It contains multi-dimensional arrays and matrix data structures. It can be utilised to perform a number of mathematical operations on arrays such as trigonometric, statistical, and algebraic.

2. Pandas 🐼

Pandas is mainly used for data analysis. This library is built on top of Numpy. Pandas allows importing data from various file formats such as comma-separated values, csv, JSON, SQL, Microsoft Excel. Pandas allows various data manipulation operations such as merging, reshaping, selecting, as well as data cleaning, and data wrangling features.

3. Matplotlib 📊

Matplotlib is a plotting library primarily used for data visualization. This library is also an extension of Numpy. This library is used for plotting various graphs such as Line plots, bar graphs, Pie charts and other figures.

4. Seaborn 📈

Seaborn is a Python data visualization library based on top of matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. Matplotlib generally consists of basic plots but Seaborn, on the other hand, provides a variety of visualization patterns. It uses fewer syntax and has easily interesting default themes. More advanced plots like Heatmaps, Box plots, Histogram, Scatter plots & many more.

5. Sci-kit Learn 💻

Scikit-learn is a machine learning library for Python which features various algorithms like support vector machine, random forests, and k-neighbours in a pre-built form.

6. Data Analysis (EDA) 📝

All the libraries are used in this phase.
Exploratory Data Analysis refers to analyzing data sets to summarize their main characteristics, often with visual methods. It is a critical process of performing initial investigations on data so as to discover patterns,to spot anomalies,to test hypothesis, find outliers, look for trends and to check assumptions with the help of summary statistics and graphical representations.

7. Probability (Extra Content) ⏳

Probability plays an underlying part inside the world of Machine Learning. It is used to judge possibilities, perform Hypothesis & categorize data on the basis of there mathematical distributions.

References:

https://numpy.org/devdocs/reference/index.html
https://pandas.pydata.org/docs/reference/index.html#api
https://matplotlib.org/3.3.1/tutorials/index.html
https://www.datacamp.com/community/data-science-cheatsheets
https://elitedatascience.com/python-seaborn-tutorial https://www.youtube.com/watch?v=vmEHCJofslg&list=PLFCB5Dp81iNVmuoGIqcT5oF4K-7kTI5vp
https://www.youtube.com/watch?v=Pkvdc2Z6eBg

thenorthkun/ML-foundations