A repo for all my Data-Science ipynbs. Helpful for someone who wants to start the basics of Data Science (Stats, ML, DL).
Some datasets (*.csv) are provided by 365datascience. While some are scraped by me.
Imports used:
+ import numpy as np
+ import pandas as pd
+ import matplotlib.pyplot as plt
+ import statsmodels.api as sm
+ import seaborn as sns
+ from sklearn.linear_model import LinearRegression
+ from sklearn.feature_selection import f_regression
+ from sklearn.model_selection import train_test_split
+ from sklearn.preprocessing import StandardScaler
+ from statsmodels.stats.outliers_influence import variance_inflation_factor
+ from sklearn.cluster import KMeans
+ from sklearn import preprocessing
+ from mpl_toolkits.mplot3d import Axes3D
+ import tensorflow as tf
+ import tensorflow_datasets as tfds
0 Percentile Rank
Contains the rank statistics of my fellow classmates
1 First Regression Model
Contains the SAT-GPA score prediction using sklearn, statsmodels. I've used the OLS model.
2 First Multiple Regression
What measures you should take before performing the multiple regression? (any regression!)
3 Multiple Regression w Categorical Data
Multiple Regression with Categorical Data using dummy variables. Also, predicted the result with the help of StatsModels.api.
(Nothing ML here. Just basic estimations!)
Basic ML starts from here...
4 Simple Linear Regression w sklearn
This file shows the differences between statsmodels and sklearn. How we have to define r-squared, intercept & coeffecient seperately to create statsmodels like summary.
5 Multiple Regression w sklearn
This file shows the regression, r-squared, the adjusted r-squared, feature selection, standardization, regression with scaled inputs, predicting, and getting to know sklearn little deeper!
6 Train-Test in Sklearn
Using the function train_test_split() provided by sklearn.model_selection
7 Real life eg. Car Data
Performing the prediction on real life dataset. From preprocessing (removing missing values, exploring probability sistribution functions, dealing with outliers, checking OLS assumptions, multicollinearity, dummy variables)
to Performing the Regression (scaling, spliting, weights & biases) and then Testing. All on sklearn.
8 Simple Logistic Regression
Logistic Regression; plotting and summary table from StatsModels.
9 Logistic Regression Binary Predictors
Logistic Regression Binary Predictors with StatsModels; Regression, Confusion-Matrix, Testing with test data.
10 Cluster Analysis
Using KMeans clustering to generate continents where each data point is a country.
11 Cluster Analysis w Categorical Data
Using KMeans clustering to generate continents where language, continents are dummy variables.
12 Clusters with WCSS, Elbow method
Within Clusters Sum of Squares(WCSS). How elbow method can be used to determine the clusters. (And, how sklearn uses kmeans++ by default)
13 Cluster Analysis Market Segmentation
Using standardization on real life data to see how clusters are changing.
14 Heatmaps & Dendogram
Seaborn magic!
Basics of Neural Nets, Deep Learning starts from here... - Before jumping onto the next section, learn the basics of numpy array, tensors, matrices, operations on matrices(addition, subtraction, transpose, dot product) - In terms of programming, tensor is no different than ndarray.
15 Simple Neural Network (Numpy)
A Simple Neural Network made with numpy.
- Will work on Tensorflow from now onwards.
- sklearn does not provide with functions regrading neural networks.
- sklearn is useful in preprocessing (i.e. clustering, random forests, etc.)
- The theory is same but syntax changes a little.
- Keras was integrated with tensorflow in 2017.
- Keras is nothing but an interface for TensorFlow rather than different library.
- Tensorflow2 is basically keras, because TF2 uses syntax of Keras.
16 Simple Neural Network (Tensorflow2)
The same Simple Neural Network, but now with Tensorflow2.
- Before jumping onto the next section, learn the basics of Layers, Activation functions, BackPropagation, Overfitting, Validation, Cross-Validation, Early-Stopping, Gradient-Descent, Momentum, Learning Rate Schedules,
Standardization, Binary & One-Hot Encoding.
17 Deep Learning with MNIST dataset
"Hello World" of Deep Learning & Image recognition. (Tensorflow2)
- Preprocess. Train, Validation & Test Dataset Splitting.
- Outline the model and choose the activation functions.
- Set optimizers and loss functions.
- Make it Learn.
- Test the accuracy of the model