This repo contains introduction to some of the most important machine learning and data-analysis techniques.
PCA_Muller.py 190818: Principal component analysis example with breast cancer data-set.
270918: RidgeandLin.py, LassoandLin.py: Lasso and Ridge regression examples.
081018: bank.csv, data set of selling products of a portuguese company to random customers over phone call(s). Data-set description is available here.
161018: gender_purchase.csv, data-set of two columns describing customers buying a product depending on gender.
111118: winequality-red.csv, red wine data set, where the output is the quality column which ranges from 0 to 10.
121118: pipelineWine.py, A simple example of applying pipeline and gridsearchCV together using the red wine data.
24112018: lagmult.py, This program just demonstrate a simple constrained optimization problem using figures.
11122018: Consumer_Complaints_short.csv, 3 columns describing the complaints, product_label and category. Complete file can be obtained from Govt.data.
13122018: Text-classification_compain_suvo.py, Classify the consumer complaints data, which is already described above.
1912018: SVMdemo.py*, this program shows the effect of using RBF kernel to map from 2d space to 3d space. Animation requires ffmpeg in unix system.
05032019: IBM_Python_Web_Scrapping.ipynb, Deals with basic web scrapping, string handling, image manipulation.
06042019: datacleaning, Folder containing files and images related to data cleaning with pandas.
08062010: DBSCAN_Complete, Folder containing files and images related to application of DBSCAN algorithm to cluster Weather Stations in Canada.
13072019: SVM_Decision_Boundary, Pipeline + GridSearchCV were performed to find best-fit parameters for SVM and then decision function contours of SVM classifier for binary classification are plotted.
28122019: DecsTree, Folder contains notebook using a decision tree classifier on the Bank Marketing Data-Set.
07032020: Conjugate Prior, Folder contains a notebook where concept of conjugate prior is discussed including an introduction to PyMC3.
29052020: ExMax_Algo, Folder contains a notebook completely explaining the Expectation Maximization algorithm.
11092020: AdaptiveLoss.ipynb, File contains description and a simple implemetation of robust and adaptive loss function. Original Paper by J. Barron. More details on TDS.
31092020: pima_diabetes.ipynb, file contains description of data preparation and choosing best machine learning algorithm for binary classification task. Little more details on kaggle kernel.