This repo contains introduction to some of the most important machine learning and data-analysis techniques.
PCA_Muller.py 190818: Principal component analysis example with breast cancer data-set.
270918: RidgeandLin.py, LassoandLin.py: Lasso and Ridge regression examples.
081018: bank.csv, data set of selling products of a portuguese company to random customers over phone call(s). Data-set description is available here.
161018: gender_purchase.csv, data-set of two columns describing customers buying a product depending on gender.
111118: winequality-red.csv, red wine data set, where the output is the quality column which ranges from 0 to 10.
121118: pipelineWine.py, A simple example of applying pipeline and gridsearchCV together using the red wine data.
24112018: lagmult.py, This program just demonstrate a simple constrained optimization problem using figures.
11122018: Consumer_Complaints_short.csv, 3 columns describing the complaints, product_label and category. Complete file can be obtained from Govt.data.
13122018: Text-classification_compain_suvo.py, Classify the consumer complaints data, which is already described above.
1912018: SVMdemo.py*, this program shows the effect of using RBF kernel to map from 2d space to 3d space. Animation requires ffmpeg in unix system.
05032019: IBM_Python_Web_Scrapping.ipynb, Deals with basic web scrapping, string handling, image manipulation.
06042019: datacleaning, Folder containing files and images related to data cleaning with pandas.
08062010: DBSCAN_Complete, Folder containing files and images related to application of DBSCAN algorithm to cluster Weather Stations in Canada.
13072019: SVM_Decision_Boundary, Pipeline + GridSearchCV were performed to find best-fit parameters for SVM and then decision function contours of SVM classifier for binary classification are plotted.
28122019: DecsTree, Folder contains notebook using a decision tree classifier on the Bank Marketing Data-Set.
07032020: Conjugate Prior, Folder contains a notebook where concept of conjugate prior is discussed including an introduction to PyMC3.
29052020: ExMax_Algo, Folder contains a notebook completely explaining the Expectation Maximization algorithm.
11092020: AdaptiveLoss.ipynb, File contains description and a simple implemetation of robust and adaptive loss function. Original Paper by J. Barron. More details on TDS.
31092020: pima_diabetes.ipynb, file contains description of data preparation and choosing best machine learning algorithm for binary classification task. Little more details on kaggle kernel.
15112020: terrorism_kaggle.ipynb, Notebook contains elaborate examples on how to think about problems and interpret large scale data using Global Terrorism Database. Apart from Pandas Groupby, Crosstab methods I have also used Folium, Basemap libraries for visualizing Leaflet map and 2D data on maps respectively. More on The Startup.
15022021: FocalLoss_Ex.ipynb, Notebook contains explanation on detail of how Focal Loss works. Please read the original Focal Loss paper. Example of implementing Focal Loss using Tensorflow is also shown. For more detail check the post on TDS.
19062021: Augly_Try.ipynb, Notebook contains examples of image augmentation using Facebook's Augly Library. For more detail check the notebook and TDS post.