/Machine_Learning

Some fundamental machine learning and data-analysis techniques are explained through realistic examples.

Primary LanguageJupyter NotebookOtherNOASSERTION

Machine_Learning

This repo contains introduction to some of the most important machine learning and data-analysis techniques.

Filenames are preceded by DDMMYY. For descriptions and more check the Wiki Page.

PCA_Muller.py 190818: Principal component analysis example with breast cancer data-set.

270918: RidgeandLin.py, LassoandLin.py: Lasso and Ridge regression examples.

081018: bank.csv, data set of selling products of a portuguese company to random customers over phone call(s). Data-set description is available here.

161018: gender_purchase.csv, data-set of two columns describing customers buying a product depending on gender.

111118: winequality-red.csv, red wine data set, where the output is the quality column which ranges from 0 to 10.

121118: pipelineWine.py, A simple example of applying pipeline and gridsearchCV together using the red wine data.

24112018: lagmult.py, This program just demonstrate a simple constrained optimization problem using figures.

11122018: Consumer_Complaints_short.csv, 3 columns describing the complaints, product_label and category. Complete file can be obtained from Govt.data.

13122018: Text-classification_compain_suvo.py, Classify the consumer complaints data, which is already described above.

1912018: SVMdemo.py*, this program shows the effect of using RBF kernel to map from 2d space to 3d space. Animation requires ffmpeg in unix system.

05032019: IBM_Python_Web_Scrapping.ipynb, Deals with basic web scrapping, string handling, image manipulation.

06042019: datacleaning, Folder containing files and images related to data cleaning with pandas.

08062010: DBSCAN_Complete, Folder containing files and images related to application of DBSCAN algorithm to cluster Weather Stations in Canada.

13072019: SVM_Decision_Boundary, Pipeline + GridSearchCV were performed to find best-fit parameters for SVM and then decision function contours of SVM classifier for binary classification are plotted.

28122019: DecsTree, Folder contains notebook using a decision tree classifier on the Bank Marketing Data-Set.

07032020: Conjugate Prior, Folder contains a notebook where concept of conjugate prior is discussed including an introduction to PyMC3.

29052020: ExMax_Algo, Folder contains a notebook completely explaining the Expectation Maximization algorithm.

11092020: AdaptiveLoss.ipynb, File contains description and a simple implemetation of robust and adaptive loss function. Original Paper by J. Barron. More details on TDS.

31092020: pima_diabetes.ipynb, file contains description of data preparation and choosing best machine learning algorithm for binary classification task. Little more details on kaggle kernel.