Some collection of codes that are used in data mining and data science related fields, developed by me (Data Science, Indiana University):
Artificial-Intelligence:
This folder contains programs in python, where I implemented KNN, Neural Nets, BFS, DFS, A*, Naive Baye's, HMM Viterbi,
MCMC Gibs Sampling algorithms. The description of every program is returned above the specified program itself.
Please check File to run program for each
-
Image Classifier -
File to run - orient.py
Models used - Neural Nets, KNN
Train_data - train-data.txt
test_data - test-data.txt -
Maps -
File to run - route.py
City Data - road-segments.txt
A* data - city-gps.txt -
Parts Of Speech tagger -
File to run - pos_solver.py
Train_data - bc.train
Test_data - bc.test -
Zacate_Auto_Player -
File to run - zacate.py -
Solver_16 -
File to run - solver16.py
input_matrix_data - input
Algorithms:
- Selection Sort - selectionsort.java
- Quick sort - quicksort.java
- Merge Sort - mergersort.java
- Least Commmon Subsequence - LCS.java
- Huffman coding - Huffman.py
- Heap Sort - HeapSort.java
- Dijkstra path finding - dijkstra.py
- DFS - dfs.py (recurssion)
- Binary Search Tree - BinarySearchTree.java
Data Mining:
- Kmeans - kmean_test.R (Implementaion of K-means Algorithm, with number of clusters value(k), tow,l, where l is the
number of points the data to be allocated to.
Data - http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/ - K-L distance - kl.R (Calculates the KL distance)
- Data_mining/BUS_decoders/BUS_decoders/Code - has all the codes related to the project, for cleaning, merging the data.
Please check Readme_Data.txt, Readme_code.txt and Report.pdf
Machine Learning(Self Implementations):
- Linear Regression -
ml_assign_1.py - Ridge regression -
self_implement/rig_regression.py - Lasso regression -
lass.py - Time series -
predict_18april_2may.R - Bagging and Boosting(Adaboost) -
mytree.py - Decision Tree -
mytree.py
Practice folder is for the coding that I do in my spare time.
Exploratory Data Analysis :- In depth analysis before building predictive model. After clicking on .html file, insert http://htmlpreview.github.com/? before the URL, for example http://htmlpreview.github.com/?https://github.com/dwipam/code/blob/master/EDA/s670-04.html
Bayesian A/B test :- Farm and multi-armed bandit problem simulation
Distribution by Technologies:-
Python - Check for Artificial Intelligence Folder, dijkstra.py, dfs.py and practice folder
R - Check for Data Mining Folder
JAVA - Check for Algorithm folder and Data Mining- BetterCode.java and practice folder
Challenges:-
Noctober - Check model.ipnyb within Noctober Folder. Placed 3 winner on AnalyticsVidhya competition.
Telstra - Check Telstra.ipnyb within Telstra challenge.
Attribution - http://htmlpreview.github.io/?https://github.com/dwipam/code/blob/master/AttributionChallenge/Model.html
If this readme is not understandable, write to:
ddkatari@iu.edu
dwipam.katariya@gmail.com