/ML-bucket

Short data science and big data scripts at your service

Primary LanguageJupyter Notebook

Ml-bucket

Ever gone through a situation where you are implementing a research paper and wish for some petty scripts which could have made your life easier? Well, the aim of the repository is to bring all the appurtenances of ML (NLP/CV etc.) into one place and use them whenever you need them with a little tweak. I have added some basic scripts and will add more in due time.

  1. tf-idf.py implements the standard tf-idf (term frequence - inverse document frequency) algorithm using sklearn (TfidfVectorizer), although you can use HashVectorizer for better speedup and scalability.

  2. SVM.py implements Support Vector Machine algorithm on the data train.csv. The code first removes all the un-necessary features, converts the categorical/nominal features to numberical using one-hot encoding method and final training is done using LibSVM .

Everyone is encouraged to contribute to this repository.