This repository contains my implementations of various data science techniques and algorithms. These implementations are not meant to be efficient or complete as the versions found in the well-known libraries such as Sklearn or Gensim. The purpose is to give the basic idea behind the algorithms by providing minimal working versions.
As an example,
- word2vec version in Gensim contains 2309 lines of code (https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/word2vec.py).
- original word2vec version contains 715 lines of code (https://github.com/tmikolov/word2vec/blob/master/word2vec.c).
- my implementation contains less then 100 lines of code (https://github.com/tevfikaytekin/data_science/blob/master/nlp/word2vec.ipynb).
Please be aware that I regularly update this repository; while many notebooks are well-established, some are still in the early stages of development.
Any suggestions or corrections are welcome.
-Tevfik Aytekin