Machine Learning approach to English Corpus Text-visualization using Word2Vec Model from Gensim Library in NLP. This project was done to test the accuracy of the Word2Vec Model on English Corpus.
- Sklearn: Used for data preprocessing, model selection, classification, Regression, clustering.
- Matplotlib: It's used for 2D or 3D plotting to show Histogram, Bar-Chart etc
- Gensim: Open Source Library used in Text Analysis, Word2Vec, Doc2Vec.
- Used Melon Honey font & sample texts are collected from the Internet.
Word2Vec model is used in word embedding. I have used here Gensim library & Matplotlib-pyplot for 2d visualization of corpus.
- First I took an English Corpus applied punctuation remover.
- Splitted the data & visualized the corpus using.
- Repeated the Process taking larger corpus.
- Google Colab/Jupyter Notebook
- Language: Python
- Word2Vec from Gensim
- Matplotlib | Plyplot
Prof. Sandipan Ganguly, HIT-K.
Rajdeep Das