To automate the text classification process (real time) based on user input to identify the category a content belongs.
1.Preparing Dataset
2.Text Processing
3.LDA topic building
4.LDA visualization
5.Clustering
6.Prediction
-
Importing Librariers Required Basic Libraries, NLTK, Beautiful Soup
-
Web Scrapping Preparing the data sheet required,As in Sports & Politics news
-
Data Visualization
Distribution Of Document word count
Word cloud of top N words
Sentence colouring of each N sentence
- Plotting --The number of documents for each topic by assigning the document to the topic that has the most weight in that document.
--The number of documents for each topic by summing up the actual weight contribution of each topic to respective documents.
- Buiding of Models
Bigram & Trigram models
LDA Model
-
Prediction of NEW TEXT
-
Predicting Topic