Perform topic modelling on the transcripts of the TED Talks

Primary LanguagePython


It is human tendency to label all the things we encounter. The internet along with its advantages also nurtured the availability and abundance in data. The larger the data gets, the greater the need to divide larger things into smaller chunks so that they could be accessed and used better. It might be an evolutionary learning to have ability to label the content based continuously training machine learning models. The goal is to design a model that could train on a corpus of text files to generate a finite bag of words that could be used to and predict an unknown/unlabelled text document. For this project we worked on designing and building topic prediction model for the TED talks to predict similar topic labels for TED Talks.

Methods used:

  1. LDA Model
  2. TfIdf Weight Ranking Model
  3. k-NN Model
  4. Word2Vec Model