CASE: Consumers provide feedback on financial products or services and our task is to extract the hidden themes/topics and assign each of the feedback documents to one of these themes or topics.
Solution: Train a Natural Language Processing machine learning model to extract the topics from each of the open-ended complaint text document.
Data Source: The data is downloaded from kaggle via this url: consumer complaint data
Topic Modeling is an unsupervized machine learning technique to discover the hidden/latent thematic structure in a large corpus of text documents. Latent Dirichlet allocation (LDA) and Non-Negative Matrix Fatactorization (NMF) are the two most popular topic modeling techniques. LDA uses a probabilistic approach where as NMF uses matrix factorization approach.