/LDA_model

Sample code from professional project involving Latent Dirchlet Allocation (topic modelling) time series on unstructured customer text data

Primary LanguagePython

This is a selection of code I built during a summer internship in Data Science. The objective of the project was to create some tools to analyze unstructured customer data in the form of feedback comments (similiar to tweets). The end implementation involved analyzing the comments using the machine learning technique Latent Dirchlet Allocation (LDA) to find the 10 most important 'topics', and then tracking quantitative and qualitative changes in these topics over time. This repo contains a selection of code the I built for the job including the data pipelining, LDA, time-series analysis, and visualization. The actual LDA package used is a java package called MALLET.