/Topic_Modelling_on_News_Articles

Official Repo for Z Unlocked Challenge 2!

Primary LanguageJupyter Notebook

Z by HP Unlocked Challenge 2

Z by HP Unlocked Challenge 2 - Topic Modeling - Special thanks to Nick Wan and Hunter Kempf for helping create this challenge! Watch the video tutorial here: https://youtu.be/mkOIb0EulkE

The Task

Summarize the main topics presented in the articles into the most relevant topic groups with text and visuals. This can be done using NLP tools especially Latent Dirichlet Allocation (LDA) and can provide insight into the relevant content within the articles without having to read through all of them.

What is Unlocked?

Unlocked is an action-packed interactive film made for data scientists. Sharpen your skills and solve the data driven mystery here: https://www.hp.com/us-en/workstations/industries/data-science/unlocked-challenge.html

The Data

The Data is pretty straightforward and consists of text files within the challenge2-articles folder. Each text file will follow the format challenge2-articleXXX.txt where X is a number. There should be 144 total articles to summarize.

Where to Start

Feel free to follow along with the jupyter notebook or investigate and create your own topic model.

LDA Models/Visualizations

pyLDAvis

pyLDAvis Tutorials:

Research Papers:

Sklearn Implementation

Inspired by: https://nbviewer.org/github/bmabey/pyLDAvis/blob/master/notebooks/sklearn.ipynb