NOTE: Please dont copy assignments or cheat at all. That is not the purpose of this repo. You will get me and yourself both into trouble.


This project was for my masters course project for Computaitonal Linguistics taught by Prof. Koller. In this project I do the data analysis of US news category dataset released in 2021 september. I use NER and LDA for my analysis. NER to understand what personalities have been discussed in this data while LDA to see what topic are discussed in this dataset.

Directory Structure

project directory structure


nltk - 3.7 wordcloud - spacy - 3.5.1 spacy-transformers - 1.2.2 pyLDAvis - 3.4.0

you can use requirements.txt to create a conda envrionment


Its a notebook so every cell can be run separately. those cells which will take a lot of runtime have this mentioned at the top of the cell.


