Text mining, Named Entity & Topic Model

News articles express information by focusing on named entities in news. Hence, it is an interesting work to extract the relationships among entities, words and topics through a large amount of news articles using natural language processing (NLP). I have used popular Latent Dirichlet Allocation ( LDA) algorithm for modeling purpose. LDA is generative probabilistic topic modeling approach.

This exercise will perform the task of discovering the underlying thematic structure in a text corpus with an objective that output to be presented as a report of the top terms appearing in each topic.

Let us explore automatic text processing and experiment with the use of topic model to identify potential topics.