LoveForData Interview Assesment Test
Objective 1: You are required to scrap a website to generate dataset. Take https://www.dawn.com/ as a reference website where the timeline of news articles start from January 1, 2021. This should be a plain parallel data <news_details, news_category> ; you may store individual feeds in the form of JSON or CSV.
Objective 2: Extract important topics from the corpus, create a map/graph for topic and documents. Visualizing it would be a plus.
Objective 3: Implement a function to extract named entities from the corpus.
Objective 4: Model the data for classifying news_category. At least two classification algorithms are required for comparative analysis.