/Text-Preprocessing

Extraction of data from semi-structured text files, and preprocess the text into numerical representations.

Primary LanguageJupyter Notebook

Text-Preprocessing

This repository comprises a data-set that contains 80+ days of COVID-19 related tweets (from late March to mid July 2020). The excel file contains 80+ sheets where each sheet contains 2000 tweets. The task of this project is to preprocess the set of tweets and convert them into numerical representations which are suitable for input into recommender systems and information retrieval algorithms.