Precog Recruitment Tasks

Task 1

pdf there in corresponding folder Task_1

Task 2:

"tweets_dump".json contains tweets (Hashtag: National Mathematics Day, 22nd Dec,2020 : Delhi)

corresponding ipynb and pdf files are also present

Task 3

Part 1:

The code for population into database is present and general conversion is in convert.py. One can set the path_to_file variable to test it.

The table version of the 4 pdfs can be seen in extract.ipynb.

Part 2:

The code to populate the database is in convert.py. The top tags analysis is in top.ipynb This task is incomplete due to lack of time on my end. I faced issues dealing with large size and had to break posts.xml to 33 smaller files to analyze top tags (can be found at bottom of ipynb file). COrresponing code to split 'Posts.xml' in split.sh