pdf there in corresponding folder Task_1
"tweets_dump".json
contains tweets (Hashtag: National Mathematics Day, 22nd Dec,2020 : Delhi)
corresponding ipynb and pdf files are also present
The code for population into database is present and general conversion is in convert.py
. One can set the path_to_file
variable to test it.
The table version of the 4 pdfs can be seen in extract.ipynb
.
The code to populate the database is in convert.py
.
The top tags analysis is in top.ipynb
This task is incomplete due to lack of time on my end.
I faced issues dealing with large size and had to break posts.xml to 33 smaller files to analyze top tags (can be found at bottom of ipynb file). COrresponing code to split 'Posts.xml' in split.sh