feat (data): script to request tweets from twitter API
Closed this issue ยท 3 comments
Objective: Build the database and having data on as much days as possible.
๐ Describe what you want
Update the script about dataset to request specific tweets from the API of twitter based on its date or ID.
The script MUST save ALL the tweets received into csv
files in the data/raw/twitter
directory, with the date and ID of first and last tweet specified (in the name ?). Possible format:
data/raw/twitter/[candidat_name]_[startdate]_[enddate].csv
data/raw/twitter/[candidat_name]_[first_id_tweet]_[last_id_tweet].csv
data/raw/twitter/candidat_name/[startdate]_[enddate]_[first_id_tweet]_[last_id_tweet].csv
data/raw/twitter/week_#x/[candidat_name]_[startdate]_[enddate]_[first_id_tweet]_[last_id_tweet].csv
A particular point must be considered: the script should collect small chunks of results in order to save all the results little by little to avoid issue related to cache memory, disk memory or whatever: Create a tmp
directory where the little portions are stored and after that the script
This script should be designed to be launched periodically, every week (or every day?) and collect specified amount of tweets about each candidate. These amounts of tweets per day and per candidate are yet to be determined.
โ๏ธ Definition of done
- a functioning script is written.
- a format for the filename is chosen,
- the script create a
tmp
directory where it saves small chunks of the total results. - the script concatenate all the chunks into a final
csv
file.
This script should be FIRST and ONLY tested with small amounts of tweet requested to the API in order to save the amount of tweet we can request: for instance 1k tweets for 2 or 3 candidates. The person testing the script should be careful to check the above points.
This script will be used for larger amounts after the pull request is validated.
Acutally, you do not need to write a need script but only add a feature to the existing one.
la requete:
poetry run python -m src data --download twitter --mention Melenchon --start_time '2022-03-18 8:00' --end_time '2022-03-18 22:00'
At this time, the part concerning:
A particular point must be considered: the script should collect small chunks of results in order to save all the results little by little to avoid issue related to cache memory, disk memory or whatever: Create a tmp directory where the little portions are stored and after that the script.
is not implmentend yet