Periodically compares recent tweets on Twitter against recent Reddit comments, to compute text similarities based on the Word2Vec algorithm. Social media text is fetched against certain keywords related to COVID-19.
Environment variables required -
TWITTER_BEARER_TOKEN
MONGODB_URI
AIRFLOW_HOME
Set up your virtual environment with Apache Airflow.
python3 -m venv airflowEnv
source airflowEnv/bin/activate
pip install -r airflow-scheduling/requirements.txt
Run these two commands
airflow webserver -p 8080
airflow scheduler
Visit localhost:8080
and start the DAG, it will run periodically and you're good to go!