Here, we've built an end-to-end ML pipeline where we're extracting data from Twitter API on daily basis and some news website. Initially, we're using this pipeline to store data localy but later moved to Azure server. So, you can store and process data either locally or azure database. Just change the config template in src/config folder
It's easy to setup COVID19 Pipeline which we can setup either locally or on Azure Server.
- RUN command sh run_pipeline.sh in terminal/cmd.
- Once it start running then open localhost:8080 in any browser.
- Import json file from dags/covid19_dag/config/arguments_parsing.json into Airflow environment (under variables section).
- Change the variables according to your requirements.
- Trigger the dag and you'll see below results at some point.
- Airflow: Workflow management for ETL pipeline
- Azure Server: Azure Database Storage and Virtual Machine
- NRC: Emotion Lexicons for Sentiment Analysis
- Twitter API
- Covid Cases, Deaths, and Recovered Info (Atlantic)
- DOD Actions taken in favour of COVID19 (DOD)
SNO | References |
---|---|
1 | Apache Airflow Docker Image |
2 | Airflow Learn Resource |
3 | Azure Storage with Python |
Please feel to try out it and kindly raise an issue if you face any problem in its execution.