pip install apache-airflow pandas numpy praw
mkdir config dags data etls logs pipelines tests utils
touch airflow.env docker-compose.yaml Dockerfile
pip freeze > requirements.txt
Fill Dockerfile, docker-compose.yml, airflow.env contents
docker compose up -d --build
- create a DAG using airflow and reddit pipeline initiation
- create a reddit_pipline function where we connect to reddit instance and perform ETL
- database, filepaths, api_keys, aws access keys, etl settings written in config.conf file
- retrieve all the keys, secrets into constants.py
- connect to reddit using praw
add all the code-> Create DAG-> reddit pipeline (ETL)->connect to reddit using praw-> ETL posts from reddit-> constants for storing usernames& Passwords
docker compose up -d --build
open - http://localhost:8080/
airflow users create --username admin --firstname admin --lastname admin --role Admin --email airflow@airflow.com --password admin
Reddit, Airflow, Celery, Postgres, S3, AWS Glue, Athena, and Redshift to create a seamless ETL process.