- Extracting data from Youtube API, Transforming data using Pandas, Loading data using Apache Airflow in Amazon EC2 and load it to Amazon S3
- Get brief understanding of building data pipelines
- Using Python code to go through ETL process
- Python 3.9
- AWS EC2: Ubuntu(t2.micro)
- Apache Airflow 2.7.2
- Extracting
- Extracted data of mostPopular videos (Trending videos) using Youtube API
- Transforming
- Selected columns that I am interested: title, description, publishedAt, viewCount
- Loading
- Setting virtual environment using AWS EC2 to run Apache Airflow
- Build DAG in Apache Airflow to build data pipelines
- Load csv file into AWS S3
- Gained insights and knowledge about data pipelines and ETL processes
- Build and manage more complex data pipelines with data modeling
- Build End-to-End process from building data pipeline to data analysis