This project build an ETL pipeline load data from S3 and output to parquet format
The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks.
- etl.py reads and processes files from song_data and log_data and output to parquet format into S3.
python etl.py