The goal of this ETL process is to extract movies ratings data, process and load it into a new Google Cloud Storage Bucket.
Check out my Medium Article at: https://medium.com/@bdadon50/data-engineering-project-movies-data-etl-using-python-gcp-33dcc076166
- Download ratings and movies data from https://files.grouplens.org/datasets/movielens.
- Create a Google Storage Bucket.
- Process the Data and Export to CSV.
- Load CSV files into the Storage Bucket.
- To run the ETL process you must enable the API for the services: Cloud Storage & Cloud Storage JSON API.
- Create a Service Account --> Create a Private Key --> Download Key Configurations in a JSON file, name it ServiceKey_GoogleCloud.json and put it in the project folder.
- You can find an example of this file in the repository(I wont show my actual private key details).
- Download Data:
bash download_data.sh
- Run the ETL process
python movies_etl.py