This project demonstrates the ETL (Extract, Transform, Load) process using a dataset from Kaggle, this being Valve's Steam Games info. The process includes extracting data from a zipped CSV file, transforming the data in a Jupyter Notebook, and loading the processed data into a MongoDB database. https://www.kaggle.com/datasets/fronkongames/steam-games-dataset?resource=download
We have two primary directories: jupyterNotebook and resources.
JupyterNotebook contains test code, aswell as the primary code that does all of the functions, that primary code document is SteamGamesData.ipynb
This contains a sub-folder, and a zip document. The zip document, games_info_clean.csv.zip, contains the raw data. cleanData contains the data we cleaned; That file is called rated_games.csv
We had fun with the project, and since this was just ETL, we didn't get to find any interesting insights, but we were able to do some cleaning. Hopefully this readme is clear and concise.