- Python
- Spark
This is pretty excessive to use Spark for, but it's just a small practice example on my local computer
- Make sure you have a text editor such as Visual Studio Code installed.
- Have a running version of Python3.7
- Clone this repository (https://github.com/jarretjeter/Spark-Spotify-Data.git) onto your local computer from github
- In your terminal create a virtual environment ('python3.7 -m venv venv'), activate it ('source venv/bin/activate') and then install the requirements ('pip install -r requirements.txt')
- In the root project directory, create a folder named "data", go to the directory in your terminal and then run the command 'gsutil -m cp gs://data.datastack.academy/spotify/spotify_artists.csv ./data' to retrieve the csv data
- Once you have the data downloaded, add a name (such as "row") to the csv header index column to avoid errors
- With that all done you can run the notebook cells in the 'main.ipynb' file to see some simple examples of Spark
- No known bugs at this time
If you have any questions, please email me at jarretjeter@gmail.com
Copyright (c) 6/26/2022 Jarret Jeter