Spark Spotify Data

Technologies Used

This is pretty excessive to use Spark for, but it's just a small practice example on my local computer

Make sure you have a text editor such as Visual Studio Code installed.
Have a running version of Python3.7
Clone this repository (https://github.com/jarretjeter/Spark-Spotify-Data.git) onto your local computer from github
In your terminal create a virtual environment ('python3.7 -m venv venv'), activate it ('source venv/bin/activate') and then install the requirements ('pip install -r requirements.txt')
In the root project directory, create a folder named "data", go to the directory in your terminal and then run the command 'gsutil -m cp gs://data.datastack.academy/spotify/spotify_artists.csv ./data' to retrieve the csv data
Once you have the data downloaded, add a name (such as "row") to the csv header index column to avoid errors
With that all done you can run the notebook cells in the 'main.ipynb' file to see some simple examples of Spark

If you have any questions, please email me at jarretjeter@gmail.com