Spotify-Packet-Analysis

Analysis of Digital Audio Streaming Data

Final Report:

https://docs.google.com/document/d/14NxwLCIXSc70cwK8Ixc9oYui0Q0etwYuHvfFpOe2gqk/edit?usp=sharing

Objective:

Can we classify Spotify packets?

The high-level methodology of our research project encompasses the systematic collection of network data from the Spotify application and Web Browser. We will subsequently extract important features, apply classification models, and evaluate based on precision and recall as metrics.

Data:

.pcap and .csv

.pcap files will be processed using python (pcapkit)
raw data (direct features):
- headers of protocols
- source and destination IP
- payload data of the packets
processed data (derivative features):
- inter-arrival time between packets
- average packet inter-arrival time per destination (Spotify server)

Tasks:

What mechanisms are we using

Collect network data using tcpdump and scapy
Feature engineering/extraction
Apply ML models to dataset

Evaluation:

How are we evaluating the ML models?

Compare Precision/Recall and RMSE
Confusion Matrix

Actual Class	Classified as non-Spotify	Classified as Spotify
non-Spotify data	True Negative	False Positive
Spotify data	False Negative	True Positive

Deliverables:

Generated outputs

bash/py script (that performs automated data collection)
Raw dataset (.pcap)
Clean dataset
ML model (Decision Tree, Random Forest, XGBoost)
Final report

File Structure

.
├── script                  # .py and .sh to collect packets
├── data                    # .pcap and .csv
├── notebook                # .ipynb for feature engineering, visualization and modeling
├── visualization           # .png or .pdf (saved visualizations)
├── .env                    # to save API keys
├── .gitignore
├── requirements.txt
├── environment.yml
└── README.md

ckalsdh/Spotify-Packet-Analysis