/ml4ld

Mini Project A/B for 10-805: Machine Learning for Large Datasets

Primary LanguageJupyter Notebook

Machine Learning for Large Datasets

Mini Project A/B for 10-805: Machine Learning for Large Datasets:

  1. Song hotness prediction on the million song dataset
  2. Image classification on the CIFAR-100 dataset

The structure of this repo is as follows:

├── deliverables # project report.
│   ├── ml4bd_proposal.pdf
│   ├── ml4bd_report.pdf
├── mpa  # mini-project-a for million song dataset.
│   ├── etl.py # Process the dataset.
│   ├── mpa-20.ipynb # run expertiments a pyspark instance with 20 nodes.
├── mpb # mini-project-b for CIFAR-100 dataset.