Data mining project
This project is for a data mining class and analyzes the NBA dataset.
The order in which to run the scripts
In order to generate figures / data:
notebooks/Preprocessing.py
notebooks/Datacubing.py
notebooks/AnalysisBasicPlots.py
notebooks/AnalysisTDWeights.py
Then you can start building the reports. Some of the reports depend on figures and tables that are generated by the python scripts. Please do not submit generated figures or tables to Github. The scripts are build in a way, that if we discover a mistake in the data and figures, simply rerunning everything will automatically update the report.
Setup
Go to https://www.kaggle.com/wyattowalsh/basketball and download the basketball dataset. It should be a file named archive.zip
. If you extract it, there should be a folder called archive
. Copy this folder into the root of this project to set it up.
archive/basketball.sqlite
archive/daily_execution_pipeline.yml
...
Having the archive in the right place is essential so that everybody uses the same file path. Do not commit the database to GitHub, as the file is too large (>50MB) to be uploaded.
Required software
Make sure you have python >3.5 and numpy
, matplotlib
, ipykernel
, and pandas
installed.
pip install -r requirements.txt
Preprocessing
The preprocessing pipeline is in notebooks/Preprocessing.py
. Running it with the VS Code python extension or just as a normal python script will create notebooks/reducedDataset.sqlite
.
cd notebooks
python3 Preprocessing.py
Data cubing and figure generation
The project report depends on figures, that are generated through the Datacubing notebook. Do not submit the figures to Github, as they can automatically be updated in the project report if we decide to change anything in the datacubing script. Run
cd notebooks
python3 Datacubing.py
to generate the figures. It will also generate 4 files called cubeGames.csv
, cubePlayers.csv
, biometricCube.csv
and biometricCubeRaw.csv
that are used in subsequent analysis notebooks.
Building Project reports manually
For Python code highlighting a latex package called minted
is needed. This depends on Pygments. If it is not already installed, it can be installed using
pip install Pygments
The pdf file can then be compiled using
cd reports/1
pdflatex -shell-escape Group3_project1.tex
Worst case: Overleaf has Pygments pre-installed.
Building Project reports with Docker
If you have docker installed, run
./dockerBuild.sh
./dockerTex.sh
to compile the report. Building the Dockerfile
easily takes an hour because of the texlive-full
installation.