/project-data-mining

Predict which hotels users are most likely to book

Primary LanguageJupyter NotebookMIT LicenseMIT

Assignment 1

(this branch)[https://github.com/voschezang/Data-Mining/tree/A1]

Assignment 2

Preprocess data with clean_data.py (fit + transform train/test data). Further process data with clean_data_2*.py (clustering and SVD).

The notebooks A2-***.ipynb show how to crossvalidata machine learning models and generate result-files. They should be self-explanatory.

Kaggle Expedia hotel recommedation challenge (link)

Performance is measured using Normalized Discounted Cumulative Gain (NDCG)@5 (see also (https://en.wikipedia.org/wiki/Discounted_cumulative_gain)