(this branch)[https://github.com/voschezang/Data-Mining/tree/A1]
Preprocess data with clean_data.py
(fit + transform train/test data). Further process data with clean_data_2*.py
(clustering and SVD).
The notebooks A2-***.ipynb
show how to crossvalidata machine learning models and generate result-files. They should be self-explanatory.
Kaggle Expedia hotel recommedation challenge (link)
Performance is measured using Normalized Discounted Cumulative Gain (NDCG)@5 (see also (https://en.wikipedia.org/wiki/Discounted_cumulative_gain)