/kaggle-base

Template for Datascience Competitions

Primary LanguageJupyter Notebook

kaggle-base

  • Template directory for datascience competitions.
  • Data is saved in PostgreSQL on Docker🐳 container and the data is reproducibule/reusable 😄🎉

Usage

Step0. Clone the repository

git clone https://github.com/kiccho1101/kaggle-base.git
cd kaggle-base

Step1. Pull/Build Docker image

Recommended:

make pull

or

make build

Step2. Start up jupyter notebook

make jupyter
  • Copy token and acccess to localhost:${JUPYTER_PORT} (default: 9000)

Step3. Start up DB

make start-db
  • Then you can access to localhost:${PGWEB_PORT} (default: 9002) to view the database.

Step4. Split train data into K-fold

make kfold CONFIG_NAME(default: lightgbm_0)

Step5. Create Features

  • Create all features.
make feature
  • Specify a feature that will be created.
make feature FEATURE_NAME

Step6. Cross Validation

make cv CONFIG_NAME

Step7. Create Stats of each table

make stats

Step8. Train and Predict

make train-and-predict CONFIG_NAME

Step9. Submit

  • Then submit your output file!🙆
./output/submission_xxx.csv

Commands

isort, black

make format

flake8, mypy

make check

Reset DB

make reset-db

execute scripts

Recommended:

make shell
python xxx.py

or

make run python xxx.py

References

まさに特徴量管理に疲弊していたときに見つけたスライド。すごくわかりやすいです。

クラスの書き方が参考になります。