This datathon platform is fully developped in python using streamlit with very few lines of code!
As written in the title, it is designed for small datathon (but can easily scale) and the scripts are easy to understand.
-
Easy way => using docker hub:
docker pull spotep/mini_datathon:latest
-
Alternative way => clone the repo into your server:
git clone mini_datathon; cd mini_datathon
You need 3 simple steps to setup your mini hackathon:
- Edit the password of the admin user in users.csv and the login & passwords for the participants
- Edit the config.py file
a) The presentation & the context of the challenge
b) The data content andX_train
,y_train
,X_test
&y_test
that you can upload on google drive and just share the links.
c) The evaluation metric & benchmark score - Run the scripts
a) If you installed it the alternative way:streamlit run main.py
b) If you pulled the docker image, just build and run the container.
Please do not forget to notify the participants that the submission file need to be a csv ordered the same way as given
in y_train
.
Ps: anytime the admin user has the possibility to pause the challenge, in that case the participants won't be able to upload their submissions.
An example version of the code is deployed on heroku here: web app
In the deployed version, we have the UCI Secom imbalanced dataset (binary classification) and evaluated by the PR-AUC score:
in the config.py file you would need to fill the following parameters:
GREATER_IS_BETTER = True
SKLEARN_SCORER = average_precision_score
SKLEARN_ADDITIONAL_PARAMETERS = {'average': 'micro'}
- upload the relevant data the your Google Drive & share the links.
The platform needs only 2 components to be saved:
The leaderboard is in fact a csv file that is being updated everytime a user submit predictions. The csv file contains 4 columns:
- id: the login of the team
- score: the best score of the team
- nb_submissions: the number of submissions the team uploads
- rank: the live rank of the team
We will have only 1 row per team since only the best score is being saved.
By default, a benchmark score is pushed to the leaderboard:
id | score |
---|---|
benchmark | 0.6 |
For more details, please refer to the script leaderboard.
Like the leaderboard, it is a csv file. It is supposed to be defined by the admin of the competition. It contains 2 columns:
- login
- password
A default user is created at first to begin to play with the platform:
login | password |
---|---|
admin | password |
In order to add new participants, simply add rows to the current users.csv file.
For more details, please refer to the script users.
- allow to have a private and public leaderboard like it is done on kaggle.com
- allow to connect using oauth
MIT License here.
We could not find an easy implementation for our yearly internal hackathon at Intel. The idea originally came from my dear devops coworker Elhay Efrat and I took the responsability to develop it.
If you like this project, let me know by buying me a coffee :)