This codebase allows you to jumpstart the INF473V challenge. The goal of this channel is to create a cheese classifier without any real training data. You will need to create your own training data from tools such as Stable Diffusion, SD-XL, etc...
Cloning the repo:
git clone git@github.com:nicolas-dufour/cheese_classification_challenge.git
cd cheese_classification_challenge
Install dependencies:
conda create -n cheese_challenge python=3.10
conda activate cheese_challenge
If CUDA>=12.0:
pip install torch torchvision
If CUDA == 11.8
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Then install the rest of the requirements
pip install -r requirements.txt
Download the data from kaggle and copy them in the dataset folder
The data should be organized as follow: dataset/val
, dataset/test
. then the generated train sets will go to dataset/train/your_new_train_set
This codebase is centered around 2 components: generating your training data and training your model. Both rely on a config management library called hydra. It allow you to have modular code where you can easily swap methods, hparams, etc
To train your model you can run
python train.py
This will save a checkpoint in checkpoints with the name of the experiment you have. Careful, if you use the same exp name it will get overwritten
to change experiment name, you can do
python train.py experiment_name=new_experiment_name
You can generate datasets with the following command
python generate.py
If you want to create a new dataset generator method, write a method that inherits from data.dataset_generators.base.DatasetGenerator
and create a new config file in configs/generate/dataset_generator
.
You can then run
python generate.py dataset_generator=your_new_generator
If you have vram issues either use smaller diffusion models (SD 1.5) or try CPU offloading (much slower). For example for sdxl lightning you can do
python generate.py image_generator.use_cpu_offload=true
To create a submition file, you can run
python create_submition.py experiment_name="name_of_the_exp_you_want_to_score" model=config_of_the_exp
Make sure to specify the name of the checkpoint you want to score and to have the right model config