apple/ml-cvnets

crash if different number of classess within `train/test` set

mjamroz opened this issue · 6 comments

Having different number of classes in train and val directory for - at least - image classification, results in cuda crashing with cryptic error code when training custom dataset:

../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:365: operator(): block: [0,0,0], thread: [0,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.         
2023-08-26 09:42:29 - LOGS    - Exception occurred that interrupted the training:  
[...]

To avoid it, be sure you have the same number of classes for test and train.
I'm checking it by:

import os
from glob import glob
from sys import argv

def cname(path):
    return os.path.split(path)[-1]

test_classes = set(map(cname, glob(os.path.join(argv[1], "test", "*"))))
train_classes = set(map(cname, glob(os.path.join(argv[1], "train", "*"))))
if missing := test_classes - train_classes:
    print("MISSING TRAIN CLASS(ES)", missing)

if missing := train_classes - test_classes:
    print("MISSING TEST CLASS(ES)", missing

Would be cool that ml-cvnets implements it at loading time - as already implemented empty directories check (i.e. script ends if there is no files for particular classes).

BTW, if you wanna split dataset into train/test that it shares all classes, datasets module is useful for that:

from datasets import load_dataset

ds = load_dataset(
    "imagefolder",
    data_files={"train": "/PATH/**"},
    split="train",
)
ds = ds.train_test_split(
    test_size=0.05, stratify_by_column="label", shuffle=True
)
ds["test"].to_csv("test.csv")
ds["train"].to_csv("train.csv")

and then you should link/copy files as in {test,train}.csv

@mjamroz
Hello! I'm trying to train MobileViT model, but I'm having the following problem and am asking for help

File "C:\Users\72344.conda\envs\MobileViTv2\Scripts\cvnets-train.exe_main_.py", line 4, in ModuleNotFoundError: No module named 'main_train'

And I tried to download this module, but show
"ERROR: Could not find a version that satisfies the requirement main_train (from versions: none)
ERROR: No matching distribution found for main_train"

Can you tell what can I do? Thank you very much!

@Tranbaber try to run python main_train.py within ml-cvnets directory, for example python -W ignore main_train.py --common.config-file path_to_config_file.yaml

@mjamroz
Hello! I have a problem after following your tips.
AttributeError : 'NoneType' object has no attribute 'size'
Can you teach me what I should do?

I would recommend you to try "huggingface" - it seems to be easier for beginners use, and it implements mobilevit.

Regarding your error it simply means some variable hasnt been defined, but you skip the most important lines of an error message - which line and variable.

Instead of your custom yaml file, try to use one from example dir

@mjamroz
Thanks! I will try you recommendations later!Thanks again!