zhenyuw16/UniDetector

dataset

Opened this issue · 6 comments

I have a question, do datasets only need to use one of Coco or LVIS, or do they both need to be used.
Additionally, when using dump_ clip_ features_ Manyprompt.py, does our annotation require either val or train, or both

COCO is used for training and LVIS is used for inference. If the label spaces of the training set and the validation set are the same (like COCO), you only need to run dump_ clip_ features_ Manyprompt.py once.

I have another question, is it necessary to train both End to end training and Decoupled training during training, or just one?

No, you only need to run one of them. Decoupled training performs better than end to end training

Possible structures to utilize images from heterogeneous label spaces for training。
There are three possible structures a, b, and c in the paper. Which one was used in our model? Is it not reflected in the paper?

@largestcabbage there is meta info which record the dataset_id, use the corresponding text embeddings to calc loss.

We use the partitioned way here. It is reflected in Table 1.