OpenGVLab/LAMM

what the differences among instruct_98K, instruct_140K, instruct_186K?

peiliu0408 opened this issue · 1 comments

as mentioned in title, in 2D_instruct folder (opendatalab), there are three merged instruct files, what the different among these files?

Thanks for your attention. As mentioned in our paper, 2D part of LAMM dataset includes 4 parts, daily dialogue, detailed description, fatual knowledge dialogue and visual task dialogue, and the sample numbers are 49k, 49k, 42k, 46k, respectively. Thus, instruct_98k consists of daily dialogue and detailed description, 140k refers to 98k plus factual knowledge dialogue and 186k are the whole set of LAMM.