what the differences among instruct_98K, instruct_140K, instruct_186K?
peiliu0408 opened this issue · 1 comments
peiliu0408 commented
as mentioned in title, in 2D_instruct folder (opendatalab), there are three merged instruct files, what the different among these files?
wangjiongw commented
Thanks for your attention. As mentioned in our paper, 2D part of LAMM dataset includes 4 parts, daily dialogue, detailed description, fatual knowledge dialogue and visual task dialogue, and the sample numbers are 49k, 49k, 42k, 46k, respectively. Thus, instruct_98k consists of daily dialogue and detailed description, 140k refers to 98k plus factual knowledge dialogue and 186k are the whole set of LAMM.