baaivision/CapsFusion

坐等数据集

shipengai opened this issue · 12 comments

好棒的工作

谢谢关注~数据集可能最近(1-2周内)不会放。但用来生成数据的CapsLLaMA模型和推理代码会大约1-2周内放,可以先用这个做推理构造数据

The dataset will not be released recently (about 1-2 weeks). But the CapsLLaMA model used to generate data together with the large-scale distributed inference code will be released in 1-2 weeks, please stay tuned.

Thanks for the update. Is there a planned timeline of dataset release?

Hi there, the CapsFus-LLaMA model and distributed inference code have been released, please check it out and give me feedback on any problem you encounter.

Thank you.

好棒,继续期待数据集公开

Hi there, the CapsFus-LLaMA model and distributed inference code have been released, please check it out and give me feedback on any problem you encounter.

Thank you.

Looking for dataset, too. Please kindly @ me if the dataset is released! Thanks!

Waiting for the datasets to be released! 👀

@shipengai @cliangyu @Moonteresa @iamlockelightning
Hi there, we have released the CapsFusion-120M dataset, please check it out!

@shipengai @cliangyu @Moonteresa @iamlockelightning Hi there, we have released the CapsFusion-120M dataset, please check it out!

hi! I download the parquets,but only the third one can be read rightly by pd.read_parquet,the other three show error thrift data. What else way can be used to read these parquets?

hi! I download the parquets,but only the third one can be read rightly by pd.read_parquet,the other three show error thrift data. What else way can be used to read these parquets?

hi @yqy2001 ! I face the same problem as @Moonteresa . Need your Help on reading the data.

@Moonteresa @TyRantLQlyf Thank you for your feedback. I will check it.

@Moonteresa @TyRantLQlyf Thank you for your feedback. I will check it.

hi @yqy2001 . Have you checked this dataset issue?

@TyRantLQlyf @Moonteresa

Hello! I've downloaded data directly from the HuggingFace repository. Upon testing, I successfully accessed the data using the following code:

image

Can you share your error messages?

(Note that pandas and pyarrow packages need to be installed, you can install them through pip install pandas pyarrow)