How many about the dataset?
qiaogh97 opened this issue · 3 comments
qiaogh97 commented
Hi, @rom1504
I download the 32 parquet files and compute the total of url. I find about 26760000 urls in every parquet, and 32*26760000 = 800 million. But you said the number of this dataset is 400m?
So what is the difference?
rom1504 commented
Hi, where did you download the parquet from?
http://the-eye.eu/public/AI/cah/laion400m-met-release/laion400m-meta/ has laion400m
If you downloaded from 3080.rom1504.fr you probably got a more recent version of the dataset that is indeed much bigger (and not really released yet)
rom1504 commented
Ah yes I see I left that 3080 link in the readme, i need to fix it :)
qiao1025566574 commented
Ok, I see