rom1504/laion-prepro
Get hundred of million of image+url from the crawling at home dataset and preprocess them
Python
Pinned issues
Issues
- 1
Define process and load_clip in data loader
#20 opened by EmilyWebber - 0
Link to download annotated data
#22 opened by shubhamagarwal92 - 1
laion400m/download_csv is not available.
#19 opened by yj-yu - 3
Consider ways to distribute the dataset
#9 opened by rom1504 - 13
Does https://github.com/rom1504/laion-prepro/blob/main/laion5B/safety/join.py work for non-en langs?
#17 opened by PranshuBansalDev - 0
- 2
- 4
md5 check for `.parquet` files
#14 opened by vtddggg - 3
How many about the dataset?
#13 opened by qiaogh97 - 2
- 1
add how to make clip embeddings out of it
#3 opened by rom1504 - 1
add how to make knn indices
#4 opened by rom1504 - 1
add command line calls for clip retrieval for cah
#11 opened by rom1504 - 2
- 0
- 0
consider shuffling the dataset
#2 opened by rom1504 - 0
make this more user friendly
#7 opened by rom1504 - 0
add how to get interesting subsets
#6 opened by rom1504 - 0
add what to train using that (clip, dalle)
#5 opened by rom1504 - 1
add content
#1 opened by rom1504