huggingface/OBELICS
Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M documents, 115B text tokens and 353M images.
PythonApache-2.0
Issues
- 0
- 1
How to use LDA for topic modeling
#12 opened by jrryzh - 2
- 4
- 1
Releasing trained topic models?
#8 opened by vishaal27 - 1
- 1
Search engine over the training data
#5 opened by aleSuglia - 11
common_words.json download issue
#6 opened by jrryzh - 1
Training Details
#1 opened by vateye - 4
Metadata process
#4 opened by ellenxtan - 2
Which folder to use?
#2 opened by mckinziebrandon - 3
When will the trained model be released?
#3 opened by chenxshuo