A question for the regularization captions

Question

A question for the regularization captions

Opened this issue a year ago · 1 comments

Hi, thanks for your excellent work.
I have a question regarding the regularization captions found in the file located at ./data/regularization_captions.txt. I am quite curious about how these captions were obtained. The relevant descriptions or explanations in the paper seem only "∼ 1000 randomly sampled captions for regularization." Can you show more details about their origin or acquisition?

Answer 1 · 2023-12-24T04:33:18.000Z

Hi, I randomly sampled some captions from the laion-400m dataset using the webdataloader with shuffle on. Maybe curating them carefully to consist of hard negatives will be useful, but I didn't analyze this in detail.

Thanks.