Caution
Repository for personal use only. Feel free to reuse the scripts, but I won't be entertaining any feature requests, etc.
Scripts pair-coded with LLLMs to format and prepare large-scale image datasets.
Usually, these scripts follow the order of download -> extract -> misc ops -> preparation (compatibility with datasets
).
Folder | Original Dataset | Simplified Dataset | Notes |
---|---|---|---|
imgedit | sysuyy/ImgEdit | diffusion-cot/imgedit-simpler | This repo uses this distribution because of simplicity. |
gpt-edit | UCSC-VLAA/GPT-Image-Edit-1.5M | UCSC-VLAA/gpt-edit-simpler | None |
echo-4o | Yejy53/Echo-4o-Image | diffusion-cot/echo-4o-instruction-following | We only use the instruction following subset. |