- Fine-grained annotation can be found in π€Huggingface.
- Dataset can be found in π€Huggingface, which contain 219,437 image descriptions. Link to our paper: arxiv.
See detailed instructions in install.md.
- COCO: Download here train2017.
- SAM: Click here SAM (sa_000000.tar ~ sa_000024.tar).
- VG: Click here VG.
After downloading, organize the image datasets as follows in ./dataset/
:
βββ coco
β βββ train2017
βββ sam
βββ images
βββ vg
After install all the requirements, you can follow use.md to generate description on your datasets.
If you find our work useful for your research or applications, please cite using this BibTeX:
@misc{pi2024image,
title={Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions},
author={Renjie Pi and Jianshu Zhang and Jipeng Zhang and Rui Pan and Zhekai Chen and Tong Zhang},
year={2024},
eprint={2406.07502},
archivePrefix={arXiv},
primaryClass={cs.CV}
}