annotations_creators

language

language_creators

license

multilinguality

pretty_name

size_categories

source_datasets

Dataset Card for STAIR-Captions

Dataset Card Creation Guide

Dataset Description

Homepage: http://captions.stair.center/
Repository: https://github.com/shunk031/huggingface-datasets_STAIR-Captions
Paper (Preprint): https://arxiv.org/abs/1705.00823
Paper (ACL'17): https://aclanthology.org/P17-2066/
Point of Contact: info_AT_stair.center

Dataset Summary

STAIR Captions is a large-scale dataset containing 820,310 Japanese captions. This dataset can be used for caption generation, multimodal retrieval, and image generation.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

The language data in JDocQA is in Japanese (BCP-47 ja-JP).

Dataset Structure

Data Instances

[More Information Needed]

Data Fields

[More Information Needed]

Data Splits

[More Information Needed]

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

[More Information Needed]

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

[More Information Needed]

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

Creative Commons Attribution 4.0 License.

Citation Information

@inproceedings{yoshikawa2017stair,
  title={STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset},
  author={Yoshikawa, Yuya and Shigeto, Yutaro and Takeuchi, Akikazu},
  booktitle={Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
  pages={417--421},
  year={2017}
}

Contributions

Thanks to @yuyay for creating this dataset.

shunk031/huggingface-datasets_STAIR-Captions

Dataset Card for STAIR-Captions

Table of Contents

Dataset Description

Dataset Summary

Supported Tasks and Leaderboards

Languages

Dataset Structure

Data Instances

Data Fields

Data Splits

Dataset Creation

Curation Rationale

Source Data

Initial Data Collection and Normalization

Who are the source language producers?

Annotations

Annotation process

Who are the annotators?

Personal and Sensitive Information

Considerations for Using the Data

Social Impact of Dataset

Discussion of Biases

Other Known Limitations

Additional Information

Dataset Curators

Licensing Information

Citation Information

Contributions