MSVD-Indonesian (Paper: link) is derived from the MSVD dataset, which is obtained with the help of a machine translation service. This dataset can be used for multimodal video-text tasks, including text-to-video retrieval, video-to-text retrieval, and video captioning. Same as the original English dataset, the MSVD-Indonesian dataset contains about 80k video-text pairs.
Indonesian (Bahasa Indonesia) sentences: link
Raw videos: link
If you find our work useful in your research, please cite:
@article{Hendria2023MSVDID,
title={{MSVD}-{I}ndonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian},
author={Willy Fitra Hendria},
journal={arXiv preprint arXiv:2306.11341},
year={2023}
}
Our experimental results are obtained utilizing the resources from X-CLIP and VNS-GRU. We thank the original authors for their open-sourcing.