MSVD-Indonesian

MSVD-Indonesian (Paper: link) is derived from the MSVD dataset, which is obtained with the help of a machine translation service. This dataset can be used for multimodal video-text tasks, including text-to-video retrieval, video-to-text retrieval, and video captioning. Same as the original English dataset, the MSVD-Indonesian dataset contains about 80k video-text pairs.

Data

Indonesian (Bahasa Indonesia) sentences: link

Raw videos: link

Qualitative Results

Text-to-Video Retrieval

Video-to-Text Retrieval

Video Captioning

Citation

If you find our work useful in your research, please cite:

@article{Hendria2023MSVDID,
  title={{MSVD}-{I}ndonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian},
  author={Willy Fitra Hendria},
  journal={arXiv preprint arXiv:2306.11341},
  year={2023}
}

Acknowledgments

Our experimental results are obtained utilizing the resources from X-CLIP and VNS-GRU. We thank the original authors for their open-sourcing.

willyfh/msvd-indonesian