A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation

PhoST is a high-quality and large-scale English-Vietnamese speech translation dataset with 508 audio hours, consisting of 331K triplets of (sentence lengthed audio, English source transcript sentence, and Vietnamese target subtitle sentence). Details of the dataset construction and experimental results can be found in our INTERSPEECH 2022 paper:

@inproceedings{PhoST,
title     = {{A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation}},
author    = {Linh The Nguyen and Nguyen Luong Tran and Long Doan and Manh Luong and Dat Quoc Nguyen},
booktitle = {Proceedings of the 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH)},
year      = {2022}
}

Please follow this LINK to download the PhoST dataset. By downloading this dataset, USER agrees:

to use the dataset for research or educational purposes only.
to not distribute the dataset or part of the dataset in any original or modified form.
and to cite our INTERSPEECH 2022 paper "A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation" whenever the dataset is used to help produce published results.

THE DATA IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE DATA OR THE USE OR OTHER DEALINGS IN THE
DATA.

VinAIResearch/PhoST

A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation

Copyright (c) 2022 VinAI