Dataset on the ๐ค Hugging Face Hub
Opened this issue ยท 2 comments
Hey Masaya, Ryuichi, Yuma, Takuya and Kentaro,
Congratulations on the release of the LibriTTS-P dataset! It's a very valuable resource for building more expressive text-to-speech models and we can't wait to try it for the Parler-TTS project.
Would you be interested in integrating the LibriTTS-P dataset to the Hugging Face Hub? The Hugging Face Hub is paired with the Datasets Library to help reduce data loading and processing to just a couple of lines of code.
For example, the open-source community will be able to load and pre-process the dataset with just two lines of Python code:
from datasets import load_dataset
libritts_p = load_dataset("ly-corp/libritts-p")
A more in-depth example can be seen for the LibriSpeech dataset here. You can also see the Dataset Viewer feature, which allows users to quickly listen to samples without downloading the dataset locally.
Integrating the dataset to the Hub is quite straightforward:
- Download the audio files locally and define a list to their paths
- Load the metadata locally, matching the order of the audio files
- Convert to Datasets format using this guide
- Push the converted dataset to the Hub (using
.push_to_hub
)
We can place the LibriTTS-P dataset under a new organisation (such as ly-corp
) and add you all as admins to the org, such that you have full control over the dataset and how it's displayed.
Overall, we believe integrating the dataset to the Hub will both: i) promote your dataset and ii) make it easier for the community to use it! Happy to help with any steps in the process, feel free to drop questions here!
@sanchit-gandhi
Thank you for your interest in the LibriTTS-P dataset!
We appreciate the detailed guide on integrating the dataset with the Hugging Face Hub.
We will discuss this opportunity with our team and announce here if we upload it to the Hugging Face Hub.
Awesome, thanks @MasayaKawamura ๐ค As mentioned, happy to help if you have any questions; just let me know!