speech-corpus

There are 18 repositories under speech-corpus topic.

  • clovaai/ClovaCall

    ClovaCall dataset and Pytorch LAS baseline code (Interspeech 2020)

    Language:Python21714856
  • yc9701/pansori

    Tools for ASR Corpus Generation from Online Video

    Language:Python1384228
  • lennes/spect

    SpeCT - Speech Corpus Toolkit for Praat. Documentation: https://lennes.github.io/spect/

    Language:HTML535211
  • kan-bayashi/LibriTTSLabel

    Alignment files of LibriTTS.

  • khiajohnson/SpiCE-Corpus

    An open-access corpus of conversational bilingual speech in Cantonese and English

    Language:JavaScript41304
  • AsoSoft/AsoSoft-Speech-Corpus

    AsoSoft Speech Corpus can be used for spoken language processing tasks in Central Kurdish such as speech recognition, speaker recognition, gender identification, and phonetic analysis.

  • dcavar/ELAN2split

    Split ELAN Annotation Files and corresponding speech files into a corpus format for common ASR and Forced Aligners

    Language:C++10303
  • kevobt/speech-to-text-voxforge

    Downloader for the voxforge corpus

    Language:Python8102
  • ubaleht/SiberianIngrianFinnish

    This project is devoted to the Siberian Ingrian Finnish language. Siberian Ingrian Finnish – is a language (dialect) used by the descendants of the settlers who spoke Lower Luga Ingrian Finnish varieties and Lower Luga Ingrian (Izhorian) who have been living in Omsk oblast (previously they lived also in other regions of the Siberia) for more than 200 years. The ancestors of the speakers of Siberian Ingrian Finnish came from the Lower Luga area in the early 19th century. They came from the Rosona river area, to be exact. This region is also called Estonian Ingria. Siberian Ingrian Finnish (Russian: Сибирский ингерманландский идиом) is the term introduced by D. V. Sidorkevich.

    Language:C#6100
  • joneavila/DRAL

    Code for Dialogs Re-enacted Across Languages (DRAL)

    Language:Python40
  • ubaleht/SiberianTatar

    This project is devoted to the dialects of the Siberian Tatars. Around 100,000 people are spoken in these dialects. The language of Siberian Tatars consists of three dialects: Tobolo-Irtysh, Tom and Baraba.

  • vectominist/Switchboard-WSJ-Utils

    Utilities for preprocessing the Switchboard and WSJ corpora in Python3

    Language:Python4101
  • ina-foss/InaGVAD

    Voice activity detection and speaker gender segmentation audiovisual corpus

    Language:Jupyter Notebook3301
  • mllpresearch/Europarl-ASR

    A 1300-hour English speech and text corpus of parliamentary debates for streaming ASR training and benchmarking, speech data filtering and speech data verbatimization.

  • mbar0075/Speech-Technology

    Deliverables relating to the Speech Technology University Unit (Notes Courtesy to Dr. Andrea De Marco)

    Language:Jupyter Notebook10
  • mllpresearch/ESO-dataset

    ESO speech dataset: an English-language speech corpus of the oncology domain for ASR training and benchmarking and MT benchmarking.