language-data
There are 10 repositories under language-data topic.
WeblateOrg/language-data
Language definitions used by Weblate
lexicalcomputing/hamod
a High Agreement Multi-lingual Outlier Detection dataset
dart-community/linguist_lang_info
A collection of language information tracked by the linguist project.
jonsafari/toy-data
Embeddable submodule of parallel/monolingual text data, for use in testing code and sanity checks
Aatlantise/k-snacs-ud
k-sncacs dataset for Universal Depdencies
dotWee/structured-stern-neon-articles
Collection of approximately 20K German texts from the 2010s: User written texts in form of personal stories, poems, poetry, articles and opinion pieces pulled from archives of the Stern NEON Community website.
eliyetres/lt2316-ht19-a1
Language identification with as few characters as possible
HughAndBecky/gel-flex-merger
several flex databases to be merged
erhankilic/languagesSqlTable
Languages Sql Table - Diller Sql Tablosu
Madwesh-india/AudioCollector
This interactive Python tool enables the recording of bilingual audio samples using PyAudio and ipywidgets. Designed for data collection tasks such as speech datasets, it provides a user-friendly interface to record, save, label, and manage audio files directly within a Jupyter Notebook.