Pinned Repositories
cldf
CLDF: Cross-Linguistic Data Formats - the specification
concepticon-data
The curation repository for the data behind Concepticon.
edictor
JavaScript program for interactive viewing, manipulating, and editing of wordlists, represented in form of TSV files.
lingpy
LingPy: Python library for quantitative tasks in historical linguistics
algorithmlexicon
alterphono
Collection of scripts and data for computational phonology
dogon-data
Dogon languages, complete re-write of the old attempts.
histwords
Collection of tools for building diachronic/historical word vectors
SwadeshLists
This is simple table data that maps different Swadesh lists onto common concepts. At the moment, I'm not sure where it will lead in the end, but we start with simple concept-string mappings for all items in different Swadesh lists that may later be expanded by additional rankings and the like.
LinguList's Repositories
LinguList/DiaSim
Work in progress: A Java-written architecture, built using modern phonology theory in the Neogrammarian tradition, which can be used to simulate the realization of diachronic sound change, given the starting phonological forms for the lexemes and an ordered list of historical rules. Undergoing renovation but current form should run as of 12/22/2022
LinguList/joeynmt
Minimalist NMT for educational purposes
LinguList/Latin-Reconstruction-NAACL
LinguList/lmdemo
Language Model Demo
LinguList/MorphyNet
MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology (+morpheme segmentation)
LinguList/sBayes
MCMC algorithms to identify linguistic contact zones
LinguList/spaCy
đ« Industrial-strength Natural Language Processing (NLP) in Python
LinguList/TILES
TILES: an algorithm for community discovery in dynamic social networks
LinguList/BANG_data_work
Code snippets for data work on the BANG project
LinguList/bibtex-tidy
Cleaner and Formatter for BibTeX files
LinguList/CognateTransformer
Code for EMNLP-2023 paper
LinguList/contacTrees
BEAST 2 package for inferring ancestral conversion graphs for language phylogenetics
LinguList/ety
EDoS - Online Etymological Dictionary of Spanish
LinguList/fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
LinguList/fastText_multilingual
Multilingual word vectors in 78 languages
LinguList/Fleet
LinguList/glove-python
Toy Python implementation of http://www-nlp.stanford.edu/projects/glove/
LinguList/iconicity-deep-learning
Code employed in the deep learning-based study of iconicity in language. Details can be found in the paper (de Varda & Strapparava (2022). A CrossâModal and Crossâlingual Study of Iconicity in Language: Insights From Deep Learning. Cognitive Science.
LinguList/Language-velocity-field-estimation-for-language-dispersal-pattern-inference
R package and tutorial for the language velocity field estimation (LVF).
LinguList/lowerfungom-wordlists
LinguList/meloni-2021-reimplementation
PyTorch re-implementation of Meloni et al 2021
LinguList/multipa
Universal multilingual automatic speech transcription into IPA
LinguList/nlp-paper-implementation
LinguList/node2vec
Implementation of the node2vec algorithm.
LinguList/phonechars
Python library for extracting phonological phylogenetic characters from aligned lexical data
LinguList/PILA
A historical-linguistic dataset for Proto-Italic and Latin, having been established in work by Bothwell et al. 2024.
LinguList/sequence_manipulation_suite
A collection of simple JavaScript programs for generating, formatting, and analyzing short DNA and protein sequences. The Sequence Manipulation Suite is commonly used by molecular biologists, for teaching purposes, and for program and algorithm testing.
LinguList/stable-diffusion
A latent text-to-image diffusion model
LinguList/styles
Official repository for Citation Style Language (CSL) citation styles.
LinguList/suffix-tree
A Generalized Suffix Tree for any Python iterable using Ukkonen's algorithm, with Lowest Common Ancestor retrieval.