/phonotactics

The World Phonotactics Database as CLDF dataset

Primary LanguagePythonCreative Commons Attribution 4.0 InternationalCC-BY-4.0

World phonotactics database

Build Status

Cite the source dataset as

Donohue, Mark, Rebecca Hetherington, James McElvenny and Virginia Dawson. 2013. World phonotactics database. Department of Linguistics, The Australian National University.

This dataset is licensed under a CC-BY-4.0 license

Available online at https://doi.org/10.5281/zenodo.815506

The data in this repository used to be available online, with a browsable web interface at http://phonotactics.anu.edu.au/ At this point in time (2020-01-17) it has disappeared.

Luckily, the data has survived in the form of a CSV dump archived on Zenodo:

DOI

Thus, the CLDF dataset curated in this repository is derived from the data dump on Zenodo.

The dataset provides data on more than 3,500 languages, including a rich set of language metadata. The 184 language features for which this dataset provides values are grouped in three datatypes (as specified in the datatype column):

  "datatype"

	Type of data:          Text
	Contains null values:  False
	Unique values:         3
	Longest value:         7 characters
	Most common values:    boolean (144x)
	                       integer (36x)
	                       number (3x)

Numeric data (datatype integer and number) comes with a specification of minimum and maximum reported values in the cldf/parameters.csv.