Hoot is an OWL Ontology in RDF/XML format that describes English phonetic information as a domain of knowledge. Each word of English is present as an instance of the OWL class Word
; each Word
has a unique identifier (IRI
), spelling (rdfs:label
), and phonetic transcription (hoot:asIPA
). There is also a Phone
class, which identifies each attested phone in the English language. The English words and their phonetic transcriptions come from the CMU Pronouncing Dictionary.
There are a number of implications for a graph-model-defined phonetic dictionary, and one of them is that we can leverage the SPARQL RDF query language to find interesting phonetic facts and relationships.
The following demonstrates just a few example SPARQL queries of Hoot:
- View all phonetic symbols that represent vowels
PREFIX hoot: <http://mphilli.github.io/hoot#>
SELECT (strafter(str(?phone), str(hoot:)) as ?vowels) {
?phone hoot:hasPhoneType hoot:vowel.
}
vowels |
---|
æ |
aɪ |
aʊ |
ɑ |
e |
ə |
ər |
ɛ |
i |
ɪ |
oʊ |
ɔ |
ɔɪ |
u |
ʊ |
- Find a word, its IRI, and its phonetic transcription
PREFIX hoot: <http://mphilli.github.io/hoot#>
SELECT ?word ?IRI ?ipa WHERE {
?IRI rdfs:label ?word ;
hoot:asIPA ?ipa .
FILTER regex(str(?word), "^hoot$")
}
word | IRI | ipa |
---|---|---|
hoot | http://mphilli.github.io/hoot#w055894 | hut |
- Find all words that end with the phonemes for "owl" (aʊl)
PREFIX hoot: <http://mphilli.github.io/hoot#>
SELECT ?word ?ipa WHERE {
?id rdfs:label ?word ;
hoot:asIPA ?ipa .
FILTER regex(str(?ipa), "aʊl$")
}
word | ipa |
---|---|
afoul | əˈfaʊl |
foul | faʊl |
fowl | faʊl |
growl | graʊl |
howl | haʊl |
cowl | kaʊl |
peafowl | ˈpiˌfaʊl |
owl | aʊl |
jowl | ʤaʊl |
prowl | praʊl |
waterfowl | ˈwɔtərˌfaʊl |
scowl | skaʊl |
towel | taʊl |
- View words that end with -ology and start with a nasal
PREFIX hoot: <http://mphilli.github.io/hoot#>
SELECT ?word {
?id rdfs:label ?word ;
hoot:asIPA ?d .
BIND(REPLACE(?d, "ˈ", "") as ?e) # remove primary stress marker
BIND(REPLACE(?e, "ˌ", "") as ?f) # remove secondary stress marker
BIND(IRI(concat(str(hoot:), substr(?f, 1, 1))) as ?phone)
FILTER regex(str(?word), "ology$")
?phone hoot:hasPhoneType hoot:nasal .
}
word |
---|
mixology |
meteorology |
methodology |
nanotechnology |
microbiology |
micropaleontology |
mycology |
morphology |
mythology |
neurology |
numerology |
necrology |
- Get words that rhyme with "maple"
PREFIX hoot: <http://mphilli.github.io/hoot#>
SELECT ?word WHERE {
?source rdfs:label ?word ;
hoot:asIPA ?rhyme .
{SELECT (strafter(str(?phones), "m") as ?n) ?literal ?phones {
?w rdfs:label ?literal ;
hoot:asIPA ?phones .
FILTER regex(str(?literal), "^maple$")
}}
FILTER regex(str(?rhyme), concat(str(?n), "$"))
FILTER(str(?word) != str(?literal)) # prevent a word from rhyming itself (spelling-wise)
FILTER(str(?phones) != str(?rhyme)) # prevent a word from rhyming itself (phonetically)
}
word |
---|
capel |
caple |
papal |
yaple |
staple |
These are just a few examples of the things we can do with Hoot. What we could do with Hoot is ultimately a reflection of how we expand it. There are two primary ways two improve upon Hoot:
- Add more information to words at different linguistic levels beyond phonetics (e.g., syntax & semantics)
- This would enable us to find other, even more complex relations between words.
- Add more languages
- This would provide us with a knowledge base to discover interesting and complex relationships cross-linguistically, and allow us to perform all kinds of typological analysis in a semantically-informed way.
(Keep in mind that other linguistic ontologies exist, although the goals may be different).
Hoot exists in RDF/XML format as an .rdf
file. I recommend placing the file in a graph database (triple store) in order to interact with it. If you have your own way of doing this, then by all means. If you need a simple way, I would recommend Blazegraph. Simple installation instructions are described here:
-
Download the Blazegraph .jar file
-
With Java (7 or greater) installed on your machine, type
java -server -Xmx4g -jar blazegraph.jar
. This starts the Blazegraph server athttp://192.168.0.2:9999/blazegraph/
. Navigate there on your web browser. -
In Blazegraph, select the UPDATE tab, and select the hoot.rdf file by clicking the
Browse...
button. Then hitUpdate
. You should get a little report of how many triples are in the database, and how long it took to make. For example:Modified: 401574 Milliseconds: 7288
-
Now navigate to the QUERY tab; here you can query the database with SPARQL. Test out some of the queries described above to get started.
-
Also, you can treat the server URL
http://192.168.0.2:9999/blazegraph/sparql
as a SPARQL endpoint. That means you can query the database in other ways. Here's an example of retrieving all English fricatives usingSPARQLWrapper
in Python:
from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper("http://localhost:9999/blazegraph/sparql")
sparql.setQuery("""
PREFIX hoot: <http://mphilli.github.io/hoot#>
SELECT (strafter(str(?phone), str(hoot:)) as ?fricatives) {
?phone hoot:hasPhoneType hoot:fricative.
}""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
if results:
fricatives = [(r["fricatives"]["value"])
for r in results["results"]["bindings"]]
print(str(fricatives))
# prints: ['ð', 'f', 's', 'ʃ', 'v', 'z', 'ʒ', 'θ']