Hoot: A repository from mphilli

Hoot

Hoot is an OWL Ontology in RDF/XML format that describes English phonetic information as a domain of knowledge. Each word of English is present as an instance of the OWL class Word; each Word has a unique identifier (IRI), spelling (rdfs:label), and phonetic transcription (hoot:asIPA). There is also a Phone class, which identifies each attested phone in the English language. The English words and their phonetic transcriptions come from the CMU Pronouncing Dictionary.

There are a number of implications for a graph-model-defined phonetic dictionary, and one of them is that we can leverage the SPARQL RDF query language to find interesting phonetic facts and relationships.

The following demonstrates just a few example SPARQL queries of Hoot:

View all phonetic symbols that represent vowels

PREFIX hoot: <http://mphilli.github.io/hoot#>
SELECT (strafter(str(?phone), str(hoot:)) as ?vowels) {
      ?phone hoot:hasPhoneType hoot:vowel.
  }

vowels
æ
aɪ
aʊ
ɑ
e
ə
ər
ɛ
i
ɪ
oʊ
ɔ
ɔɪ
u
ʊ

Find a word, its IRI, and its phonetic transcription

PREFIX hoot: <http://mphilli.github.io/hoot#>

SELECT ?word ?IRI ?ipa WHERE {
      ?IRI rdfs:label ?word ; 
             hoot:asIPA ?ipa .
        FILTER regex(str(?word), "^hoot$")
  }

word	IRI	ipa
hoot	http://mphilli.github.io/hoot#w055894	hut

Find all words that end with the phonemes for "owl" (aʊl)

PREFIX hoot: <http://mphilli.github.io/hoot#>

SELECT ?word ?ipa WHERE {
      ?id rdfs:label ?word ; 
          hoot:asIPA ?ipa .
      FILTER regex(str(?ipa), "aʊl$")
  }

word	ipa
afoul	əˈfaʊl
foul	faʊl
fowl	faʊl
growl	graʊl
howl	haʊl
cowl	kaʊl
peafowl	ˈpiˌfaʊl
owl	aʊl
jowl	ʤaʊl
prowl	praʊl
waterfowl	ˈwɔtərˌfaʊl
scowl	skaʊl
towel	taʊl

View words that end with -ology and start with a nasal

PREFIX hoot: <http://mphilli.github.io/hoot#>

SELECT ?word  {
  ?id rdfs:label ?word ;
      hoot:asIPA ?d .
  BIND(REPLACE(?d, "ˈ", "") as ?e) # remove primary stress marker
  BIND(REPLACE(?e, "ˌ", "") as ?f) # remove secondary stress marker
  BIND(IRI(concat(str(hoot:), substr(?f, 1, 1))) as ?phone) 
  FILTER regex(str(?word), "ology$")
  ?phone hoot:hasPhoneType hoot:nasal .
}

word
mixology
meteorology
methodology
nanotechnology
microbiology
micropaleontology
mycology
morphology
mythology
neurology
numerology
necrology

Get words that rhyme with "maple"

PREFIX hoot: <http://mphilli.github.io/hoot#>

SELECT ?word WHERE {
      ?source rdfs:label ?word ; 
       hoot:asIPA ?rhyme .
    {SELECT (strafter(str(?phones), "m") as ?n) ?literal ?phones {
              ?w rdfs:label ?literal ;
              hoot:asIPA ?phones . 
               FILTER regex(str(?literal), "^maple$")
     }}
         FILTER regex(str(?rhyme), concat(str(?n), "$"))
         FILTER(str(?word) != str(?literal)) # prevent a word from rhyming itself (spelling-wise)
         FILTER(str(?phones) != str(?rhyme)) # prevent a word from rhyming itself (phonetically)
  }

word
capel
caple
papal
yaple
staple

Goals

These are just a few examples of the things we can do with Hoot. What we could do with Hoot is ultimately a reflection of how we expand it. There are two primary ways two improve upon Hoot:

Add more information to words at different linguistic levels beyond phonetics (e.g., syntax & semantics)
- This would enable us to find other, even more complex relations between words.
Add more languages
- This would provide us with a knowledge base to discover interesting and complex relationships cross-linguistically, and allow us to perform all kinds of typological analysis in a semantically-informed way.

(Keep in mind that other linguistic ontologies exist, although the goals may be different).

Usage and installation

Hoot exists in RDF/XML format as an .rdf file. I recommend placing the file in a graph database (triple store) in order to interact with it. If you have your own way of doing this, then by all means. If you need a simple way, I would recommend Blazegraph. Simple installation instructions are described here:

Download the Blazegraph .jar file
With Java (7 or greater) installed on your machine, type java -server -Xmx4g -jar blazegraph.jar. This starts the Blazegraph server at http://192.168.0.2:9999/blazegraph/. Navigate there on your web browser.
In Blazegraph, select the UPDATE tab, and select the hoot.rdf file by clicking the Browse... button. Then hit Update. You should get a little report of how many triples are in the database, and how long it took to make. For example:
```
Modified: 401574
Milliseconds: 7288
```
Now navigate to the QUERY tab; here you can query the database with SPARQL. Test out some of the queries described above to get started.
Also, you can treat the server URL http://192.168.0.2:9999/blazegraph/sparql as a SPARQL endpoint. That means you can query the database in other ways. Here's an example of retrieving all English fricatives using SPARQLWrapper in Python:

from SPARQLWrapper import SPARQLWrapper, JSON

sparql = SPARQLWrapper("http://localhost:9999/blazegraph/sparql")
sparql.setQuery("""
        PREFIX hoot: <http://mphilli.github.io/hoot#>
        SELECT (strafter(str(?phone), str(hoot:)) as ?fricatives) {
                ?phone hoot:hasPhoneType hoot:fricative.
        }""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
if results:
    fricatives = [(r["fricatives"]["value"])
                     for r in results["results"]["bindings"]]
    print(str(fricatives))
    # prints: ['ð', 'f', 's', 'ʃ', 'v', 'z', 'ʒ', 'θ']

mphilli/Hoot

Hoot

Goals

Usage and installation