Goal: Take a tag or search term matching a theme or place and enhance it by returning its synonyms, its translation, its broader and narrower concepts, and its related concepts. For example, a search for "disease" should also return "health" (broader), "patología" (translation), "medicine" (related), etc.
Motivation: Data users and contributors often use different terms to describe similar concepts, such as "health" and "wellness". Semantic search allows for exploration and discovery of relationships among concepts that were previously unknown. It also supports both query expansion and refinement.
Implementation: Datasets in ArcGIS Online have contributor-generated keywords. As dataset metadata are harvested, the keywords are looked-up against this service. Matching keywords are added to the item description as "extended keywords". These are used to index items and are hidden from the user.
The Esri-Hub vocabulary models ArcGIS Hub Items, Users, and Groups. The vocabulary maps existing ArcGIS Online entities (Content and Community - Users, Groups, Items) to their qualities (Tags - Place, Theme). The Category tags expand the current six Hub category themes (Healthy, Livable, Prosperous, Safe, Sustainable, Well-Run), which are mapped to Library of Congress Subject Heading concepts and [WordNet] (<wordnet-rdf.princeton.edu>) synonyms. The Place tags expand a place hierarchy inhereted from DBPedia. The hub:Item associates hub:Users and hub:Groups. hub:Items have a categories:ThemeContext and a places:PlaceContext. These are imported from separate vocabularies also found in this repository. The models are generated in Protégé software and exported as vocabularies in Turtle syntax, which capture relations among modeled concepts as triple-statements. These vocabulary files are then loaded into a Fuseki triplestore, which is queried through SPARQL queries hosted at a public endpoint.
To view or edit the following vocabularies, open them with any text editor or load them into Protégé: 1) Esri-Hub base, 2) Categories import, 3) Places import, 4) the USGS Thesaurus, and 5) the DTIC Thesaurus. To load a vocabulary in Protégé, File > Open (i.e. "esri-hub.ttl"). Files can be loaded locally or from a URI if the vocabulary is published online.
- The Esri-Hub base vocabulary uses Classes and Properties to model entities in OWL/RDF. It imports the Categories and Places vocabularies as well as other standard vocabularies like SKOS.
- To import a term from an ontology contained in a document loaded on the web, go to Active Ontology tab > Import and follow the steps to point to the term's URI.
- Classes are the nodes of the graph in the data model.
- Object properties are the edges of the graph in the data model.
- Data properties are the data types that the node value takes.
- The Categories and Places vocabularies use Individuals to model concepts in SKOS. The attributes of the concepts are stored as annotations. Esri Hub concepts are asserted to be OWL:sameAS to equivalent concepts in Library of Congress Subject Headings or WordNet.
- To save the vocabulary or export it, go to File > Save As and choose the serialization. Fuseki accepts RDF/XML, TTL, and JSON-LD at present.
To query vocabularies using SPARQL, set up a local Fuseki server and upload the vocabulary of your choice or visit the [Fuseki server endpoint] (http://34.229.180.217:8080/fuseki/dataset.html?tab=upload&ds=/category) where the vocabularies are hosted in separate graphs. The "queries.txt" and "generic queries.txt" files contain example SPARQL queries to run against the vocabulary. The "categories queries.txt" and "places queries.txt" files contains more specific queries.
To explore the vocabularies by entering a search term, visit the Term Expander Express App, created by Pranav Kulkarni. A set of some suggested search terms can be found in "term expander.txt". Additional search terms related to demo site GIS Nation Geoplatform are found in "use cases.txt".
When users are browsing for a topic, they can explore topic clusters instead of page ranked results. Clusters can be defined by any edge relationship in the vocabulary, such as broader/narrower, related, synonymous, etc. Distinguishing search and discovery from retrieval tasks will improve search usability.
When users contribute datasets, the words they use to describe them (i.e. title, description) can recommend tags. Eventually, users (such as agencies) should be able to import their own controlled vocabularies into a graph and use their preferred terms in addition to the provided vocabularies.
Vocabulary terms for places can be generated automatically from the following RDF data dumps: GNIS-LD, GeoNames, and DBPedia. API lookups from the following services can also be used to generate thematic terms programmatically: Library of Congress and WordNet.