GitHub repository for the Web Post Ontology of Hooray Media. Contains a hierarchy of conceptual/topical "categories" for the purpose of organizing web posts centering on parenting and dating lifestyle content.
These category terms are intended to provide a re-usable, concrete basis for Hooray content authors to more reliably categorize their posts.
- Web Post Ontology (WPO)
- Table of Contents
- Current Version
- Repository Structure
- General Information
- Using the Category Hierarchy
- Version Updates
- Credit
v0.5.0-alpha - April 30th, 2024
- classifier contains content for the GPT post categorizer
- munging contains Python scripts for various tasks; mostly ignorable
- ontology contains folders for each released version of the WPO; each folder contains, at minimum
changes.md
contains a listing of everything modified from the last versionwpo_vx.x.x-xxx.ttl
is the source representation of the ontology from Protégéwpo_vx.x.x-xxx.html
is a LODE-generated static HTML representation of the ontologywpo_vx.x.x-xxx.png
(sometimes a .svg) is a visual representation of the ontology generated by jsoncrackclass_hierarchy.txt
contains a textual representation of the class hierarchy
- training contains guides for content authors on effectually using the WPO
- Issues is more or less a to-do list
Below are some general things to consider about the WPO:
- The taxonomy underlying this ontology is inspired by AOL's DMOZ project, which ran from 1998-2017. DMOZ was an early attempt for Internet denizens to organize the web, before machines did so. Much of its hierarchy manifests here, although the perspective is certainly more focused.
- To retain some level of linkage to outside sources, the terms herein are linked to Wikidata and/or DBPedia terms, with either (1) skos:example or (2) dct:source. For the former, the terms are roughly similar, but not exactly the same; for the latter, the terms are considered identical. E.g., in this taxonomy, the term "Animals" excludes humans, where the Wikidata and DBPedia terms include humans. This is because, as delineated earlier, the purpose of this taxonomy is to be amenable to lifestyle content writers publishing magazine-like articles, so for them, humans are not necessarily animals, as pets and wildlife are.
- Initially, I had used rdfs:seeAlso and owl:sameAs, but neither of these can be rendered with PyLODE (it only allows a few annotation properties to show), and using owl:sameAs causes Protégé to enforce some rules that ruins rendering, so I had to use skos:example and dct:source
- I may transition to using regular LODE, which may allow more annotations to be rendered, but it needs investigation as well
- Initially, I had used rdfs:seeAlso and owl:sameAs, but neither of these can be rendered with PyLODE (it only allows a few annotation properties to show), and using owl:sameAs causes Protégé to enforce some rules that ruins rendering, so I had to use skos:example and dct:source
- I use the delimiter
#
insead of/
because it allows you to jump around HTML or TTL pages opened on the web, as the#
is a fragment designed to do so - The base IRI of the ontology is
https://hooray.media/ontology/wpo#
- Class definitions are an amalgamation of Wikidata and DBPedia, in addition to the "lifestyle content focused" knowledge particular to the task at hand
- Class definitions are of the general form "A superclass that xyz", where superclass is the immediate parent class, and xyz contains the specific information about the class in question
- Superclasses are not stated for the top-level classes
- Class definitions are of the general form "A superclass that xyz", where superclass is the immediate parent class, and xyz contains the specific information about the class in question
This is a provisional (and necessarily philosophically "loose") representation of general topics/categories used to organize website content for posts pertaining to parenting and dating lifestyle. Inasmuch as the things represented here are not precisely tangible, the taxonomy is intentionally "fiat" (see Barry Smith's BFO). Therefore, the author maintains the stance that (1) the end-user view of the webpages themselves and (2) the ease of use of the website for content writers in categorizing new content are more important than the philosophical rigor of the taxonomy.
Moreover, where the semantics of "subClassOf" mean inheritance and specialization, within this taxonomy, the definition of "subClassOf" is loosened. For example, while "Fall Festival" is not, semantically, a subclass of "Fall", it is nonetheless placed under "Fall". This is because the purpose of the taxonomy is to help content writers organize website posts, and such a prescription is intuitive to laymen.
The BFO principle of naming things singularly (non-plurally) is violated here. This is because, for web content categorization, it is common practice for post categories to be plural. E.g., "Holidays" is more appropriately displayed on a web page than "Holiday". This is not true for every term, however, e.g., as "Family" is more appropriate than "Families". In short: the decision between singular and plural terms is subjectively based on the applicability to web content writers and readers.
Some of the other BFO-esque principles are violated here: single inheritance, Aristotelian definitions and the use of roles.
Some terms are in the hierarchy simply for organizational purposes and should not be used by Hooray authors.
- U.S. State
- Use the U.S. state, like Florida, or a city under it
- Occupations and the “xyz Professional” terms, like Healthcare Professional
- Use a lower term of an actual occupation, like Nurse
- Schools by Governance, Schools by Life Stage
- Use a lower term, like Home School or Charter School
- Child Life Stages
- Use a lower term, like Elementary Years
- Aquatic Wildlife and Aquatic Birds
- Use a lower term for a real animal, like Dolphin or Duck
Below are some debatable reserved terms (use with discretion):
- Recreational Classes
- Team Sports
- Cultural Holidays, Holiday Months and Religious Holidays
- Keeping because I can see the SEO use of these terms
The category hierarchy must be understood if it is to be used well. It can be browsed and played around with in several forms:
There are some general rules to follow when categorizing:
- Ideally, select one (1) region category term, like Orlando
- Select no more than five (5) categories in total
- Select the most specific terms possible, while considering SEO
- Do not hesitate to use terms from more than one top-level term; e.g., a post about Zoos can use the Animals and Things To Do trees
To keep categorization as normalized as possible, a custom GPT is available that recommends 5 categories to authors, when they paste the text content of their post. The custom GPT can be accessed here.
All updates to the taxonomy are performed in Protégé. The general process after updating (anything, including term ordering, term names/annotations, addition/deletion of terms, etc.) is:
To keep the delimiter uniform, when editing in Protégé, make sure in the File > Preferences > New entities tab, you have 'Followed by:' selected to be #
, and not /
.
- In Protégé, click Refactor > Change ontology IRI > Enter new version number (I don't use a "versioned" IRI; I build it into the IRI normally)
- No longer necessary after version 0.5.0-alpha, as we removed the version number from the IRI to make updates easier
- In the Ontology tab, under the Ontology Prefixes tab at the bottom, update the blank (default) namespace prefix IRI with the new version number and hit Enter
- Refactor > Rename multiple entities > Enter the old versioned IRI in the first line and the new versioned IRI in the second line > Rename
- In the Ontology pane, update the owl:priorVersion and owl:versionInfo statements, as well as the dcterms:modified date
- Any time you introduce a new class, you have to do the following:
- Give it an rdfs:label
- Give it a skos:definition
- Add an rdfs:isDefinedBy statement (automatable)
- Add outgoing links to DBPedia and Wikidata
- Search for the term here
https://dbpedia.org/page/xxx
- And here
https://www.wikidata.org/w/index.php?search=xxx
- Search for the term here
The rdfs:isDefinedBy
statement is used so that if, by some chance, someone encounters only a class fragment on the Web, they have an explicit link back to the main ontology. It did have to be updated manually after every update, but after version 0.5.0-alpha, with the removal of the version number from the base IRI, it is no longer needed.
However, when adding new classes, instead of adding the rdfs:isDefinedBy
property and its value manually, it can be automated:
- Open munging/ontology_modification/isDefinedBy_Adder.py
- Put your new ontology file in the same folder with it
- Change the input_file and output_file variables
- Run and use your new ontology file as needed
Visualizations are then made with jsoncrack.com.
- Open munging/ontology_outputting/ttl_to_json.py
- Put the ontology file in the same folder as this script
- Change the ttl_file_path and output_file_path variables as desired
- Run the script
- Copy/paste the JSON content into jsoncrack's editor
- Top right, click the download arrow, then save as PNG or SVG
- Download the latest version of PyLODE here
- Put your ontology file in the bin/ folder
- Open a command prompt in the bin/ folder
- Run
./pyLODE.exe -o output.html input.ttl
, replacing the input and output filenames with yours - Refresh the folder and you should have a new static HTML file for the ontology
This is for the ingestion of the authors and others, in case they prefer a text visualization of the hierarchy over an image.
- Open munging/ontology_outputting/print_wordpress_hierarchy.py
- Put the ontology file in the same folder as this script
- Change the ttl_file_path and output_file_path variables as desired
- Run the script
- Move the output class_hierarchy.txt file as needed
- John Byrne of Hooray Media for project inception
- Dani Meyering for feedback
- Meghan Roth for feedback
- Laura Byrne for feedback
- Tyler Procko, ontologist