/icasa

ICASA vocabulary

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

ICASA OWL Ontologies and RDF Variables/Units

This is an initial rendering of the ICASA Master Variable List in OWL. The primary goal of this project is to create a practical and faithful implementation of the ICASA Master Variable List that can be used for linked-data applications and covers the existing AgMIP JSON Data Objects and can be be combined with other linked-data contexts to support the TERRA-REF project, particularly information available through BETYdb.

This is an early and rough draft intended for community feedback.

What's been done:

See the Design Notes for more information on the basic requirements, recommendations, and design considerations.

The PURL http://purl.org/icasa and currently redirects to this Github repostitory.

See related work by the DSSAT team: https://github.com/DSSAT/icasa-data-ontology

Management Info

Each dataset/subset/group is added as an OWL Class. Each variable/code is added as a datatype property with domain as the associated class (dataset/subset/group) and range xsd:string. For example:

<!-- http://purl.org/icasa/core#Experiment -->
<owl:Class rdf:about="http://purl.org/icasa/core#Experiment">
   <rdfs:label>Experiment</rdfs:label> 
   <rdfs:comment xml:lang="en">Complete description of management and initial conditions for a real or 
   synthetic experiment (or very closely linked set of experiments). Data measured during or at the end 
   of the experiment. The information presented should be sufficient to allow thorough interpretation or 
   analysis of the results and for simulation of the experiment</rdfs:comment>
   <rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
</owl:Class>

<!-- http://http://purl.org/icasa/core#data_source -->
<owl:DatatypeProperty rdf:about="http://purl.org/icasa/core#data_source">
    <rdfs:label>data_source</rdfs:label>
    <rdfs:comment xml:lang="en">Original format of  data (DSSAT, APSIM, CIMMYT, field log, etc)</rdfs:comment>     
    <rdfs:domain rdf:resource="http://purl.org/icasa/core#Experiment"/>
    <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
    <rdfs:subPropertyOf rdf:resource="http://www.w3.org/2002/07/owl#topDataProperty"/>
</owl:DatatypeProperty>

Class names are manually generated from the dataset/subset/group columns and descriptions from White et al (2013).

Files:

To run:

python icasa-mgmt-info.py > icasa-mgmt-info.owl

Measured Data

ICASA supports measured data through summary (recorded once for a treatment) and time series (measured at specific intervals throughout an experiment) variables. Variables are grouped based on specific categories and have attributes variable name, code, definition, units, and types.

In ICASA, summary data variables are divided into five categories: development, growth, water balance, soils, and environment. Overall, there are approximately 165 summary variables. Time series variables are divided into thirteen cagegories: plant growth, plant nitrogen, plant phosphorous, plant water balance, soil layers, soil nitrogen, soil organic matter, soil phosporous, surface litter, soil plant atmossphere, management, floodwater and pest population effects.

Of course, there can certainly be other types of measured data. While ICASA assumes daily measurements, the time series granularity can be different for other applications. Also, while ICASA assumes crop-level measurements, this is not necessarily a requirement.

A different approach is taken for the Measured_Data sheet. A simple OWL ontology was manually created to describe the top-level concepts of variables and units. This will likely be replaced by another standard ontology or model, once a suitable candidate is found.

The python script icasa-measured-data.py converts the Measured_Data into a set of variable descriptions in RDF. We can imagine similar sets of variables for BETYdb, TERRA-REF, and other projects.

    <!-- http://http://purl.org/icasa/variables#irrd -->
    <rdf:Description rdf:about="http://purl.org/icasa/variables#irrd">
        <rdf:type rdf:resource="http://purl.org/icasa/vu#Variable"/>
        <vu:name>irrigation</vu:name>
        <vu:alternateName>irrd</vu:alternateName>
        <vu:definition>Irrigation amount per day</vu:definition>
        <vu:unit>mm/d</vu:unit>
        <vu:category>Management</vu:category>
    </rdf:Description>

Files:

Units

The ICASA master variable list contains a "Units_or_type" column with the units for the variable. While some of these units may already be addressed by other ontologyies (e.g., Units of Measurement), it would be helpful to get specific definitions for those used by the ICASA community.

For the non-subject matter expert, this is helpful: http://www.fao.org/docrep/x0490e/x0490e0i.htm

Files:

  • icasa-units.csv: Mapping of unit to definition

Notes

  • Object properties have not yet been added (relations)
  • Some classes are duplicated (Person/Institution/Document) for experiment, soil, weather station, etc. These can likely be consolidated to a single class.
  • Some of the terms in the original vocabulary are ID fields and relational keys intended for use in an RDBMS.
  • Some classes and variables are not included in the V 2.0 documentation (Suite, AgMIP variables, Dome simulation)
  • Some codes sometimes contain % or #
  • Some codes in the AgMIP JSON Objects documentation do not exist in the spreadsheet (people, tr_name, icrzno, icbl, elev)
  • AgMIP JSON Objects examples sometimes use variable name instead of code (crop_model_version versus model_ver)

TODO

  • Add object properties (relations)
  • Add support for measured data
  • Demonstrate use with AgMIP JSON Objects and JSON-LD