DO NOT USE THIS REPOSITORY
Use https://github.com/NextCenturyCorporation/AIDA-Interchange-Format . You will need to contact clark.dorman@nextcentury.com with your GitHub user ID to get access.
This repository contains resources to support the AIDA Interchange Format (AIF). It consists of:
-
a formal representation of the format in terms of an OWL ontology in
interchange-format.ttl
. This ontology can be validated using the SHACL constraints file insrc/main/resources/edu/isi/gaia/aida_ontology.shacl
. At the moment the OWL ontology has not been brought up-to-date with the latest changes. The SHACL file is a better guide. NCC will reconcile them in the next week. -
utilities to make it easier to work with this format. JVM utilities are in
src/main/java/edu/isi/gaia/AIFUtils.kt
. Although written in Kotlin, these can be used from any JVM language by adding a Maven dependency onedu.isi:gaia-interchange-kotlin:1.0.0-SNAPSHOT
. There is the beginning of a Python translation of these utilities inaida_interchange/aifutils.py
. Next Century will complete the Python translation of these utilities soon. -
examples of how to use AIF. These are given in Java in the unit tests under
src/text/java/edu/isi/gaia/ExamplesAndValidationTests
. There is the beginning of a Python translation of these examples intests/aida_interchange/Examples.py
. If you run either set of examples, the corresponding Turtle output will be dumped. -
code to translate from the TAC KBP Coldstart++ KB format into this format.
src/main/java/edu/isi/gaia/ColdStart2AidaInterchange.kt
. -
code to translate a simple format for entity and event mentions in images to AIF:
src/main/java/edu/isi/gaia/ImagesToAIF.kt
We recommend using Turtle format for AIF when working with single document files (for readability) but N-Triples for working with large KBs (for speed).
- To install the JVM code, do
mvn install
from the root of this repository using Apache Maven. Repeat themvn install
if you pull an updated version of the code. You can run the tests, which should output the examples, by doingmvn test
. - The Python code is not currently set up for installation; just add it to your
PYTHONPATH
.
To run the validator, run target/appassembler/bin/validateAIF
with a single argument, a parameter
file. The parameter file should have keys and values separated by :
. It should have either the
parameter kbToValidate
pointing to the single Turtle format KB to validate, or it should have
kbsToValidate
pointing to a file listing the paths of the Turtle format KBs to validate.
Additionally, it must have a parameter domainOntology
pointing to the OWL file for the domain
ontology to validate against. Beware that validating large KBs can take a long time. There is
a sample of a validator param file in sample_params/validate.common_corpus.single.params
To convert a ColdStart KB, run target/appassembler/bin/coldstart2AidaInterchange
. It takes a
single argument, a key-value parameter file where keys and values are separated by :
s. There
are four parameters which are always required:
inputKBFile
: the path to the ColdStart KB to convertbaseUri
: the URI path to use as the base for generated assertions URIs, entity URIs, etc. For examplehttp://www.isi.edu/aida
systemUri
: a URI path to identify the system which generated the ColdStart output. For examplehttp://www.rpi.edu/tinkerbell
mode
: must beFULL
orSHATTER
, as explained below.
If mode
is FULL
, then the entire ColdStartKB is converted into a single AIF RDF file in
n-triples format (n-triples is used for greater I/O speed). The following parameters then
apply:
outputAIFFile
will specify the path to write this file to.- cross document coreference information present in the ColdStartKB can be discarded by setting
the optional parameter
breakCrossDocCoref
totrue
(defaultfalse
). - The optional
restrictConfidencesToJustifications
parameter (defaultfalse
) controls whether confidence values are attached directly to the relevant entity/relation/event/sentiment assertion or only to their justifications. The former is how it should be in TA2/TA3, but the latter for messages from TA1 to TA2. Note this is somewhat imperfect because ColdStart lacks justifications for type and link assertions, so for these confidence information will simply be missing when restricting to justifications.
If mode
is SHATTER
, the data related to each document in the ColdStart KB is written to a
separate AIF file in Turtle format. In this case, the only other parameter is the directory
to write the output to (outputAIFDirectory
). The values of breakCrossDocCoref
,
useClustersForCoref
, and attachConfidencesToJustifications
are fixed at true
, false
,
and true
, respectively.
The following optional parameters are available in both modes:
useClustersForCoref
parameter (defaultfalse
) specifies whether to use the "clusters" provided by the AIF format for representing coreference. While in AIDA there can be uncertainty about coreference, making these clusters useful, in ColdStart coreference decisions were always "hard". We provide the user with the option of whether to encode these coref decisions in the way they would be encoded in AIDA if there were any uncertainty so that users can test these data structures.
There are sample shatter and single KB param files under sample_params/translate.*
There is an example program showing how to consume AIF data in edu.isi.gaia.MaxConfidenceEstimator
.
There is an example program/utility for converting a simple tab-separated format for images to AIF.
See class comment on src/main/java/edu/isi/gaia/ImagesToAIF.kt
for details. You can run this
program by running target/appassembler/bin/images2Aif
.
If you need to edit the Kotlin code, just import the POM for this project using IntelliJ IDEA and you should be ready to go.
AIF was designed by Ryan Gabbard (gabbard@isi.edu) and Pedro Szekely (pszekeley@isi.edu) of USC ISI. Gabbard also wrote the initial implementations of the associated tools. However, the tools are now supported and extended by Eddie Curley (eddie.curley@nextcentury.com) and Clark Dorman (clark.dorman@nextcentury.com) of Next Century.
This repository exists in two locations:
- as a private repository owned by Next Century. This is the canonical repository where development will happen.
- as a public repository at https://github.com/isi-vista/gaia-interchange/ . At appropriate times, Next Century will submit their private repository to DARPA for release review. Following approval, Next Century will merge to this public repository.
The open repository will support an open NIST evaluation. For questions related to this evaluation, please contact Hoa Dang (hoa.dang@nist.gov).
This material is based on research sponsored by DARPA under agreement number FA8750-18- 2-0014. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon.
The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of DARPA or the U.S. Government.