/biotea-annotation

RDF annotation for PubMed and PMC using entity recognition tools such as NCBO Annotator and CMA.

Primary LanguageJavaApache License 2.0Apache-2.0

biotea-annotation

Refactorization for the annotation code at https://github.com/alexgarciac/biotea. RDF annotation for PubMed and PMC using entity recognition tools such as the NCBO Annotator (http://www.bioontology.org/annotator-service) and CMA (http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/810/664). CMA is not a public service thus this documentation refers to annotations with NCBO Annotator

Dependencies

Most of the dependendies are configured with Maven. There is however a couple of local dependencies to biotea-utilities, biotea-ao and one jar located at the lib directory provided with this project.

This project uses the NCBO Annotator thus annotations obtained for the same file at different times can vary due to changes in the ontologies and responses retrieved from the annotator.

How run this project using the batch option

  • Clone biotea-utilities
  • Clone biotea-ao
  • Clone this repository
  • In your IDE, create a dependency from this project to biotea-utilities and biotea-ao and jars in the lib directory
  • Modify configuration files, i.e., config.properties, in biotea-utilities resources folder (path-to-biotea-utilities/src/main/resources/config.properties). If you are generating annotations for RDFized articles with biotea-rdfization, make sure you use the same configuration there. Most of the time you only need to change the following properties:
    • biotea.dataset.prefix: Either pmc or pubmed
    • biotea.dataset: For instance dataset/pmc or dataset/pubmed or bio2rdf_dataset:bio2rdf-pmc-vrX or bio2rdf_dataset:bio2rdf-pubmed-vrX. This will be used in the VOiD properties of the generated dataset.
    • biotea.base: For instance biotea.ws or bio2rdf.org. This will be used to generate the URI to resources. bio2rdf will generate URIs compatible with Bio2RDF URI style.
    • ncbo.annotator.exclude: Aliases for those ontologies that should not be used by the NCBO Annotator. All the aliases are defined as properties at path-to-biotea-utilities/src/main/resources/ontologies.properties.
  • Specify a valid API-KEY to use the NCBO Annotator or the AgroPortal annotator at path-to-biotea-utilities/src/main/resources/apikey.properties
  • Make sure you include the biotea-utilities resources folder in your classpath
  • The main class is ws.biotea.ld2rdf.annotation.batch.BatchApplication some parameters are needed:
    • -in --mandatory, should point to a directory with all the files to be annotated
    • -out --mandatory
    • -annotator --optional, use ncbo (default value) or agroportal.
    • -extension --mandatory, only files at with this extension will be processed, either nxml or rdf is our recommendation
    • -inStyle --optional, either jats_file (default value) or rdf_file
    • -onto --optional, either ao for the Annotation Ontology or oa for the Open Annotation, this defines the annotation ontology used to serialize the annotations
    • -format --optional, either XML (default value) or JSON-LD
    • -onlyTA --optional, if present, ontly title and abstract will be annotated

Input

If jats_file is used as inStyle option:

If rdf_file is used as inStyle option:

Output

  • One RDF file per input file

###Examples For instance, if you want to annotate PMC articles following the Bio2RDF URL model you need this configuration:

  • biotea.dataset.prefix=pmc
  • biotea.dataset=bio2rdf_dataset:bio2rdf-pmc-vr2
  • biotea.base=bio2rdf.org Remember to specify a valid API KEY in apikey.properties

If you want to annotate JATS files with extension nxml and get RDF/XML files following AO model use:

  • java ws.biotea.ld2rdf.annotation.batch.BatchApplication -in -out -extension nxml which is equivalent to the following that also specify all parameters with default values
  • java ws.biotea.ld2rdf.annotation.batch.BatchApplication -in -out -extension nxml -inStyle jats_file -annotator ncbo -onto ao -format XML

If you want to annotate RDF files with extension rdf and get RDF/XML files following AO model use:

  • java ws.biotea.ld2rdf.annotation.batch.BatchApplication -in -out -extension rdf -inStyle rdf_file which is equivalent to the following that also specify all parameters with default values
  • java ws.biotea.ld2rdf.annotation.batch.BatchApplication -in -out -extension nxml -inStyle rdf_file -annotator ncbo -onto ao -format XML

If you want to annotate JATS files with extension nxml and get RDF/XML files following OA model use:

  • java ws.biotea.ld2rdf.annotation.batch.BatchApplication -in -out -extension nxml -onto OA which is equivalent to the following that also specify all parameters with default values
  • java ws.biotea.ld2rdf.annotation.batch.BatchApplication -in -out -extension nxml -inStyle jats_file -annotator ncbo -onto OA -format XML