/bioont-search-benchmark

A Benchmark for Searching Top-K Ontology Resources in the Biomedical Domain

Primary LanguageJava

bioont-search-benchmark

This is a Maven project that contains source code in Java and ground truth data for a biomedical ontology benchmark.

More details in the research paper:

D. Oliveira, A. S. Butt, A. Haller, D. Rebholz-Schuhmann, and R. Sahay, “Where to search top-K biomedical ontologies?,” Brief Bioinform, vol. 20, no. 4, pp. 1477–1491, Jul. 2019, doi: 10.1093/bib/bby015.

To run this project you will need the following:

  • A Linux machine.

  • Virtuoso Jena Provider:

    1. Clone the virt-jena repository inside the benchmark directory.
    2. Inside the new virt-jena directory do mvn clean install
  • Virtuoso

    1. Create a directory in the root of the bioont repository to store of Virtuoso database, e.g. virt_database
    2. Change the virtuoso.ini parameters according to your machine requirements and put the file in your Virtuoso database directory.
    3. Start the Virtuoso server in your database directory.
    4. Edit the scripts/bulk_load.sh script and change the first four parameters to correspond to your Virtuoso server port, user, password and the directory of the Virtuoso database (e.g VIRT_DB=$PWD/virt_database).
    5. In the root directory of the repository, bulk load the ontologies into Virtuoso with scripts/bulk_load.sh.
    • To restart the Virtuoso store, stop virtuoso and delete everything inside virt_database except for the virtuoso.ini file. Restart virtuoso and run scripts/bulk_load.sh again.
  • Solr - the use of OLS-SOLR spring boot application is advised for optimal compatibility (https://github.com/EBISPOT/OLS/tree/master/ols-apps/ols-solr-app). Follow these steps:

    1. Clone/download the OLS git repository into the bioont repository.
    2. Delete the contents of the resources directory.
    3. Copy all contents of the userinput/ontology_property_files directory into the resources directory.
    4. Build OLS by running mvn clean package in the root of the OLS repositorty.
    5. Download and extract Solr (only version 5.2.1 was tested) to the root of the bioont repository.
    6. Create a directory to store the Solr indexes in the root of the bioont repository, e.g. solr_index
    7. Start solr with:

    $ solr-5.2.1/bin/solr -Dsolr.solr.home=$PWD/OLS/ols-solr/src/main/solr-5-config -Dsolr.data.dir=$PWD/solr_index

    1. Build the Solr indexes from the root of the bioont repository with:

    $ scripts/index.sh

    • To restart the Solr indexes, stop Solr, delete everything inside the solr_index directory and run step (vii) again.

Running the benchmark

Keep Virtuoso and Solr running. Open the file userinput/config.properties and change the necessary parameters. Note that you will need to register in BioPortal to obtain an API key.

To run the benchmark do the following:

  1. In the benchmark directory build the project with mvn clean package.
  2. Run the benchmark with java -jar benchmark/target/bioont-1.0-SNAPSHOT-shaded.jar
  3. View the results in the userinput/ranking_results and userinput/evaluation folders.
  • To restart the benchmark, delete the userinput/ranking_models and run step (2) again.
  • If you don't want to load the ontology metadata and the preprocessing for the algorithms again and just want to re-run the benchmark, open the Test class and change the variables loadData and preprocessing to false. Then follow steps (1) and (2) again.

Customising input data

If you wish to use the benchmark with a different set of ontologies you will need to create new ontology configuration files with the exact some structure and repeat the Solr steps starting from (iii). You will also need to add the acronym for those new ontologies in userinput/acronyms.txt and the URL for their download in userinput/uris.txt.

To change the query terms used in the benchmark edit the file userinput/test_terms.txt and introduce one query term per line.