/fuji

FAIRsFAIR Research Data Object Assessment Service

Primary LanguagePythonMIT LicenseMIT

DOI

F-UJI (FAIRsFAIR Research Data Object Assessment Service)

Developers:Robert Huber, Anusuriya Devaraju

Publish Docker image DOI

Overview

F-UJI is a web service to programatically assess FAIRness of research data objects based on metrics developed by the FAIRsFAIR project. The service will be applied to demostrate the evaluation of objects in repositories selected for in-depth collaboration with the project.

The 'F' stands for FAIR (of course) and 'UJI' means 'Test' in Malay. So F-UJI is a FAIR testing tool.

Cite as

Devaraju, A. and Huber, R. (2021). An automated solution for measuring the progress toward FAIR research data. Patterns, vol 2(11), https://doi.org/10.1016/j.patter.2021.100370

Clients and User Interface

A web demo using F-UJI is available at https://www.f-uji.net

An R client package that was generated from the F-UJI OpenAPI definition is available from https://github.com/NFDI4Chem/rfuji.

An open source web client for F-UJI is available is available at https://github.com/MaastrichtU-IDS/fairificator

Assessment Scope, Constraint and Limitation

The service is in development and its assessment depends on several factors.

  • In the FAIR ecosystem, FAIR assessment must go beyond the object itself. FAIR enabling services and repositories are vital to ensure that research data objects remain FAIR over time. Importantly, machine-readable services (e.g., registries) and documents (e.g., policies) are required to enable automated tests.
  • In addition to repository and services requirements, automated testing depends on clear machine assessable criteria. Some aspects (rich, plurality, accurate, relevant) specified in FAIR principles still require human mediation and interpretation.
  • The tests must focus on generally applicable data/metadata characteristics until domain/community-driven criteria have been agreed (e.g., appropriate schemas and required elements for usage/access control, etc.). For example, for some of the metrics (i.e., on I and R principles), the automated tests we proposed only inspect the ‘surface’ of criteria to be evaluated. Therefore, tests are designed in consideration of generic cross-domain metadata standards such as dublin core, dcat, datacite, schema.org, etc.
  • FAIR assessment is performed based on aggregated metadata; this includes metadata embedded in the data (landing) page, metadata retrieved from a PID provider (e.g., Datacite content negotiation) and other services (e.g., re3data).

alt text

Requirements

Python 3.5.2+

308 redirects

In order to deal with 308 redirects, the following patch has to be applied on urrlib: https://github.com/python/cpython/pull/19588/commits

Google Dataset Search

  • Download the latest Dataset Search corpus file from: https://www.kaggle.com/googleai/dataset-search-metadata-for-datasets
  • Open file fuji_server/helper/create_google_cache_db.py and set variable 'google_file_location' according to the file location of the corpus file
  • Run create_google_cache_db.py which creates a SQLite database in the data directory. From root directory run python3 -m fuji_server.helper.create_google_cache_db.

The service was generated by the swagger-codegen project. By using the OpenAPI-Spec from a remote server, you can easily generate a server stub.
The service uses the Connexion library on top of Flask.

Usage

Before running the service, please set user details in the configuration file, see config/users.py.

To install F-UJI, you may execute the following python-based or docker-based installation commands from the root directory:

Python module-based installation:

From the fuji source folder run

pip3 install .

or to install the last fixed dependencies

pip3 install .

The F-uji server can now be started with.

python3 -m fuji_server -c fuji_server/config/server.ini

Docker-based installation:

docker run -d -p 1071:1071 ghcr.io/pangaea-data-publisher/fuji

To access the Swagger user interface, open the url below on the browser:

http://localhost:1071/fuji/api/v1/ui/

Your Swagger definition lives here:

http://localhost:1071/fuji/api/v1/swagger.json

You can provide a different server config file this way:

docker run -d -p 1071:1071 -v server.ini:/usr/src/app/fuji_server/config/server.ini ghcr.io/pangaea-data-publisher/fuji

You can also build the docker image from the source code:

docker build -t <tag_name> .
docker run -d -p 1071:1071 <tag_name>

Notes

To avoid tika startup warning message, set environment variable TIKA_LOG_PATH. For more information, see https://github.com/chrismattmann/tika-python

If you receive the exception 'urllib2.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] on MacOS, run the install command shipped with Python : ./Install\ Certificates.command

License

This project is licensed under the MIT License; for more details, see the LICENSE file.

Acknowledgements

F-UJI is a result of the FAIRsFAIR “Fostering FAIR Data Practices In Europe” project which received funding from the European Union’s Horizon 2020 project call H2020-INFRAEOSC-2018-2020 (grant agreement 831558).

The project was also supported through our contributors by the Helmholtz Metadata Collaboration (HMC), an incubator-platform of the Helmholtz Association within the framework of the Information and Data Science strategic initiative.