Semsimian is a package to provide fast semantic similarity calculations for ontologies. It is a Rust library with a Python interface.
This includes implementation of Jaccard and Resnik similarity of terms in an ontology, as well as a method to calculate the similarity of two sets of terms (so-called termset similarity). Other methods will be added in the future.
Semsimian is currently integrated into OAK and the Monarch app to provide fast semantic similarity calculations.
- Set up your virtual environment of choice.
- cd
semsimian
(home directory of this project) pip install maturin
maturin develop
python
Python 3.9.16 (main, Jan 11 2023, 10:02:19)
[Clang 14.0.6 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from semsimian import Semsimian
>>> s = Semsimian([('banana', 'is_a', 'fruit'), ('cherry', 'is_a', 'fruit')])
>>> s.jaccard_similarity('banana', 'cherry')
This should yield a value of 1.0.
As of version 0.1.14, the semsimian source is released on GitHub, with a corresponding set of Python wheels released to Pypi.
To trigger a new set of builds, first update the version number in Cargo.toml
, then create a new release.
Wheels are prepared for the following environments and architectures:
OS | Architectures | Python Versions |
---|---|---|
Linux | x86_64, x86_64-unknown-linux-musl, aarch64-unknown-linux-gnu, aarch64-unknown-linux-musl | 3.7, 3.8, 3.9, 3.10, 3.11 |
MacOS | x86_64, universal2 | 3.7, 3.8, 3.9, 3.10, 3.11 |
Windows | x86_64 | 3.7, 3.8, 3.9, 3.10, 3.11 |
If a import semsimian
results in a ImportError
warning about incompatible architecture, try the following:
- Install
conda
. This guide may be helpful. - Set up a virtual environment with
conda
so that your Python build is aligned with your processor architecture (in this case, ARM). Try something like:
$ conda create -n myenv python=3.9
...setup happens...
$ conda activate myenv
and then proceed as above.
Build a docker image:
docker build -t my-rust-app .
Run your tests inside a Docker container and generate coverage:
docker run -v "$(pwd)":/usr/src/app -t my-rust-app bash -c "CARGO_INCREMENTAL=0 RUSTFLAGS='-Zprofile -Ccodegen-units=1 -Cinline-threshold=0 -Coverflow-checks=off -Zpanic_abort_tests -Cpanic=abort' cargo test && grcov . -s . --binary-path ./target/debug/ -t html --branch --ignore-not-existing -o ./target/debug/coverage/"
Get Coverage report from:
open ./target/debug/coverage/index.html