mandiayba/CONTES

CONcept-TErm System

PythonApache-2.0

CONTES

CONcept-TErm System method to normalize multi-word terms with concepts from a domain-specific ontology (See paper).

The system is based on |gensim| |sklearn|

Intallation

Get CONTES from github

$ git clone https://github.com/ArnaudFerre/CONTES.git

$ cd CONTES

Create the Virtual Env, You need anaconda to be installed

$ conda env create -f contes-env.yml

Activate the Virtual Env

$ source activate contesenv

Tests

$ python module_word2vec/main_word2vec.py --help

$ python module_train/main_train.py --help

$ python module_predictor/main_predictor.py --help

Usage

Calculate word embeddings

$ python module_word2vec/main_word2vec.py \
--json word-vectors.json \
--min-count 0 \
--vector-size 100 \
--window-size 2 < test-data/corpus.txt

Train a Contes model

$ python module_train/main_train.py  \
--word-vectors test-data/embeddings/microbio_filtered_100/word-vectors.json.gz \
--terms test-data/input-corpus/terms_0.json \
--attributions test-data/input-corpus/attributions_0.json \
--regression-matrix test-data/models/bb \
--ontology test-data/OntoBiotope_BioNLP-ST-2016.obo

Predict from a Contes Model

$ python module_predictor/main_predictor.py \
--word-vectors test-data/embeddings/microbio_filtered_100/word-vectors.json.gz \
--terms test-data/input-corpus/terms_0.json \
--regression-matrix test-data/models/bb \
--ontology test-data/OntoBiotope_BioNLP-ST-2016.obo \
--output test-data/predictions/output.json