/nala

Text mining of natural language mutations mentions

Primary LanguageHTML

☝️ We moved

This library is not maintained anymore.

We moved nala to the text annotation tool, tagtog:

tagtog, The Text Annotation Tool to Train AI




Build Status codecov

nala

Text mining method for the extraction of sequence variants (genes or proteins) written in standard (ST) format (e.g. "E6V") or complex natural language (NL) (e.g. "glutamic acid was substituted by valine at residue 6").

Publication: Cejuela et al., nala: text mining natural language mutation mentions, Bioinformatics, 2018

Install

Requires Python 3.6

From source

git clone https://github.com/Rostlab/nala.git
cd nala
poetry shell
poetry install
python3 -m nalaf.download_data

NOTE: if you prefer installing with pip (instead of poetry), you will need pip >= 19.0, and then do:

pip install -r requirements.txt
pip install .

Developing

Test

If you want to run the unit tests (excluding the slow ones) do:

nosetests -a '!slow'

Troubleshooting on Windows

The module python-crfsuite (pycrfsuite) may not install on Windows. See the original module.

Run Examples

  • Simple:

    • python3 nala_demo.py -p 15878741 12625412 # i.e. list of PMIDs to tag
    • python3 nala_demo.py -s "Standard (ST) examples: Asp8Asn or delPhe1388. Semi-standard (SST) examples: 3992-9g-->a mutation. Natural language (NL) examples: glycine was substituted by lysine at residue 18 (Gly18Lys)"
  • Programmatic access: nala/learning/train.py

  • API annotation service via tagtog.net: https://www.tagtog.net/-corpora/IDP4+