Based on this repo - https://github.com/ncats/docker/tree/master/semrep
Added docker-compose to combine the metamap and semrep images/containers.
For more info on setup see metamap and semrep README.md
The tar.bz2 files get uncompressed automatically during image creation
Get SemRep data - ftp://lhcftp.nlm.nih.gov/Open-Access-Datasets/SKR/SemRep/dist
docker-compose build
Test metamap
docker run --rm -t -p 18066:18066 semrep_metamap
Test the API
curl -X POST localhost:8067/ -H Content-Type:text/plain --data-binary "modafinil is a novel stimulant that is effective in the treatment of narcolepsy"
Test the container
docker exec -it semrep ./bin/semrep.v1.8 -L 2018 -Z 2018AA -F user_data/test.txt
MetaMap - https://metamap.nlm.nih.gov/Docs/MM_2016_Usage.pdf
--threshold 123
Restricts output to UMLS candidate concepts whose evaluation score equals or exceeds the specified threshold. Judicious use of this option can exclude false positives when some input text has no close matches in the Metathesaurus. An appropriate threshold can usually be determined simply by examining MetaMap output for typical text in a given application.
API outputs SemRep full fomrat
Modified from here - https://github.com/lhncbc/SemRep/blob/master/doc/SemRep_full_fielded_output.pdf
- section (title/abstract) - this is custom added in our version of semrep
- SE (designates that the output is from SemRep)
- PMID (the ID of the document)
- subsection - If the utterance begins with one of a specified set of strings of uppercase letters followed by a colon
- section (ti if the utterance is from the title of the citation; ab if the utterance is from the abstract of the citation.; tx if from text)
- Sentence ID: an integer indicating the utterance’s position within the title/abstract.
- indicates the output type, and will be one of atoms itemized in Section 1: text, entity, relation,or coreference.
- SubjectMaxDist: The number of potential arguments (i.e., NPs) from the indicator in the direction of the subject (3)
- SubjectDist: The number of potential arguments separating the subject from the indicator (1)
-
- CUI of the subject concept (C0027893)
-
- Preferred name of the subject concept (neuropeptide Y)
- Semantic Type(s) of the subject concept5 (aapp, gngm, nsba in the example above, gngm is an artificial semantic type)
- Subject Semantic Type used for the relation (aapp)
-
- Normalized gene ID(s) of the subject from EntrezGene; may contain multiple IDs delimited by comma or may be empty (4852)
-
- Normalized gene name(s) of the subject from EntrezGene; may contain multiple names delimited by comma or may be empty (NPY)
- Text that maps to the subject (neuropeptide y)
-
- Change term ( may appear as placeholder)
-
- Degree term ( may appear as placeholder)
-
- Negation term: 1 if the subject is negated, 0 if it’s not. (0)
- Confidence score (1000)
- First character position (in document) of text denoting subject entity (39)
- Last character position (in document) of text denoting subject entity (59)
- Indicator Type6 (VERB)
- Predicate (INHIBITS)
- negation if the relation (the immediately preceding field) is negative; empty otherwise
- First character position (in utterance) of text denoting relation (70)
- End position (in utterance) of text denoting relation (79)
- ObjectMaxDist: The number of potential arguments (i.e., NPs) from the indicator in the direction of the object (4)
- ObjectDist: The number of potential arguments separating the object from the indicator (2)
-
- CUI of the object concept (C0021753)
- Preferred name of the object concept (Interleukin-1 beta)
- Semantic Type(s) of the object concept (gngm,aapp,imft in the example above, gngm is an artificial semantic type)
- Object Semantic Type used for the relation (gngm)
-
- Normalized gene ID(s) of the object from EntrezGene; may contain multiple IDs delimited by comma or may be empty (3553)
-
- Normalized gene name(s) of the object from EntrezGene; may contain multiple names delimited by comma or may be empty (IL1B)
- Text that maps to the object (interleukin-1beta)
-
- Change term ( may appear as placeholder)
-
- Degree term ( may appear as placeholder)
-
- Negation term: 1 if the object is negated, 0 if it’s not. (0)
- Confidence score (1000)
- First character position (in document) of text denoting subject entity (129)
- End position (in document) of text denoting subject entity (136)
To let connections in outside of localhost
#semrep-rest-api/conf/application.conf
play.filters.hosts {
allowed = ["."]
}
First tests gave this error
! System error
! 'dlopen("/metamap/public_semrep/bin/abgenemod.so") failed in load_foreign_resource/1: libpcre.so.1: cannot open shared object file: No such file or directory'
Fixed this:
cp lib/libpcre.so.0.0.1 lib/libpcre.so.1