MedTagger contains a suite of programs that the Mayo Clinic NLP program has developed in 2013. It includes three major components: MedTagger for indexing based on dictionaries, MedTaggerIE for information extraction based on patterns, and MedTaggerML for machine learning-based named entity recognition.
The updated release includes a dictionary based on MedLex, a corpus-driven semantic lexicon, that maps to OMOP Concept identifiers. MedTagger for indexing is built upon a fast string matching algorithm leveraging lexical normalization. The contextual annotator enables the detection of local context for concept entries detected. The new release of the dictionary maps to the OMOP Concept identifiers. For the detailed information of those concept identifiers, please visit http://athena.ohdsi.org.
MedTagger IE Pipelines use a custom ruleset format. An example ruleset of Coronavirus Diseases 19 (COVID 19) related symptoms (e.g. dry cough, fever, fatigue)
can be found here under the /src/main/resources/medtaggerieresources/covid19
directory. These resources are what tells MedTagger
what to do/extract, and this directory is expected as input for the RULEDIR parameter
Live demo for COVID-19 ruleset: https://ohnlp.github.io/ohnlptk/
Video demo: https://vimeo.com/392331446
-
Download the latest release from https://github.com/OHNLP/MedTagger/releases
-
Extract the zip file
-
Modify the
INPUTDIR
,OUTPUTDIR
, andRULEDIR
variables inrun_medtagger_win.bat
orrun_medtagger_unix_mac.sh
, as appropriateINPUT_DIR
: full directory path of input folderOUTPUT_DIR
: full directory path of output folderRULES_DIR
: full directory path of 'Rule' folder
Example for Mac:
INPUTDIR="$YOUR_INPUT_DIRECTORY" OUTPUTDIR="$YOUR_OUTPUT_DIRECTORY" RULEDIR="$YOUR_MEDTAGGER_HOME/medtaggerieresources/covid19"
Example for Windows:
INPUTDIR="C:\$YOUR_INPUT_DIRECTORY\input" OUTPUTDIR="C:\$YOUR_OUTPUT_DIRECTORY\output" RULEDIR="C:\YOUR_MEDTAGGER_HOME\medtaggerieresources\covid19"
-
Run the batch file
Mac/linux:
run_medtagger_unix_mac.sh
Windows:
run_medtagger_win.bat
- Clone this repository
- You will need JDK8 or above, Apache Maven, and Apache Ant installed
- To access github package repositories for dependency resolution, you will need to generate an appropriate github token with the read:packages permissions, and edit settings.xml appropriately by replacing
${env.SECRET_ACTOR}
with your github username and${env.SECRET_TOKEN}
with the generated token. - When your modifications are complete, from the project root directory:
- Run
mvn clean install -s settings.xml
- Run
ant dist
- A distribution zip will be created at
MedTagger.zip
in the root directory
- Run
Liu H, Bielinski SJ, Sohn S, Murphy S, Wagholikar KB, Jonnalagadda SR, Ravikumar KE, Wu ST, Kullo IJ, Chute CG. An information extraction framework for cohort identification using electronic health records. AMIA Summits on Translational Science Proceedings. 2013;2013:149.
Wen A, Fu S, Moon S, El Wazir M, Rosenbaum A, Kaggal VC, Liu S, Sohn S, Liu H, Fan J. Desiderata for delivering NLP to accelerate healthcare AI advancement and a Mayo Clinic NLP-as-a-service implementation. npj Digital Medicine. 2019 Dec 17;2(1):1-7.