PRISM Search

ユーザが入力した医療文書に対し，PRISMアノテーションが施された医療文書群から類似するものを提示する文書検索システム

Web-based search engine to show similar clinical documents to a user-input clinical snippet

Requirements

Python 3.8 (could work with 3.6+ but not tested)
scikit-learn
mojimoji
MedNER-J
Flask

Installation

If you use poetry, just run poetry install. Otherwise, you can install the dependencies with pip (ver. 20.0.0+) by pip install -r requirements.txt. You may want to create a virtual environment first.

You need to prepare a PRISM-annotated document source for search. We prepared preprocess.py for this purpose. Please adapt the code for the data format of your document data. The script, prepro.py, is another example for PRISM's Q3 data.

After these setups completed, you should be able to run the server with python app.py in the Flask's development mode. Be aware that, by default, the app uses the PRISM Q3 data, which requires you to modify the DATA source in app.py for your preprocessed data.

The procedure to deploy this app to a production environment depends on the web-server's setting. Please consult with the administrators.

Usage

Submit a clinical document to find relevant text thereof at / (root)
You will see an NER result of your input and its top 3-ranked "similar" documents at /result
You can modify the similarity criteria:
- Options to calculate similarity among clinical docs
- Clinical NE tags to consider in similarity search

How it works

This app first apply PRISM-based clinical NER to your input document. The NER result is used for similarity calculation with a search-source documents, which are NER-ed in advance.

The current version's similarity calculation is simply based on what-is-called "bag of named entities" (BoNE). Like the "bag of words" (BoW), documents are vectorised into occurrence counts of the named entities appearing in the whole source. Then, the "similarity" among documents is calculated with the cosine-similarity measure.

This similarity calculation can be regarded as a baseline for this purpose. Further improvements could be implemented.

Development

Developed by Shuntaro Yada in Social Computing Lab. at NAIST.

Licence

To be announced.