/Ascle

A Python Natural Language Processing Toolkit for Medical Text Generation

Primary LanguageJupyter Notebook

Ascle: A Python Natural Language Processing Toolkit for Medical Text Generation

Python 3.6.13 Python 3.8.13 Python 3.8.16 Python 3.10.12

We introduce Ascle, a pioneering natural language processing (NLP) toolkit designed for medical text generation. Ascle is tailored for biomedical researchers and healthcare professionals with an easy-to-use, all-in-one solution that requires minimal programming expertise.

Framework of Ascle

Ascle consists of three modules:

🌟 Generative Functions: For the first time, Ascle includes four advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation;

Basic NLP Functions: Ascle consists of 12 essential NLP functions such as word tokenization and sentence segmentation;

Query and Search Capabilities: Ascle provides user-friendly query and search functions on clinical databases.

⚙️indicates that we have our fine-tuned models for this particular task.
⭐️indicates that we conducted evaluations for this particular task.

Table of Contents

Updates

17_05_2024 - We are currently updating Ascle. In the next version, Ascle will include the question-answering task based on the RAG framework and will support multiple languages for all tasks.
07_11_2023 - New Release v2.2: we changed the toolkit name to Ascle from EHRKit, easier to use!
10_07_2023 - New Release v2.0: a large re-organization and improvement from v1.0.
24_05_2023 - New Release Pretrained Models for Machine Translation.
15_03_2022 - Merged the ehrkit folder to support off-shelf medical text processing.
10_03_2022 - Made all tests available in an ipynb file and updated the most recent version.
17_12_2021 - New folder collated_tasks containing Fall 2021 functionalities added
11_05_2021 - cleaned up the notebooks, fixed up the readme using depth=1.
04_05_2021 - Tests run-through added in tests.
22_04_2021 - Freezing development.
22_04_2021 - Completed the tutorials and readme.
20_04_2021 - Spring functionality finished -- mimic classification, summarization, and query extraction.

Setup

Download Repository

You can download Ascle as a git repository; simply clone to your choice of directories (keep depth small to keep the old versions out and reduce size).

git clone https://github.com/Yale-LILY/Ascle.git

Environment

cd Ascle
python3 -m venv asclevir/
source asclevir/bin/activate
pip install -r requirements.txt

NOTE: there is a chance that your Python version is not compatible with scispacy, so you can install with the following command:

pip install scispacy
pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_core_sci_sm-0.5.0.tar.gz

Then you are good to go!

Ascle Demo

We provide various generative functions and basic NLP functions. A quick start is to run the demo.py:

cd Ascle
python demo.py

Note: this may take some time, as some packages will be downloaded.

Load Ascle

from Ascle import Ascle

# create Ascle 
med = Ascle()

Text Simplification

# Text Simplification
main_record = """
              The patient presents with symptoms of acute bronchitis,
              including cough, chest congestion, and mild fever.
              Auscultation reveals coarse breath sounds and occasional 
              wheezing. Based on the clinical examination, a diagnosis
              of acute bronchitis is made, and the patient is prescribed 
              a short course of bronchodilators and advised to rest and
              stay hydrated.
              """

# choose the model
layman_model = "ireneli1024/bart-large-elife-finetuned"

med.update_and_delete_main_record(main_record)

# call the text simplification function and print the output
print(med.get_layman_text(layman_model, min_length=20, max_length=70))

>> """
   The patient presents with symptoms of acute bronchitis including
   cough, chest congestion and mild fever. Auscultation reveals coarse 
   breath sounds and occasional wheezing. Based on these symptoms and 
   the patient's history of previous infections with the same condition, 
   the doctor decides that the patient is likely to have a cold or bronch.
   """

Machine Translation

main_record = """
              Myeloid derived suppressor cells (MDSC) are immature myeloid 
              cells with immunosuppressive activity. They accumulate in 
              tumor-bearing mice and humans with different types of cancer, 
              including hepatocellular carcinoma (HCC).
              """
              
med.update_and_delete_main_record(main_record)

# call the machine translation function and print the output
print(med.get_translation_mt5("French"))

>> """
   Les cellules suppressives dérivées de myéloïdes (MDSC) sont des
   cellules myéloïdes immatures ayant une activité immunosuppressive, 
   accumulées chez des souris et des humains ayant différents types de 
   cancer, y compris le carcinome hépatocellulaire (HCC).
   """

Fine-tuned Models

In Ascle, users can access any publicly available language model. Additionally, we provide users with 32 of our fine-tuned models which are suitable for multiple-choice QA, text simplification, and machine translation tasks.

Plase feel to download our fine-tuned models:

Get involved

Please create a GitHub issue if you have any questions, suggestions, requests or bug-reports. We welcome PRs!

Contributors

This project started at the year of 2018. There are many people participated and made contributions:

Rui Yang*, Qingcheng Zeng*, Keen You*, Yujie Qiao*, Lucas Huang, Chia-Chun Hsieh, Benjamin Rosand, Jeremy Goldwasser, Amisha D Dave, Tiarnan D.L. Keenan, Emily Y Chew, Dragomir Radev, Zhiyong Lu, Hua Xu, Qingyu Chen, Irene Li

Especially in the memory of Prof. Dragomir Radev, who has dedicated so much to this project.

Paper

Please find our paper at https://arxiv.org/abs/2311.16588.

Citation

@misc{yang2023ascle,
      title={Ascle: A Python Natural Language Processing Toolkit for Medical Text Generation}, 
      author={Rui Yang and Qingcheng Zeng and Keen You and Yujie Qiao and Lucas Huang and Chia-Chun Hsieh and Benjamin Rosand and Jeremy Goldwasser and Amisha D Dave and Tiarnan D. L. Keenan and Emily Y Chew and Dragomir Radev and Zhiyong Lu and Hua Xu and Qingyu Chen and Irene Li},
      year={2023},
      eprint={2311.16588},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Contact

We will continue to maintain and update this repository. If you have any questions, feel free to contact us.
Rui Yang: yang_rui@u.nus.edu Dr. Irene Li: ireneli@ds.itc.u-tokyo.ac.jp