We introduce Ascle, a pioneering natural language processing (NLP) toolkit designed for medical text generation. Ascle is tailored for biomedical researchers and healthcare professionals with an easy-to-use, all-in-one solution that requires minimal programming expertise.
Ascle consists of three modules:
🌟 Generative Functions: For the first time, Ascle includes four advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation;
Basic NLP Functions: Ascle consists of 12 essential NLP functions such as word tokenization and sentence segmentation;
Query and Search Capabilities: Ascle provides user-friendly query and search functions on clinical databases.
⚙️indicates that we have our fine-tuned models for this particular task.
⭐️indicates that we conducted evaluations for this particular task.
17_05_2024 - We are currently updating Ascle. In the next version, Ascle will include the question-answering task based on the RAG framework and will support multiple languages for all tasks.
07_11_2023 - New Release v2.2: we changed the toolkit name to Ascle from EHRKit, easier to use!
10_07_2023 - New Release v2.0: a large re-organization and improvement from v1.0.
24_05_2023 - New Release Pretrained Models for Machine Translation.
15_03_2022 - Merged the ehrkit folder to support off-shelf medical text processing.
10_03_2022 - Made all tests available in an ipynb file and updated the most recent version.
17_12_2021 - New folder collated_tasks containing Fall 2021 functionalities added
11_05_2021 - cleaned up the notebooks, fixed up the readme using depth=1.
04_05_2021 - Tests run-through added in tests
.
22_04_2021 - Freezing development.
22_04_2021 - Completed the tutorials and readme.
20_04_2021 - Spring functionality finished -- mimic classification, summarization, and query extraction.
You can download Ascle as a git repository; simply clone to your choice of directories (keep depth small to keep the old versions out and reduce size).
git clone https://github.com/Yale-LILY/Ascle.git
cd Ascle
python3 -m venv asclevir/
source asclevir/bin/activate
pip install -r requirements.txt
NOTE: there is a chance that your Python version is not compatible with scispacy, so you can install with the following command:
pip install scispacy
pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_core_sci_sm-0.5.0.tar.gz
Then you are good to go!
We provide various generative functions and basic NLP functions. A quick start is to run the demo.py:
cd Ascle
python demo.py
Note: this may take some time, as some packages will be downloaded.
from Ascle import Ascle
# create Ascle
med = Ascle()
# Text Simplification
main_record = """
The patient presents with symptoms of acute bronchitis,
including cough, chest congestion, and mild fever.
Auscultation reveals coarse breath sounds and occasional
wheezing. Based on the clinical examination, a diagnosis
of acute bronchitis is made, and the patient is prescribed
a short course of bronchodilators and advised to rest and
stay hydrated.
"""
# choose the model
layman_model = "ireneli1024/bart-large-elife-finetuned"
med.update_and_delete_main_record(main_record)
# call the text simplification function and print the output
print(med.get_layman_text(layman_model, min_length=20, max_length=70))
>> """
The patient presents with symptoms of acute bronchitis including
cough, chest congestion and mild fever. Auscultation reveals coarse
breath sounds and occasional wheezing. Based on these symptoms and
the patient's history of previous infections with the same condition,
the doctor decides that the patient is likely to have a cold or bronch.
"""
main_record = """
Myeloid derived suppressor cells (MDSC) are immature myeloid
cells with immunosuppressive activity. They accumulate in
tumor-bearing mice and humans with different types of cancer,
including hepatocellular carcinoma (HCC).
"""
med.update_and_delete_main_record(main_record)
# call the machine translation function and print the output
print(med.get_translation_mt5("French"))
>> """
Les cellules suppressives dérivées de myéloïdes (MDSC) sont des
cellules myéloïdes immatures ayant une activité immunosuppressive,
accumulées chez des souris et des humains ayant différents types de
cancer, y compris le carcinome hépatocellulaire (HCC).
"""
In Ascle, users can access any publicly available language model. Additionally, we provide users with 32 of our fine-tuned models which are suitable for multiple-choice QA, text simplification, and machine translation tasks.
Plase feel to download our fine-tuned models:
Please create a GitHub issue if you have any questions, suggestions, requests or bug-reports. We welcome PRs!
This project started at the year of 2018. There are many people participated and made contributions:
Rui Yang*, Qingcheng Zeng*, Keen You*, Yujie Qiao*, Lucas Huang, Chia-Chun Hsieh, Benjamin Rosand, Jeremy Goldwasser, Amisha D Dave, Tiarnan D.L. Keenan, Emily Y Chew, Dragomir Radev, Zhiyong Lu, Hua Xu, Qingyu Chen, Irene Li
Especially in the memory of Prof. Dragomir Radev, who has dedicated so much to this project.
Please find our paper at https://arxiv.org/abs/2311.16588.
@misc{yang2023ascle,
title={Ascle: A Python Natural Language Processing Toolkit for Medical Text Generation},
author={Rui Yang and Qingcheng Zeng and Keen You and Yujie Qiao and Lucas Huang and Chia-Chun Hsieh and Benjamin Rosand and Jeremy Goldwasser and Amisha D Dave and Tiarnan D. L. Keenan and Emily Y Chew and Dragomir Radev and Zhiyong Lu and Hua Xu and Qingyu Chen and Irene Li},
year={2023},
eprint={2311.16588},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
We will continue to maintain and update this repository. If you have any questions, feel free to contact us.
Rui Yang: yang_rui@u.nus.edu
Dr. Irene Li: ireneli@ds.itc.u-tokyo.ac.jp