MedEaseIne β is a project aimed at summarizing medical texts and documents into patient-friendly summaries. It also facilitates question-answering based on medical context using state-of-the-art models. We have utilized the T5 (Text-To-Text Transfer Transformer) model from Hugging Face Transformers library to perform abstractive summarization and question answering on medical texts. The T5 model is fine-tuned on medical domain-specific data (the PubMed Subset Bulk and SumPubMed) to generate concise summaries. Likewise, we have also utilized Google Gemini API to extract information from texts or documents and summarize them into patient-friendly language as well as facilitate question-answering.
.
βββ app/
β βββ models/
β βββ question_answering/
β β βββ checkpoint-1500/
β βββ summarization/
β βββ summarization_final_trained_model
βββ static/
β βββ css/
β β βββ styles.css
β βββ images/
β βββ js/
β βββ script.js
βββ templates/
β βββ about_us.html
β βββ contact_us.html
β βββ home.html
β βββ layout.html
β βββ qna.html
β βββ result.html
β βββ summarize.html
βββ utils/
β βββ question_answering.py
β βββ summarization.py
βββ __init__.py
βββ forms.py
βββ routes.py
βββ scripts/
β βββ dataset_creation_pubmed_subset.ipynb
β βββ fine_tune_question_answer.ipynb
β βββ finetuning_T5_for_summarization.ipynb
β βββ subset_sumpubmed.py
β βββ sumpubmed_dataset_script.py
βββ .gitignore
βββ LICENSE
βββ README.md
βββ requirements.txt
βββ run.py
- All the datasets that were used to fine-tune our models can be found here.
- All the scripts that were used to train our models can be found here.
- This script was used to extract abstracts, and the whole content from PubMed articles and then the CHV dataset was used to replace complex terms with Consumer Vocabulary to simplify the abstracts. The simplified abstracts were used as target text to our T5-small model and the content were used as source.
- This script and this script were used to create a subset of SumPubMed dataset.
- This script was used to fine-tune our T5-small model for summarization task in CoLab environment.
- This script was used to fine-tune our T5-base model for the task of question-answering. SQuAD dataset was used for the fine-tuning process.
- Medical Documents and Texts Summarization π:
- T5 for medical articles summarization.
- Gemini for summary generation in patient-friendly language.
- Gemini for summarization of medical documents like Discharge Summaries, Medical Histories, Diagnostic Reports, Clinical Notes, Treatment Plans, etc.
- Question Answering β:
- Provides accurate answers to medical questions based on given medical contexts.
- Provides user the flexibility to ask any question related to the context or some common Health queries.
- Python
- Flask
- HTML/CSS/JavaScript
- Pandas
- NumPy
- Tensorflow
- PyTorch
- Transformers
- Google Gemini API
- Google CoLab
git clone https://github.com/Tangsang2003/Abstractive-Summarization-and-Question-Answering-of-Medical-Texts-using-T5.git
- For Windows:
python -m venv venv
- For Linux and MacOS:
python3 -m venv venv
- Activating the virtual environment For Windows:
venv\Scripts\activate
- For Linux and MacOS:
source venv/bin/activate
pip install -r "requirements.txt"
- Go to app and create directories:
.
βββ app/
β βββ models/
β βββ question_answering/
β β βββ checkpoint-1500/
β βββ summarization/
β βββ summarization_final_trained_model
$ cd app
$ mkdir models
$ cd models
$ mkdir question_answering
$ cd question_answering
$ mkdir checkpoint-1500
$ cd ..
$ mkdir summarization
$ cd summrization
$ mkdir summarization_final_trained_model
- Download T5 model for Summarization from here.
- Copy all the files to the
summarization_final_trained_model
directory. - Download T5 model for Question Answering from here.
- Copy all the files to the
checkpoint-1500
directory. - Obtain
GOOGLE_API_KEY
from here. - Setup the
SECRET_KEY
andGOOGLE_API_KEY
in your system'sEnvironment Variables
. - You can set your
SECRET_KEY
to be anything.
python run.py
If you'd like to contribute to this project, please follow these steps:
- Fork the repository.
- Create a new branch for your feature:
git checkout -b feature-name
. - Commit your changes:
git commit -m 'Add some feature'
. - Push to the branch:
git push origin feature-name
. - Submit a pull request.
- Development of a user feedback mechanism to improve our T5 model for summarization and question-answering.
- Creation of further-refined datasets.
- Deployment of the web application on AWS, Azure or Google Cloud.
- Explore partnerships with healthcare institutions, research organizations, or educational platforms to integrate MedEaseIne into clinical workflows, medical education, or research activities.