/medpodgpt

MedPodGPT: A multilingual audio-augmented large language model for medical research and education

Primary LanguagePythonGNU Affero General Public License v3.0AGPL-3.0

MedPodGPT

Benchmarking Multilingual Medical Large Language Models (LLMs)



CODE_LICENSE DATA_LICENSE Model Weight License Python 3.10

๐ŸŽ‰ Announcements

[2024.7.14] Our AI Platform MedPodGPT is publicly available. It is an online platform for deploying our latest multimodal foundation models for medical and clinical applications. Please try it out if you are interested!

[2024.7.12] Our preprint is available online! Please check it!

[2024.7.12] We are releasing a new benchmark encompassing the latest USMLE Step 1, Step 2, Step 3, and Ethics to further advance the filed. Check our database here.

[2024.7.11] We open-sourced the source codes of our MedPodGPT: medical LLMs in your pocket and benchmarking multilingual medical LLMs.

๐Ÿ“š Table of Contents

๐Ÿ’ป Installation

pip install -r requirements.txt

๐Ÿš€ Quick Start

๐Ÿฃ Train Lightweight Models

For lightweight models (2B, 7B, and 8B), we optimize the entire model. Please check and setup hyper-parameters in config_small.yml.

python main_small.py

๐Ÿฅ Train Heavy Models

For lager and heavy models (>8B), we optimize the Low-rank Adapter (LoRA). Please check and setup hyper-parameters in config_large.yml.

python main_large.py

๐Ÿค Train Quantized Large Models

We also provide support for quantizing larger models, e.g., LLaMA 3 70B model, using the GPTQ algorithm and then optimizing the LoRA. The large models can be deployed on consumer GPUs after quantization.

We can directly use the Hugging Face transformers package to conduct quantization.

python quantization_HF.py --repo "meta-llama/Meta-Llama-3-70B-Instruct" --bits 4 --group_size 128

Alternatively, we also provide a quantization script by using the Python AutoGPTQ package.

python quantization.py "meta-llama/Meta-Llama-3-70B-Instruct" "./gptq_model" "medical" --bits 4 --group_size 128 --desc_act 1 --dtype float16 --seqlen 2048 --damp 0.01

Then, we need to upload the model to Hugging Face,

python upload_quantized_model.py --repo "shuyuej/MedLLaMA3-70B-BASE-MODEL-QUANT" --folder_path "./gptq_model"

Lastly, we optimize the LoRA module,

python main_quantization.py

๐Ÿ“Š Performance Evaluation

All inferences are conducted using the vLLM engine. We use inference_pretrain.py and inference_single_model.py for larger models (>8B) and inference_sequential.py for smaller models (2B/7B/8B). Please check here for more information.

Note

Mistral 7B on Hindi MMLU Benchmarks:
Please un-comment this line.
To address the issue of repeated content in some responses, we applied a repetition_penalty during inference.

๐Ÿ“œ Prompt Format

We simply use Directly answer the best option: instead of Answer: to better guide LLMs to generate the best option and to easier extract the best option from the responses.
Please modify these lines if you wanna try other prompts.

Note

LLaMA 3 8B on Hindi MMLU Benchmarks:
Please modify these lines.
Because most responses are in mixed English-Hindi or English, we used เค•เฅƒเคชเคฏเคพ เคชเฅเคฐเคถเฅเคจ เค•เคพ เค‰เคคเฅเคคเคฐ เคนเคฟเค‚เคฆเฅ€ เคฎเฅ‡เค‚ เคฆเฅ‡เค‚ เค”เคฐ เคธเฅ€เคงเฅ‡ เคธเคฌเคธเฅ‡ เค…เคšเฅเค›เฅ‡ เคตเคฟเค•เคฒเฅเคช เค•เฅ‡ เคธเคพเคฅ เคœเคตเคพเคฌ เคฆเฅ‡เค‚: (Please answer the question in Hindi and directly answer the best option:) to guide the model.

english_prompt = "Directly answer the best option:"
english_prompt_pubmedqa = "Directly answer yes/no/maybe:"
hindi_prompt = "เคธเฅ€เคงเฅ‡ เคธเคฌเคธเฅ‡ เค…เคšเฅเค›เฅ‡ เคตเคฟเค•เคฒเฅเคช เค•เฅ‡ เคธเคพเคฅ เคœเคตเคพเคฌ เคฆเฅ‡เค‚:"
french_prompt = "Rรฉpondez directement avec la meilleure option:"
spanish_prompt = "Responde directamente con la mejor opciรณn:"
chinese_prompt = "็›ดๆŽฅๅ›ž็ญ”ๆœ€ไผ˜้€‰้กน:"

๐Ÿ”ง Single GPU For Lightweight Models

Important

Please note that if you wanna conduct model inference using multiple GPUs, the GPUs' memory cannot be successfully released. Please modify these lines and make use of this sh file.

inference_sequential.py

Sequentially evaluate the performance of multiple checkpoints (models).
Please note that we use --eval_pretrain to indicate whether to evaluate the original pre-trained model.

python inference_sequential.py --eval_pretrain True --id 35166 52749 70332 87915

๐Ÿ› ๏ธ Distributed GPUs For Heavy Models

Sequentially evaluate the performance of the original pre-trained model and all the checkpoints.
Special Notice: Please change the checkpoint IDs and CUDA_VISIBLE_DEVICES in the inference_large.sh file.

sh inference_large.sh

inference_pretrain.py

Only evaluate the performance of the original pre-trained model.

python inference_pretrain.py

inference_single_model.py

Only evaluate the performance of a single checkpoint (model).
Please note that --id is the checkpoint id.

python inference_single_model.py --id 35166

๐Ÿค– OpenAI ChatGPT Support

We also offer support for running OpenAI ChatGPT inference using API. Please enter your OpenAI API Key here.

Warning

Please note that OpenAI ChatGPT API is extremely expensive.
Please only use it if you have a budget for it!

python inference_chatgpt.py

๐Ÿ“š Dataset Description

For now, we released a demo dataset for you to run the codes. Please follow our instructions to transcribe your own podcasts and build your own dataset.

The podcasts data used for the continual pre-training of MedPodGPT:

๐Ÿ† Benchmarks and Results

Multilingual Benchmarks Description

We utilized a comprehensive set of medical benchmarks from the most widely spoken languages in the world, including English, Mandarin, French, Spanish, and Hindi.

Language Dataset # test examples # of choices Link Ref
English MedExpQA 125 5 Link Paper
MedQA 1273 4 Link Paper
MedMCQA 4183 4 Link Paper
PubMedQA 1000 3 Link Paper
MMLU - Anatomy 135 4 Link Paper
MMLU - Clinical Knowledge 265 4 Link Paper
MMLU - College Biology 144 4 Link Paper
MMLU - College Medicine 173 4 Link Paper
MMLU - Medical Genetics 100 4 Link Paper
MMLU - Professional Medicine 272 4 Link Paper
French MedExpQA 125 5 Link Paper
MedMCQA 622 5 Link Paper
MMLU - Anatomy 135 4 Link Paper
MMLU - Clinical Knowledge 265 4 Link Paper
MMLU - College Biology 144 4 Link Paper
MMLU - College Medicine 173 4 Link Paper
MMLU - Medical Genetics 100 4 Link Paper
MMLU - Professional Medicine 272 4 Link Paper
Spanish HEAD-QA 2742 4 Link Paper
MedExpQA 125 5 Link Paper
MMLU - Anatomy 135 4 Link Paper
MMLU - Clinical Knowledge 265 4 Link Paper
MMLU - College Biology 144 4 Link Paper
MMLU - College Medicine 173 4 Link Paper
MMLU - Medical Genetics 100 4 Link Paper
MMLU - Professional Medicine 272 4 Link Paper
Chinese MedQA-MCMLE 3426 4 Link Paper
CMMLU - Anatomy 148 4 Link Paper
CMMLU - Clinical Knowledge 237 4 Link Paper
CMMLU - College Medicine 273 4 Link Paper
CMMLU - Medical Genetics 176 4 Link Paper
CMMLU - Traditional Chinese Medicine 185 4 Link Paper
CMMLU - Virology 169 4 Link Paper
Hindi MMLU - Anatomy 135 4 Link Paper
MMLU - Clinical Knowledge 265 4 Link Paper
MMLU - College Biology 144 4 Link Paper
MMLU - College Medicine 173 4 Link Paper
MMLU - Medical Genetics 100 4 Link Paper
MMLU - Professional Medicine 272 4 Link Paper

Performance on In-domain Benchmarks

Zero-shot Cross-lingual Performance

๐Ÿ”ฅ Real-world Deployment

For real-world deployment, please refer to the vLLM Distributed Inference and Serving and OpenAI Compatible Server.

๐ŸŽฏ Automatic Speech Recognition

In the scripts folder, we provide Automatic Speech Recognition (ASR) service.

python audio2text.py

โš’๏ธ Dataset Builder

We used the following codes to pre-process our transcripts and generate training dataset. Please check these lines for different languages support.

python database_builder.py
python merge_database.py

๐Ÿ› ๏ธ Upload and Download Models

In the scripts folder, we offer support for both uploading and downloading models.

To upload your checkpoints to Hugging Face model repo,

python upload_model.py --repo "shuyuej/DrGemma2B" --id 35166 52749 70332 87915

To download your model or files from Hugging Face repo,

python download_model.py --repo "shuyuej/DrGemma2B" --repo_type "model" --save_dir "./save_folder"

๐Ÿ–ผ๏ธ Structure of the Code

At the root of the project, you will see:

โ”œโ”€โ”€ requirements.txt
โ”œโ”€โ”€ main_small.py
โ”œโ”€โ”€ main_large.py
โ”œโ”€โ”€ main_quantization.py
โ”œโ”€โ”€ config_small.yml
โ”œโ”€โ”€ config_large.yml
โ”œโ”€โ”€ config_quantization.yml
โ”œโ”€โ”€ config_chatgpt.yml
โ”œโ”€โ”€ lib
โ”‚   โ”œโ”€โ”€ data_manager.py
โ”‚   โ”œโ”€โ”€ model_loader_small.py
โ”‚   โ”œโ”€โ”€ model_loader_large.py
โ”‚   โ”œโ”€โ”€ model_loader_quantization.py
โ”‚   โ”œโ”€โ”€ evaluation_small.py
โ”‚   โ”œโ”€โ”€ evaluation_large.py
โ”‚   โ””โ”€โ”€ evaluation_chatgpt.py
โ”œโ”€โ”€ inference
โ”‚   โ”œโ”€โ”€ inference_large.sh
โ”‚   โ”œโ”€โ”€ inference_chatgpt.py
โ”‚   โ”œโ”€โ”€ inference_pretrain.py
โ”‚   โ”œโ”€โ”€ inference_sequential.py
โ”‚   โ””โ”€โ”€ inference_single_model.py
โ”œโ”€โ”€ download_files
โ”‚   โ”œโ”€โ”€ download_model_from_hf.py
โ”‚   โ””โ”€โ”€ download_model_to_local.py
โ”œโ”€โ”€ quantization
โ”‚   โ”œโ”€โ”€ quantization.py
โ”‚   โ””โ”€โ”€ upload_quantized_model.py
โ”œโ”€โ”€ scripts
โ”‚   โ”œโ”€โ”€ audio2text.py
โ”‚   โ”œโ”€โ”€ download_model.py
โ”‚   โ”œโ”€โ”€ upload_model.py
โ”‚   โ”œโ”€โ”€ database_builder.py
โ”‚   โ””โ”€โ”€ merge_database.py
โ”œโ”€โ”€ benchmark
โ”‚   โ”œโ”€โ”€ chinese_cmmlu
โ”‚   โ”œโ”€โ”€ chinese_mcmle
โ”‚   โ”œโ”€โ”€ english_medexpqa
โ”‚   โ”œโ”€โ”€ english_medmcqa
โ”‚   โ”œโ”€โ”€ english_medqa
โ”‚   โ”œโ”€โ”€ english_mmlu
โ”‚   โ”œโ”€โ”€ english_pubmedqa
โ”‚   โ”œโ”€โ”€ english_usmle
โ”‚   โ”œโ”€โ”€ french_medexpqa
โ”‚   โ”œโ”€โ”€ french_medmcqa
โ”‚   โ”œโ”€โ”€ french_mmlu
โ”‚   โ”œโ”€โ”€ hindi_mmlu
โ”‚   โ”œโ”€โ”€ spanish_headqa
โ”‚   โ”œโ”€โ”€ spanish_medexpqa
โ”‚   โ””โ”€โ”€ spanish_mmlu
โ””โ”€โ”€ utils
    โ”œโ”€โ”€ answer_utils.py
    โ”œโ”€โ”€ benchmark_utils.py
    โ”œโ”€โ”€ eval_chatgpt_utils.py
    โ”œโ”€โ”€ eval_large_utils.py
    โ”œโ”€โ”€ eval_small_utils.py
    โ”œโ”€โ”€ test_extraction_chinese.py
    โ”œโ”€โ”€ test_extraction_english.py
    โ”œโ”€โ”€ test_extraction_french.py
    โ”œโ”€โ”€ test_extraction_hindi.py
    โ”œโ”€โ”€ test_extraction_spanish.py
    โ””โ”€โ”€ utils.py

๐Ÿ™ Citation

If you find our work useful in your research, please consider citing it in your publications. We provide a BibTeX entry below.

@article {Jia2024medpodgpt,
	author       = {Jia, Shuyue and Bit, Subhrangshu and Searls, Edward and Claus, Lindsey and Fan, Pengrui and Jasodanand, Varuna H. and Lauber, Meagan V. and Veerapaneni, Divya and Wang, William M. and Au, Rhoda and Kolachalama, Vijaya B},
	title        = {{MedPodGPT}: A multilingual audio-augmented large language model for medical research and education},
	elocation-id = {2024.07.11.24310304},
	year         = {2024},
	doi          = {10.1101/2024.07.11.24310304},
	publisher    = {Cold Spring Harbor Laboratory Press},
	abstract     = {The proliferation of medical podcasts has generated an extensive repository of audio content, rich in specialized terminology, diverse medical topics, and expert dialogues. Here we introduce a computational framework designed to enhance large language models (LLMs) by leveraging the informational content of publicly accessible medical podcast data. This dataset, comprising over 4,300 hours of audio content, was transcribed to generate over 39 million text tokens. Our model, MedPodGPT, integrates the varied dialogue found in medical podcasts to improve understanding of natural language nuances, cultural contexts, and medical knowledge. Evaluated across multiple benchmarks, MedPodGPT demonstrated an average improvement of 2.31\% over standard open-source benchmarks and showcased an improvement of 2.58\% in its zero-shot multilingual transfer ability, effectively generalizing to different linguistic contexts. By harnessing the untapped potential of podcast content, MedPodGPT advances natural language processing, offering enhanced capabilities for various applications in medical research and education.Competing Interest StatementV.B.K. is on the scientific advisory board for Altoida Inc. and serves as a consultant to AstraZeneca. R.A. is a scientific advisor to Signant Health and NovoNordisk. The remaining authors declare no competing interests.Funding StatementNational Institutes of HealthAuthor DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).Yes I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesAll data produced are available online at https://github.com/vkola-lab/MedPodGPT.https://github.com/vkola-lab/MedPodGPT},
	URL          = {https://www.medrxiv.org/content/early/2024/07/12/2024.07.11.24310304},
	eprint       = {https://www.medrxiv.org/content/early/2024/07/12/2024.07.11.24310304.full.pdf},
	journal      = {medRxiv}
}

๐Ÿ“ง Contact

Core Contributor and Maintainer:

Database Contributor and Maintainer:

If you have any questions, please drop us an email at brucejia@bu.edu, sbit@bu.edu, and nsearls@bu.edu.

๐Ÿ”จ Contribution

We always welcome contributions to help make MedPodGPT Library better. If you would like to contribute, please submit a pull request.

๐Ÿ™Œ Acknowledgement

The MedPodGPT Library is created and maintained by the Kolachalama Laboratory.