/Apollo

Multilingual Medicine: Model, Dataset, Benchmark, Code

Primary LanguagePythonApache License 2.0Apache-2.0

Multilingual Medicine: Model, Dataset, Benchmark, Code

Covering English, Chinese, French, Hindi, Spanish, Hindi, Arabic So far

Python 3.10 Pytorch 2.1.2 transformers accelerate

πŸ“ƒ Paper β€’ 🌐 Demo β€’ πŸ€— ApolloCorpus β€’ πŸ€— XMedBench
δΈ­ζ–‡ | English

Apollo

🌈 Update

  • [2024.04.25] MedJamba released, train and evaluation code refer to repo.
  • [2024.03.07] Paper released.
  • [2024.02.12] ApolloCorpus and XMedBench is publishedοΌπŸŽ‰
  • [2024.01.23] Apollo repo is publishedοΌπŸŽ‰

Results

πŸ€— Apollo-0.5B β€’ πŸ€— Apollo-1.8B β€’ πŸ€— Apollo-2B β€’ πŸ€— Apollo-6B β€’ πŸ€— Apollo-7B β€’ πŸ€— Apollo-34B β€’ πŸ€— Apollo-72B

πŸ€— MedJamba

πŸ€— Apollo-0.5B-GGUF β€’ πŸ€— Apollo-2B-GGUF β€’ πŸ€— Apollo-6B-GGUF β€’ πŸ€— Apollo-7B-GGUF

Apollo

Usage Format

  • 0.5B, 1.8B, 2B, 6B, 7B: User:{query}\nAssistant:{response}<|endoftext|>
  • 34B, 72B: <|User|>:{query}\n<|Assistant|>:{response}<|endoftext|>

Dataset & Evaluation

  • Dataset πŸ€— ApolloCorpus

    Click to expand

    Apollo

    • Zip File
    • Data category
      • Pretrain:
        • data item:
          • json_name: {data_source}{language}{data_type}.json
          • data_type: medicalBook, medicalGuideline, medicalPaper, medicalWeb(from online forum), medicalWiki
          • language: en(English), zh(chinese), es(spanish), fr(french), hi(Hindi)
          • data_type: qa(generated qa from text)
          • data_type==text: list of string
            [
              "string1",
              "string2",
              ...
            ]
            
          • data_type==qa: list of qa pairs(list of string)
            [
              [
                "q1",
                "a1",
                "q2",
                "a2",
                ...
              ],
              ...
            ]
            
      • SFT:
        • json_name: {data_source}_{language}.json
        • data_type: code, general, math, medicalExam, medicalPatient
        • data item: list of qa pairs(list of string)
            [
              [
                "q1",
                "a1",
                "q2",
                "a2",
                ...
              ],
              ...
            ]
          
  • Evaluation πŸ€— XMedBench

    Click to expand
    • EN:

      • MedQA-USMLE
      • MedMCQA
      • PubMedQA: Because the results fluctuated too much, they were not used in the paper.
      • MMLU-Medical
        • Clinical knowledge, Medical genetics, Anatomy, Professional medicine, College biology, College medicine
    • ZH:

      • MedQA-MCMLE
      • CMB-single: Not used in the paper
        • Randomly sample 2,000 multiple-choice questions with single answer.
      • CMMLU-Medical
        • Anatomy, Clinical_knowledge, College_medicine, Genetics, Nutrition, Traditional_chinese_medicine, Virology
      • CExam: Not used in the paper
        • Randomly sample 2,000 multiple-choice questions
    • ES: Head_qa

    • FR: Frenchmedmcqa

    • HI: MMLU_HI

      • Clinical knowledge, Medical genetics, Anatomy, Professional medicine, College biology, College medicine
    • AR: MMLU_Ara

      • Clinical knowledge, Medical genetics, Anatomy, Professional medicine, College biology, College medicine

Results reproduction

Click to expand

We take Gemma-2b as example

  1. Download Dataset for project:

    bash 0.download_data.sh
    
  2. Prepare test and dev for specific model:

    • Create test data for with special token, you can use ./util/check.ipynb to check models' special tokens
    bash 1.data_process_test&dev.sh
    
  3. Prepare train data for specific model (Create tokenized data in advance):

    • You can adjust data Training order and Training Epoch in this step
    bash 2.data_process_train.sh
    
  4. Train the model

    • If you want to train in Multi Nodes please refer to ./scripts/multi_node_train_*.sh
    bash 3.single_node_train_gemma.sh
    
  5. (Optional) Proxy-Tuning: Directly improve model capabilities without fine-tuning

      bash src/proxy-tuning/scripts/eval/proxy_tuning.sh
    
  6. Evaluate your model: Generate score for benchmark

    bash 4.eval.sh
    
  7. Evaluate your model: Play with your ckpts in bash

    python ./src/evaluate/cli_demo.py --model_name='./ckpts/your/path/tfmr'
    

Acknowledgment

Citation

Please use the following citation if you intend to use our dataset for training or evaluation:

@misc{wang2024apollo,
   title={Apollo: Lightweight Multilingual Medical LLMs towards Democratizing Medical AI to 6B People},
   author={Xidong Wang and Nuo Chen and Junyin Chen and Yan Hu and Yidong Wang and Xiangbo Wu and Anningzhe Gao and Xiang Wan and Haizhou Li and Benyou Wang},
   year={2024},
   eprint={2403.03640},
   archivePrefix={arXiv},
   primaryClass={cs.CL}
}

Star History

Star History Chart