- Clone this repository and navigate to DrugAssist folder
git clone https://github.com/blazerye/DrugAssist.git
cd DrugAssist
- Install Package
conda create -n drugassist python=3.8 -y
conda activate drugassist
pip install -r requirements.txt
We release the sample dataset on Hugging Face at blazerye/MolOpt-Instructions, and you can use it for training, the full dataset is on going.
You can use LoRA to finetune Llama2-7B-Chat
model on the MolOpt-Instructions
dataset, the running command is as follows:
sh run_sft_lora.sh
You can merge LoRA weights to generate full model weights using the following command:
python merge_model.py \
--base_model $BASE_MODEL_PATH \
--lora_model $LORA_MODEL_PATH \
--output_dir $OUTPUT_DIR \
--output_type huggingface \
--verbose
You can use gradio to launch web demo by running the following command:
python gradio_service.py \
--base_model $FULL_MODEL_PATH \
--ip $IP \
--port $PORT
If you find DrugAssist useful for your research and applications, please cite using this BibTeX:
@misc{ye2024drugassist,
title={DrugAssist: A Large Language Model for Molecule Optimization},
author={Ye, Geyan and Cai, Xibao and Lai, Houtim and Wang, Xing and Huang, Junhong and Wang, Longyue and Liu, Wei and Zeng, Xiangxiang},
publisher={arXiv:2401.10334},
year={2024},
}
We appreciate LLaMA, Chinese-LLaMA-Alpaca-2, Alpaca, iDrug and many other related works for their open-source contributions.