本项目用于Embedding模型的相关实验,包括Embedding模型评估、ReRank模型微调、Embedding模型微调、Embedding模型量化等。
参考脚本: src/baseline_eval
目录:
- bge_base_zh_eval.py: BGE-base-zh-v1.5模型评估,作为基线评估(baseline)
评估结果参考 docs/model_evaluation.md
文档。
- Using
sentence-transformers v3
:
python src/finetune/ft_embedding.py
- Using AutoTrain:
cd ./src/finetune
CUDA_VISIBLE_DEVICES=0 autotrain --config config.yml
- Using LlamaIndex Finetune Embeddings:
可查阅参考文献5
。
- Training and Finetuning Embedding Models with Sentence Transformers v3: https://huggingface.co/blog/train-sentence-transformers
- Fine-tune Embedding models for Retrieval Augmented Generation (RAG): https://www.philschmid.de/fine-tune-embedding-model-for-rag
- 俄罗斯套娃 (Matryoshka) 嵌入模型概述: https://huggingface.co/blog/zh/matryoshka
- Finetune Embeddings: https://docs.llamaindex.ai/en/stable/examples/finetuning/embeddings/finetune_embedding/
- NLP(八十六)RAG框架Retrieve阶段的Embedding模型微调: https://mp.weixin.qq.com/s?__biz=MzU2NTYyMDk5MQ==&mid=2247486333&idx=1&sn=29d00d472647bc5d6e336bec22c88139&chksm=fcb9b2edcbce3bfb42ea149d96fb1296b10a79a60db7ad2da01b85ab223394191205426bc025&token=1376257911&lang=zh_CN#rd
- How to Fine-Tune Custom Embedding Models Using AutoTrain: https://huggingface.co/blog/abhishek/finetune-custom-embeddings-autotrain
- Upload a dataset to the Hub: https://huggingface.co/docs/datasets/v1.16.0/upload_dataset.html