Pinned Repositories
RLCD
Reproduction of "RLCD Reinforcement Learning from Contrast Distillation for Language Model Alignment
datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
LM
mutil-gpu-inference
Reinforcement-learning
The warehouse provides the classic dqn algorithm, and relies on the experimental environment of 'cartpole-v0' in gym to test.
LAMM
[NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents
MedicalGPT
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。
fastllm
纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行
LanShanPi's Repositories
LanShanPi/Reinforcement-learning
The warehouse provides the classic dqn algorithm, and relies on the experimental environment of 'cartpole-v0' in gym to test.
LanShanPi/LM
LanShanPi/mutil-gpu-inference