paraGONG's Stars
Zjh-819/LLMDataHub
A quick guide (especially) for trending instruction finetuning datasets
allenai/reward-bench
RewardBench: the first evaluation tool for reward models.
OpenLMLab/MOSS-RLHF
Secrets of RLHF in Large Language Models Part I: PPO
yfzhang114/Generalization-Causality
关于domain generalization,domain adaptation,causality,robutness,prompt,optimization,generative model各式各样研究的阅读笔记
princeton-nlp/LESS
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning
This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."
RLHFlow/RLHF-Reward-Modeling
Recipes to train reward model for RLHF.
RLHFlow/Online-RLHF
A recipe for online RLHF and online iterative DPO.
XayahSuSuSu/Latex-HNUThesisTemplate
湖南大学本科毕业论文LaTeX模板(大理类)
louieworth/awesome-rlhf
An index of algorithms for reinforcement learning from human feedback (rlhf))
zepingyu0512/awesome-llm-understanding-mechanism
awesome papers in LLM interpretability
openai/transformer-debugger
wangshusen/DRL
Deep Reinforcement Learning
openai/weak-to-strong
Improbable-AI/curiosity_redteam
Official implementation of ICLR'24 paper, "Curiosity-driven Red Teaming for Large Language Models" (https://openreview.net/pdf?id=4KqkizXgXU)
microsoft/DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
thunlp/UltraChat
Large-scale, Informative, and Diverse Multi-round Chat Data (and Models)
OFA-Sys/InsTag
InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning
langchain-ai/langchain
🦜🔗 Build context-aware reasoning applications
ZigeW/data_management_LLM
Collection of training data management explorations for large language models
tatsu-lab/stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
ydyjya/Awesome-LLM-Safety
A curated list of safety-related papers, articles, and resources focused on Large Language Models (LLMs). This repository aims to provide researchers, practitioners, and enthusiasts with insights into the safety implications, challenges, and advancements surrounding these powerful models.
project-baize/baize-chatbot
Let ChatGPT teach your own chatbot in hours with a single GPU!
PAIR-code/lit
The Learning Interpretability Tool: Interactively analyze ML models to understand their behavior in an extensible and framework agnostic interface.
ridiculouz/LLMaAA
The official repository for paper "LLMaAA: Making Large Language Models as Active Annotators"
openai/openai-python
The official Python library for the OpenAI API
dsdanielpark/Bard-API
The unofficial python package that returns response of Google Bard through cookie value.
baichuan-inc/Baichuan2
A series of large language models developed by Baichuan Intelligent Technology