Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
This is a collection of papers and other resources for verifier engineering, which corresponds to the paper Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering. We will update the paper content and this repo regularly, and we very much welcome suggestions of any kind.
Note
🌟 Feel free to submit pull requests to share your work and insights from the perspective of verifier engineering - your contributions are always welcome!
Verifier Type | Verification Form | Verify Granularity | Verifier Source | Extra Training |
---|---|---|---|---|
Golden Annotation | Binary/Text | Thought Step/Full Trajectory | Program Based | No |
Rule-based | Binary/Text | Thought Step/Full Trajectory | Program Based | No |
Code Interpreter | Binary/Score/Text | Token/Thought Step/Full Trajectory | Program Based | No |
ORM | Binary/Score/Rank/Text | Full Trajectory | Model Based | Yes |
Language Model | Binary/Score/Rank/Text | Thought Step/Full Trajectory | Model Based | Yes |
Tool | Binary/Score/Rank/Text | Token/Thought Step/Full Trajectory | Program Based | No |
Search Engine | Text | Thought Step/Full Trajectory | Program Based | No |
PRM | Score | Token/Thought Step | Model Based | Yes |
Knowledge Graph | Text | Thought Step/Full Trajectory | Program Based | No |
Search | Verify | Feedback | Task | |
---|---|---|---|---|
STar RFT WizardMath |
Linear | Golden Annotation | Imitation Learning | Math |
CAG | Linear | Golden Annotation | Imitation Learning | RAG |
Self-Instruct | Linear | Rule-based | Imitation Learning | General |
Code Alpaca WizardCoder |
Linear | Rule-based | Imitation Learning | Code |
ILF-Code | Linear | Code interpreter Human |
Imitation Learning | Code |
RAFT RRHF |
Linear | ORM | Imitation Learning | General |
SSO | Linear | Rule-based | Preference Learning | Alignment |
CodeUltraFeedback | Linear | Language Model | Preference Learning | Code |
Self-Rewarding | Linear | Language Model | Preference Learning | Alignment |
StructRAG | Linear | Language Model | Preference Learning | RAG |
MCTS-DPO | Tree | Language Model | Preference Learning | Math |
Chain of Preference Optimization | Tree | Language Model | Preference Learning | Reasoning |
LLAMA-BERRY | Tree | ORM | Preference Learning | Reasoning |
Math-Shepherd | Linear | Golden Annotation Rule-based |
Reinforcement Learning | Math |
RLTF PPOCoder |
Linear | Code Interpreter | Reinforcement Learning | Code |
RLAIF | Linear | Language Model | Reinforcement Learning | General |
SIRLC | Linear | Language Model | Reinforcement Learning | Reasoning |
RLFH | Linear | Language Model | Reinforcement Learning | Knowledge |
RLHF | Linear | ORM | Reinforcement Learning | Alignment |
Quark | Linear | Tool | Reinforcement Learning | Alignment |
ReST-MCTS | Tree | Language Model | Reinforcement Learning | Math |
CRITIC | Linear | Code Interpreter Tool Search Engine |
Verifier-Aware | Math Code Knowledge General |
Self-Debug | Linear | Code Interpreter | Verifier-Aware | Code |
Self-Refine | Linear | Language Model | Verifier-Aware | Alignment |
ReAct | Linear | Search Engine | Verifier-Aware | Knowledge |
Constrative Decoding | Linear | Language Model | Verifier-Guided | General |
Chain-of-Verification | Linear | Language Model | Verifier-Guided | Knowledge |
Inverse Value Learning | Linear | Language Model | Verifier-Guided | General |
PRM | Linear | PRM | Verifier-Guided | Math |
KGR | Linear | Knowledge Graph | Verifier-Guided | Knowledge |
UoT | Tree | Language Model | Verifier-Guided | General |
ToT | Tree | Language Model | Verifier-Guided | Reasoning |
If you find our repo useful in your research, please consider citing:
@article{VerifierEngineering,
title={Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering},
author={Xinyan Guan, Yanjiang Liu, Xinyu Lu, Boxi Cao, Ben He, Xianpei Han, Le Sun, Jie Lou, Bowen Yu, Yaojie Lu, Hongyu Lin},
journal={arXiv preprint arXiv:2411.11504},
url={https://arxiv.org/abs/2411.11504}
year={2024}
}