/Verifier-Engineering

Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering

Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering

Overview

This is a collection of papers and other resources for verifier engineering, which corresponds to the paper Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering. We will update the paper content and this repo regularly, and we very much welcome suggestions of any kind.

Note

🌟 Feel free to submit pull requests to share your work and insights from the perspective of verifier engineering - your contributions are always welcome!

Overview of Common Verifiers

Verifier Type Verification Form Verify Granularity Verifier Source Extra Training
Golden Annotation Binary/Text Thought Step/Full Trajectory Program Based No
Rule-based Binary/Text Thought Step/Full Trajectory Program Based No
Code Interpreter Binary/Score/Text Token/Thought Step/Full Trajectory Program Based No
ORM Binary/Score/Rank/Text Full Trajectory Model Based Yes
Language Model Binary/Score/Rank/Text Thought Step/Full Trajectory Model Based Yes
Tool Binary/Score/Rank/Text Token/Thought Step/Full Trajectory Program Based No
Search Engine Text Thought Step/Full Trajectory Program Based No
PRM Score Token/Thought Step Model Based Yes
Knowledge Graph Text Thought Step/Full Trajectory Program Based No

A Verifier Engineering Perspective on Post-training Methods

Search Verify Feedback Task
STar
RFT
WizardMath
Linear Golden Annotation Imitation Learning Math
CAG Linear Golden Annotation Imitation Learning RAG
Self-Instruct Linear Rule-based Imitation Learning General
Code Alpaca
WizardCoder
Linear Rule-based Imitation Learning Code
ILF-Code Linear Code interpreter
Human
Imitation Learning Code
RAFT
RRHF
Linear ORM Imitation Learning General
SSO Linear Rule-based Preference Learning Alignment
CodeUltraFeedback Linear Language Model Preference Learning Code
Self-Rewarding Linear Language Model Preference Learning Alignment
StructRAG Linear Language Model Preference Learning RAG
MCTS-DPO Tree Language Model Preference Learning Math
Chain of Preference Optimization Tree Language Model Preference Learning Reasoning
LLAMA-BERRY Tree ORM Preference Learning Reasoning
Math-Shepherd Linear Golden Annotation
Rule-based
Reinforcement Learning Math
RLTF
PPOCoder
Linear Code Interpreter Reinforcement Learning Code
RLAIF Linear Language Model Reinforcement Learning General
SIRLC Linear Language Model Reinforcement Learning Reasoning
RLFH Linear Language Model Reinforcement Learning Knowledge
RLHF Linear ORM Reinforcement Learning Alignment
Quark Linear Tool Reinforcement Learning Alignment
ReST-MCTS Tree Language Model Reinforcement Learning Math
CRITIC Linear Code Interpreter
Tool
Search Engine
Verifier-Aware Math
Code
Knowledge
General
Self-Debug Linear Code Interpreter Verifier-Aware Code
Self-Refine Linear Language Model Verifier-Aware Alignment
ReAct Linear Search Engine Verifier-Aware Knowledge
Constrative Decoding Linear Language Model Verifier-Guided General
Chain-of-Verification Linear Language Model Verifier-Guided Knowledge
Inverse Value Learning Linear Language Model Verifier-Guided General
PRM Linear PRM Verifier-Guided Math
KGR Linear Knowledge Graph Verifier-Guided Knowledge
UoT Tree Language Model Verifier-Guided General
ToT Tree Language Model Verifier-Guided Reasoning

Citation

If you find our repo useful in your research, please consider citing:

@article{VerifierEngineering,
    title={Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering},
    author={Xinyan Guan, Yanjiang Liu, Xinyu Lu, Boxi Cao, Ben He, Xianpei Han, Le Sun, Jie Lou, Bowen Yu, Yaojie Lu, Hongyu Lin},
    journal={arXiv preprint arXiv:2411.11504},
    url={https://arxiv.org/abs/2411.11504}
    year={2024}
}

Star History

Star History Chart