Pinned Repositories
abel
SOTA Math Opensource LLM
anole
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
auto-j
Generative Judge for Evaluating Alignment
factool
FacTool: Factuality Detection in Generative AI
MathPile
[NeurlPS D&B 2024] Generative AI for Math: MathPile
O1-Journey
O1 Replication Journey: A Strategic Progress Report – Part I
OlympicArena
This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"
OpenResearcher
OpenResearcher, an advanced Scientific Research Assistant
ProX
Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"
ReAlign
Reformatted Alignment
Generative Artificial Intelligence Research Lab (GAIR)'s Repositories
GAIR-NLP/O1-Journey
O1 Replication Journey: A Strategic Progress Report – Part I
GAIR-NLP/factool
FacTool: Factuality Detection in Generative AI
GAIR-NLP/anole
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
GAIR-NLP/OpenResearcher
OpenResearcher, an advanced Scientific Research Assistant
GAIR-NLP/MathPile
[NeurlPS D&B 2024] Generative AI for Math: MathPile
GAIR-NLP/abel
SOTA Math Opensource LLM
GAIR-NLP/auto-j
Generative Judge for Evaluating Alignment
GAIR-NLP/ProX
Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"
GAIR-NLP/ReAlign
Reformatted Alignment
GAIR-NLP/OlympicArena
This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"
GAIR-NLP/Entropy-ABF
Official implementation for 'Extending LLMs’ Context Window with 100 Samples'
GAIR-NLP/alignment-for-honesty
GAIR-NLP/weak-to-strong-reasoning
GAIR-NLP/OPO
GAIR-NLP/benbench
Benchmarking Benchmark Leakage in Large Language Models
GAIR-NLP/scaleeval
Scalable Meta-Evaluation of LLMs as Evaluators
GAIR-NLP/ReasonEval
Evaluating Mathematical Reasoning Beyond Accuracy
GAIR-NLP/MetaCritique
Evaluate the Quality of Critique
GAIR-NLP/MoPS
[ACL 2024] Code for "MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation"
GAIR-NLP/BeHonest
BeHonest: Benchmarking Honesty in Large Language Models
GAIR-NLP/Preference-Dissection
GAIR-NLP/cs2916
GAIR-NLP/SimulateBench
GPT as Human
GAIR-NLP/Safety-J
Safety-J: Evaluating Safety with Critique
GAIR-NLP/walnut-plan
The Walnut Plan
GAIR-NLP/self-improvement-reversal
GAIR-NLP/ChineseFactEval
GAIR-NLP/math-evaluation-harness
A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨