Official implementation of π REFINER: Reasoning Feedback on Intermediate Representations π Blog Post
This repo proposes REFINER, an interaction-based framework for natural language reasoning tasks π₯. REFINER is a framework that refines LMs reasoning capabilities through feedback. Our work is the first to investigate how interacting with fine-grained reasoning feedback on intermediate reasoning steps impacts the performance of LMs on reasoning tasks.
We propose to solve these tasks by forcing the model to generate intermediate hypotheses (z) and improving them via structured feedback. We introduce an interactive framework named REFINER, made of two separate models: (a) a CRITIC model trained to provide structured feedback on intermediate reasoning steps and (b) a GENERATOR model trained to solve the reasoning task by first generating intermediate reasoning steps. The core idea of REFINER is to exploit the interaction between the generator model and the critic model, where the generatorβs intermediate reasoning steps are improved via structured feedback from the critic.
- compatible with python 3.8
- dependencies can be installed using
requirements.txt
- The codebase is built around Hugging Face ecosystem and wandb (for monitoring and experiment management).
Start by cloning the repository:
git clone git@github.com:debjitpaul/refiner.git
Install VirtualEnv using the following (optional):
$ [sudo] pip install virtualenv
Create and activate your virtual environment (optional):
$ virtualenv -p python3 venv
$ source venv/bin/activate
Install all the required packages:
$ pip install -r requirements.txt
Data | Reference | Output | Description |
---|---|---|---|
Math Word Problem | π , ποΈ, π | Math Equations (z) and Answers (y) | Generate an equation given a math word problem question |
Sythethic Natural Language Reasoning | π , ποΈ, π | Reasoning steps (z) and Conclusion (y) | This task requires the model to perform deductive reasoning and generate intermediate reasoning steps z and conclusions y using closed-world rules and facts. |
Moral Stories | π , ποΈ, π | Moral Norm (z) and Moral Action (y) | Given a context x consisting of a situation, an intention, and an immoral action, the model needs to generate the moral norm z and the moral action y |
Train a baseline model using PPO.
-
Train a Generator model without Critic in the loop (Warm Start).
-
Train the warm start generator model with critic in the loop. For training we used oracle critic.
-
Training REFINER with Low-rank Adaptation of Large Language Models (LORA) π.
python3 src/scripts/finetune.py --training-file path_train_data --validation-file path_val_data --language-model google/flan-t5-base --model-dir flan_t5_large_model --epochs 10 --batch-size 8
python3 src/scripts/finetune.py --training-file path_train_data --validation-file path_val_data --language-model google/flan-t5-base --model-dir flan_t5_large_model --epochs 10 --batch-size 8
python3 src/scripts/train_refiner.py --training-file data/mwp/critique_train.json --validation-file data/mwp/critique_val.json --language-model google/flan-t5-base --model-dir flan_t5_large_model --critique_model-dir output_critique --epochs 10 --batch-size 8 --number_turn 4
python3 src/scripts/test_predict.py --training-file data/mwp/critique_train.json --validation-file data/mwp/critique_val.json --language-model google/flan-t5-base --model-dir flan_t5_large_model --critique_model-dir output_critique --epochs 10 --batch-size 8 --number_turn 4
python3 src/scripts/test_predict.py --training-file data/mwp/critique_train.json --validation-file data/mwp/critique_val.json --language-model google/flan-t5-base --model-dir flan_t5_large_model --critique_model-dir output_critique --lora True --epochs 10 --batch-size 8 --number_turn 4
@misc{paul2023refiner,
title={REFINER: Reasoning Feedback on Intermediate Representations},
author={Paul, Debjit and Ismayilzada, Mete and Peyrard, Maxime and Borges, Beatriz and Bosselut, Antoine and West, Robert and Faltings, Boi},
eprint={2304.01904},
journal={arXiv preprint arXiv:2304.01904},
url={https://arxiv.org/pdf/2304.01904.pdf},
year={2023}
}