/dev-diff_history

repo_for_quick_fixes

Primary LanguagePythonMIT LicenseMIT

Disclaimer

This is a PM copy for developing evals only. Should not be used unless you really know what you are doing.

You should run nethack_experiments/fulltext_history_rollout_pm.py --model_name_or_path=/tmp/convert_chkpt_mp/BIS-466/BIS-466/streaming_params_3000 --observation_save_dir=/tmp

You can download checkpoints from gs://focused-llama/bison-pl/BIS-466/pytorch

Athena setup is here: mrunner --config mrunner_configs.yaml --context athena_eval_sf run configs/config-eval-base.py

diff History for Neural Language Agents

This is the official code release accompanying the paper diff History for Neural Language Agents by Piterbarg, Pinto, and Fergus (arXiv preprint, 2024).


Tldr: diff history is a method for improving the quality of LM generations for decision-making settings through low-resource instruction tuning. We show that small LMs can be data efficiently tuned into highly competitive neural agents just by: (1) treating the LM as a policy; (2) extending model context lengths; (3) increasing the length of the history used to train/tune and prompt models; and (4) preprocessing observations in history with the Unix diff command.

πŸ–‡οΈ Project Page πŸ–‡οΈ | πŸ’‘ Abstract πŸ’‘ | πŸ“ Paper PDF πŸ“ | πŸ“₯ Dataset Coming Soon! πŸ“₯


Installation πŸ”Œ

Start by cloning our repo recursively.

git clone --recursive git@github.com:upiterbarg/diff_history.git

Install core dependencies

conda env create --file=conda_env.yaml
conda activate test

Install external dependencies

BabyAI-Text

cd external/Grounding_LLMs_with_online_RL/babyai-text/babyai; pip install -e .; cd ..
cd gym-minigrid; pip install -e.; cd ..
pip install -e .; cd ../../..

NLE (with seed changes)

sed -i '344,349d' external/nle/nle/env/tasks.py
sed -i '365,366d' external/nle/nle/env/tasks.py
cd external/nle; python setup.py build; python setup.py install; cd ../..

NetHack language wrapper

cd external/nle-language-wrapper; python setup.py build; python setup.py install; python -m setup develop; cd ../..

Vision NLE utilities

pip install git+ssh://git@github.com/facebookresearch/moolib
cd external/dungeonsdata-neurips2022/experiment_code
pip install -e . && cd ../../..

Navigating the Repo πŸ—ΊοΈ

--> conda_env.yaml         # Conda config.
--> finetune.py            # Finetuning script. Copies https://github.com/allenai/open-instruct/open_instruct/finetune.py, with token additions + masking.
--> action_textmap.py      # Interaction history tokens.
--> gpt2_resize.py         # Resize GPT-2 context length
--> utils.py               # Various utilities: configuring custom stop generation, computing diffs, setting seeds everywhere.
--> scripts                # Bash scripts
----- / launch.sh                    # Sample instruction tuning launch script
--> ds_configs             # Distributed training configs
----- / stage3_offloading_accelerate.conf    #  ZeRO Stage 3
--> nethack_experiments    # NetHack experiment code
----- / diff_history_rollout.py       # Test LMs with diff history in NetHack.
----- / fulltext_history_rollout.py   # Test LMs with full text history in NetHack.
----- / generate_aa_dataset.py        # Generate a dataset with full games using AutoAscend.
----- / format_interaction_histories.py      # Format interaction histories
----- / wrappers.py                   # Define pixel and LM interaction history wrappers for NetHack.
--> babyaitext_experiments
----- / diff_history_rollout.py       # Test LMs with diff history in BabyAI-Text
----- / fulltext_history_rollout.py   # Test LMs with full text history in BabyAI-Text
----- / generate_babyai_dataset.py    # Generate BabyAI-Text dataset with full games from the BabyAI bot
----- / format_interaction_histories.py      # Format interaction histories
----- / babyai_text_bot.py            # BabyAI Text bot.
----- / lm_wrappers.py                # Wrapper over BabyAI-Text for formatting interaction histories.
--> external               # Various external dependencies