RCI Agent for MiniWoB++

Welcome to the codebase for our paper, "Language Models can Solve Computer Tasks". In this codebase, you will find the implementation of our RCI agent, which uses a pre-trained language model to execute computer tasks in MiniWoB++ benchmark guided by natural language. The agent employs a simple RCI prompting scheme that allows it to improve its outputs.

[Website] [Arxiv Paper] [PDF]

Dependencies

The RCI agent is implemented in Python 3.9 and requires the following dependencies:

gym
openai
selenium
Pillow
regex

pip install -r requirements.txt

Note: MiniWoB++ is not officially supported on Windows. Please refer to this issue.

Usage

Setup

To run the code, you must first install MiniWoB++ and configure your OpenAI API key. MiniWoB++ is integrated with the OpenAI Gym environment. Navigate to the computergym directory and execute the following command to install it:

cd computergym
pip install -e .

Once that's done, you need to write your OpenAI API key in the example_config.json file, then rename the file to config.json

Run

To run the code, simply execute the following command:

python main.py --env [TASK NAME] --llm [LLM NAME] --num-episodes [NUM EPISODES] --erci [NUM Explicit RCI] --irci [NUM Implicit RCI] --sgrounding

Here are the arguments you need to specify:

--env: Name of the MiniWoB++ task you want to run. You can see the list of available tasks in available_tasks.txt
--llm: Name of the language model you want to use. The model name and corresponding API name are specified below:
- chatgpt: "gpt-3.5-turbo"
- davinci: "text-davinci-003"
- ada: "ada"
- babbage: "babbage"
- curie: "curie"
- davinci1: "davinci"
- davinci2: "text-davinci-002"
--num-episodes: Number of episodes to run the task
--erci: The number of explicit RCI loop for an action plan. -1 will remove the action plan sampling.
--irci: The number of implicit RCI loop for the agent grounding.
--sgrounding: If this is True, then the state grounding update is enabled.
--headless: If this is True, then the MiniWoB++ environment will run in headless mode.

Consider running the following command to verify if everything is functioning correctly:

python main.py --env choose-list --llm chatgpt --num-episodes 1 --irci 1 --sgrounding

Evaluation

Our project's approach has yielded impressive results, with our agent achieving the second-highest score out of all tested models. We have observed that our agent outperforms the baselines, with the exception of CC-Net (SL + RL), which uses dictionary-based typing actions.

What sets our RCI agent apart is that it accomplished this feat using 120 times fewer samples than WebN-T5-3B and 11,000 times fewer samples than CC-Net. Obtaining expert demonstrations and defining reward functions for computer tasks can be a daunting challenge, but our research highlights the potential of using LLMs to overcome these obstacles and achieve success in general computer tasks.

Check out our paper!

Our paper is available on Arxiv. If you use this code in your research, we kindly ask that you cite our paper.

@article{kim2023language,
      title={Language Models can Solve Computer Tasks}, 
      author={Geunwoo Kim and Pierre Baldi and Stephen McAleer},
      journal={arXiv preprint arXiv:2303.17491},
      year={2023},
}

posgnu/rci-agent