/Cradle

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

Primary LanguagePythonMIT LicenseMIT

Cradle: Towards General Computer Control

[Website] [Arxiv] [PDF]

Python Version GitHub license

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.


Videos

  

Click on either of the video thumbnails above to watch them on YouTube.

Notice

We are still working on further cleaning up the code and constantly updating it. We are also extending Cradle to more games and software. Feel free to reach out!

Project Setup

Please setup your environment as:

conda create --name cradle-dev python=3.10
conda activate cradle-dev
pip3 install -r requirements.txt

To install GroundingDino:

Download its weights to the cache directory:

mkdir cache
cd cache
curl -L -C - -O https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha2/groundingdino_swinb_cogcoor.pth
cd ..

Note: You should have a CUDA environment, please make sure you have properly installed CUDA dependencies first. You can use the following command to detect it on Linux.

nvcc -V

Or search for its environment variable: CUDA_HOME or CUDA_PATH. On Windows it should be something like "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8" and on Linux like "/usr/local/cuda".

If you don't get the specific version, you should download cudatoolkit and cuDNN first (version 11.8 is recommended).

If you don't download CUDA correctly, after installing GroundingDino, the code will produce:

NameError: name '_C' is not defined

If this happened, please re-setup CUDA and pytorch, reclone the git and perform all installation steps again.

On Windows install from https://developer.nvidia.com/cuda-11-8-0-download-archive (Linux packages also available).

Make sure pytorch is installed using the right CUDA dependencies.

conda install pytorch torchvision cudatoolkit=11.8 -c nvidia -c pytorch

If this doesn't work, or you prefer the pip way, you can try something like:

pip3 install --upgrade torch==2.1.1+cu118 -f https://download.pytorch.org/whl/torch_stable.html
pip3 install torchvision==0.16.1+cu118 -f https://download.pytorch.org/whl/torch_stable.html

Now, you should install the pre-compiled GroundingDino with the project dependencies. You can use the package in our repo and the following commands:

cd deps
pip install groundingdino-0.1.0-cp310-cp310-win_amd64.whl
cd ..

Once it is installed, we need to pre-download some required model files and set some environment variables.

# Define the necessary environment variables, this can be done in the .env file in the /cradle directory
HUGGINGFACE_HUB_CACHE = "./cache/hf" # This can be the full path too, if the relative one doesn't work

# Pre-download huggingface files needed by GroundingDino
# This step may require a VPN connection
# Windows user needs to run it in git bash
mkdir $HUGGINGFACE_HUB_CACHE
huggingface-cli download bert-base-uncased config.json tokenizer.json vocab.txt tokenizer_config.json model.safetensors --cache-dir $HUGGINGFACE_HUB_CACHE

# Define the last necessary environment variable, this can be done in the .env file in the /cradle directory
# This step will avoid needing a VPN to run
TRANSFORMERS_OFFLINE = "TRUE"

If for some reason there is some incompatibility in installing or running GroundingDino, it's recommended to recreate your environment.

Only if really necessary, you can try to clone and compile/install GroundingDino yourself.

# Clone
cd ..
git clone https://github.com/IDEA-Research/GroundingDINO.git
cd GroundingDINO

# Build and install it
pip3 install -r requirements.txt
pip3 install .
cd ../Cradle

It should install without errors and now it will be available for any project using the same conda environment (cradle-dev).

To build the C++ code on Windows, you may need to install build tools.

Download them from https://visualstudio.microsoft.com/visual-cpp-build-tools/ Make sure to select "Desktop Environment with C++" and include the 1st 3 optional packages:

  • MSVC v141 or higher
  • Windows SDK for your OS version
  • CMake tools

To install the videosubfinder for the gather information module

Download the videosubfinder from https://sourceforge.net/projects/videosubfinder/ and extract the files into the res/tool/subfinder folder. We have already created the folder for you and included a test.srt, which is a required dummy file that will not affect results.

The file structure should be like this:

  • res
    • tool
      • subfinder
        • VideoSubFinderWXW.exe
        • test.srt
        • ...

Tunning videosubfinder

Use res/tool/general.clg to overwrite res/tool/subfinder/settings/general.cfg file. To get the best extraction results, you can tune the subfinder by changing the parameters in the settings/general.cfg file. You may follow the readme me in Docs folder to get more information about the parameters. Only modify it if absolutely necessary. Values have already been tuned to game scenario and environment setup.

To install the OCR tools

1. Option 1
# Download best-matching version of specific model for your spaCy installation
python -m spacy download en_core_web_lg

or

# pip install .tar.gz archive or .whl from path or URL
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.1/en_core_web_lg-3.7.1.tar.gz

2. Option 2
# Copy this url https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.1/en_core_web_lg-3.7.1.tar.gz
# Paste it in the browser and download the file to res/spacy/data
cd res/spacy/data
pip install en_core_web_lg-3.7.1.tar.gz

General guidelines

Always, always, ALLWAYS get the latest /main branch.

Any file with text content in the project in the resources directory (./res) should be in UTF-8 encoding. Use the cradle.utils to open/save files.

Infra code

1. OpenAI provider

OpenAI provider now can expose embeddings and LLM from OpenAI and Azure together. Users only need to create one instance of each and pass the appropriate configuration.

Example configurations are in /conf. To avoid exposing sensitive details, keys and other private info should be defined in environmental variables.

The suggested way to do it is to create a .env file in the root of the repository (never push this file to GitHub) where variables can be defined, and then mention the variable names in the configs.

Please check the examples below.

Sample .env file containing private info that should never be on git/GitHub:

OA_OPENAI_KEY = "abc123abc123abc123abc123abc123ab"
AZ_OPENAI_KEY = "123abc123abc123abc123abc123abc12"
AZ_BASE_URL = "https://abc123.openai.azure.com/"

Sample config for an OpenAI provider:

{
	"key_var" : "OA_OPENAI_KEY",
	"emb_model": "text-embedding-ada-002",
	"comp_model": "gpt-4-vision-preview",
	"is_azure": false
}

RDR2 Install

Cradle currently focuses on RDR2 game. You can get it from any PC platform you prefer. However, the current codebase has been tested on MS Windows.

Game Settings

1. Change settings before running the code.

1.1 Mouse mode

Change mouse mode in the control setting to DirectInput.

Original interface Changed interface
Original interface Changed interface

1.2 Control

Change both two 'Tap and Hold Speed Control' to on, so we can press w twice to run, saving the need to press shift. Also make sure 'Aiming Mode' to 'Hold To Aim', so we need to keep pressing the mouse right button when aiming.

Original interface Changed interface
Original interface Changed interface

1.3 Game screen

The recommended default resolution to use is 1920x1080, but it can vary if the 16:9 aspect ratio is preserved. Other resolution is not fully tested. DO NOT change the aspect ratio. Also, remember to set the game Screen Type to Windowed Borderless.

SETTING -> GRAPHICS -> Resolution = 1920X1080 and Screen Type = Windowed Borderless game_position

resolution

1.4 Mini-map

Remember to enlarge the icon to ensure the program is working well following: SETTING -> DISPLAY -> Radar Blip Size = Large and SETTING -> DISPLAY -> Map Blip Size = Large and SETTING -> DISPLAY -> Radar = Expanded (or press Alt + X).

minimap_setting

1.4 Subtitles

Enable to show the speaker's name in the subtitles.

subtitles_setting

Getting Started

To run the agent, follow these steps:

1- Launch the RDR2 game

2- To start from the beginning of Chapter #1, after you lauch the game, pass all introductory videos

3- Pause the game

4- Launch the framework agent with the command:

python prototype_runner.py 

Citation

If you find our work useful, please consider citing us!

@article{weihao2024cradle,
  title     = {{Towards General Computer Control: A Multimodal Agent For Red Dead Redemption II As A Case Study}},
  author    = {Weihao Tan and Ziluo Ding and Wentao Zhang and Boyu Li and Bohan Zhou and Junpeng Yue and Haochong Xia and Jiechuan Jiang and Longtao Zheng and Xinrun Xu and Yifei Bi and Pengjie Gu and Xinrun Wang and Börje F. Karlsson and Bo An and Zongqing Lu},
  journal   = {arXiv:2403.03186},
  month     = {March},
  year      = {2024},
  primaryClass={cs.AI}
}