gym-codecraft

A gym-like environment with a Docker container sandbox for the agent to learn to code.

Main Idea

There are two seperate fields: Education and Learning.

Education is about creating an environment that facilitate whoever in that environment to learn. While Learning is about organizing the algorithms of an agent to solve novel problems.

In the context of Reinforcement Learning, we can imagine two systems like this:

Requirements

I'm using Windows, so Docker Desktop is needed. Also the Python library docker is needed, which can automatically get client from the environment.

The Python library gymnasium, which is the successor of the famous gym, is needed.

We'll mostly use the clean and lightweight docker version Alpine Linux for the tasks, but it'll be pulled automatically, no need to install in advance.

What to be expected?

Running a task involves (1) pulling a specified docker Image, (2) creating a Volume as the agent's workspace, (3) running a Container in the background, (4) sending commands to the Container.

Ending a task involves (1) stopping the Container, (2) leaving the Volume and the Image as it is. Overtime, Images, Volumes, Containers might be accumulated in Docker, please clean them as needed.

An interesting review was written by gpt-4-0613, explaining the structure of this project. (This is one of the task in the curriculum.)

Running the demo

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -e .
python demo.py

expected output:

(venv) gym-codecraft>python demo.py
({'obs': '\n'}, {})
{'action': 'test'}
{'obs': "No running container: please use `{'action':'reset', 'task_id':'?'}` to choose a task."}
Reward: -1
{'action': 'reset', 'task_id': '1'}
Pulling from library/python
Pulling fs layer
Waiting
Downloading: [============================>                      ]  360.4kB/622.3kB
Download complete
Downloading: [==============================>                    ]  7.588MB/12.44MB
Pull complete
Extracting: [=====================>                             ]  262.1kB/622.3kBB
Verifying Checksum
Download complete
Extracting: [==================================================>]  622.3kB/622.3kB
Pull complete
Downloading: [==================================================>]     243B/243BMB
Verifying Checksum
Download complete
Extracting: [====================================>              ]  9.175MB/12.44MB
Verifying Checksum
Download complete
Extracting: [==================================================>]  12.44MB/12.44MB
Pull complete
Extracting: [==================================================>]   3.09MB/3.09MB
Digest: sha256:25df32b602118dab046b58f0fe920e3301da0727b5b07430c8bcd4b139627fdc
Status: Downloaded newer image for python:alpine3.18
{'obs': 'Task 1:\n {\'category\': \'Python\', \'docker\': \'python:alpine3.18\', \'shell\': \'/bin/sh\', \'working_dir\': \'/tmp\', \'title\': \'Hello World\', \'description\': "Write a Python file `hello.py` printing the string \'Hello, World!\'", \'test\': "Entering the command `python hello.py` should print the string \'Hello, World!\'"}\n'}
{'action': 'lol'}
{'obs': 'Unknown action: lol'}
Reward: -1
{'action': 'command', 'command': 'pwd'}
{'obs': '/tmp\n'}
{'action': 'write_file', 'path': 'hello.py', 'content': 'print("Hello, world!")'}
{'obs': 'File hello.py written.'}
{'action': 'command', 'command': 'ls'}
{'obs': 'hello.py\n'}
{'action': 'command', 'command': 'cat hello.py'}
{'obs': 'print("Hello, world!")'}
{'action': 'command', 'command': 'python hello.py'}
{'obs': 'Hello, world!\n'}
{'action': 'submit'}
{'obs': 'Code submitted.'}
Reward: 1
{'action': 'reset', 'task_id': '2'}
Pulling from library/alpine
Already exists
Digest: sha256:82d1e9d7ed48a7523bdebc18cf6290bdb97b82302a8a9c27d4fe885949ea94d1
Status: Downloaded newer image for alpine:3.18
{'obs': "Task 2:\n {'category': 'Git', 'docker': 'alpine:3.18', 'shell': '/bin/sh', 'working_dir': '/tmp', 'title': 'Git Clone', 'description': 'Clone a repo from github, using the URL https://github.com/liusida/gym-codecraft.git', 'test': 'Check the gym-codecraft repo is in the working directory'}\n"}
{'action': 'command', 'command': 'apk update; apk add git'}
{'obs': 'fetch https://dl-cdn.alpinelinux.org/alpine/v3.18/main/x86_64/APKINDEX.tar.gz\nfetch https://dl-cdn.alpinelinux.org/alpine/v3.18/community/x86_64/APKINDEX.tar.gz\nv3.18.2-58-gf4adaceb7ff [https://dl-cdn.alpinelinux.org/alpine/v3.18/main]\nv3.18.2-58-gf4adaceb7ff [https://dl-cdn.alpinelinux.org/alpine/v3.18/community]\nOK: 20062 distinct packages available\n(1/9) Installing ca-certificates (20230506-r0)\n(2/9) Installing brotli-libs (1.0.9-r14)\n(3/9) Installing libunistring (1.1-r1)\n(4/9) Installing libidn2 (2.3.4-r1)\n(5/9) Installing nghttp2-libs (1.53.0-r0)\n(6/9) Installing libcurl (8.1.2-r0)\n(7/9) Installing libexpat (2.5.0-r1)\n(8/9) Installing pcre2 (10.42-r1)\n(9/9) Installing git (2.40.1-r0)\nExecuting busybox-1.36.1-r0.trigger\nExecuting ca-certificates-20230506-r0.trigger\nOK: 18 MiB in 24 packages\n'}
{'action': 'command', 'command': 'git --help'}
{'obs': "usage: git [-v | --version] [-h | --help] [-C <path>] [-c <name>=<value>]\n           [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]\n           [-p | --paginate | -P | --no-pager] [--no-replace-objects] [--bare]\n           [--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]\n           [--config-env=<name>=<envvar>] <command> [<args>]\n\nThese are common Git 
commands used in various situations:\n\nstart a working area (see also: git help tutorial)\n   clone     Clone a repository into a new directory\n   init      Create an empty Git repository or reinitialize an existing one\n\nwork on the current change (see also: git help everyday)\n   add       Add file contents to the index\n   mv        Move or rename a file, a directory, or a symlink\n   restore   Restore working tree files\n   rm        Remove files from the working tree and from the index\n\nexamine the history and state (see also: git help revisions)\n   bisect    Use binary search to find the commit that introduced a bug\n   diff      Show changes between commits, commit and working tree, etc\n   grep      Print lines matching a pattern\n   log       Show commit logs\n   show   
   Show various types of objects\n   status    Show the working tree status\n\ngrow, mark and tweak your common history\n   branch    List, create, or delete branches\n   commit    Record changes to 
the repository\n   merge     Join two or more development histories together\n   rebase    Reapply commits on top of another base tip\n   reset     Reset current HEAD to the specified state\n   switch    Switch branches\n   tag       Create, list, delete or verify a tag object signed with GPG\n\ncollaborate (see also: git help workflows)\n   fetch     Download objects and refs from another repository\n   pull      Fetch from and integrate with another repository or a local branch\n   push      Update remote refs along with associated objects\n\n'git help -a' and 'git help -g' list available subcommands and some\nconcept guides. See 'git help <command>' or 'git help <concept>'\nto read about a specific subcommand or concept.\nSee 'git help git' for an overview of the system.\n"}        
{'action': 'command', 'command': 'git clone https://github.com/liusida/gym-codecraft.git .'}
{'obs': "Cloning into '.'...\n"}
{'action': 'submit'}
{'obs': 'Code submitted.'}
Reward: 1

Running it with ChatGPT

After running the hand-written demo, you can copy/paste to run the demo with ChatGPT, see how ChatGPT works on these tasks.

python demos/demo-chatgpt.py

GPT-3.5 is good enough to solve all current tasks.

Running it with OpenAI API

If you have an OpenAI API Key, like 'sk-...', you can put it in .env_template and rename the file to .env. Then you can start the GPT Agent demo:

python demos/demo-gpt-agent.py

This agent is quite simple, without a good System 2, so it can only solve some easy problems.

I Want You

This project is currently developed by me alone. You are very welcome to join and contribute, please let me know! The easiest way to connect me, I guess, is through Twitter @liusida2007.

Here are some Todo's:

Write more nice tasks in curriculum.json. (This would be the meat of this project and is very important!)
Giving correct reward based on the "test" section. Should I use GPT-3.5-turbo to generate the testing code?
How to construct step-by-step rewards (e.g. Lightman, et al. 2023)
How to render or monitor the environment? Having a sense of how the agent is doing.
Make a LangChain-GPT-3.5-turbo-based agent that can play in this environment