CLIP-Guided RL

Blog post detailing our experiments and findings: https://medium.com/@vaiciu.donatas/clip-is-all-rl-needs-fdb30eedbab9

Installation

conda create --name ani python=3.8
conda install -c conda-forge jupyterlab
conda install -c anaconda ipywidgets
conda install -c anaconda scipy
conda install -c conda-forge swig
pip install Cython
pip install open_clip_torch
pip install stable-baselines3[extra]
pip install pyglet==1.5.27
pip install tensorboardX
pip install gym[all]

For Windows, you will also need to install Mujoco:

Install mjpro150 to C:\Users\<User>\.mujoco\<mjpro150>
Install license key to C:\Users\<User>\.mujoco\mjkey.txt
Add C:\Users\<User>\.mujoco\mjpro150\bin to PATH
Enable Win32 long paths
run pip install gym[all] again

Experiments

Models

CLIP
CLOOB

Environments

Colab

Gym Installation Issues

Sometimes, OpenAI Gym installation fails on Colab. Therefore, it is recommended to pip install gym first, before all other packages

!pip install gym[box2d]
!pip install stable-baselines3[extra]
!pip install open_clip_torch
!pip install pyglet==1.5.27
!pip install tensorboardX
!pip install ftfy

Gym Render Issues

In order to render OpenAI Gym in Colab, please follow these instructions

!sudo apt-get update
!sudo apt-get install x11-utils 
!sudo pip install pyglet 
!sudo apt-get install -y xvfb python-opengl
!pip install pyvirtualdisplay

from pyvirtualdisplay import Display
display = Display(visible=0, size=(400, 300))
display.start()

CLIP

CLIP requires additional Python modules to work.

!git clone https://github.com/DzvinkaYarish/tartu-ani-clip-guided-rl.git
!cp -r /content/tartu-ani-clip-guided-rl/utils.py /content

Weights can be found here. CLIP will download them automatically

CLOOB

CLOOB requires additional Python modules to work.

!git clone https://github.com/DzvinkaYarish/tartu-ani-clip-guided-rl.git
!cp -r /content/tartu-ani-clip-guided-rl/utils.py /content
!cp -r /content/tartu-ani-clip-guided-rl/cloob /content

Weights can be found here. You have to download them manually.

!mkdir checkpoints
!wget https://ml.jku.at/research/CLOOB/downloads/checkpoints/cloob_rn50_yfcc_epoch_28.pt
!mv /content/cloob_rn50_yfcc_epoch_28.pt /content/checkpoints/cloob_rn50_yfcc_epoch_28.pt

How to modify Gym environment by populating it with images

Gym Box2D environments are quite minimalistic, and it is unlikely CLIP has seen images like that during training. This minimalism might confuse CLIP which as a result will produce incorrect similarity scores. Generally, creating prompts for such environments might be tricky which led to the idea of modifying the environment by pasting images of more realistic objects that CLIP might have seen during training. For example, in Lunar Lander you can replace the spaceship with an image of the cat which hopefully will make the job easier for CLIP. However, in order to do that you have to modify the source code of the environment. Below we provide the instructions on how to do that.

Before we start modifying the environment, we need to make some minor changes inside rendering.py ("<venv>/envs/ani/Lib/site-packages/gym/envs/classic_control/rendering.py")
Find the Viewer class and add the following method

    def draw_image(self, filename, width, height):
        geom = Image(filename, width, height)
        self.add_onetime(geom)
        return geom

Open the desired environment code (e.g. "<venv>/envs/ani/Lib/site-packages/gym/envs/box2d/lunar_lander.py")
We only need to change the render method since that is what returns the image
Just before the return draw the necessary images with the following code. Here we draw the image and apply transformation t that moves the image to coordinates x and y and then rotates the image by angle (radians).

t = rendering.Transform(
    translation=(x, y),
    rotation=angle,
)

self.viewer.draw_image(
    "<image_path>",
    width, height
).add_attr(t)

A complete example for Lunar Lander that replaces the shape-ship with a cat can be found below. You need to replace the render method with the following code.

    def render(self, mode="human"):
        from gym.envs.classic_control import rendering

        if self.viewer is None:
            self.viewer = rendering.Viewer(VIEWPORT_W, VIEWPORT_H)
            self.viewer.set_bounds(0, VIEWPORT_W / SCALE, 0, VIEWPORT_H / SCALE)

        for obj in self.particles:
            obj.ttl -= 0.15
            obj.color1 = (
                max(0.2, 0.2 + obj.ttl),
                max(0.2, 0.5 * obj.ttl),
                max(0.2, 0.5 * obj.ttl),
            )
            obj.color2 = (
                max(0.2, 0.2 + obj.ttl),
                max(0.2, 0.5 * obj.ttl),
                max(0.2, 0.5 * obj.ttl),
            )

        self._clean_particles(False)

        for p in self.sky_polys:
            self.viewer.draw_polygon(p, color=(0, 0, 0))

        ship = None
        ship_trans = None
        for obj in self.particles + self.drawlist:
            for f in obj.fixtures:
                trans = f.body.transform
                if type(f.shape) is circleShape:
                    t = rendering.Transform(translation=trans * f.shape.pos)
                    self.viewer.draw_circle(
                        f.shape.radius, 20, color=obj.color1
                    ).add_attr(t)
                    self.viewer.draw_circle(
                        f.shape.radius, 20, color=obj.color2, filled=False, linewidth=2
                    ).add_attr(t)
                else:
                    path = [trans * v for v in f.shape.vertices]
                    if ship is None:
                        ship = path
                        ship_trans = trans
                    self.viewer.draw_polygon(path, color=obj.color1)
                    path.append(path[0])
                    self.viewer.draw_polyline(path, color=obj.color2, linewidth=2)

        for x in [self.helipad_x1, self.helipad_x2]:
            flagy1 = self.helipad_y
            flagy2 = flagy1 + 50 / SCALE
            self.viewer.draw_polyline([(x, flagy1), (x, flagy2)], color=(1, 1, 1))
            self.viewer.draw_polygon(
                [
                    (x, flagy2),
                    (x, flagy2 - 10 / SCALE),
                    (x + 25 / SCALE, flagy2 - 5 / SCALE),
                ],
                color=(0.8, 0.8, 0),
            )

        x, y = LunarLander.centroid(ship)
        height = 3.4
        width = 3.4

        t = rendering.Transform(
            translation=(x, y),
            rotation=ship_trans.angle + 0.34,
        )

        self.viewer.draw_image(
            "<image_path>",
            width, height
        ).add_attr(t)

        return self.viewer.render(return_rgb_array=mode == "rgb_array")

    @staticmethod
    def centroid(vertexes):
        _x_list = [vertex[0] for vertex in vertexes]
        _y_list = [vertex[1] for vertex in vertexes]
        _len = len(vertexes)
        _x = sum(_x_list) / _len
        _y = sum(_y_list) / _len
        return _x, _y

You can test that your code works by running the following script

import gym
import matplotlib.pyplot as plt

ENV_NAME = 'LunarLander-v2'
env = gym.make(ENV_NAME)
env.reset()
img = env.render(mode="rgb_array")

plt.imshow(img)
plt.show()
env.close()