Usage

python .\cliport\demos_balanced.py n=2880 task=place-obj-in-container mode=train disp=False save_data=True

to enable visualization, use disp=True. This will:

generate 24 * 120 demos, so basically 120 demos for each of the 24 instructions

Sample output log looks like:

python .\cliport\demos_balanced.py n=2880 task=place-obj-in-container mode=train disp=False save_data=True
...
...
Total Reward: 1.000 | Done: True | Goal: place yellow hexagon into green bowl
Done episode: 2878
Oracle demo: 2879/2880 | Seed: 126
Total Reward: 1.000 | Done: True | Goal: place yellow hexagon into green bowl
Done episode: 2879
Oracle demo: 2880/2880 | Seed: 126
Total Reward: 1.000 | Done: True | Goal: place yellow hexagon into green bowl
Done episode: 2880
Collected: {'place red block into green box', 'place red block into green bowl', 'place green hexagon into green box', 'place yellow block into red bowl', 'place yellow hexagon into red bowl', 'place yellow block into green bowl', 'place green block into red bowl', 'place green hexagon into green bowl', 'place red block into red bowl', 'place green block into green box', 'place red block into red box', 'place green block into green bowl', 'place green block into red box', 'place yellow block into red box', 'place red hexagon into red bowl', 'place red hexagon into green box', 'place green hexagon into red bowl', 'place yellow hexagon into green box', 'place yellow hexagon into green bowl', 'place green hexagon into red box', 'place yellow hexagon into red box', 'place yellow block into green box', 'place red hexagon into red box', 'place red hexagon into green bowl'}

Note:

Currently the implementation for the task is generic, in that there is a class cliport/tasks/place-obj-in-container that generates data satisfying the requirement.

However, I am not sure if cliport can generate "balanced" dataset natively, as it seems demo.py is basically sampling using a random mechanism. To remedy this, I "hardcoded" the following mechanism:

inside demo_balanced.py, I am basically looping the following:
```
while True:
     episode, total_reward = [], 0
     seed += 2

     collected = 0
     while collected < 120:
         # Set seeds.
         np.random.seed(seed)
         random.seed(seed)
 		# generate data
```
where:
- collected makes sure that each instruction has collected 120 random samples (so 20 could be used for testing)
- inside the while collected < 120 loop, np.random.seed(seed) makes sure the randomness of the current instruction is fixed

Then, to promote randomness in the scene/table layout generated even within the same instruction, I added the line in place_obj_container.py:

def reset(self, env):
	super().reset(env)
	# some code omitted

	container_color = np.random.choice(["green", "red"])
	container =  np.random.choice(list(containers.keys()))

	obj_color = np.random.choice(["green", "red", "yellow"])
	obj = np.random.choice(list(objects.keys()))

	np.random.seed() # randomize layout regardless of the seed

	# generate layout

This arguably is hardcoded, but within a limited time I think this should do.

Therefore, the requirement of 120 instructions, but each with 100 different demos are realized.

For example, in the generated dset, three of the demos under the instruction for "place red block into green bowl" has the following layout visualization:

Demo1	Demo2	Demo3

where both distractors objects are randomized in shape/color/position.

jasonyux/cliport

Usage