allenai/ScienceWorld

Focusing on object sometimes removes it from list of interactable objects when building admissible commands.

MarcCote opened this issue · 8 comments

Todo
python examples/human.py --task-num 6 --var-num 0

1- look around
2- open door to greenhouse, score(8)
3- go to greenhouse , score(9)
4- look around
5- focus on pea plant in flower pot 3, score(50)
6- pick up flower pot 3, score(8)
7- go to hallway
8- open door to kitchen
9- go to kitchen, score(8)
10- focus on red box (losing focus)
11- move flower pot 3 to red box

12 - focus on pea plant (not listed in admissible commands!)

Hmmm interesting, I was unable to replicate this one. I wonder what the issue might be if it's not immediately repeatable:
(I had initially thought that the pea plant might have died from a lack of water on its journey to the kitchen, changing its name, but I think the flower pots in that task are "infinite watering" to prevent exactly that)

python examples/human.py --task-num 6 --var-num 0

...
> look 

This room is called the kitchen. In it, you see: 
        the agent
        a substance called air
        a chair. On the chair is: nothing.
        a counter. On the counter is: a bowl (containing a red apple, a banana, an orange, a potato), a drawer.
        a cupboard. The cupboard door is closed. 
        a freezer. The freezer door is closed. 
        a fridge. The fridge door is closed. 
        a glass jar (containing a substance called sodium chloride)
        a lighter
        a oven, which is turned off. The oven door is closed. 
        a painting
        a red box (containing nothing)
        a sink, which is turned off. In the sink is: nothing.
        a substance called soap
        a stopwatch, which is deactivated. 
        a stove, which is turned off. On the stove is: nothing.
        a table. On the table is: a glass cup (containing nothing).
        a thermometer, currently reading a temperature of 10 degrees celsius
You also see:
        A door to the bathroom (that is closed)
        A door to the hallway (that is open)
        A door to the outside (that is closed)
Reward: 0
Score: 83
isCompleted: False
'help' lists valid action templates, 'objects' lists valid objects, 'valid' lists valid action-object combinations (long!). 
'goals' lists progress on subgoals.
type 'exit' to quit.
> inventory

In your inventory, you see:
        a flower pot 5 (containing a pea plant in the reproducing stage with a tall height, soil)
        an orange
Reward: 0
Score: 83
isCompleted: False
'help' lists valid action templates, 'objects' lists valid objects, 'valid' lists valid action-object combinations (long!). 
'goals' lists progress on subgoals.
type 'exit' to quit.
> focus on red box

You focus on the red box.
Reward: 0
Score: 83
isCompleted: False
'help' lists valid action templates, 'objects' lists valid objects, 'valid' lists valid action-object combinations (long!). 
'goals' lists progress on subgoals.
type 'exit' to quit.
> move flower pot to red box

You move the flower pot 5 to the red box.
Reward: 0
Score: 83
isCompleted: False
'help' lists valid action templates, 'objects' lists valid objects, 'valid' lists valid action-object combinations (long!). 
'goals' lists progress on subgoals.
type 'exit' to quit.
> look in red box

Inside the red box is: 
        a flower pot 5 (containing a pea plant in the reproducing stage with a tall height, soil)
Reward: 0
Score: 83
isCompleted: False
'help' lists valid action templates, 'objects' lists valid objects, 'valid' lists valid action-object combinations (long!). 
'goals' lists progress on subgoals.
type 'exit' to quit.
> focus on pea plant

You focus on the pea plant.
Reward: 17
Score: 100
isCompleted: True
'help' lists valid action templates, 'objects' lists valid objects, 'valid' lists valid action-object combinations (long!). 
'goals' lists progress on subgoals.
type 'exit' to quit.
> 

Is focus on pea plant actually listed in the admissible commands? E.g., if you tab tab it?

Ah I see -- my mistake, the action sequence works but "focus on pea plant" is indeed not in the valid actions list under that referent. Since objects can have multiple referents, and enumerating all of them would make the valid actions list huge (especially when they include containers, e.g. X, X in Y, X in Y on Z), currently the input parser chooses a unique referent for each object in the valid actions list. But it's not doing a good job here, since it's picking "living thing" as the referent for the pea plant:

{ "action":"focus on living thing", "template_id":11, "obj_ids":[762], "type_ids":[67] }

Possible fixes are:

  • We could enumerate them all. This will make the valid actions list huge, but exhaustive, and likely not break anything (unless downstream agents break when the list is huge)
  • We could modify the return value so that "action" returns a list instead of a string. The list could be exhaustive, and only the first element of the list (e.g. "focus on living thing" above) guaranteed to be the least-ambiguous referent given the other visible objects. This is the most compact, but it requires an API change, so it'd break agents made before it.
  • We could add some sort of "preferred referent" for each object. I had thought about that when designing the referent system, but it seemed like it might be difficult to keep track of this/do this cleanly.

I've implemented the non-breaking Option #1 in this branch: https://github.com/allenai/scienceworld/tree/exhaustivevalidactions

The list of valid actions now returns a separate template for each valid string, and for actions that have multiple arguments it iterates all the possible combinations for each object, for each referent -- so the list can be much, much longer now.

Here's an example:

{"validActions": [{ "action":"close door to greenhouse", "template_id":1, "obj_ids":[17577], "type_ids":[152] },{ "action":"close door", "template_id":1, "obj_ids":[17577], "type_ids":[152] },{ "action":"close greenhouse door", "template_id":1, "obj_ids":[17577], "type_ids":[152] },{ "action":"close door to kitchen", "template_id":1, "obj_ids":[17562], "type_ids":[152] },{ "action":"close door", "template_id":1, "obj_ids":[17562], "type_ids":[152] },{ "action":"close kitchen door", "template_id":1, "obj_ids":[17562], "type_ids":[152] }, ...

Here, the same action (close the greenhouse door) now has the three versions the parser recognizes enumerated: close door to greenhouse, close door, and close greenhouse door. They're not guaranteed to be unique (and, many are not, so if they're used the parser will go into ambiguity-resolution mode and ask the agent which objects they meant, which creates other complications for agents. But, this probably solves more problems than it creates.

It's not exhaustively tested yet -- we'll likely want to run it a bunch and make sure there isn't some environment that has a ton of objects that are nested deep in containers that have many possible referents, to make sure some part of it doesn't break. The string enumeration is an iterator, so if it breaks anywhere it might be on the py4j sending some (now) extremely long JSON string of all the valid actions for it to parse. It's possible running the random agents across all the variations might help it find these issues (and/or running the gold agents).

I'm working on testing it -- it looks like it now returns options that aren't recognized by the parser. I'll have a look tomorrow and see if I can figure out where the issue is. :)

It looks like the valid action generation was not respecting closed containers, so it would potentially generate possible valid actions involving items in closed containers. I've changed it to point to a function that just looks for visible objects, and so far I'm not seeing any red flags.

@MarcCote possibly related to this, I just fixed a bug in exhaustivevalidactions where items in the inventory were not enumerating in the valid actions list. I've made a bunch of changes to the branch, so I'm not sure if this same bug is present in the original branch or not.

8813d39

But, we should probably plan to move the changes from this branch to main soon, and make a new major release. The benefits of this branch are that it enumerates essentially all the valid action possibilities (with possible action verb aliases, and different possible referents for the objects), meaning that for e.g. LMs that align their generated action to a valid action, the performance will now be much better. But, the cost of this is that it can take a while (sometimes up to seconds) to generate this fantastically large set of actions, so the simulator is much slower. That's my only reservation right now.

@PeterAJansen do you think now is a good time to move the changes to the main branch?