Task 5 data is missing in gold-paths-all.zip
yuchenlin opened this issue · 8 comments
In the goldpaths-all.zip
the data for Task 5 is not included. Is this expected? Thanks!
Thanks @yuchenlin for this issue report. It looks like indeed the gold path generator didn't have Task 5 included in its list. During development I'd generate these in small batches for just a few tasks at a time (since the whole run for all tasks takes a long time), and it looks like when I added all the tasks in a big list together to run overnight I accidentally left Task 5 out.
I've just pushed a fix in the branch I've been working on, which will eventually get merged into the main branch:
val specificTasks = Array(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29) // Do specific tasks
In the interim I've generated just the paths for Task 5 (which ran in just a few minutes), and added them to the goldpaths-all.zip file on that branch:
https://github.com/allenai/ScienceWorld/tree/exhaustivevalidactions/goldpaths
Thank you so much!
Hi Peter,
Sorry to bother you again. I found that actually the ids in the filname and the ids in the current SW env does not match. So the missing task 5 is actually the current Task 21 --> Task 2-3 measure-melting-point-unknown-substance
and the one in your new file of Task 5 is find-animal which is the Task 11 in the old mappings, and it was already included in the old json file.
Simply put, would you please generate the gold paths for the task "Task 2-3 measure-melting-point-unknown-substance"? Thank you very much!
Just in case you didn't know, there is a flag you can set when calling env.load(taskName, variationIdx, generateGoldPath=True)
to ask ScienceWorld to generate the gold path for a given task and variation.
@yuchenlin Here's the regenerated paths for Task 21 (measure-melting-point-unknown-substance
).
I think it's probably time to regenerate all the paths, so I've set those running, it just may take a few days.
Though -- I just realized that I'm operating on the exhaustivevalidactions
branch, which has a number of small fixes (particularly with enumerating the full valid action space, with all the aliases for action verbs, and most of the possible referent names). This might affect the paths somewhat. I'd recommend using that new branch if you're able (since it will likely be merged into master under a new release shortly). But if you would like the Task 21 paths using the current release, just let me know and I can redo those fairly quickly.
Actually it only took a day -- here's the full set of gold paths, regenerated on the current exhaustivevalidactions
branch.
goldsequences-0-1-2-3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-18-19-20-21-22-23-24-25-26-27-28-29.zip
The new goldpaths-all.zip
has been committed to that branch: 6ab99ab
And (just for interest) here's the report, the gold agents solve 99.7% of the task variations:
Simplifications: noElectricalAction, openDoors, selfWateringFlowerPots, teleportAction
---------------------------------
Total number of variations tested: 7207
Total number of variations with errors in gold path: 24
---------------------------------
0: boil min: 0.78 max: 1.00 avg: 0.99
1: change-the-state-of-matter-of min: 1.00 max: 1.00 avg: 1.00
2: chemistry-mix min: 1.00 max: 1.00 avg: 1.00
3: chemistry-mix-paint-secondary-color min: 1.00 max: 1.00 avg: 1.00
4: chemistry-mix-paint-tertiary-color min: 1.00 max: 1.00 avg: 1.00
5: find-animal min: 1.00 max: 1.00 avg: 1.00
6: find-living-thing min: 1.00 max: 1.00 avg: 1.00
7: find-non-living-thing min: 1.00 max: 1.00 avg: 1.00
8: find-plant min: 1.00 max: 1.00 avg: 1.00
9: freeze min: 0.82 max: 1.00 avg: 0.99
10: grow-fruit min: 0.46 max: 1.00 avg: 0.98
11: grow-plant min: 1.00 max: 1.00 avg: 1.00
12: identify-life-stages-1 min: 1.00 max: 1.00 avg: 1.00
13: identify-life-stages-2 min: 1.00 max: 1.00 avg: 1.00
14: inclined-plane-determine-angle min: 1.00 max: 1.00 avg: 1.00
15: inclined-plane-friction-named-surfaces min: 1.00 max: 1.00 avg: 1.00
16: inclined-plane-friction-unnamed-surfaces min: 1.00 max: 1.00 avg: 1.00
17: lifespan-longest-lived min: 1.00 max: 1.00 avg: 1.00
18: lifespan-longest-lived-then-shortest-lived min: 1.00 max: 1.00 avg: 1.00
19: lifespan-shortest-lived min: 1.00 max: 1.00 avg: 1.00
20: measure-melting-point-known-substance min: -1.00 max: 1.00 avg: 1.00
21: measure-melting-point-unknown-substance min: -1.00 max: 1.00 avg: 0.99
22: melt min: 1.00 max: 1.00 avg: 1.00
23: mendelian-genetics-known-plant min: 1.00 max: 1.00 avg: 1.00
24: mendelian-genetics-unknown-plant min: 1.00 max: 1.00 avg: 1.00
25: power-component min: 1.00 max: 1.00 avg: 1.00
26: power-component-renewable-vs-nonrenewable-energy min: 1.00 max: 1.00 avg: 1.00
27: test-conductivity min: -1.00 max: 1.00 avg: 0.99
28: test-conductivity-of-unknown-substances min: 1.00 max: 1.00 avg: 1.00
29: use-thermometer min: -1.00 max: 1.00 avg: 0.99
---------------------------------
Exporting gold action sequences...
Exporting gold action sequences... (goldsequences-0-1-2-3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-18-19-20-21-22-23-24-25-26-27-28-29.json)
* Task 0 (variations: 30
* Task 1 (variations: 30
* Task 2 (variations: 32
* Task 3 (variations: 36
* Task 4 (variations: 36
* Task 5 (variations: 300
* Task 6 (variations: 300
* Task 7 (variations: 300
* Task 8 (variations: 300
* Task 9 (variations: 30
* Task 10 (variations: 126
* Task 11 (variations: 126
* Task 12 (variations: 14
* Task 13 (variations: 10
* Task 14 (variations: 168
* Task 15 (variations: 1386
* Task 16 (variations: 162
* Task 17 (variations: 125
* Task 18 (variations: 125
* Task 19 (variations: 125
* Task 20 (variations: 436
* Task 21 (variations: 300
* Task 22 (variations: 30
* Task 23 (variations: 120
* Task 24 (variations: 480
* Task 25 (variations: 20
* Task 26 (variations: 20
* Task 27 (variations: 900
* Task 28 (variations: 600
* Task 29 (variations: 540
Completed...
@PeterAJansen can you push the script used to generate that data? Also, I'm thinking we should use the new task ID (i.e. 3-1).
We definitely should modify it to use the new task IDs!
The code to generate the gold paths is already in the repo, just poorly named :-/ . The critical bit is the specificTasks
line at the top, which is currently a list of all task numbers (0-30), but that we could change to a list of the task IDs (then make sure the call to loading the environment has the string instead of int signature -- though I forget if that's done on the Python or Scala side).