allenai/ScienceWorld

Task 5 data is missing in gold-paths-all.zip

yuchenlin opened this issue · 8 comments

In the goldpaths-all.zip the data for Task 5 is not included. Is this expected? Thanks!

Thanks @yuchenlin for this issue report. It looks like indeed the gold path generator didn't have Task 5 included in its list. During development I'd generate these in small batches for just a few tasks at a time (since the whole run for all tasks takes a long time), and it looks like when I added all the tasks in a big list together to run overnight I accidentally left Task 5 out.

I've just pushed a fix in the branch I've been working on, which will eventually get merged into the main branch:

val specificTasks = Array(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29)           // Do specific tasks

In the interim I've generated just the paths for Task 5 (which ran in just a few minutes), and added them to the goldpaths-all.zip file on that branch:

https://github.com/allenai/ScienceWorld/tree/exhaustivevalidactions/goldpaths

Thank you so much!

Hi Peter,

Sorry to bother you again. I found that actually the ids in the filname and the ids in the current SW env does not match. So the missing task 5 is actually the current Task 21 --> Task 2-3 measure-melting-point-unknown-substance and the one in your new file of Task 5 is find-animal which is the Task 11 in the old mappings, and it was already included in the old json file.

Simply put, would you please generate the gold paths for the task "Task 2-3 measure-melting-point-unknown-substance"? Thank you very much!

Just in case you didn't know, there is a flag you can set when calling env.load(taskName, variationIdx, generateGoldPath=True) to ask ScienceWorld to generate the gold path for a given task and variation.

goldsequences-21.zip

@yuchenlin Here's the regenerated paths for Task 21 (measure-melting-point-unknown-substance).

I think it's probably time to regenerate all the paths, so I've set those running, it just may take a few days.

Though -- I just realized that I'm operating on the exhaustivevalidactions branch, which has a number of small fixes (particularly with enumerating the full valid action space, with all the aliases for action verbs, and most of the possible referent names). This might affect the paths somewhat. I'd recommend using that new branch if you're able (since it will likely be merged into master under a new release shortly). But if you would like the Task 21 paths using the current release, just let me know and I can redo those fairly quickly.

Actually it only took a day -- here's the full set of gold paths, regenerated on the current exhaustivevalidactions branch.

goldsequences-0-1-2-3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-18-19-20-21-22-23-24-25-26-27-28-29.zip

The new goldpaths-all.zip has been committed to that branch: 6ab99ab

And (just for interest) here's the report, the gold agents solve 99.7% of the task variations:

Simplifications: noElectricalAction, openDoors, selfWateringFlowerPots, teleportAction
---------------------------------
Total number of variations tested: 7207
Total number of variations with errors in gold path: 24
---------------------------------
  0:                                                         boil	min:  0.78      max:  1.00      avg:  0.99      
  1:                                change-the-state-of-matter-of	min:  1.00      max:  1.00      avg:  1.00      
  2:                                                chemistry-mix	min:  1.00      max:  1.00      avg:  1.00      
  3:                          chemistry-mix-paint-secondary-color	min:  1.00      max:  1.00      avg:  1.00      
  4:                           chemistry-mix-paint-tertiary-color	min:  1.00      max:  1.00      avg:  1.00      
  5:                                                  find-animal	min:  1.00      max:  1.00      avg:  1.00      
  6:                                            find-living-thing	min:  1.00      max:  1.00      avg:  1.00      
  7:                                        find-non-living-thing	min:  1.00      max:  1.00      avg:  1.00      
  8:                                                   find-plant	min:  1.00      max:  1.00      avg:  1.00      
  9:                                                       freeze	min:  0.82      max:  1.00      avg:  0.99      
 10:                                                   grow-fruit	min:  0.46      max:  1.00      avg:  0.98      
 11:                                                   grow-plant	min:  1.00      max:  1.00      avg:  1.00      
 12:                                       identify-life-stages-1	min:  1.00      max:  1.00      avg:  1.00      
 13:                                       identify-life-stages-2	min:  1.00      max:  1.00      avg:  1.00      
 14:                               inclined-plane-determine-angle	min:  1.00      max:  1.00      avg:  1.00      
 15:                       inclined-plane-friction-named-surfaces	min:  1.00      max:  1.00      avg:  1.00      
 16:                     inclined-plane-friction-unnamed-surfaces	min:  1.00      max:  1.00      avg:  1.00      
 17:                                       lifespan-longest-lived	min:  1.00      max:  1.00      avg:  1.00      
 18:                   lifespan-longest-lived-then-shortest-lived	min:  1.00      max:  1.00      avg:  1.00      
 19:                                      lifespan-shortest-lived	min:  1.00      max:  1.00      avg:  1.00      
 20:                        measure-melting-point-known-substance	min: -1.00      max:  1.00      avg:  1.00      
 21:                      measure-melting-point-unknown-substance	min: -1.00      max:  1.00      avg:  0.99      
 22:                                                         melt	min:  1.00      max:  1.00      avg:  1.00      
 23:                               mendelian-genetics-known-plant	min:  1.00      max:  1.00      avg:  1.00      
 24:                             mendelian-genetics-unknown-plant	min:  1.00      max:  1.00      avg:  1.00      
 25:                                              power-component	min:  1.00      max:  1.00      avg:  1.00      
 26:             power-component-renewable-vs-nonrenewable-energy	min:  1.00      max:  1.00      avg:  1.00      
 27:                                            test-conductivity	min: -1.00      max:  1.00      avg:  0.99      
 28:                      test-conductivity-of-unknown-substances	min:  1.00      max:  1.00      avg:  1.00      
 29:                                              use-thermometer	min: -1.00      max:  1.00      avg:  0.99      
---------------------------------
Exporting gold action sequences...
Exporting gold action sequences... (goldsequences-0-1-2-3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-18-19-20-21-22-23-24-25-26-27-28-29.json)
 * Task 0 (variations: 30
 * Task 1 (variations: 30
 * Task 2 (variations: 32
 * Task 3 (variations: 36
 * Task 4 (variations: 36
 * Task 5 (variations: 300
 * Task 6 (variations: 300
 * Task 7 (variations: 300
 * Task 8 (variations: 300
 * Task 9 (variations: 30
 * Task 10 (variations: 126
 * Task 11 (variations: 126
 * Task 12 (variations: 14
 * Task 13 (variations: 10
 * Task 14 (variations: 168
 * Task 15 (variations: 1386
 * Task 16 (variations: 162
 * Task 17 (variations: 125
 * Task 18 (variations: 125
 * Task 19 (variations: 125
 * Task 20 (variations: 436
 * Task 21 (variations: 300
 * Task 22 (variations: 30
 * Task 23 (variations: 120
 * Task 24 (variations: 480
 * Task 25 (variations: 20
 * Task 26 (variations: 20
 * Task 27 (variations: 900
 * Task 28 (variations: 600
 * Task 29 (variations: 540
Completed...

@PeterAJansen can you push the script used to generate that data? Also, I'm thinking we should use the new task ID (i.e. 3-1).

We definitely should modify it to use the new task IDs!

The code to generate the gold paths is already in the repo, just poorly named :-/ . The critical bit is the specificTasks line at the top, which is currently a list of all task numbers (0-30), but that we could change to a list of the task IDs (then make sure the call to loading the environment has the string instead of int signature -- though I forget if that's done on the Python or Scala side).

https://github.com/allenai/ScienceWorld/blob/main/simulator/src/main/scala/scienceworld/goldagent/ExampleGoldAgent.scala