Alfworld GPT-3 Results
gautierdag opened this issue · 3 comments
gautierdag commented
Hi,
I wondered if you had more details or numbers from your GPT-3 results on Alfworld? For instance, do you have the splits of accuracy across the different subtasks (as in Table 3 in the paper)?
I would try to reproduce it, but I reckon the total cost would be > $100 and would like to avoid it if possible.
ysymyth commented
Hi, at the end of 134 instances, the six category
prefixes = {
'pick_and_place': 'put',
'pick_clean_then_place': 'clean',
'pick_heat_then_place': 'heat',
'pick_cool_then_place': 'cool',
'look_at_obj': 'examine',
'pick_two_obj': 'puttwo'
}
has the final result
134 r 0 rs [19, 19, 7, 17, 16, 8] cnts [24, 31, 23, 21, 18, 17] sum(rs)/sum(cnts) 0.6417910447761194
e.g. put
tasks are 19/24 correct.
ysymyth commented
A more complete trajectory is at https://gist.github.com/ysymyth/01045e5b65651eccd63a5a46964b8216
gautierdag commented
Thank you!