ysymyth/ReAct

Alfworld GPT-3 Results

gautierdag opened this issue · 3 comments

Hi,
I wondered if you had more details or numbers from your GPT-3 results on Alfworld? For instance, do you have the splits of accuracy across the different subtasks (as in Table 3 in the paper)?

I would try to reproduce it, but I reckon the total cost would be > $100 and would like to avoid it if possible.

Hi, at the end of 134 instances, the six category

prefixes = {
    'pick_and_place': 'put',
    'pick_clean_then_place': 'clean',
    'pick_heat_then_place': 'heat',
    'pick_cool_then_place': 'cool',
    'look_at_obj': 'examine',
    'pick_two_obj': 'puttwo'
}

has the final result

134 r 0 rs [19, 19, 7, 17, 16, 8] cnts [24, 31, 23, 21, 18, 17] sum(rs)/sum(cnts) 0.6417910447761194

e.g. put tasks are 19/24 correct.

Thank you!