This repository contains a leaderboard of the ALFRED benchmark and BibTeX entries (until Feb. 2024).
Low-level step-by-step instructions + High-level goal instructions
Method | Test unseen | Test seen | |||
---|---|---|---|---|---|
SR | GC | SR | GC | ||
Seq2Seq | CVPR'20 | 0.39 (0.08) | 7.03 (4.26) | 3.98 (2.02) | 9.42 (6.27) |
MOCA | ICCV'21 | 5.30 (2.72) | 14.28 (9.99) | 22.05 (15.10) | 28.29 (22.05) |
E.T. | ICCV'21 | 8.57 (4.10) | 18.56 (11.46) | 38.42 (27.78) | 45.44 (34.93) |
LWIT | IJCAI'21 | 8.37 (5.06) | 19.13 (14.81) | 29.16 (24.67) | 38.82 (34.85) |
LACMA | EMNLP'23 | 9.2 | 20.1 | 32.4 | 40.5 |
HiTUT | ACLF'21 | 13.87 (5.86) | 20.31 (11.5) | 21.27 (11.10) | 29.97 (17.41) |
ABP | CVPRW'21 | 15.43 (1.08) | 24.76 (2.22) | 44.55 (3.88) | 51.13 (4.92) |
M-Track | CVPR'22 | 16.29 (7.66) | 22.60 (13.18) | 24.79 (13.88) | 33.35 (19.48) |
LLM-Planner | ICCV'23 | 16.42 | 23.37 | 18.20 | 26.77 |
MCR-Agent | AAAI'24 | 17.04 (9.69) | 30.13 (21.19) | ||
MAT | ICPR'22 | 21.84 (6.13) | 32.41 (10.59) | 33.01 (9.42) | 43.65 (14.68) |
AMSLAM | IROS'22 | 23.48 (2.36) | 34.64 (4.63) | 29.48 (3.28) | 40.88 (5.56) |
Scene-LLM | arXiv'24 | 25.15 | 33.75 | 26.52 | 37.09 |
FILM* | ICLR'22 | 26.49 (10.55) | 36.37 (14.30) | 27.67 (11.23) | 38.51 (15.06) |
LGS-RPA | RAL'22 | 35.41 (22.76) | 45.24 (22.76) | 40.05 (21.28) | 48.66 (28.97) |
Prompter* | arXiv'22 | 45.32 (20.79) | 56.57 (25.80) | 51.17 (25.12) | 60.22 (30.21) |
CAPEAM | ICCV'23 | 46.11 (19.45) | 57.33 (24.06) | 51.79 (21.60) | 60.50 (25.88) |
CAPEAM* | ICCV'23 | 49.84 (22.61) | 61.10 (27.00) | 50.62 (22.61) | 59.40 (27.49) |
ThinkBot# | arXiv'24 | 57.82 (26.93) | 67.75 (30.73) | 62.69 (32.02) | 71.64 (37.01) |
Human | 91.00 (85.80) | 94.50 (87.60) |
High-level goal instructions only
Method | Test unseen | Test seen | |||
---|---|---|---|---|---|
SR | GC | SR | GC | ||
LAV | arXiv'21 | 6.38 (3.12) | 17.27 (10.47) | 13.35 (6.31) | 23.21 (13.18) |
EmBERT | arXiv'21 | 7.52 (3.58) | 16.33 (10.42) | 31.77 (23.41) | 39.27 (31.32) |
HiTUT | ACLF'21 | 11.12 (4.50) | 17.89 (9.77) | 13.63 (5.57) | 21.11 (11.00) |
HLSM | CoRL'22 | 20.27 (5.55) | 30.31 (9.99) | 29.94 (8.74) | 41.21 (14.58) |
FILM* | ICLR'22 | 24.46 (9.67) | 34.75 (13.13) | 25.77 (10.39) | 36.15 (14.17) |
LGS-RPA | RAL'22 | 27.80 (12.92) | 38.55 (20.01) | 33.01 (16.65) | 41.71 (24.49) |
EPA | CVPRW'22 | 36.07 (2.92) | 39.54 (3.91) | 39.96 (2.56) | 44.14 (3.47) |
Prompter* | arXiv'22 | 41.53 (18.84) | 53.69 (24.20) | 47.95 (23.29) | 56.98 (28.42) |
RoboGPT# | arXiv'23 | 42.97 | 53.80 | 45.66 | 54.99 |
RoboGPT#* | arXiv'23 | 44.57 | 55.01 | 49.45 | 58.39 |
CAPEAM | ICCV'23 | 43.69 (17.64) | 54.66 (22.76) | 47.36 (19.03) | 54.38 (23.78) |
CAPEAM* | ICCV'23 | 45.72 (20.15) | 57.25 (24.73) | 46.64 (20.81) | 55.29 (25.47) |
Human | 91.00 (85.80) | 94.50 (87.60) |
The above methods are ranked by the SR on Test unseen
* for adopting templated action in FILM
# for very recent work