Can't reproduce OVMM result

Question

Can't reproduce OVMM result

asfandasfo opened this issue a year ago · 9 comments

Hi, I following these instructions (https://github.com/facebookresearch/home-robot/blob/main/projects/habitat_ovmm/README.md) to evaluate the rl_agent with GROUND_TRUTH_SEMANTICS: 0.
here are the results:
Metrics:
{'episode_count': 1199.0, 'does_want_terminate': 0.05337781484570475, 'num_steps': 1217.8340283569642, 'find_object_phase_success': 0.195162635529608, 'pick_object_phase_success': 0.09841534612176814, 'find_recep_phase_success': 0.0567139282735613, 'overall_success': 0.0016680567139282735, 'partial_success': 0.08798999165971642}

find_object_phase_success looks similar as reported in the paper but other results are very low- Am I missing something? Can you help? Thanks

Answer 1 · 2024-01-06T15:16:02.000Z

Hi, yes, the success numbers appear to be lower. I will check if there is an issue. It would be helpful if you could point me to the commit you are on.

Answer 2 · 2024-01-06T16:01:33.000Z

this one 4b3c1e1

Answer 3 · 2024-01-11T17:54:12.000Z

Hi, I reran evaluations with GT and Detic on 77746a7 (which is close to the version you are using, any changes between the two shouldn't affect the evaluations) and was able to reproduce the results. But I will also confirm that the latest main has no issues.

Can you please confirm if you are using the right checkpoints: did you download them using the install_deps.sh script from here? Also, what habitat-lab commit (under src/third_party/habitat-lab were you using?

Answer 4 · 2024-01-11T19:27:11.000Z

I did evaluate it with GT perception and got these results:

I have the same lower overall_success as in #374
Yes, I downloaded the checkpoints using the install_deps.sh script.
I am using 10a6e87 for habitat-lab.
I will try the latest main to see if I could reproduce the results.

Answer 5 · 2024-01-11T21:01:33.000Z

Just to confirm, you used the same set of commits in both evals (GT semantics and Detic), right?

Actually, I would expect to see a higher find_object_phase_success too with GT semantics when using those commits (which are post 0.2.5 upgrade). These do resemble the numbers reported in the other issue (which was before 0.2.5 upgrade). There is a possibility that your environment has older versions pre-installed even though your code is on some recent commit. Eg: Can you once double-check that the eval environment uses the right habitat version? Activate the conda environment and run python3 -c "import habitat; print(habitat.__version__)"

Answer 6 · 2024-01-12T02:17:56.000Z

Yes, I'm using the same commits for both evals. And you are right, I have installed the older version of the habitat. Thanks for pointing out. Probably that is the reason for the lower success numbers.

Answer 7 · 2024-01-18T13:53:02.000Z

Got similar numbers, evaluating the release branch (home_robot_v0.1.2): GT + RL

episode_count: 1199.0
does_want_terminate: 0.3769808173477898
num_steps: 930.7372810675563
find_object_phase_success: 0.5629691409507923
pick_object_phase_success: 0.5262718932443703
find_recep_phase_success: 0.43119266055045874
overall_success: 0.0633861551292744
partial_success: 0.3959549624687239

Answer 8 · 2024-04-11T02:47:17.000Z

Hi @cazhang, please refer to this comment. I'd suggest making sure that you are on the updated commit.

Answer 9 · 2024-05-03T03:21:44.000Z

I am assuming this issue is resolved. Please reopen if you are still facing issues.