openai/mle-bench

[Issue] Google Smartphone Decimeter Challenge 2022 - Hackable

Closed this issue · 1 comments

Hello,

I believe I have found a problem (or more like an agent found the problem) with the smartphone-decimeter-competition. There is a easy way to achieve a score of 0 which would put a solution leagues above any human ranking first. This is due to data that is present in the mle-bench's version of the public test folder of the smartphone competition. If you look at the official competition, https://www.kaggle.com/competitions/smartphone-decimeter-2022/, in the available test data that is used for prediction there are no files named 'span_log.nmea' . However, due to the test split being created from training data for mlebench, these files are present. If the agent is smart enough, it can use these files to achieve a perfect score of 0.

The solution to this issue is pretty simple, just ensuring the span_log.nmea files are removed from the test data folder similar to how the ground_truth.csv files are removed.

Thank you for flagging!!

We have catalogued this in the readme in #94, as per #66

TLDR:

  • we won't fix this now to avoid invalidating the leaderboard, you should ignore the issue and treat this comp as usual.
  • we will fix this issue, batched with other fixes when porting MLE-bench to openai/frontier-evals, timelines TBD.

Thanks again for flagging! Apologies for not immediately fixing.