[Issue] Google Smartphone Decimeter Challenge 2022 - Hackable
Closed this issue · 1 comments
Hello,
I believe I have found a problem (or more like an agent found the problem) with the smartphone-decimeter-competition. There is a easy way to achieve a score of 0 which would put a solution leagues above any human ranking first. This is due to data that is present in the mle-bench's version of the public test folder of the smartphone competition. If you look at the official competition, https://www.kaggle.com/competitions/smartphone-decimeter-2022/, in the available test data that is used for prediction there are no files named 'span_log.nmea' . However, due to the test split being created from training data for mlebench, these files are present. If the agent is smart enough, it can use these files to achieve a perfect score of 0.
The solution to this issue is pretty simple, just ensuring the span_log.nmea files are removed from the test data folder similar to how the ground_truth.csv files are removed.
Thank you for flagging!!
We have catalogued this in the readme in #94, as per #66
TLDR:
- we won't fix this now to avoid invalidating the leaderboard, you should ignore the issue and treat this comp as usual.
- we will fix this issue, batched with other fixes when porting MLE-bench to openai/frontier-evals, timelines TBD.
Thanks again for flagging! Apologies for not immediately fixing.