https://arxiv.org/pdf/2409.12126
The latest version of the dataset can be downloaded here. It is available as a zip archive, with password linguisticreasoning
. The data is only available in this format in order to avoid it being picked up by crawlers, which would lead to it being accidentally included in the sort of web corpora often used to train LLMs and large scale machine translation models, rendering it useless as a benchmark.
- Please do not re-host this data as plain text in places where it might be picked up by web crawlers.
- If you are planning on evaluating your model with Linguini, you should ensure its contents are not in your training data.
See the CONTRIBUTING file for how to help out.
Linguini is CC-BY-SA licensed, as found in the LICENSE file.