/nocola

Official repository for NoCoLA dataset

Primary LanguagePython

NoCoLA

This repository is supporting the paper "NoCoLA: The Norwegian Corpus of Linguistic Acceptability" by Matias Jentoft and David Samuel at University of Oslo, Language Technology Group. NoCoLa are two datasets: "class" consisting of Norwegian language sentences with their binary acceptability judgements, and "zero" with pairs of unacceptable sentences with their acceptable counterparts.

The two datasets for linguistic acceptability are published here, for the -class version we have pre-made a split of 80/10/10 for training purposes.

If you wish to test a Norwegian Language Model for its competence in Norwegian grammar, all the necessary code is available in this repository.