/zuco-benchmark

ZuCo Reading Task Classification Benchmark using EEG and Eye-Tracking Data

Primary LanguageMATLAB

Welcome to the ZuCo Benchmark on Reading Task Classification!

🧭 Starting from here, you can:

📖 Read the manuscript.

🔗 Gather more information on zuco-benchmark.com

💻 Look at our code for creating the baseline results

🏆 Create models or custom feature sets and participate in our challenge at EvalAI

About ZuCo and the Reading Task Classification

The Zurich Cognitive Language Processing Corpus (ZuCo 2.0) is a dataset combining EEG and eye-tracking recordings from subjects reading natural sentences as a resource for the investigation of the human reading process in adult English native speakers.

The benchmark is a cross-subject classification to distinguish between normal reading and task-specific information searching.

How Can I Use This Repository?

This repository is supposed to give you a starting point to participate in our challenge.
To run the code, follow the steps:

Dependencies

Version: Python 3.7.16
Using newer versions may lead to conflicts with h5py. Should you encounter any installation difficulties, please don't hesitate to open an issue.

  1. Install pip
  2. Create a virtual environment and activate it
  3. Run pip install -r requirements.txt

Data

Whole Dataset:

⚠️ Warning: the complete dataset contains about 70GB of files
You can also download individual files from the OSF
To download the whole dataset, execute
bash get_data.sh

Classification Features:

If you do not want to download the whole dataset, you can download the extracted features for each subject and feature set.
To do so, download features.zip, place the file under zuco-benchmark/src/ and unzip it.

Computing the Baseline Results

cd src
Run the code to produce baseline predictions with the SVM.
python benchmark_baseline.py

If you downloaded the whole dataset, the script will first extract the selected feature sets from the .mat files, which will take a while.

Participation

You can use the code in benchmark_baseline.py as a starting point and:

  • Try different models.
  • Use other feature sets or combinations. See config.py for available feature sets
  • Create your own feature combinations. To do that, take a look at the feature-extraction and add your own feature combination there.

To experiment with different models or feature combinations, you can use the validation.py, which tests your configuration using leave-one-out cross-validation on the training data.

Submission

If you have create_submission enabled in the config, benchmark_baseline.py will automatically create a submission file in the correct format.
For the submission format, check out the example files.
Head to EvalAI, fill in the required information and upload your submission.json.