Welcome to the Gallant Lab's offering for the 2016 IEEE International Conference on Systems, Man, and Cybernetics Hackathon!
For this project, you'll have access to brain activity from a human subject listening while they listen to hours of stories. Your goal is to design a method to decode their brain activity and predict what the subject heard. How you do it is completely up to you. We've provided easy starting points for NLP hackers wanting to construct custom features from the text data and ML hackers who just want to dive in using our features.
Participants must read and agree to the conditions described in the LICENSE.md
Start by downloading this repository.
$ git clone https://github.com/gallantlab/ieee-hackathon.git
To help you get a lay of the land, we made a Jupyter notebook containing a simple decoding example. We recommend starting here to familiarize yourself with the data. Github provides a simple way to view the notebook. Just viewing the notebook might be sufficient to get you started. Optionally, if you'd like to interact with it you'll need to install Jupyter and several other dependencies. An easy way to get all the dependencies is by installing Anaconda.
After cloning this repository with the git clone
command above, run the following commands:
$ cd ieee-hackathon/notebook
$ jupyter notebook
This should pop up a browser with the notebook, where you can edit and run the notebook.
Smaller files like the story text are included in this repository, and the larger files are stored in a publicly-accessible Box.
- responses.h5
- The voxel responses for a single subject listening to eleven training stories and one test story. The responses to individual stories are stored in separate arrays.
TextGrids
,ieee-hackathon/data/grids
- This folder contains files with the time-stamped text of the stories
TR Files
, `ieee-hackathon/data/trfiles- This folder contains information you'll need to align the stories with the brain responses
- features.h5
- The pre-computed feature vectors (computed from a
word2vec
language model) for each of the stories in theresponses.h5
file (except the test story).
- bsridge-directdecoding-alphas2to4-nboots10.h5
- Linear regression weights from a the example decoding model which predicts word embedding vectors from voxel activity.
At the very least you'll need to download the responses, which you can do by running the following script:
$ bash download_responses.sh
You might also want to download our precomputed stimulus features and the example decoding model:
$ bash download_features.sh
$ bash download_model.sh
The challenge is as pretty simple. Design a decoder that considerably outperforms our decoder and you win the prize! Train your decoder using the stimuli and responses for the following stories:
alternateithicatom
avatar
howtodraw
legacy
life
myfirstdaywiththeyankees
naked
odetostepfather
souls
undertheinfluence
And decode the responses in a prediction for the story we'll evaluate how well it performs on a held-out test story:
TEST
Please only use the test story to generate your submission. Do not peek at it to build your decoder.
Run your decoder on the responses stored in the TEST
field of responses.h5
. Produce a space-delimited text file in which each line contains the word predictions for a single time point. That is, the text file contains one line per row in the TEST
response array. Include a blank line if your decoder predicts no output for a given time point. For clarity, we've provided an example submission called GLABEXAMPLE_decoding_results.txt
Send an email to robertg@berkeley.edu
with the subject "<team_name> 2016 IEEE HACKATHON SUBMISSION
". Include as attachments the following files:I plan to be available for the beginning of the hackathon to take any questions, and but due to the time difference between Budapest and California, I'll
<team_name>_decoding_results.txt
: the aforementioned space-delimited text fileREADME.txt
(or.pdf
,.md
if you want to include images or formatting): a document containing a brief description of your approach
We'll evaluate the quality of your decoding using one quantitative and one qualitative method.
We'll take the decoded words you submitted for the test story and transform them into a semantic word embedding space using this pretrained word2vec
model. We'll sum the vectors for each time point and compute the correlation coefficient between the summed vector with the semantic vector from the actual story.
We'll present human raters with the original passage and output from several submissions including our best and have them judge which of the submissions best matches to the meaning of the original passage.
If you significantly beat our best in-lab decoder, we'll ask for your code to verify the results. If they hold, you win!
Good luck and feel free to email any questions to robertg@berkeley.edu!