Code for decoding speech as text from neural data
This package contains Python code for the high-level aspects of decoding speech from neural data, including transfer learning across multiple subjects. It was used for all results in the paper "Machine translation of cortical activity to text with an encoder-decoder framework" (Makin et al., Nature Neuroscience, 2020). These high-level aspects include the structuring of the training, the organization by subjects, and the construction of TFRecord
s. The (low-level) training itself is done with the adjacent machine_learning
package, which implements sequence-to-sequence networks in TensorFlow.
-
Install TensorFlow 1.15.5, the final version of TF1.x.
pip install tensorflow-gpu==1.15.5
If you don't have a GPU you should install the CPU version
pip install tensorflow==1.15.5
Please consult the Tensorflow installation documents. The most important facts to know are that TF1.15 requires CUDA 10.0,
libcudnn7>=7.6.5.32-1+cuda10.0
, andlibnccl2>=2.6.4-1+cuda10.0
. (I have only tested with up to, not beyond, the listed versions of these libraries). Make sure the driver for your GPU is compatible with these versions of the cudNN and NCCL libraries. And the latest version of Python supported by TF1.15 is 3.7. -
Install the three required packages:
git clone https://github.com/jgmakin/utils_jgm.git pip install -e utils_jgm git clone https://github.com/jgmakin/machine_learning.git pip install -e machine_learning git clone https://github.com/jgmakin/ecog2txt.git pip install -e ecog2txt
Note that utils_jgm
requires the user to set up a configuration file; please see the README for that package.
In order to unify the vast set of parameters (paths, experimental block structure, neural-network hyperparameters, etc.), all experiments are organized with the help of two configuration files, block_breakdowns.json
, and YOUR_EXPERIMENT_manifest.yaml
, examples of each are included in this repository.
-
Edit the
block_breakdowns.json
to match your use case. The entries areSUBJECT_ID: {BLOCK: {"type: BLOCK_TYPE, "default_dataset": DEFAULT_DATASET_VALUE}}
where the
DEFAULT_DATASET_VALUE
is one of"training"
/"validation"
/"testing"
; and theBLOCK_TYPE
is whatever descriptive title you want to give to your block (e.g.,"mocha-3"
). Assigning types to the blocks allows them to be filtered out of datasets, according to information provided in themanifest
(see next item). Place your edited copy into a directory we will calljson_dir
. -
Edit one of the
.yaml
manifest files to something sensible for your case. The most important thing to know is that many of the classes in this package (andmachine_learning
) load their default attributes from thismanifest
. That means that, even though the keyword arguments for their constructors (__init__()
methods) may appear to default toNone
, thisNone
actually instructs the class to default to the argument's value in themanifest
.You don't have to set all the values before your first run, but in the very least, you should:
- Fix the paths/dirs. For the most part they are for writing, not reading, so you can set them wherever you like. For the three reading paths:
json_dir
must point to the location of yourblock_breakdowns.json
file (see previous item).bad_electrodes_path
must point to a (possibly empty) plain-text file listing (one entry per line) any bad channels. NB that these are assumed to be 1-indexed! (but will internally be converted to zero-indexing). Alternatively, you can provide (either via the manifest or as an argument to theECoGDataGenerator
) thegood_electrodes
directly.electrode_path
: you can ignore this unless you plan to plot results on the cortical surface (in which case contact me).
block_types
: these set necessary conditions for membership in one of the datasets,training
/validation
/testing
. For example, in themochastar_word_sequence.yaml
manifest file, thetesting
andvalidation
sets are allowed to include onlymocha-1
, but the training set is allowed to includemocha-1, ..., mocha-9
. So if amocha-3
block hasvalidation
as its"default_dataset"
in theblock_breakdowns.json
, it would be excluded altogether.grid_size
: Set this to match the dimensions of your ECoG grid.text_sequence_vocab_file
: You can provide a file with a list, one word per line, of all words to be targeted by the decoder. This key specifies just the name of the file; the file itself must live in thetext_dir
specified in__init__.py
. If you set this key toNone
, the package will attempt to build a list of unique targets directly from theTFRecord
s. An example vocab_file,vocab.mocha-timit.1806
, is included in this package.data_mapping
: Use this to set which data to use as inputs and outputs for the sequence-to-sequence network--see_ecog_token_generator
below.DataGenerator
: In themanifest
, this points to theECoGDataGenerator
indata_generators.py
, but you will probably want to subclass this class and point to your new (sub)class instead--see next item.
You can probably get away with leaving the rest of the values in the
.yaml
at their default values, at least for your first run.Finally, make sure
YOUR_EXPERIMENT_manifest.yaml
lives at thetext_dir
specified in__init__.py
(you can change this as you like, but remember that thetext_sequence_vocab_file
must live in the same directory). - Fix the paths/dirs. For the most part they are for writing, not reading, so you can set them wherever you like. For the three reading paths:
-
ECoGDataGenerator
, found indata_generators.py
, is a shell class for generating data--more particularly for writing out theTFRecords
that will be used for training and assessing your model--that plays nicely with the other classes. However, three of its (required!) methods are unspecified because they depend on how you store your data. (Dummy versions appear inECoGDataGenerator
; you can inspect their input and outputs there.) You should subclassECoGDataGenerator
and fill in these methods:-
_ecog_token_generator
: a Python generator that yields data structures in the form of adict
, each entry of which corresponds to a set of inputs and outputs on a single trial. For example, the entries might beecog_sequence
,text_sequence
,audio_sequence
, andphoneme_sequence
. The last two are not strictly necessary for speech decoding and can be left out--or you can add more. Just make sure that you return at least the data structures requested in thedata_mapping
specified in themanifest
. So e.g. if thedata_mapping
isdata_mapping = {'decoder_targets': 'text_sequence', 'encoder_inputs': 'ecog_sequence'}
then_ecog_token_generator
must yield dictionaries containing at least (but not limited to) atext_sequence
and anecog_sequence
. The entire dictionary will be written to aTFRecord
(one for each block), so it's better to yield more rather than fewer data structures, in case you change your mind later about thedata_mapping
but don't want to have to rewrite all theTFRecord
s.And one more thing: the
text_sequence_vocab_file
key in the experiment manifest is linked to thetext_sequence
in this data mapping. So if you plan to call yourdecoder_targets
something else, saymy_words
, then make sure to rename the key in the experiment manifest that points to a vocab file tomy_words_vocab_file
. -
_get_wav_data
: should return thesampling_rate
and audiosignal
for one (e.g.) block of audio data. This will allow you to make use of the built-in_get_MFCC_features
in constructing your_ecog_token_generator
. If you're never going to generate anaudio_sequence
, however, you can ignore it. -
_query
: should return the total number of examples in a group of blocks. This will allow you to allocate memory efficiently when using theget
method. However, the methods_query
andget
are not used elsewhere in the code; they are convenience functions for examining the data directly rather than through aTFRecord
.
-
The basic commands to train a model are as follows (you can e.g. run this in a Python notebook).
import ecog2txt.trainers as e2t_trainers
import ecog2txt.data_generators
# CREATE A NEW MODEL
trainer = e2t_trainers.MultiSubjectTrainer(
experiment_manifest_name=YOUR_EXPERIMENT_manifest.yaml,
subject_ids=[400, 401],
SN_kwargs={
'FF_dropout': 0.4, # overwriting whatever is in the manifest
'TEMPORALLY_CONVOLVE': True # overwriting whatever is in the manifest
},
DG_kwargs={
'REFERENCE_BIPOLAR': True, # overwriting whatever is in the manifest
},
ES_kwargs = {
'data_mapping': { # overwriting whatever is in the manifest
'encoder_inputs': 'ecog_sequence',
'decoder_targets': 'text_sequence',
},
},
)
# MAKE SURE ALL THE TFRECORDS ARE WRITTEN
for subject in trainer.ecog_subjects:
subject.write_tf_records_maybe()
trainer.subject_to_table()
# TRAIN THE TWO SUBJECTS IN PARALLEL
assessments = trainer.parallel_transfer_learn()