Project investigating human and artificial neural representations of python program comprehension and execution.
This pipeline supports several major functions.
- MVPA (multivariate pattern analysis) evaluates decoding of code properties or code model representations from their respective brain representations within a collection of canonical brain regions.
- PRDA (program representation decoding analysis) evaluates decoding of code properties from code model representations.
To run all core experiments from the paper, the following command will suffice after setup:
python braincode mvpa # runs all core MVPA analyses in parallel
python braincoda prda # runs all supplemental PRDA analyses in parallel
To regenerate tables and figures from the paper, run the following after completing the analyses:
cd paper/scripts
source # pulls scores, runs stats, generates plots and tables
(Multiple Demand)brain-lang
Code Properties
(code vs. sentences)test-lang
(english vs. japanese)task-content
(math vs. str) *datatypetask-structure
(seq vs. for vs. if) *control flowtask-tokens
(# of tokens in program) *static analysistask-lines
(# of runtime steps during execution) *dynamic analysistask-bytes
(# of bytecode ops executed)task-nodes
(# of nodes in AST)task-halstead
(function of tokens, operations, vocabulary)task-cyclomatic
(function of program control flow graph)
Code Models
(presence of tokens)code-bow
(token frequency)code-tfidf
(token and document frequency)code-seq2seq
1 (sequence modeling)code-xlnet
2 (autoregressive LM)code-gpt2
4 (autoregressive LM)code-bert
5 (masked LM)code-roberta
6 (masked LM)code-transformer
3 (LM + structure learning)
Requirements: Anaconda
conda create -n braincode python=3.7
source activate braincode
git clone --branch ICML2022 --depth 1
cd braincode
pip install . # -e for development mode
cd setup
source # downloads 'large' files, e.g. datasets, models
usage: braincode [-h] [-f FEATURE] [-t TARGET] [-m METRIC] [-d CODE_MODEL_DIM]
[-p BASE_PATH] [-s] [-b]
run specified analysis type
positional arguments:
optional arguments:
-h, --help show this help message and exit
-f FEATURE, --feature FEATURE
-t TARGET, --target TARGET
-m METRIC, --metric METRIC
-d CODE_MODEL_DIM, --code_model_dim CODE_MODEL_DIM
-p BASE_PATH, --base_path BASE_PATH
-s, --score_only
-b, --debug
note: BASE_PATH must be specified to match if changed from default.
Sample calls
# basic examples
python braincode mvpa -f brain-MD -t task-structure # brain -> {task, model}
python braincode prda -f code-bert -t task-tokens # model -> task
# more complex example
python braincode mvpa -f brain-lang+brain-MD -t code-projection -d 64 -m SpearmanRho -p $BASE_PATH --score_only
# note how `+` operator can be used to join multiple representations via concatenation
# additional metrics are available in the `` module
This package also provides an automated build using GNU Make. A single pipeline is provided, which starts from an empty environment, and proceeds through to the creation of paper tables and figures.
make paper # see 'make help' for more info
Build automation can also be containerized in Docker
make docker
If you use this work, please cite XXX (under review)