/anonymous-code

Primary LanguagePythonMIT LicenseMIT

BrainCode

Project investigating human and artificial neural representations of python program comprehension and execution.

This pipeline supports several major functions.

  • MVPA (multivariate pattern analysis) evaluates decoding of code properties or code model representations from their respective brain representations within a collection of canonical brain regions.
  • PRDA (program representation decoding analysis) evaluates decoding of code properties from code model representations.

To run all core experiments from the paper, the following command will suffice after setup:

python braincode mvpa # runs all core MVPA analyses in parallel
python braincoda prda # runs all supplemental PRDA analyses in parallel

To regenerate tables and figures from the paper, run the following after completing the analyses:

cd paper/scripts
source run.sh # pulls scores, runs stats, generates plots and tables

Supported Brain Regions

  • brain-MD (Multiple Demand)
  • brain-lang (Language)
  • brain-vis (Visual)
  • brain-aud (Auditory)

Supported Code Features

Code Properties

  • test-code (code vs. sentences)
  • test-lang (english vs. japanese)
  • task-content (math vs. str) *datatype
  • task-structure (seq vs. for vs. if) *control flow
  • task-tokens (# of tokens in program) *static analysis
  • task-lines (# of runtime steps during execution) *dynamic analysis
  • task-bytes (# of bytecode ops executed)
  • task-nodes (# of nodes in AST)
  • task-halstead (function of tokens, operations, vocabulary)
  • task-cyclomatic (function of program control flow graph)

Code Models

  • code-projection (presence of tokens)
  • code-bow (token frequency)
  • code-tfidf (token and document frequency)
  • code-seq2seq 1 (sequence modeling)
  • code-xlnet 2 (autoregressive LM)
  • code-gpt2 4 (autoregressive LM)
  • code-bert 5 (masked LM)
  • code-roberta 6 (masked LM)
  • code-transformer 3 (LM + structure learning)

Installation

Requirements: Anaconda

conda create -n braincode python=3.7
source activate braincode
git clone --branch ICML2022 --depth 1 https://github.com/anonmyous-author/anonymous-code
cd braincode
pip install . # -e for development mode
cd setup
source setup.sh # downloads 'large' files, e.g. datasets, models

Run

usage: braincode [-h] [-f FEATURE] [-t TARGET] [-m METRIC] [-d CODE_MODEL_DIM]
                 [-p BASE_PATH] [-s] [-b]
                 {mvpa,prda}

run specified analysis type

positional arguments:
  {mvpa,prda}

optional arguments:
  -h, --help            show this help message and exit
  -f FEATURE, --feature FEATURE
  -t TARGET, --target TARGET
  -m METRIC, --metric METRIC
  -d CODE_MODEL_DIM, --code_model_dim CODE_MODEL_DIM
  -p BASE_PATH, --base_path BASE_PATH
  -s, --score_only
  -b, --debug

note: BASE_PATH must be specified to match setup.sh if changed from default.

Sample calls

# basic examples
python braincode mvpa -f brain-MD -t task-structure # brain -> {task, model}
python braincode prda -f code-bert -t task-tokens # model -> task

# more complex example
python braincode mvpa -f brain-lang+brain-MD -t code-projection -d 64 -m SpearmanRho -p $BASE_PATH --score_only
# note how `+` operator can be used to join multiple representations via concatenation
# additional metrics are available in the `metrics.py` module

Automation

Make

This package also provides an automated build using GNU Make. A single pipeline is provided, which starts from an empty environment, and proceeds through to the creation of paper tables and figures.

make paper # see 'make help' for more info

Docker

Build automation can also be containerized in Docker

make docker

Citation

If you use this work, please cite XXX (under review)

License

License: MIT