Project Alpha
Statistics 159/259: Reproducible and Collaborative Statistical Data Science
UC Berkeley | Fall 2015
[]
(https://travis-ci.org/berkeley-stat159/project-alpha?branch=master)
[]
(https://coveralls.io/r/berkeley-stat159/project-alpha?branch=master)
This repository stores the documentation of our analysis of the balloon- analogue risk task (BART) data included as part of the OpenfMRI ds009 data set [The generality of self-control] (https://openfmri.org/dataset/ds000009/). The original analysis was conducted in 2009 by Jessica Cohen for her doctoral thesis at UCLA, and was largely focused on comparing different notions of self-control across several fMRI studies. Our original aim was to reproduce the original analysis. However, we were unable to do so for many of the methods used, either due to the utilization of unavailable pre-packaged software or because we could not justify the underlying theory for some approaches. So, we chose to develop our own pre-processing pipeline and pursue new directions for identifying activation regions for the BART study. Our ultimate goal is to compare our results with those of Cohen's paper.
Many thanks to Jarrod Millman, Matthew Brett, J-B Poline, and Ross Barnowski for their advice and encouragement.
Navigating the Repository
The Makefile contains recipes to perform all final analysis of the data, generate the project report, and test all user-defined functions.
Please note that to perform the analysis and/or run the function test files,
you will need to download the data for analysis and the testing data. Please
see the project-alpha/data
directory for more details.
-
make all
: Downloads and validates the data and then performs all final analysis. Generates the final report, stored in thepaper
subdirectory. Please be aware that the raw data is 6 GB and that we also generate 14 GB of data in our analysis, so plan accordingly for download space and time. All images needed for the report are reproduced in theimages
directory. -
make data
: Downloads and decompresses the.tgz
archive of the data for our analysis from OpenfMRI.org. We are using the BART data from ds009, The generality of self-control. All downloaded files will be located in thedata/ds009
directory. Note that the.tgz
archive is about 6 GB, so users should plan accordingly for space and download time. -
make validate
: Validates the ds009 data after downloading it viamake data
. Checks if all files are present and have the correct hash, based on the hashes stored in thedata/ds009_hashes.json file
. -
make analysis
: Generate final figures and results. Please note that in order to generate our results, 14 GB of data will be produced. -
make report
: Generates the PDF of our report, including appendices. -
make eda
: Generate figures from exploratory analysis and sends console output toeda.txt
in the current directory. Please be aware that the output from this script could perhaps be better described as all of our supplementary data analysis that was not directly referenced in our paper. -
make clean
: Removes all extra files generated when compiling code. Does this recursively for all subdirectories -
make test
: Tests the functions located in thedata
andcode
directories, to be used to validate and analyze the data, respectively. -
make verbose
: Performs the same actions asmake test
, but uses the verbose nosetests option. -
make coverage
: Generates a coverage report for the functions located in thedata
andcode
directories.
Commands and documentation for generating additional and supplementary work can be found in various subdirectories. Please view their respective READMEs for more information.
code
: Code files for all user-defined functions used for analysis, the tests files for the functions, and scripts to run intermediate analysis.data
: Data used for our analysis and tests can be downloaded and validated here.final
: Scripts to run the final analysis and generate all figures required for the report and slides. Themake all
recipe in the present directory essentially runs all the scripts infinal
.images
: Stores all output figures generated by the analysis scripts. Figures required for the report and slides are cached for convenience, but may also be reproduced.paper
: Files and instructions to generate the report.proposal
: Files and instructions to generate the proposal.slides
: Files and instructions to generate the progress report and final slides.
Contributors
- Kent Chen (
kentschen
) - Rachel Lee (
reychil
) - Benjamin LeRoy (
benjaminleroy
) - Jane Liang (
janewliang
) - Hiroto Udagawa (
hiroto-udagawa
)