This project contains all code to reproduce the analyses of the EEG manypipes project.
It is hosted on GitHub: https://github.com/NomisCiri/eeg_manypipes_arc
The archived code can be found on Zenodo: https://doi.org/10.5281/zenodo.6549049
This is a contribution by Stefan Appelhoff, Simon Ciranka, and Casper Kerrén from the Center of Adaptive Rationality (ARC)
Original documentation provided by the organizers can be found in the organizer_documentation
directory.
sourcedata
and derivatives
of this project are stored on GIN:
https://gin.g-node.org/sappelhoff/eeg_manypipes_arc
UPDATE 2023-01-23 -- AT REQUEST OF THE EEG MANY PIPELINES STEERING COMMITTEE, WE TURNED THE ACCESS TO THE DATA REPOSITORY TO PRIVATE.
The report for the analysis is in REPORT.txt
To run the code, you need to install the required dependencies first. We recommend that you follow these steps (assumed to be run from the root of this repository):
- Download Miniconda for your system: https://docs.conda.io/en/latest/miniconda.html
(this will provide you with the
conda
command) - Use
conda
to installmamba
:conda install mamba -n base -c conda-forge
(for more information, see: https://github.com/mamba-org/mamba; NOTE: We recommend that you installmamba
in yourbase
environment.) - Use the
environment.yml
file in this repository to create theemp
("EEGManyPipelines") environment:mamba env create -f environment.yml
- Activate the environment as usual with
conda activate emp
- After the first activation, run the following to activate pre-commit hooks:
pre-commit install
We recommend that you make use of the data hosted on GIN via Datalad.
If you followed the installation steps above, you should almost have a working installation of Datalad in your environment. The last step that is (probably) missing, is to install git-annex.
Depending on your operating system, do it as follows:
- ubuntu: mamba install -c conda-forge git-annex
- macos: brew install git-annex
(use Homebrew)
- windows: choco install git-annex
(use Chocolatey)
Use the following steps to download the data:
- clone:
datalad clone https://gin.g-node.org/sappelhoff/eeg_manypipes_arc
- go to root of cloned dataset:
cd eeg_manypipes_arc
- get a specific piece of the data
datalad get sourcedata/eeg_eeglab/EMP01.set
- ... or get all data:
datalad get *
Note that if you do not get
all the data (step 4. above), the data that you did not get
is not actually present on your system.
There is merely a symbolic link to a remote location (GIN).
Furthermore, the entire EEG data (even after get
) is "read only";
if you need to edit or overwrite the files (not recommended), you can run datalad unlock *
.
Under .github/workflows/run_analysis.yml
we have specified a test workflow that may be
helpful for you to inspect.
Before running the code on your system you must:
- Obtain the data (see above)
- Edit
config.py
to include the path to your data (seeFPATH_DS
variable)
- files unrelated to analysis
LICENSE
, detailing how our work is licensedREADME.md
, the information that you currently readsetup.cfg
, a file to configure different software tools to work well with each other (black, flake8, ...)CITATION.cff
, metadata on how to cite this code.gitignore
, which files not to track in the version control systemenvironment.yml
, needed to install software dependencies (see also "Installation" above).pre-commit-config.yaml
, configuration for "pre-commit hooks" that ease software developmentorganizer_documentation/*.pdf
, the original documentation provided by the EEG Many Pipelines project organizers.github/workflows/run_analysis.yml
, a continuous integration workflow definition for GitHub Actions
All other files are related to the analysis.
REPORT.txt
, containing four short paragraphs on the analysis of the four hypothesesreport_sheets/EEGManyPipelines_results_h*.xlsx
, Excel files with information about the analysis. Noteh*
stands for hypotheses 1, 2a, 3a, ..., 4b.config.py
, definitions of stable variables that are reused throughout other scripts; for example file pathsutils.py
, definitions of functions that are reused throughout other scripts
The Python script that are doing the heavy lifting have names that are prefixed with
two integers 00
, 01
, 02
, ...
This indicates the order in which to run the scripts.
The 00
are optional to run.
00_find_bad_subjs.py
, to find subjects to exclude from analysis based on behavioral performance (seeBAD_SUBJS
variable inconfig.py
)00_inspect_raws.py
, to interactively inspect raw EEG data00_prepare_handin.py
, only to prepare all files for handing in for the EEGManyPipelines submission
The preprocessing scripts are those from 01
to 06
.
These operate on single subjects.
01_find_bads.py
, finding bad channels using pyprep02_mark_bad_segments.py
, marking bad temporal segments using MNE-Python automatic methods03_run_ica.py
, running ICA, excluding previously found bad channels and segments04_inspect_ica.py
, find and exclude bad ICA components05_make_epochs.py
, epoch the data06_run_autoreject.py
, interpolate channels06b_check_autoreject.py
, provide a summary of interpolated channels
Note that these scripts can be easily run from the command line and that you can specify certain arguments there (see the scripts for more detail). This allows running several subjects from the command line like below:
for i in {1..33}
do
python -u 01_find_bads.py \
--sub=$i \
--overwrite=True
done
Finally, there is one script for testing each of the four hypotheses.
07_test_h1.py
, for hypothesis 108_test_h2.py
, for hypothesis 209_test_h3.py
, for hypothesis 310_test_h4.py
, for hypothesis 4
All outputs of these analyses are stored on GIN