neurosynth-data

This repository contains data files for use with the Neurosynth codebase. All data are released under the Open Database License (ODbL). To a first approximation, this means you can do whatever you want with these data (including sharing, using, adapting, and modifying the data) as long as you attribute any public use, share any derivative database under the same license, and keep any derivative data open. See the license file for further details.

Repository structure

The most recent version of the Neurosynth datafiles are always contained in the root folder. Key files include:

database.txt: a plaintext file containing activation data and key metadata for all studies in the Neurosynth database. Each row represents a single activation from a single study. Mandatory columns include PMID, x/y/z coordinates, and stereotactic space. The remaining columns contain useful but non-essential metadata (e.g., author names, journal title, etc.). These latter columns can be deleted if desired.
features.txt: a plaintext file containing feature information for studies in database.txt. Each row represents a single study. The first column contains the PMID of the study and is used to map data in features.txt to the data in database.txt. Each column contains the weights for a different feature. Titles for all features are provided in the first row. Additionally, in contrast to previous datasets, the feature weights are now normalized, and reflect tf-idf values rather than proportions.

Current data

The latest data file is always stored in current_data.tar.gz in the root folder (and mirrored in the archive/ folder).

The current dataset is version 0.6, released July, 2015. The archive contains two files: database.txt and features.txt. The database.txt file contains activation data for 11,406 studies. The features.txt file contains feature information for over 3,300 term-based features. Note that unlike previous feature data releases (prior to v0.3), the current release contains term-based features derived only from the abstracts of articles in the Neurosynth database, and not from the full text. This change was introduced in order to improve specificity and reproducibility of results.

Additionally, unlike previous releases, the present feature set includes not only single word features, but also N-gram features (e.g., "working memory", "emotion regulation", etc.).

IMPORTANT NOTE: Beginning with version 0.3, studies in both the features and database files are identified by PubMed ID (rather than the doi used previously). This means that if you've previously (i.e., pre-April 2014) generated a Dataset instance using an older database.txt file, you'll need to re-generate the Dataset before loading new features.

lsoussan/neurosynth-data

neurosynth-data

Repository structure

Current data

Archive