/dmgroppe.github.io

Personal webpage for David M. Groppe

MIT LicenseMIT

David M. Groppe, PhD

Human Neuro-Data Scientist, Epilepsy Researcher, Software Developer

Data Sharing in Epilepsy Research

Historically, scientists have shared their data analyses but sharing the data itself was impractical. Now, however, it is quite feasible to share digital scientific data and there is increasing momentum to make data sharing a routine part of publicly funded research. This is because sharing data:

  • increases the reliability of scientific findings by enabling replication and increasing statistical power
  • increases efficiency by enabling data-reuse
  • promotes collaboration
  • can be necessary for studying rare conditions
  • facilitates acquiring diverse samples that are representative of our diverse patient populations and clinical cultures
  • democratizes science by improving data access
  • promotes commercial innovation (if the data are licensed for commercial use)

Sharing data does have some potential challenges such as:

  • the need to protect the privacy of the individuals who provided the data
  • the cost of organizing and sharing the data
  • data generators need to be rewarded/credited for sharing
  • establishing standard data formats that facilitate data discovery and re-use
  • potentially enabling other researchers to “scoop” the data generators (i.e., other researchers might make discoveries using the data before the data generators have a chance to perform those analyses)

While the scientific community has been actively working to tackle those challenges, I think it is by far still the norm for researchers to not share their data. To get a sense of the state and trajectory of data sharing in the epilepsy research community, I tried to catalog all available public datasets of intracranial electroencephalogram (iEEG) data. I also assembled this catalog to help other iEEG researchers find these data. There is not yet a PubMed or Google Scholar for scientific data and since datasets are spread across multiple repositories in multiple formats, it can be challenging to find the data you need. Finally, I hope this survey helps researchers identify data repositories where they might share their own data or look for other types of data.

Note, I chose iEEG data, because it is the data type I am most familiar with. If you work with other types of epilepsy data, please email me with links to public databases of such data. I am providing links to those resources at the bottom of this page.

iEEG Public Datasets

I searched the following repositories for iEEG Data:

In addition, I was unable to access the following repositories because I could not figure out how to access the data, did not think that I could access them without an academic affiliation, or my one request to access them went unanswered. I am listing them here for completeness and am sure that with a bit more effort you could access the data for reasonable purposes:

Finally, the Open Science Framework and Dryad have a few iEEG datasets as well, but I have not had time to add them to the catalog.

Some of those repositories did not have any iEEG data and there were some DABI datasets I was unable to access because I was not granted permission. For each dataset, I tried to determine:

  • the amount of data
  • data format
  • presence of associated neuroimaging data
  • patient demographic data
  • if a fee is required to access the data
  • data license

Those traits are summarized in this csv file: ieegDatabases2023.csv. If I have made any errors, please let me know so I can correct them.

Public Non-iEEG Epilepsy Datasets:

Below are links to epilepsy datasets with other types of data. If you know of any others, please email me with links.

Scalp EEG Data:

Imaging Data: