[No Code] Discover new datasets
alexandrebarachant opened this issue · 34 comments
We need people browsing the web to discover interesting datasets than could be added to the moabb.
You can comment on this issue.
But first, check your dataset is not already in the list
What kind of datasets
We are interested in any datasets of time neural timeseries (EEG, MEG, ECOG, and fNIRS) with a minimum of 5 subjects, where we can apply machine learning algorithms and available online. It does not need to be a BCI dataset, but it must contains different condition/task, labelled and tagged.
How do I search for a new dataset ?
Many of the datasets of the BNCI index have not been reported. you can start here.
Researcher are making more and more datasets available. some database exists and might contains interesting things :
Finally, google is your friend
How much time does it takes ?
Entering a new dataset should took you 2 minutes.
found one here: https://depositonce.tu-berlin.de/handle/11303/6271
Browsing Plos one to find New motor imagery datasets:
- http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0114853
- 10 subjects
- (left hand, right hand, feet) + (both hands, left hand combined with right foot, right hand combined with left foot)
- http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0188293
- 12 subjects
- 9 different elbow task + rest
- http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0162546
- 13 subjects
- left in Execution / Imagination / Observation. can use rest as a second class
- http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0121262
- 30 subjects
- left versus rest
- http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0162657
- 4 subjects, 3 sessions each
- left-hand, right-hand, feet
- http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0193722
- 14 subjects
- left hand, right hand, rest. pre-epoched data :(
-
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0143962(EEG data not available)- 18 subjects, 6 sessions
- 3 MI tasks
- can't find the data, but they are suposed to be available
- http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0174161
- 12 subjects
- Grasp, Elbow, rest
re: dataset 1, All the trials are pre-epoched :(
yeah. Actually i think the GigaDb dataset is already like this ...
ah you're right, you just concatenated all the trials. In that case we can do the same here :) good good
also regarding the second to last: Have you e-mailed Fabien?
I'm definitely not super happy about the concatenation of individual trials. in the case of the GigaDB, the dataset was too large to ignore. in those case, we can contact the authors to ask them about the raw data, but concatenating is a good starting point to see whether the dataset is really interesting or not.
Also, let's contact fabien and camille about the second last dataset. I will do it today.
regarding concatenation though: Couldn't we just add a buffer of zeros before and after each trial to smooth out border effects? After de-meaning the trials to eliminate the issue of offset
Yep we could. I think the most problematic part is the non zero mean that create huge edge artifact.
We could also return Mne epochs in that case, but that still not ideal from a filtering point of view.
In any case we might want to put a warning ?
warning is good, will add
the list Is not synchronized whit the documentation, why? can I help there?
We will use this issue and the associated wiki page to keep track of the dataset that we could add in MOABB. Please, comment this issue if you want to report about a new dataset.
There is a nice dataset here for SSVEP and ERP using EEG and ear-EEG while standing or moving, the data are available here
These are 2 other interesting ones someone pointed out on the Slack channel -
Many datasets are listed here : https://www.researchgate.net/post/Are-there-any-public-EEG-data-sets-that-one-can-try-their-hands-on
This dataset is interesting for its population age and size, it is based on SSVEP for 100 participants with ages greater than 50 years old:
https://www.nature.com/articles/s41597-022-01372-9
This dataset could be integrated in MOABB, MI with information about subjects: https://zenodo.org/record/7554429
Ideas for EEG datasets: https://www.fieldtriptoolbox.org/faq/open_data/
- Motor Imagery dataset in acute stroke patients.
- Paper: An EEG motor imagery dataset for brain computer interface in acute stroke patients
- Data: Figshare Link
This dataset "Inner Speech Dataset" was published in nature and seems like a good fit to add support.
Paper: Thinking out loud, an open-access EEG-based BCI dataset for inner speech recognition
Data: OpenNeuro link
These are 2 other interesting ones someone pointed out on the Slack channel -
Hi @Div12345, I was interested in the second dataset, but unfortunately, I did not find it in the MOABB documentation.
Are there any plans related to adding the second dataset in the near future, or is the dataset already part of the library under some specific section or with a specific name?
All the dataset inside this paper: https://arxiv.org/pdf/2402.08656.pdf
- An ERP dataset for home appliance control
https://www.frontiersin.org/articles/10.3389/fnhum.2024.1320457/full - 84 total subjects with the following repartition:
- 60 controlled three types of appliances (TV: 30, door lock: 15, and electric light: 15)
- 14 subjects controlled a Bluetooth speaker
- 10 subjects controlled air conditioner
- Code and dataset: https://github.com/jml226/Home-Appliance-Control-Dataset
Is someone working on BEnchmark database Towards BCI Application (https://www.frontiersin.org/journals/neuroscience/articles/10.3389/fnins.2020.00627/full)?
It is an SSVEP dataset with 70 subjects performing a 40-target cued-spelling task. I saw that it was referred on the Datasets to include section, but found no Issue related to it.
Hi @machinelatto!
It seems like no one focused on this task, or if someone started, didn't commit or create the PR. Would you be interested?
You would basically need to create two functions, as shown in this tutorial: https://neurotechx.github.io/moabb/auto_tutorials/4_adding_a_dataset.html#sphx-glr-auto-tutorials-4-adding-a-dataset-py
One is to download and one is to load the dataset using mne.
Hi @bruAristimunha !
I'm probably going to use this dataset on my research, so I could try to create those functions in the next weeks. If it goes well I'l open a PR.