Import datasets overview
kousu opened this issue · 6 comments
There is a dataset overview created by @naga-karthik and @valosekj here on Google.
I believe this would fit much better somewhere under data/. So, port it in.
Also, I assume this overview was motivated by the CLI interface to our private datasets being too abstract
$ ssh git@data
PTY allocation request failed
hello user this is git@data running gitolite3 3.6.12-1 (Debian) on git 2.34.1
R C datasets/..*
R W datasets/basel-mp2rage
R W datasets/bavaria-quebec-spine-ms
R W datasets/beijing-tumor
R W datasets/canproco
R W datasets/data-single-subject_DO-NOT-USE
R W datasets/data_axondeepseg_bf_source
R W datasets/data_axondeepseg_bf_training
R W datasets/data_axondeepseg_tem
R W datasets/data_axondeepseg_users
R W datasets/data_axondeepseg_vcu
R W datasets/data_axondeepseg_wakehealth_source
R W datasets/data_axondeepseg_wakehealth_training
R W datasets/eeg-epilepsy
R W datasets/levin-stroke
R W datasets/lumbar-epfl
R W datasets/mni-bmpd
R W datasets/model_seg_exvivo_gm-wm_t2_unet2d-multichannel-softseg
R W datasets/msseg_challenge_2016
R W datasets/msseg_challenge_2021
R W datasets/philadelphia-pediatric
R W datasets/sci-colorado
R W datasets/sci-zurich
R W datasets/sct-testing-large
R W datasets/spine-generic-processed
R W datasets/template_dog_virginiatech
R W datasets/uk-biobank
R W datasets/uk-biobank-processed
R W datasets/umass-ms-ge-excite1.5
R W datasets/umass-ms-ge-hdxt1.5
R W datasets/umass-ms-ge-pioneer3
R W datasets/umass-ms-siemens-espree1.5
R W datasets/uqueensland_mouse
Please see https://github.com/neuropoly/data-management/blob/master/internal-server.md for more help
The names of the dataset aren't enough of a guide.
But, I wonder, since I am planning (neuropoly/data-management#77 / https://github.com/neuropoly/computers/issues/167) to replace the CLI interface with a more GitHub-like one, I wonder if maybe this will rapidly become superfluous, when people can just click to https://data.neuro.polymtl.ca/explore the way the public currently can with our one big open-access dataset at https://github.com/spine-generic/.
There is a dataset overview created by @naga-karthik and @valosekj here on Google.
I believe this would fit much better somewhere under data/. So, port it in.
Maybe, also, share policy should be considered. The Google Sheets table is currently private (shared only within NeuroPoly). NeuroPoly intranet is publicly available, though.
Tagging @jcohenadad for his opinion.
The Google Sheets table is currently private (shared only within NeuroPoly
Is it? I don't think it is.
While logged out:
and while logged in I can see that linked sharing is turned on
that's why I thought it was okay to port it in 🤔
When I get data.neuro.polymtl.ca upgraded to Gitea, everything there will stay private but we could sidestep the issue by adding, say, https://data.neuro.polymtl.ca/datasets/awesome-overview/wiki, or make it a policy that people need to hash-tag their datasets in the provided Description field so that scanning down https://data.neuro.polymtl.ca/explore gives a good overview of what's available,
Is it? I don't think it is.
Indeed, it was not. Thank you for catching this Nick. I would prefer it if this list would stay private. I just changed it.
As for the replacement of this table with the future Gitea, this is an excellent idea indeed. And indeed, having a (private) overview of the available datasets will be very useful for students. If we could display the different fields similarly to the current google sheet (instead of having the aggregated tags mixed between categories, eg: #sci #T1w #brain #EPFL) that would be very useful.
Great. I'll work hard to get Gitea going then.
Closing since this isn't meant to be public.