neuropoly/intranet.neuro.polymtl.ca

Import datasets overview

kousu opened this issue · 6 comments

kousu commented

There is a dataset overview created by @naga-karthik and @valosekj here on Google.

I believe this would fit much better somewhere under data/. So, port it in.

kousu commented

Also, I assume this overview was motivated by the CLI interface to our private datasets being too abstract

$ ssh git@data
PTY allocation request failed
hello user this is git@data running gitolite3 3.6.12-1 (Debian) on git 2.34.1

 R   C  datasets/..*
 R W    datasets/basel-mp2rage
 R W    datasets/bavaria-quebec-spine-ms
 R W    datasets/beijing-tumor
 R W    datasets/canproco
 R W    datasets/data-single-subject_DO-NOT-USE
 R W    datasets/data_axondeepseg_bf_source
 R W    datasets/data_axondeepseg_bf_training
 R W    datasets/data_axondeepseg_tem
 R W    datasets/data_axondeepseg_users
 R W    datasets/data_axondeepseg_vcu
 R W    datasets/data_axondeepseg_wakehealth_source
 R W    datasets/data_axondeepseg_wakehealth_training
 R W    datasets/eeg-epilepsy
 R W    datasets/levin-stroke
 R W    datasets/lumbar-epfl
 R W    datasets/mni-bmpd
 R W    datasets/model_seg_exvivo_gm-wm_t2_unet2d-multichannel-softseg
 R W    datasets/msseg_challenge_2016
 R W    datasets/msseg_challenge_2021
 R W    datasets/philadelphia-pediatric
 R W    datasets/sci-colorado
 R W    datasets/sci-zurich
 R W    datasets/sct-testing-large
 R W    datasets/spine-generic-processed
 R W    datasets/template_dog_virginiatech
 R W    datasets/uk-biobank
 R W    datasets/uk-biobank-processed
 R W    datasets/umass-ms-ge-excite1.5
 R W    datasets/umass-ms-ge-hdxt1.5
 R W    datasets/umass-ms-ge-pioneer3
 R W    datasets/umass-ms-siemens-espree1.5
 R W    datasets/uqueensland_mouse

Please see https://github.com/neuropoly/data-management/blob/master/internal-server.md for more help

The names of the dataset aren't enough of a guide.

But, I wonder, since I am planning (neuropoly/data-management#77 / https://github.com/neuropoly/computers/issues/167) to replace the CLI interface with a more GitHub-like one, I wonder if maybe this will rapidly become superfluous, when people can just click to https://data.neuro.polymtl.ca/explore the way the public currently can with our one big open-access dataset at https://github.com/spine-generic/.

There is a dataset overview created by @naga-karthik and @valosekj here on Google.

I believe this would fit much better somewhere under data/. So, port it in.

Maybe, also, share policy should be considered. The Google Sheets table is currently private (shared only within NeuroPoly). NeuroPoly intranet is publicly available, though.

Tagging @jcohenadad for his opinion.

kousu commented

The Google Sheets table is currently private (shared only within NeuroPoly

Is it? I don't think it is.

While logged out:

Screenshot_20221214_154228

and while logged in I can see that linked sharing is turned on

Screenshot_20221214_154300

that's why I thought it was okay to port it in 🤔

kousu commented

When I get data.neuro.polymtl.ca upgraded to Gitea, everything there will stay private but we could sidestep the issue by adding, say, https://data.neuro.polymtl.ca/datasets/awesome-overview/wiki, or make it a policy that people need to hash-tag their datasets in the provided Description field so that scanning down https://data.neuro.polymtl.ca/explore gives a good overview of what's available,

like this.

Screenshot 2022-12-14 at 16-00-54 Neurogitea

You can even filter by tag with the search button!

Screenshot 2022-12-14 at 16-01-47 Neurogitea

Is it? I don't think it is.

Indeed, it was not. Thank you for catching this Nick. I would prefer it if this list would stay private. I just changed it.

As for the replacement of this table with the future Gitea, this is an excellent idea indeed. And indeed, having a (private) overview of the available datasets will be very useful for students. If we could display the different fields similarly to the current google sheet (instead of having the aggregated tags mixed between categories, eg: #sci #T1w #brain #EPFL) that would be very useful.

kousu commented

Great. I'll work hard to get Gitea going then.

Closing since this isn't meant to be public.