Update BRCA clinical table in zenodo with 'subtype' column name

Question

Update BRCA clinical table in zenodo with 'subtype' column name

envest opened this issue 3 years ago · 3 comments

Should we just make this improvement to the table itself (and have that change be scripted / done during download)?

Originally posted by @jaclyn-taroni in #47 (comment)

Answer 1 · 2021-09-10T19:42:34.000Z

Currently, the BRCAClin.tsv file is formatted this way:

Sample	Type	Siglust	PAM50
TCGA-AN-A0FL-01A-11R-A034-07	tumor	-13	Basal
TCGA-A1-A0SK-01A-12R-A084-07	tumor	-13	Basal
...

To make this raw data download in a format consistent with other clinical data, we could modify the column header PAM50 to be subtype like

Sample	Type	Siglust	subtype
TCGA-AN-A0FL-01A-11R-A034-07	tumor	-13	Basal
TCGA-A1-A0SK-01A-12R-A084-07	tumor	-13	Basal
...

The way I handle this now is just by changing the column header after loading in the the analysis scripts.

BRCAClin.tsv comes from https://zenodo.org/record/58862/files/BRCAClin.tsv.
https://help.zenodo.org/ says DOI versioning is possible, which is nice because only one file is changing.

@jaclyn-taroni , I'm here to do whatever is necessary to be helpful. Please let me know what is needed!

Answer 2 · 2021-09-10T19:44:52.000Z

After re-re-reading the original comment... Jackie, do you think we should edit the raw file on Zenodo, or just edit the file after it is downloaded? I currently do the editing mid-stream during analysis steps.

Answer 3 · 2021-09-10T19:58:48.000Z

What I was going for with that comment was to edit the file after it's downloaded. Right now you do the editing in the first analysis step if I recall correctly, and why not do that as part of download_TCGA_data.sh since that calls a few Rscripts with data preparation steps?