greenelab/RNAseq_titration_results

Update BRCA clinical table in zenodo with 'subtype' column name

envest opened this issue · 3 comments

Should we just make this improvement to the table itself (and have that change be scripted / done during download)?

Originally posted by @jaclyn-taroni in #47 (comment)

Currently, the BRCAClin.tsv file is formatted this way:

Sample Type Siglust PAM50
TCGA-AN-A0FL-01A-11R-A034-07 tumor -13 Basal
TCGA-A1-A0SK-01A-12R-A084-07 tumor -13 Basal
...

To make this raw data download in a format consistent with other clinical data, we could modify the column header PAM50 to be subtype like

Sample Type Siglust subtype
TCGA-AN-A0FL-01A-11R-A034-07 tumor -13 Basal
TCGA-A1-A0SK-01A-12R-A084-07 tumor -13 Basal
...

The way I handle this now is just by changing the column header after loading in the the analysis scripts.

BRCAClin.tsv comes from https://zenodo.org/record/58862/files/BRCAClin.tsv.
https://help.zenodo.org/ says DOI versioning is possible, which is nice because only one file is changing.

@jaclyn-taroni , I'm here to do whatever is necessary to be helpful. Please let me know what is needed!

After re-re-reading the original comment... Jackie, do you think we should edit the raw file on Zenodo, or just edit the file after it is downloaded? I currently do the editing mid-stream during analysis steps.

What I was going for with that comment was to edit the file after it's downloaded. Right now you do the editing in the first analysis step if I recall correctly, and why not do that as part of download_TCGA_data.sh since that calls a few Rscripts with data preparation steps?