Update BRCA clinical table in zenodo with 'subtype' column name
envest opened this issue · 3 comments
Should we just make this improvement to the table itself (and have that change be scripted / done during download)?
Originally posted by @jaclyn-taroni in #47 (comment)
Currently, the BRCAClin.tsv
file is formatted this way:
Sample | Type | Siglust | PAM50 |
---|---|---|---|
TCGA-AN-A0FL-01A-11R-A034-07 | tumor | -13 | Basal |
TCGA-A1-A0SK-01A-12R-A084-07 | tumor | -13 | Basal |
... |
To make this raw data download in a format consistent with other clinical data, we could modify the column header PAM50
to be subtype
like
Sample | Type | Siglust | subtype |
---|---|---|---|
TCGA-AN-A0FL-01A-11R-A034-07 | tumor | -13 | Basal |
TCGA-A1-A0SK-01A-12R-A084-07 | tumor | -13 | Basal |
... |
The way I handle this now is just by changing the column header after loading in the the analysis scripts.
BRCAClin.tsv
comes from https://zenodo.org/record/58862/files/BRCAClin.tsv
.
https://help.zenodo.org/ says DOI versioning is possible, which is nice because only one file is changing.
@jaclyn-taroni , I'm here to do whatever is necessary to be helpful. Please let me know what is needed!
After re-re-reading the original comment... Jackie, do you think we should edit the raw file on Zenodo, or just edit the file after it is downloaded? I currently do the editing mid-stream during analysis steps.
What I was going for with that comment was to edit the file after it's downloaded. Right now you do the editing in the first analysis step if I recall correctly, and why not do that as part of download_TCGA_data.sh
since that calls a few Rscripts with data preparation steps?