d3b-center/hope-cohort-analysis

Add two samples' genomic data back to HOPE cohort

Closed this issue · 3 comments

Add 7316-1106 and 7316-3000 back to hope subcohort, except for proteomics. Use the attached file from Pei.
master_11032023 (1).txt

I have added the two missing samples in the histology file (see attached). I can upload this file to s3://d3b-openaccess-us-east-1-prd-pbta/hope-aya/v3/Hope-GBM-histologies-base.tsv. I think I only need this file for creating the merged matrices per this module: https://github.com/d3b-center/hope-cohort-analysis/tree/master/analyses/merge-files. TSV is not allowed for uploads here on github so just zipped it:
Hope-GBM-histologies-base.tsv.zip

This looks good, yes, you can add to v3 folder, thank you

I have updated and uploaded to s3 the following merged files:

results
├── Hope-cnv-controlfreec-tumor-only.rds
├── Hope-cnv-controlfreec.rds
├── Hope-fusion-putative-oncogenic.rds
├── Hope-gene-counts-rsem-expected_count-collapsed.rds
├── Hope-gene-counts-rsem-expected_count.rds
├── Hope-gene-expression-rsem-tpm-collapsed.rds
├── Hope-gene-expression-rsem-tpm.rds
├── Hope-snv-consensus-plus-hotspots.maf.tsv.gz
├── Hope-tumor-only-snv-mutect2.maf.tsv.gz
└── md5sum.txt

For the md5sum.txt, I have only updated the md5sums for the above files generated by my merge script).

Here is the comparison of sample size between v2 and the above merged files (i.e. v3) - each file's sample size has increased by 2:

> # Counts
> counts_file = readRDS("data/Hope-gene-counts-rsem-expected_count-collapsed.rds")
> length(colnames(counts_file))
[1] 85

> counts_file = readRDS("analyses/merge-files/results/Hope-gene-counts-rsem-expected_count-collapsed.rds")
> length(colnames(counts_file))
[1] 87

> # TPM
> tpm_file = readRDS("data/Hope-gene-expression-rsem-tpm-collapsed.rds")
> length(colnames(tpm_file))
[1] 85

> tpm_file = readRDS("analyses/merge-files/results/Hope-gene-expression-rsem-tpm-collapsed.rds")
> length(colnames(tpm_file))
[1] 87

> # SNV
> snv_file <- data.table::fread("data/Hope-snv-consensus-plus-hotspots.maf.tsv.gz")
> length(unique(snv_file$Tumor_Sample_Barcode))
[1] 71

> snv_file <- data.table::fread("analyses/merge-files/results/Hope-snv-consensus-plus-hotspots.maf.tsv.gz")
> length(unique(snv_file$Tumor_Sample_Barcode))
[1] 73

> # SNV tumor-only 
> snv_tumor_only_file <- data.table::fread("data/Hope-tumor-only-snv-mutect2.maf.tsv.gz")
> length(unique(snv_tumor_only_file$Tumor_Sample_Barcode))
[1] 88

> snv_tumor_only_file <- data.table::fread("analyses/merge-files/results/Hope-tumor-only-snv-mutect2.maf.tsv.gz")
> length(unique(snv_tumor_only_file$Tumor_Sample_Barcode))
[1] 90

> # CNV
> cnv_file <- readRDS("data/Hope-cnv-controlfreec.rds")
> length(unique(cnv_file$Kids_First_Biospecimen_ID))
[1] 71

> cnv_file <- readRDS("analyses/merge-files/results/Hope-cnv-controlfreec.rds")
> length(unique(cnv_file$Kids_First_Biospecimen_ID))
[1] 73

> # CNV tumor-only
> cnv_tumor_only_file <- readRDS("data/Hope-cnv-controlfreec-tumor-only.rds")
> length(unique(cnv_tumor_only_file$Kids_First_Biospecimen_ID))
[1] 88

> cnv_tumor_only_file <- readRDS("analyses/merge-files/results/Hope-cnv-controlfreec-tumor-only.rds")
> length(unique(cnv_tumor_only_file$Kids_First_Biospecimen_ID))
[1] 90

> # Fusions
> fusion_file <- readRDS("data/Hope-fusion-putative-oncogenic.rds")
> length(unique(fusion_file$Sample))
[1] 85

> fusion_file <- readRDS("analyses/merge-files/results/Hope-fusion-putative-oncogenic.rds")
> length(unique(fusion_file$Sample))
[1] 87