mskcc/roslin-variant

Add more sample-level information for display in cBioPortal

ckandoth opened this issue · 2 comments

Add more columns to data_clinical.txt, using http://oncotree.mskcc.org/cdd/api/ to standardize column names if possible.

  • MOUSE_CONTAMINATION - For batches with at least 1 PDX sample, report fraction of sequenced reads that align to mouse genome.
  • PROJECT_CODE - IGO request ID. E.g. Proj_08392_B
  • FACETS_PLOIDY, FACETS_PURITY, FACETS_WGD, FACETS_VERSION if available.
  • FRACTION_GENOME_ALTERED - fraction of Genome CN altered from FACETS results.
  • MUTATION_COUNT - Number of mutations from analysis MAF.
  • NONSYNONYMOUS_MUTATION_COUNT - Number of mutations from portal MAF.
  • DNA_INPUT - value in input_ng from sample_patient.txt file
  • LIBRARY_YIELD - value in Library_yield from sample_patient.txt file
  • PIPELINE_VERSION - version number of pipeline
  • PIPELINE - name of analysis pipeline used

PMs requesting to have DNA_INPUT, LIBRARY_YIELD, IGO_ID, MOUSE_CONTAMINATION hidden in cbioportal.

should we add TMB?

@timosong You should be able to control which fields are exposed/hidden by default. Lookup "cBioPortal file formats". For TMB, we have a separate GitHub ticket. Don't implement that now.