Syksy/curatedPCaData

meeting notes

Closed this issue · 4 comments

07-09-2020

  • for taylor et al. use cbioportal normalization and our own pipeline
  • create/update vignettes with information on the source of data for each study
  • create formatting for other clinical vaiables (ex: var_name = var_value | var_name2 = var_value2 | var_name3 = var_value3)
Syksy commented

08-07-2020

Specific details for imputation:

  • Median imputation?
  • k-NN; how to determine optimal k?
  • Make sure that rows/columns are correct (assuming that rows are samples, columns are genes)

Immune deconvolution methods:

  • Determine what are the gene expressions required for each method
  • Correlate methods with each other and see if they suggest similar compositions
  • Correlate results with existing results available from e.g. papers
  • Leave out CIBERSORT due to registration wall (no R-package/source code available?)

Genes names:

  • Take the whole list of gene names (biomaRt), save that as a R object inside the package (internal?)
  • Have all GEX/CNA/... to contain those gene names, even if they're mainly populated with NA values (-> comforming dimensions for genes)
Syksy commented

08-26-2020

Naming conventions

  • Lower case in all function names
  • Lower case also in function parameters
  • Instead of "." use always "_" (due to the special use of "." in R)
  • Structures variables have "DataType_StudyName" e.g. "MAE_TCGA"
Syksy commented

09-11-2020

  • Get rid of cBioPortal version of Taylor et al., focus on GEO-derived portion that uses our harmonization discipline
  • Elaborate further on example analyses
  • Elaborate further on uses of MultiAssayExperiment (use their examples maybe for inspiration, cheat sheet from https://bioconductor.riken.jp/packages/3.6/bioc/vignettes/MultiAssayExperiment/inst/doc/MultiAssayExperiment_cheatsheet.pdf etc)
  • Make sure everybody's up to baseline of the master branch (and meet regarding conflicts if a better solution exists than is currently in master)
  • Figure out which immunedeconv methods can be used with data that's not in TPM GEX-input format (e.g. xCell supporting ranked orders if native package is used?)
Syksy commented

These issues have been now addressed at least for the grand majority, or altered and completed (for example inclusion of CIBERSORT after all).