davismcc/archive-scater

Outlier detection with TPM

Closed this issue · 3 comments

PCA outlier detection does not work with TPM as the expression values. The function expects e.g. "pct_counts_top_100_features" instead of "pct_exprs_top_100_features".

sceset <- scater::plotPCA(
   sceset, 
   size_by = "total_features",
   shape_by = "use",
   pca_data_input = "pdata",
   detect_outliers = TRUE,
   return_SCESet  = TRUE
)

Results in:

The following selected_variables were not found in pData(object): pct_counts_top_100_featuresThe following selected_variables were not found in pData(object): pct_counts_feature_controlsThe following selected_variables were not found in pData(object): log10_counts_endogenous_featuresThe following selected_variables were not found in pData(object): log10_counts_feature_controls

Hi @boombard

Thanks for the raising the issue.

The function does work with TPM, but you will need to specify which variables you want to use (e.g. pct_tpm_top_100_features) as a character vector for the argument selected_variables.

From the help page for plotPCA:

selected_variables  
character vector indicating which variables in pData(object) to use for the phenotype-data based PCA. Ignored if the argument pca_data_input is anything other than "pdata".

This should fail more gracefully, or direct the user towards specifying that argument, so I'll add that in the devel version.

Please let me know if specifying the variables with selected_variables does not work as it should.

Best
Davis

Thanks @davismcc

I figured that out a few minutes after opening the issue, happy to close it unless you'd like to refer to when developing your graceful fail.

OK, I've added some extra messages for this case, so hopefully things are a bit clearer to the user. These changes will go out in the next Bioconductor release (25 April). Thanks again for the report!