MRCIEU/metaboprep

Query re. perform.metabolite.qc.R, step (10)

Closed this issue · 1 comments

In step (10), PC outliers are ID'd and removed. I'm not sure about how this works ... If it is pulling the list of independent features from the summary stats part of the script (i.e. PC outliers section in run_MetaboQC_pipeline.R), is this correct or does the tree need to be rebuilt in the qc'd dataset? This only seems to happen if ind_feature_names[1] doesn't already exist - not sure when that is the case? See code snippet below.

So, can we just check this part of the code and its dependencies?


if( is.na(ind_feature_names[1]) == TRUE ){
cat( paste0("\t\t- QCstep: identify independent features through correlation analysis and dendrogram clustering.\n") )
## we need to estimate independent features denovo, if not available
featuresumstats = feature.sum.stats( wdata = wdata, sammis = samplemis)
w = which(featuresumstats$table$independent_features_binary == 1)
ind_feature_names = rownames(featuresumstats$table)[w]
cat( paste0("\t\t\t* ", length(ind_feature_names), " independent features identified.\n") )
}

Ok I have:

  1. made it such that we always re-estimate what the indpendent features are based on the newly qc-data (to that point in the process)
  2. add a feature to submit your own tree cut height and moved the default up to 0.5.
  3. updated the parameter file and all functions where necessary