Japal/zCompositions

Trying to reconcile multRepl() with cmultRepl(method="CZM")

Closed this issue · 1 comments

I've been trying to run cmultRepl(method="CZM") on some very sparse microbiome data where the sample totals vary widely. I am finding that the output values are negative for some datasets. The Bayesian methods all fail with the datasets I'm using, likely due to their sparseness.

I wonder if multRepl is meant to be similar to cmultRepl(method="CZM")? I noticed that, multRepl has the nice feature of checking for negative output values and giving the user the ability to add a closure value to make sure that there are no negative output values. When I run the following, I get different outputs...

multRepl(LPdata, label = 0, imp.missing = TRUE, closure = 10^6)$Cu

cmultRepl(LPdata, output = "p-counts", method="CZM")$Cu

It's likely I misunderstand the differences between these two functions. If I'm asking an impossible question, then is there a way to "correct" the output of cmultRepl to not get negative values?

Japal commented

Thanks Pat for your enquire. Yes, if the data set is very sparse the methods are probably struggling to find usable information to conduct any imputation. Even if any formulation worked technically, if sparseness is very serious it might still be too daring to impute. You might consider conducting some sensible aggregation, reduction of the resolution, working at a higher taxonomic level ...

Yes, both multRepl and cmultRepl are similar, however the latter replaces using a threshold 0.5/n, in addition to the fraction of the DL determined by the user. This is detailed on page 145 of the Martin et al. (2015) paper based on previous literature related to count zeros. Hence the difference you are finding, because multRepl is meant for continuous data and do not apply that.

No way to correct the output of cmultRepl other than, as said above, reducing the resolution level. Alternatively, you can work with the proportions themselves and use multRepl as you did. I take note to implement those warnings in cmultRepl + CZM too.