"X" vs formula confusion
Closed this issue · 2 comments
Dear Amy,
First of all - many thanks for producing great tools to You and Your team! : ]
As in the title, Im having problem with understanding X
and formula
parameters:
-
In the
divnet
function description You write that X parameter is "The covariate matrix, with samples as rows and variables as columns". Yet, in the vignette, the actual value passed into the parameter is a string (in phyloseq object - a column name in sample data). I understand that some sub-setting is being done here, but still... -
From both vignette and Your answer to this issue: #53 it seemed to me as if "X" parameter was being used in the same way as formula (i.e. You provide examplary X value:
X = season + plot
). Yet, formula is a separate parameter... -
Reading the divnet function code it seems to me, that when using formula, the X value is ignored. This is because at the beginning of the function, You check:
if (!is.null(formula)) {
if ("phyloseq" %in% class(W)) {
X <- data.frame(phyloseq::sample_data(W))
X <- stats::model.matrix(object = formula, data = X)
}
Which I think means, that if formula is provided, X is produced based only on W and formula. Are these two parameters exclusive?
Perhaps clarification of these concepts would be beneficial for users such as myself - with poor statistical knowledge and mediocre coding skills : ]
Thank You and best wishes,
Adrian
Hi Adrian,
Thanks for your question! I believe the answer is, yes, you can specify your model either in terms of a model matrix or using a formula (which is a newer feature in DivNet).
More generally, in many cases, a formula is a convenient representation of the model matrix used in the linear (or generalized linear) model you are fitting. For more details, you might look into the documentation for lm() and model.matrix() in base R.
I hope this helps!
Best,
David
We've added in documentation to address this issue! Please refer to the updated Getting Started documentation.