mooreryan/divnet-rs

Null sample data file for independent samples & choosing base taxa

Closed this issue · 1 comments

I'm using a dataset too large for DivNet, so I was excited to learn that divnet-rs was created! In my attempted DivNet model, I was using formula=NULL to reflect that the samples were assumed to be independent. How should I go about creating a sample data file to reflect no covariates being included in the model?

Also, there is not a single taxa that is present across all samples, so I chose the most abundant taxa to be the base taxa in my attempted DivNet model. Is choosing a base taxa also required in divnet-rs?

Using no covariates

If you want to run the divnet estimation without covariates you can. (I'm wondering why you want to do it without covariates though?) You need to set up the model matrix using some placeholder variable. Generally, sample names is a good option for that. So something like this:

sample_names <- ...code to get names of your samples...

# And here you should generate the divnet-rs sample data file.
# 
# The [, -1] thing drops the intercept from the output of model.matrix.
model.matrix(~sample_names)[, -1] %>%
  as.data.frame %>%
  rownames_to_column("sample") %>%
  write.table("sample_data.csv",
              quote = FALSE,
              sep = ",",
              row.names = FALSE)

If you need info about the input and output files, see the docs.

Specifying base taxa

You still have to choose a base taxa. DivNet (and so divnet-rs) use additive logratio transformation...and the base of the transform is the taxa you choose. Not having a single taxa present in all samples is pretty common, so I would try and pick the one occurring in the most number of samples. In cases such as these, it may be a good idea to run a couple of times choosing different taxa and see that your results do not change too drastically.

If you need help on how to specify the base taxa, see this section of the docs.