vegan::vegist and vegan::metaMDS don't produce equivalent results
befriendabacterium opened this issue · 7 comments
Crossposting from Stack here (https://stats.stackexchange.com/questions/628994/why-dont-veganvegist-and-veganmetamds-produce-equivalent-results) just so it's raised as an issue and to draw your attention to it:
This question has been discussed before but the proposed solution (https://stats.stackexchange.com/a/459820/170801) is no longer reproducible for some reason. As a refresher - it has been suggested that turning off the autotransform and halfchange options in veganMDS() will allow you to produce identical results whether you are directly entering a community data matrix or distance matrix of it:
library('vegan')
data(varespec)
dij <- vegdist(varespec, method = 'bray')
set.seed(1)
ord1 <- metaMDS(varespec, distance = 'bray', k = 2)
set.seed(1)
ord2 <- metaMDS(dij, k = 2)
set.seed(1)
ord3 <- metaMDS(varespec, distance = 'bray', k = 2, autotransform = FALSE,
halfchange = FALSE)
However, when we run
all.equal(scores(ord2, 'sites'), scores(ord3, 'sites'))
it no longer produces:
[1] TRUE
as it did previously, but instead now produces:
[1] "Mean relative difference: 0.5965117"
Can anyone explain what's going on, please? @gavinsimpson maybe?
You do not give us sufficient information, but here some points you can look at:
- Check that transformation and scaling are equal in your analysis.
metaMDS
with community data matrix may perform transformation, such assqrt(x)
if values ofx
are high, and Wisconsin double standardization (wisconsin
). There may also be post-analysis scaling and rotation of axes. For instance, withvarespec
data the default with raw input data will give you:
global Multidimensional Scaling using monoMDS
Data: wisconsin(sqrt(varespec)) <=== both wisconsin and sqrt
Distance: bray
Dimensions: 2
Stress: 0.1825658
Stress type 1, weak ties
Best solution was repeated 1 time in 20 tries
The best solution was from try 5 (random start)
Scaling: centring, PC rotation, halfchange scaling <=== half-change scaling and rotatin
Species: expanded scores based on ‘wisconsin(sqrt(varespec))’ <=== info on transformation also shown here
All these should be equal.
- Even if axes are similarly scaled and rotated, they may be reflected: mirror image of the solution switches left & right and/or up & down even when the solutions are identical (the directions of axes are undefined in all ordination methods). You should use Procrustes rotation to compare two solutions:
procrustes(ord2, ord3)
Function procrustes
was specifically written to enable comparison or ordination configurations in vegan. Use it.
Hi Jari,
Thanks for your prompt response.
Given I was following up on a previous issue and just flagging that @gavinsimpson's previously reproducible solution is no longer reproducible (copy and pasting it and linking to the original Stack Exchange discussion), I thought I had provided sufficient information?
Thanks for pointing me to vegan::procrustes (which led me in turn to vegan::protest) - if I do a 'protest' comparison of these these two ordinations, it does indeed appear that they are equivalent i.e.
protest(ord2, ord3, scores = "sites", permutations = 999)
returns:
Call:
protest(X = ord2, Y = ord3, scores = "sites", permutations = 999)
Procrustes Sum of Squares (m12 squared): 2.22e-16
Correlation in a symmetric Procrustes rotation: 1
Significance: 0.001
Permutation: free
Number of permutations: 999
This is great, though I am still having some difficulty understanding why vegdist-then-metaMDS and metaMDS solutions don't produce identical ordinations. Particularly:
-
Why is the default not for both methods to produce exactly the same ordinations? I had previously assumed that when a community data matrix was the input to metaMDS, metaMDS would just call the vegdist function for its distance matrix calculation (i.e. act as a wrapper to it), but this appears to not be the case. Although the results may indeed be equivalent when tested via protest(), this could be a source of confusion (for context, two PhD students originally raised this issue to me because they were understandably confused that when using the same input data and parameters, they each got different results via the two different methods).
-
Accepting that they do not produce the same result by default, why does @gavinsimpson's previously working solution of setting 'autotransform' and 'halfchange' to FALSE (as well as setting an identical seed before each execution), no longer work?
I am just raising this as I think it could be a potential source of confusion (as indeed it was with the students and myself) - it may very well be that it is just something that can be clarified in the package documentation/a warning message. Thanks all for all your hard work on the vegan package, it really is an integral package for ecology in R.
Cheers
Matt
I answered this in Cross Validated. A brief summary of that response:
- Your model
ord2
uses halfchange scaling butord3
does not. This is sufficient to explain the differences. - You should not use
all.equal
to assess the similarity of configurations because signs of axes are not defined (they may be reflected). You should usevegan::procrustes
which will tell you that configurations are identical (Procrustes some of squares is 0).
In principle the default is to use halfchange scaling if possible (unless you set halfchange = FALSE
). Earlier we were unable to do so when dissimilarities were supplied instead of raw data. This was corrected in the 2.6-4 release with vegan distance functions. Earlier the same data gave different results when dissimilarities were calculated within the function or when these same dissimilarities were supplied as input. The major change is commit f4faf84 based on commit cb86519 (plus several minor commits). You should always set halfchange = FALSE
if you want so instead of trusting us to be unable to use halfchange scaling (we may improve).
The relevant NEWS items in the 2.6-4 release were:
-
vegdist
,betadiver
andraupcrick
set attributemaxdist
giving
the numeric value of theoretical maximum of the dissimilarity index.
For many dissimilarities this is 1, but √2 for Chord and
Hellinger distances, for instance. The attribute isNA
for open
indices that do not have such a ceiling.betadiver
has three
similarity indices and these setmaxdist
0. -
metaMDS
defaults to halfchange scaling when the dissimilarities have
a numericmaxdist
attribute, and adapt the threshold to the ceiling
value. For open indices without ceiling, the threshold will be in the
scale of dissimilarities.metaMDS
used a simple test to detect index
ceiling 1, but the test is now more robust and can also find other
maximum values. If such inference is made, the function will broadcast
a message of assumed value of the ceiling. -
Mountford index in
vegdist
is now scaled to maximum value log(2).
Earlier Mountford distances were scaled to maximum 1.
I think the old code was a source of confusion: the default is halfchange = TRUE
, but this was not honoured always. For ord3
you set explicitly halfchange = FALSE
, but for ord2
you used the default halfchange = TRUE
. Your model had different arguments and therefore you got different models. Earlier we were not capable to obey setting halfchange = TRUE
with dissimilarities as input, but this was improved in the release 2.6-4.
Your original view was correct: if input is rectangular (raw) data, metaMDS
just calls vegdist
and this is unchanged. Hafchange scaling is performed on final solution and does not influence dissimilarities. The old behaviour was inconsistent (and confusing in my opinion), because halfchange scaling was performed when dissimilarities were calculated within metaMDS
but halfchange scaling was not performed when these very same dissimilarities were input. Same data in two different forms and different results. Now it is fixed and these two give equivalent results:
d <- vegdist(dune)
metaMDS(dune)
metaMDS(d) # different than previous pre vegan 2.6-4
as do
metaMDS(dune, halfchange=FALSE)
metaMDS(d, halfchange=FALSE)
Hi Jari,
Thanks for the extra clarification of the changes to the packages here, as well as providing an answer on CrossValidated - this is beginning to make much more sense to me now!
I can reproduce your example with the 'dune' dataset fine - same results from the two alternative methods. However, when I try this with @gavinsimpson's example with the 'varespec' dataset, I still get different results from the two alternative methods:
library('vegan')
data(varespec)
d <- vegdist(varespec)
metaMDS(varespec)
metaMDS(d)
metaMDS(varespec, halfchange=F)
metaMDS(d, halfchange=F)
Setting the seed to be the same doesn't solve this either:
set.seed(1)
metaMDS(varespec)
gives:
Call:
metaMDS(comm = varespec)
global Multidimensional Scaling using monoMDS
Data: wisconsin(sqrt(varespec))
Distance: bray
Dimensions: 2
Stress: 0.1825658
Stress type 1, weak ties
Best solution was not repeated after 20 tries
The best solution was from try 10 (random start)
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on ‘wisconsin(sqrt(varespec))’
whilst:
set.seed(1)
metaMDS(d)
gives:
Call:
metaMDS(comm = d)
global Multidimensional Scaling using monoMDS
Data: d
Distance: bray
Dimensions: 2
Stress: 0.1000211
Stress type 1, weak ties
Best solution was repeated 9 times in 20 tries
The best solution was from try 7 (random start)
Scaling: centring, PC rotation, halfchange scaling
Species: scores missing
Is this something to do with the nature of the input data, perhaps? For example, the 'dune' dataset appears to be cover class values of species (integers) but the 'varespec' dataset appears to be estimated cover values of species (decimals).
Thanks
Matt
metaMDS
uses internally vegdist(wisconsin(varespec))
, so the following should be equal:
## default metaMDS
metaMDS(varespec)
metaMDS(vegdist(wisconsin(sqrt(varespec))))
## alternatively with default vegdist
metaMDS(varespec, autotransform = FALSE)
metaMDS(vegdist(varespec))
It is indeed related to dune
using cover scale: transformations are triggered with values with large variation. The heuristics are explained in the documentation (help(metaMDS)
, chapter Details, bullet point 1). If you want to be sure that data are not transformed, and you want to be sure that halfchange scaling is not used, set metaMDS(..., autotransform=FALSE, halfchange=FALSE)
.
Hi Jari,
Thanks for clarifying this - I think it was indeed the Wisconsin double standardization that was throwing this off. I am now able to get the same result with both methods.
Thanks for your help and patience with this one - feel free to post this sligthtly udpated answer over at CrossValidated too, for anyone who comes across the question there. I'll close this issue on here now.
Cheers
Matt