vegan::vegist and vegan::metaMDS don't produce equivalent results

Question

vegan::vegist and vegan::metaMDS don't produce equivalent results

befriendabacterium opened this issue a year ago · 7 comments

Crossposting from Stack here (https://stats.stackexchange.com/questions/628994/why-dont-veganvegist-and-veganmetamds-produce-equivalent-results) just so it's raised as an issue and to draw your attention to it:

This question has been discussed before but the proposed solution (https://stats.stackexchange.com/a/459820/170801) is no longer reproducible for some reason. As a refresher - it has been suggested that turning off the autotransform and halfchange options in veganMDS() will allow you to produce identical results whether you are directly entering a community data matrix or distance matrix of it:

library('vegan')
data(varespec)

dij <- vegdist(varespec, method = 'bray')

set.seed(1)
ord1 <- metaMDS(varespec, distance = 'bray', k = 2)
set.seed(1)
ord2 <- metaMDS(dij, k = 2)
set.seed(1)
ord3 <- metaMDS(varespec, distance = 'bray', k = 2, autotransform = FALSE, 
                halfchange = FALSE)

However, when we run

all.equal(scores(ord2, 'sites'), scores(ord3, 'sites'))

it no longer produces:

[1] TRUE

as it did previously, but instead now produces:

[1] "Mean relative difference: 0.5965117"

Can anyone explain what's going on, please? @gavinsimpson maybe?

Answer 1 · 2023-10-17T15:53:32.000Z

You do not give us sufficient information, but here some points you can look at:

Check that transformation and scaling are equal in your analysis. metaMDS with community data matrix may perform transformation, such as sqrt(x) if values of x are high, and Wisconsin double standardization (wisconsin). There may also be post-analysis scaling and rotation of axes. For instance, with varespec data the default with raw input data will give you:

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(varespec))  <=== both wisconsin and sqrt
Distance: bray 

Dimensions: 2 
Stress:     0.1825658 
Stress type 1, weak ties
Best solution was repeated 1 time in 20 tries
The best solution was from try 5 (random start)
Scaling: centring, PC rotation, halfchange scaling  <=== half-change scaling and rotatin
Species: expanded scores based on ‘wisconsin(sqrt(varespec))’ <=== info on transformation also shown here

All these should be equal.

Even if axes are similarly scaled and rotated, they may be reflected: mirror image of the solution switches left & right and/or up & down even when the solutions are identical (the directions of axes are undefined in all ordination methods). You should use Procrustes rotation to compare two solutions:

procrustes(ord2, ord3)

Function procrustes was specifically written to enable comparison or ordination configurations in vegan. Use it.

Answer 2 · 2023-10-18T09:25:29.000Z

Hi Jari,

Thanks for your prompt response.

Given I was following up on a previous issue and just flagging that @gavinsimpson's previously reproducible solution is no longer reproducible (copy and pasting it and linking to the original Stack Exchange discussion), I thought I had provided sufficient information?

Thanks for pointing me to vegan::procrustes (which led me in turn to vegan::protest) - if I do a 'protest' comparison of these these two ordinations, it does indeed appear that they are equivalent i.e.

protest(ord2, ord3, scores = "sites", permutations = 999)

returns:

Call:
protest(X = ord2, Y = ord3, scores = "sites", permutations = 999) 

Procrustes Sum of Squares (m12 squared):        2.22e-16 
Correlation in a symmetric Procrustes rotation:     1 
Significance:  0.001 

Permutation: free
Number of permutations: 999

This is great, though I am still having some difficulty understanding why vegdist-then-metaMDS and metaMDS solutions don't produce identical ordinations. Particularly:

Why is the default not for both methods to produce exactly the same ordinations? I had previously assumed that when a community data matrix was the input to metaMDS, metaMDS would just call the vegdist function for its distance matrix calculation (i.e. act as a wrapper to it), but this appears to not be the case. Although the results may indeed be equivalent when tested via protest(), this could be a source of confusion (for context, two PhD students originally raised this issue to me because they were understandably confused that when using the same input data and parameters, they each got different results via the two different methods).
Accepting that they do not produce the same result by default, why does @gavinsimpson's previously working solution of setting 'autotransform' and 'halfchange' to FALSE (as well as setting an identical seed before each execution), no longer work?

I am just raising this as I think it could be a potential source of confusion (as indeed it was with the students and myself) - it may very well be that it is just something that can be clarified in the package documentation/a warning message. Thanks all for all your hard work on the vegan package, it really is an integral package for ecology in R.

Cheers
Matt

Answer 3 · 2023-10-18T09:38:31.000Z

I answered this in Cross Validated. A brief summary of that response:

Your model ord2 uses halfchange scaling but ord3 does not. This is sufficient to explain the differences.
You should not use all.equal to assess the similarity of configurations because signs of axes are not defined (they may be reflected). You should use vegan::procrustes which will tell you that configurations are identical (Procrustes some of squares is 0).

In principle the default is to use halfchange scaling if possible (unless you set halfchange = FALSE). Earlier we were unable to do so when dissimilarities were supplied instead of raw data. This was corrected in the 2.6-4 release with vegan distance functions. Earlier the same data gave different results when dissimilarities were calculated within the function or when these same dissimilarities were supplied as input. The major change is commit f4faf84 based on commit cb86519 (plus several minor commits). You should always set halfchange = FALSE if you want so instead of trusting us to be unable to use halfchange scaling (we may improve).

The relevant NEWS items in the 2.6-4 release were:

vegdist, betadiver and raupcrick set attribute maxdist giving
the numeric value of theoretical maximum of the dissimilarity index.
For many dissimilarities this is 1, but √2 for Chord and
Hellinger distances, for instance. The attribute is NA for open
indices that do not have such a ceiling. betadiver has three
similarity indices and these set maxdist 0.
metaMDS defaults to halfchange scaling when the dissimilarities have
a numeric maxdist attribute, and adapt the threshold to the ceiling
value. For open indices without ceiling, the threshold will be in the
scale of dissimilarities. metaMDS used a simple test to detect index
ceiling 1, but the test is now more robust and can also find other
maximum values. If such inference is made, the function will broadcast
a message of assumed value of the ceiling.
Mountford index in vegdist is now scaled to maximum value log(2).
Earlier Mountford distances were scaled to maximum 1.

Answer 4 · 2023-10-18T10:00:10.000Z

I think the old code was a source of confusion: the default is halfchange = TRUE, but this was not honoured always. For ord3 you set explicitly halfchange = FALSE, but for ord2 you used the default halfchange = TRUE. Your model had different arguments and therefore you got different models. Earlier we were not capable to obey setting halfchange = TRUE with dissimilarities as input, but this was improved in the release 2.6-4.

Your original view was correct: if input is rectangular (raw) data, metaMDS just calls vegdist and this is unchanged. Hafchange scaling is performed on final solution and does not influence dissimilarities. The old behaviour was inconsistent (and confusing in my opinion), because halfchange scaling was performed when dissimilarities were calculated within metaMDS but halfchange scaling was not performed when these very same dissimilarities were input. Same data in two different forms and different results. Now it is fixed and these two give equivalent results:

d <- vegdist(dune)
metaMDS(dune)
metaMDS(d)   # different than previous pre vegan 2.6-4

as do

metaMDS(dune, halfchange=FALSE)
metaMDS(d, halfchange=FALSE)

Answer 5 · 2023-10-18T12:10:58.000Z

Hi Jari,

Thanks for the extra clarification of the changes to the packages here, as well as providing an answer on CrossValidated - this is beginning to make much more sense to me now!

I can reproduce your example with the 'dune' dataset fine - same results from the two alternative methods. However, when I try this with @gavinsimpson's example with the 'varespec' dataset, I still get different results from the two alternative methods:

library('vegan')
data(varespec)

d <- vegdist(varespec)

metaMDS(varespec)
metaMDS(d)

metaMDS(varespec, halfchange=F)
metaMDS(d, halfchange=F)

Setting the seed to be the same doesn't solve this either:

set.seed(1)
metaMDS(varespec)

gives:

Call:
metaMDS(comm = varespec) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(varespec)) 
Distance: bray 

Dimensions: 2 
Stress:     0.1825658 
Stress type 1, weak ties
Best solution was not repeated after 20 tries
The best solution was from try 10 (random start)
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on ‘wisconsin(sqrt(varespec))’

whilst:

set.seed(1)
metaMDS(d)

gives:

Call:
metaMDS(comm = d) 

global Multidimensional Scaling using monoMDS

Data:     d 
Distance: bray 

Dimensions: 2 
Stress:     0.1000211 
Stress type 1, weak ties
Best solution was repeated 9 times in 20 tries
The best solution was from try 7 (random start)
Scaling: centring, PC rotation, halfchange scaling 
Species: scores missing

Is this something to do with the nature of the input data, perhaps? For example, the 'dune' dataset appears to be cover class values of species (integers) but the 'varespec' dataset appears to be estimated cover values of species (decimals).

Thanks
Matt

Answer 6 · 2023-10-18T12:28:15.000Z

metaMDS uses internally vegdist(wisconsin(varespec)), so the following should be equal:

## default metaMDS
metaMDS(varespec)
metaMDS(vegdist(wisconsin(sqrt(varespec))))
## alternatively with default vegdist
metaMDS(varespec, autotransform = FALSE)
metaMDS(vegdist(varespec))

It is indeed related to dune using cover scale: transformations are triggered with values with large variation. The heuristics are explained in the documentation (help(metaMDS), chapter Details, bullet point 1). If you want to be sure that data are not transformed, and you want to be sure that halfchange scaling is not used, set metaMDS(..., autotransform=FALSE, halfchange=FALSE).

Answer 7 · 2023-10-18T13:40:38.000Z

Hi Jari,

Thanks for clarifying this - I think it was indeed the Wisconsin double standardization that was throwing this off. I am now able to get the same result with both methods.

Thanks for your help and patience with this one - feel free to post this sligthtly udpated answer over at CrossValidated too, for anyone who comes across the question there. I'll close this issue on here now.

Cheers
Matt