bilat.symmetry is returning landmarks with specimens reordered alphabetically

Question

bilat.symmetry is returning landmarks with specimens reordered alphabetically

Closed this issue 6 years ago · 10 comments

Hi,

Like the title says, I noticed this happening in my data last night and spent the morning testing whether other geomorph functions are behaving in the same way (thankfully, they are not). I have attached a standalone example that reproduces the error on my computer.
geomorph_rowname_test.zip

Is it doing the same thing on your end?
-David A

Answer 1 · 2018-06-26T18:17:59.000Z

The R script you attached seems to have permissions associated with it. I cannot open it.

Answer 2 · 2018-06-26T18:24:58.000Z

Sorry about that, here's the content of the script and I saved the data as an excel file so I can just paste it here:

#I discovered that my code produced landmark-file rownames that were in alphabetical order and out of their original order, breaking the correspondence between my landmarks and the covariates used in analysis.
#I don't think it was intentional, so I looked into the source of the reshuffling
#long story short, it's originating from bilat.symmetry

#rownames are not alphabetical
Pheno.Coord <- read.csv("path/to/data/geomorph_test_data.csv", header = T, row.names = 1)

true.names <- rownames(Pheno.Coord)

library(geomorph)

#let's run through several steps in the registration process and see where we get reordering and where it is preserved

array.test <- arrayspecs(Pheno.Coord, 68, 3)
all.equal.character(true.names, dimnames(array.test)[[3]])

gdf.test <- geomorph.data.frame(shape = array.test, ind = rownames(Pheno.Coord))
all.equal.character(true.names, dimnames(gdf.test$shape)[[3]])

gpa.test <- gpagen(gdf.test$shape)
all.equal.character(true.names, dimnames(gpa.test$coords)[[3]])

LM.namesX<- colnames(Pheno.Coord)[seq(2, dim(Pheno.Coord)[2], 3)]
LM.names <- substr(LM.namesX,1,nchar(LM.namesX)-2)
bilat.test <- bilat.symmetry(shape, object.sym = T, land.pairs=matrix(c(grep("R", LM.names[]),grep("L", LM.names[])),ncol=2), data = gdf.test, iter = 5)

all.equal.character(true.names, dimnames(bilat.test$symm.shape)[[3]])

two.d.test.gpa <- two.d.array(gpa.test$coords)
all.equal.character(true.names, rownames(two.d.test.gpa))

two.d.test.bilat <- two.d.array(bilat.test$symm.shape)
all.equal.character(true.names, rownames(two.d.test.bilat))

#the source of the rowname shuffling is the bilat.symmetry function, it just gets propagated to any function that uses the data after
#here's my session info

# sessionInfo()
# R version 3.4.4 (2018-03-15)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Ubuntu 18.04 LTS
# 
# Matrix products: default
# BLAS: /opt/microsoft/ropen/3.4.4/lib64/R/lib/libRblas.so
# LAPACK: /opt/microsoft/ropen/3.4.4/lib64/R/lib/libRlapack.so
# 
# locale:
#   [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C               LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8     LC_MONETARY=en_CA.UTF-8   
# [6] LC_MESSAGES=en_CA.UTF-8    LC_PAPER=en_CA.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
# [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       
# 
# attached base packages:
#   [1] stats4    grid      parallel  stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
#   [1] kangar00_1.1         org.Hs.eg.db_3.5.0   AnnotationDbi_1.40.0 IRanges_2.12.0       S4Vectors_0.16.0     Biobase_2.38.0       rmarkdown_1.10      
# [8] plotly_4.7.1         Morpho_2.6           shiny_1.1.0          geomorph_3.0.6       ape_5.1              rgl_0.99.16          Rgraphviz_2.22.0    
# [15] graph_1.56.0         BiocGenerics_0.24.0  KEGGgraph_1.38.0     ggplot2_2.2.1.9000   biomaRt_2.34.2       RevoUtils_10.0.9     RevoUtilsMath_10.0.1
# 
# loaded via a namespace (and not attached):
#   [1] subplex_1.5-4           nlme_3.1-131.1          bitops_1.0-6            bit64_0.9-7             doParallel_1.0.11       webshot_0.5.0          
# [7] progress_1.2.0          httr_1.3.1              rprojroot_1.3-2         tools_3.4.4             backports_1.1.2         R6_2.2.2               
# [13] DBI_1.0.0               lazyeval_0.2.1          colorspace_1.3-2        manipulateWidget_0.10.0 withr_2.1.2             tidyselect_0.2.4       
# [19] prettyunits_1.0.2       chron_2.3-52            bit_1.1-14              compiler_3.4.4          scales_0.5.0            mvtnorm_1.0-8          
# [25] stringr_1.3.1           digest_0.6.15           colorRamps_2.3          jpeg_0.1-8              pkgconfig_2.0.1         htmltools_0.3.6        
# [31] geiger_2.0.6            htmlwidgets_1.2         rlang_0.2.1             rstudioapi_0.7          RSQLite_2.1.1           Rvcg_0.17              
# [37] bindr_0.1.1             jsonlite_1.5            crosstalk_1.0.0         dplyr_0.7.5             RCurl_1.95-4.10         magrittr_1.5           
# [43] Matrix_1.2-12           Rcpp_0.12.17            munsell_0.5.0           abind_1.4-5             proto_1.0.0             sqldf_0.4-11           
# [49] yaml_2.1.19             stringi_1.2.3           CompQuadForm_1.4.3      MASS_7.3-49             plyr_1.8.4              blob_1.1.1             
# [55] promises_1.0.1          bigmemory.sri_0.1.3     crayon_1.3.4            miniUI_0.1.1.1          lattice_0.20-35         hms_0.4.2              
# [61] knitr_1.20              pillar_1.2.3            codetools_0.2-15        XML_3.98-1.11           glue_1.2.0              evaluate_0.10.1        
# [67] data.table_1.11.4       deSolve_1.21            httpuv_1.4.4.1          foreach_1.4.4           gtable_0.2.0            purrr_0.2.5            
# [73] tidyr_0.8.1             gsubfn_0.7              assertthat_0.2.0        mime_0.5                xtable_1.8-2            coda_0.19-1            
# [79] later_0.7.3             viridisLite_0.3.0       tibble_1.4.2            iterators_1.0.9         memoise_1.1.0           bindrcpp_0.2.2         
# [85] bigmemory_4.5.33

geomorph_test_data.xlsx

Answer 3 · 2018-06-26T19:10:37.000Z

Received, and investigating. d Dr. Dean C. Adams Director of Graduate Education, EEB Program Professor Department of Ecology, Evolution, and Organismal Biology Iowa State University www.public.iastate.edu/~dcadams/<http://www.public.iastate.edu/~dcadams/> phone: 515-294-3834 From: J0vid <notifications@github.com> Sent: Tuesday, June 26, 2018 1:25 PM To: geomorphR/geomorph <geomorph@noreply.github.com> Cc: Subscribed <subscribed@noreply.github.com> Subject: Re: [geomorphR/geomorph] bilat.symmetry is returning landmarks with specimens reordered alphabetically (#16) Sorry about that, here's the content of the script and I saved the data as an excel file so I can just paste it here: #I discovered that my code produced landmark-file rownames that were in alphabetical order and out of their original order, breaking the correspondence between my landmarks and the covariates used in analysis. #I don't think it was intentional, so I looked into the source of the reshuffling #long story short, it's originating from bilat.symmetry #rownames are not alphabetical Pheno.Coord <- read.csv("path/to/data/geomorph_test_data.csv", header = T, row.names = 1) true.names <- rownames(Pheno.Coord) library(geomorph) #let's run through several steps in the registration process and see where we get reordering and where it is preserved array.test <- arrayspecs(Pheno.Coord, 68, 3) all.equal.character(true.names, dimnames(array.test)[[3]]) gdf.test <- geomorph.data.frame(shape = array.test, ind = rownames(Pheno.Coord)) all.equal.character(true.names, dimnames(gdf.test$shape)[[3]]) gpa.test <- gpagen(gdf.test$shape) all.equal.character(true.names, dimnames(gpa.test$coords)[[3]]) LM.namesX<- colnames(Pheno.Coord)[seq(2, dim(Pheno.Coord)[2], 3)] LM.names <- substr(LM.namesX,1,nchar(LM.namesX)-2) bilat.test <- bilat.symmetry(shape, object.sym = T, land.pairs=matrix(c(grep("R", LM.names[]),grep("L", LM.names[])),ncol=2), data = gdf.test, iter = 5) all.equal.character(true.names, dimnames(bilat.test$symm.shape)[[3]]) two.d.test.gpa <- two.d.array(gpa.test$coords) all.equal.character(true.names, rownames(two.d.test.gpa)) two.d.test.bilat <- two.d.array(bilat.test$symm.shape) all.equal.character(true.names, rownames(two.d.test.bilat)) #the source of the rowname shuffling is the bilat.symmetry function, it just gets propagated to any function that uses the data after #here's my session info # sessionInfo() # R version 3.4.4 (2018-03-15) # Platform: x86_64-pc-linux-gnu (64-bit) # Running under: Ubuntu 18.04 LTS # # Matrix products: default # BLAS: /opt/microsoft/ropen/3.4.4/lib64/R/lib/libRblas.so # LAPACK: /opt/microsoft/ropen/3.4.4/lib64/R/lib/libRlapack.so # # locale: # [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8 LC_MONETARY=en_CA.UTF-8 # [6] LC_MESSAGES=en_CA.UTF-8 LC_PAPER=en_CA.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C # [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C # # attached base packages: # [1] stats4 grid parallel stats graphics grDevices utils datasets methods base # # other attached packages: # [1] kangar00_1.1 org.Hs.eg.db_3.5.0 AnnotationDbi_1.40.0 IRanges_2.12.0 S4Vectors_0.16.0 Biobase_2.38.0 rmarkdown_1.10 # [8] plotly_4.7.1 Morpho_2.6 shiny_1.1.0 geomorph_3.0.6 ape_5.1 rgl_0.99.16 Rgraphviz_2.22.0 # [15] graph_1.56.0 BiocGenerics_0.24.0 KEGGgraph_1.38.0 ggplot2_2.2.1.9000 biomaRt_2.34.2 RevoUtils_10.0.9 RevoUtilsMath_10.0.1 # # loaded via a namespace (and not attached): # [1] subplex_1.5-4 nlme_3.1-131.1 bitops_1.0-6 bit64_0.9-7 doParallel_1.0.11 webshot_0.5.0 # [7] progress_1.2.0 httr_1.3.1 rprojroot_1.3-2 tools_3.4.4 backports_1.1.2 R6_2.2.2 # [13] DBI_1.0.0 lazyeval_0.2.1 colorspace_1.3-2 manipulateWidget_0.10.0 withr_2.1.2 tidyselect_0.2.4 # [19] prettyunits_1.0.2 chron_2.3-52 bit_1.1-14 compiler_3.4.4 scales_0.5.0 mvtnorm_1.0-8 # [25] stringr_1.3.1 digest_0.6.15 colorRamps_2.3 jpeg_0.1-8 pkgconfig_2.0.1 htmltools_0.3.6 # [31] geiger_2.0.6 htmlwidgets_1.2 rlang_0.2.1 rstudioapi_0.7 RSQLite_2.1.1 Rvcg_0.17 # [37] bindr_0.1.1 jsonlite_1.5 crosstalk_1.0.0 dplyr_0.7.5 RCurl_1.95-4.10 magrittr_1.5 # [43] Matrix_1.2-12 Rcpp_0.12.17 munsell_0.5.0 abind_1.4-5 proto_1.0.0 sqldf_0.4-11 # [49] yaml_2.1.19 stringi_1.2.3 CompQuadForm_1.4.3 MASS_7.3-49 plyr_1.8.4 blob_1.1.1 # [55] promises_1.0.1 bigmemory.sri_0.1.3 crayon_1.3.4 miniUI_0.1.1.1 lattice_0.20-35 hms_0.4.2 # [61] knitr_1.20 pillar_1.2.3 codetools_0.2-15 XML_3.98-1.11 glue_1.2.0 evaluate_0.10.1 # [67] data.table_1.11.4 deSolve_1.21 httpuv_1.4.4.1 foreach_1.4.4 gtable_0.2.0 purrr_0.2.5 # [73] tidyr_0.8.1 gsubfn_0.7 assertthat_0.2.0 mime_0.5 xtable_1.8-2 coda_0.19-1 # [79] later_0.7.3 viridisLite_0.3.0 tibble_1.4.2 iterators_1.0.9 memoise_1.1.0 bindrcpp_0.2.2 # [85] bigmemory_4.5.33 geomorph_test_data.xlsx<https://github.com/geomorphR/geomorph/files/2138411/geomorph_test_data.xlsx> — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#16 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AGSvqZ7eAs_5f37wU13hD9Sb61vVU6mhks5uAnx7gaJpZM4U4aJe>.

Answer 4 · 2018-06-26T19:22:17.000Z

I can see where the reshuffling occurs in the script but it is a challenge to fix it because the example we use just happens to have the variable, ind, ordered the same way the fitted value from the linear model used to estimate the symmetric shape component. I noticed that you did not supply an "ind" variable. This might be a short-term solution, and even if it does not work, as desired, you could rearrange the result with

bilat.test$symm.shape <- bilat.test$symm.shape[,, ind]

Because I cannot open your data (also do not have permission) I cannot replicate your exact error. I'm a little apprehensive to start updating the code without a full appreciation for what causes the error and the best way to solve it, which would require an example like yours. (I don't want to trade once issue for another.) Furthermore, this function is begging for an overhaul, so the best solution (for us) might be to start from scratch, provided we can get it to a serviceable place for you.

Maybe Dean will see something I didn't.

Mike

Answer 5 · 2018-06-26T21:01:55.000Z

David,

It was not as bad as it seemed. Many of the specimens were still in the same order but the term "ind" had been appended to the rownames (based on an internal model matrix). This is not a simple fix but something we will have to overhaul in the function, but I have a kluge for you, so that you can continue working Starting with one of your steps:

all.equal.character(new.names, dimnames(bilat.test$symm.shape)[[3]])
new.names <- paste("ind", true.names, sep="")
fix.order <- match(new.names, dimnames(bilat.test$symm.shape)[[3]])

bilat.test$symm.shape <- bilat.test$symm.shape[,,fix.order]
all.equal.character(new.names, dimnames(bilat.test$symm.shape)[[3]])

Which yielded this:

all.equal.character(new.names, dimnames(bilat.test$symm.shape)[[3]])
[1] TRUE

I'm sure the asymm.shape could be similarly affected.

Sorry for the hassle. As I mentioned before this particular function requires more overhaul than a simple tweak will fix. Unfortunately that is not easy to do at the spur of the moment. Your coding skills suggest that this kluge should not be an impediment in the interim and hopefully we can get the function updated soon.

Cheers!
Mike

Answer 6 · 2018-06-26T21:10:37.000Z

Yea, I can see what you're saying about most of the issues being from the addition of ind to the names. Most of it is fixed by substring the output.

all.equal.character(true.names, substr(dimnames(bilat.test$symm.shape)[[3]], start = 4, stop = nchar(dimnames(bilat.test$symm.shape)[[3]])))
[1] "279 string mismatches"

One thought that comes to mind is saving the original order of the row names explicitly early in the function and then when lm.fit or whatever you're using for modelling returns the residuals for the symmetric and asymmetric components, index them by the original row names to restore the order. Thanks for being so quick to respond.
-David

Answer 7 · 2018-06-26T21:17:08.000Z

That is, in fact, that plan. D Dr. Dean C. Adams Director of Graduate Education, EEB Program Professor Department of Ecology, Evolution, and Organismal Biology Iowa State University www.public.iastate.edu/~dcadams/<http://www.public.iastate.edu/~dcadams/> phone: 515-294-3834 From: J0vid <notifications@github.com> Sent: Tuesday, June 26, 2018 4:11 PM To: geomorphR/geomorph <geomorph@noreply.github.com> Cc: Adams, Dean [EEOBS] <dcadams@iastate.edu>; Comment <comment@noreply.github.com> Subject: Re: [geomorphR/geomorph] bilat.symmetry is returning landmarks with specimens reordered alphabetically (#16) Yea, I can see what you're saying about most of the issues being from the addition of ind to the names. Most of it is fixed by substring the output. all.equal.character(true.names, substr(dimnames(bilat.test$symm.shape)[[3]], start = 4, stop = nchar(dimnames(bilat.test$symm.shape)[[3]]))) [1] "279 string mismatches" One thought that comes to mind is saving the original order of the row names explicitly early in the function and then when lm.fit or whatever you're using for modelling returns the residuals for the symmetric and asymmetric components, index them by the original row names to restore the order. Thanks for being so quick to respond. -David — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#16 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AGSvqeK68YBMADsjdhqvtEvJ6KkntmApks5uAqNNgaJpZM4U4aJe>.

Answer 8 · 2018-06-26T21:59:42.000Z

And actually, it is not that simple if I remember my history correctly. This solution would work well for object symmetry but not necessarily matching symmetry. If I recall correctly, R does not like data frames with the same row names in multiple places. The “ind” variable was a way to get around this when there are multiple shapes to match. With two names per specimen, and shapes in any kind of order, this proposed solution would not work well. I think we tried solving ways to make the specimen labels obvious, even if order was not maintained, while working within the bounds of a single function. The best solution will be to make two different functions and have one parent function to manage them. Then “ind” is not needed for object symmetry and the order can be maintained easier. Thanks for illuminating this issue! Mike

…

Sent from my iPhone

On Jun 26, 2018, at 5:17 PM, Dean Adams ***@***.***> wrote: That is, in fact, that plan. D Dr. Dean C. Adams Director of Graduate Education, EEB Program Professor Department of Ecology, Evolution, and Organismal Biology Iowa State University www.public.iastate.edu/~dcadams/<http://www.public.iastate.edu/~dcadams/> phone: 515-294-3834 From: J0vid ***@***.***> Sent: Tuesday, June 26, 2018 4:11 PM To: geomorphR/geomorph ***@***.***> Cc: Adams, Dean [EEOBS] ***@***.***>; Comment ***@***.***> Subject: Re: [geomorphR/geomorph] bilat.symmetry is returning landmarks with specimens reordered alphabetically (#16) Yea, I can see what you're saying about most of the issues being from the addition of ind to the names. Most of it is fixed by substring the output. all.equal.character(true.names, substr(dimnames(bilat.test$symm.shape)[[3]], start = 4, stop = nchar(dimnames(bilat.test$symm.shape)[[3]]))) [1] "279 string mismatches" One thought that comes to mind is saving the original order of the row names explicitly early in the function and then when lm.fit or whatever you're using for modelling returns the residuals for the symmetric and asymmetric components, index them by the original row names to restore the order. Thanks for being so quick to respond. -David — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#16 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AGSvqeK68YBMADsjdhqvtEvJ6KkntmApks5uAqNNgaJpZM4U4aJe>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Answer 9 · 2018-06-26T22:19:35.000Z

Yes, that is correct. And things will become more complex when there are replicates. A broader solution is needed. Dean Dr. Dean C. Adams Director of Graduate Education, EEB Program Professor Department of Ecology, Evolution, and Organismal Biology Iowa State University www.public.iastate.edu/~dcadams/<http://www.public.iastate.edu/~dcadams/> phone: 515-294-3834 From: Michael Collyer <notifications@github.com> Sent: Tuesday, June 26, 2018 5:00 PM To: geomorphR/geomorph <geomorph@noreply.github.com> Cc: Adams, Dean [EEOBS] <dcadams@iastate.edu>; Comment <comment@noreply.github.com> Subject: Re: [geomorphR/geomorph] bilat.symmetry is returning landmarks with specimens reordered alphabetically (#16) And actually, it is not that simple if I remember my history correctly. This solution would work well for object symmetry but not necessarily matching symmetry. If I recall correctly, R does not like data frames with the same row names in multiple places. The “ind” variable was a way to get around this when there are multiple shapes to match. With two names per specimen, and shapes in any kind of order, this proposed solution would not work well. I think we tried solving ways to make the specimen labels obvious, even if order was not maintained, while working within the bounds of a single function. The best solution will be to make two different functions and have one parent function to manage them. Then “ind” is not needed for object symmetry and the order can be maintained easier. Thanks for illuminating this issue! Mike

Sent from my iPhone

On Jun 26, 2018, at 5:17 PM, Dean Adams ***@***.******@***.***>> wrote: That is, in fact, that plan. D Dr. Dean C. Adams Director of Graduate Education, EEB Program Professor Department of Ecology, Evolution, and Organismal Biology Iowa State University www.public.iastate.edu/~dcadams/<http://www.public.iastate.edu/~dcadams/<http://www.public.iastate.edu/~dcadams/%3chttp:/www.public.iastate.edu/~dcadams/>> phone: 515-294-3834 From: J0vid ***@***.******@***.***>> Sent: Tuesday, June 26, 2018 4:11 PM To: geomorphR/geomorph ***@***.******@***.***>> Cc: Adams, Dean [EEOBS] ***@***.******@***.***>>; Comment ***@***.******@***.***>> Subject: Re: [geomorphR/geomorph] bilat.symmetry is returning landmarks with specimens reordered alphabetically (#16) Yea, I can see what you're saying about most of the issues being from the addition of ind to the names. Most of it is fixed by substring the output. all.equal.character(true.names, substr(dimnames(bilat.test$symm.shape)[[3]], start = 4, stop = nchar(dimnames(bilat.test$symm.shape)[[3]]))) [1] "279 string mismatches" One thought that comes to mind is saving the original order of the row names explicitly early in the function and then when lm.fit or whatever you're using for modelling returns the residuals for the symmetric and asymmetric components, index them by the original row names to restore the order. Thanks for being so quick to respond. -David — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#16 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AGSvqeK68YBMADsjdhqvtEvJ6KkntmApks5uAqNNgaJpZM4U4aJe>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub<#16 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AGSvqUzI45BRRxyRJOtbIlkznZ40K0Nqks5uAq7OgaJpZM4U4aJe>.

Answer 10 · 2018-06-28T15:03:29.000Z

Dean fixed this isse