geomorphR/geomorph

bilat.symmetry is returning landmarks with specimens reordered alphabetically

Closed this issue · 10 comments

J0vid commented

Hi,

Like the title says, I noticed this happening in my data last night and spent the morning testing whether other geomorph functions are behaving in the same way (thankfully, they are not). I have attached a standalone example that reproduces the error on my computer.
geomorph_rowname_test.zip

Is it doing the same thing on your end?
-David A

The R script you attached seems to have permissions associated with it. I cannot open it.

J0vid commented

Sorry about that, here's the content of the script and I saved the data as an excel file so I can just paste it here:

#I discovered that my code produced landmark-file rownames that were in alphabetical order and out of their original order, breaking the correspondence between my landmarks and the covariates used in analysis.
#I don't think it was intentional, so I looked into the source of the reshuffling
#long story short, it's originating from bilat.symmetry

#rownames are not alphabetical
Pheno.Coord <- read.csv("path/to/data/geomorph_test_data.csv", header = T, row.names = 1)

true.names <- rownames(Pheno.Coord)

library(geomorph)

#let's run through several steps in the registration process and see where we get reordering and where it is preserved

array.test <- arrayspecs(Pheno.Coord, 68, 3)
all.equal.character(true.names, dimnames(array.test)[[3]])

gdf.test <- geomorph.data.frame(shape = array.test, ind = rownames(Pheno.Coord))
all.equal.character(true.names, dimnames(gdf.test$shape)[[3]])

gpa.test <- gpagen(gdf.test$shape)
all.equal.character(true.names, dimnames(gpa.test$coords)[[3]])

LM.namesX<- colnames(Pheno.Coord)[seq(2, dim(Pheno.Coord)[2], 3)]
LM.names <- substr(LM.namesX,1,nchar(LM.namesX)-2)
bilat.test <- bilat.symmetry(shape, object.sym = T, land.pairs=matrix(c(grep("R", LM.names[]),grep("L", LM.names[])),ncol=2), data = gdf.test, iter = 5)

all.equal.character(true.names, dimnames(bilat.test$symm.shape)[[3]])

two.d.test.gpa <- two.d.array(gpa.test$coords)
all.equal.character(true.names, rownames(two.d.test.gpa))

two.d.test.bilat <- two.d.array(bilat.test$symm.shape)
all.equal.character(true.names, rownames(two.d.test.bilat))

#the source of the rowname shuffling is the bilat.symmetry function, it just gets propagated to any function that uses the data after
#here's my session info

# sessionInfo()
# R version 3.4.4 (2018-03-15)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Ubuntu 18.04 LTS
# 
# Matrix products: default
# BLAS: /opt/microsoft/ropen/3.4.4/lib64/R/lib/libRblas.so
# LAPACK: /opt/microsoft/ropen/3.4.4/lib64/R/lib/libRlapack.so
# 
# locale:
#   [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C               LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8     LC_MONETARY=en_CA.UTF-8   
# [6] LC_MESSAGES=en_CA.UTF-8    LC_PAPER=en_CA.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
# [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       
# 
# attached base packages:
#   [1] stats4    grid      parallel  stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
#   [1] kangar00_1.1         org.Hs.eg.db_3.5.0   AnnotationDbi_1.40.0 IRanges_2.12.0       S4Vectors_0.16.0     Biobase_2.38.0       rmarkdown_1.10      
# [8] plotly_4.7.1         Morpho_2.6           shiny_1.1.0          geomorph_3.0.6       ape_5.1              rgl_0.99.16          Rgraphviz_2.22.0    
# [15] graph_1.56.0         BiocGenerics_0.24.0  KEGGgraph_1.38.0     ggplot2_2.2.1.9000   biomaRt_2.34.2       RevoUtils_10.0.9     RevoUtilsMath_10.0.1
# 
# loaded via a namespace (and not attached):
#   [1] subplex_1.5-4           nlme_3.1-131.1          bitops_1.0-6            bit64_0.9-7             doParallel_1.0.11       webshot_0.5.0          
# [7] progress_1.2.0          httr_1.3.1              rprojroot_1.3-2         tools_3.4.4             backports_1.1.2         R6_2.2.2               
# [13] DBI_1.0.0               lazyeval_0.2.1          colorspace_1.3-2        manipulateWidget_0.10.0 withr_2.1.2             tidyselect_0.2.4       
# [19] prettyunits_1.0.2       chron_2.3-52            bit_1.1-14              compiler_3.4.4          scales_0.5.0            mvtnorm_1.0-8          
# [25] stringr_1.3.1           digest_0.6.15           colorRamps_2.3          jpeg_0.1-8              pkgconfig_2.0.1         htmltools_0.3.6        
# [31] geiger_2.0.6            htmlwidgets_1.2         rlang_0.2.1             rstudioapi_0.7          RSQLite_2.1.1           Rvcg_0.17              
# [37] bindr_0.1.1             jsonlite_1.5            crosstalk_1.0.0         dplyr_0.7.5             RCurl_1.95-4.10         magrittr_1.5           
# [43] Matrix_1.2-12           Rcpp_0.12.17            munsell_0.5.0           abind_1.4-5             proto_1.0.0             sqldf_0.4-11           
# [49] yaml_2.1.19             stringi_1.2.3           CompQuadForm_1.4.3      MASS_7.3-49             plyr_1.8.4              blob_1.1.1             
# [55] promises_1.0.1          bigmemory.sri_0.1.3     crayon_1.3.4            miniUI_0.1.1.1          lattice_0.20-35         hms_0.4.2              
# [61] knitr_1.20              pillar_1.2.3            codetools_0.2-15        XML_3.98-1.11           glue_1.2.0              evaluate_0.10.1        
# [67] data.table_1.11.4       deSolve_1.21            httpuv_1.4.4.1          foreach_1.4.4           gtable_0.2.0            purrr_0.2.5            
# [73] tidyr_0.8.1             gsubfn_0.7              assertthat_0.2.0        mime_0.5                xtable_1.8-2            coda_0.19-1            
# [79] later_0.7.3             viridisLite_0.3.0       tibble_1.4.2            iterators_1.0.9         memoise_1.1.0           bindrcpp_0.2.2         
# [85] bigmemory_4.5.33

geomorph_test_data.xlsx

I can see where the reshuffling occurs in the script but it is a challenge to fix it because the example we use just happens to have the variable, ind, ordered the same way the fitted value from the linear model used to estimate the symmetric shape component. I noticed that you did not supply an "ind" variable. This might be a short-term solution, and even if it does not work, as desired, you could rearrange the result with

bilat.test$symm.shape <- bilat.test$symm.shape[,, ind]

Because I cannot open your data (also do not have permission) I cannot replicate your exact error. I'm a little apprehensive to start updating the code without a full appreciation for what causes the error and the best way to solve it, which would require an example like yours. (I don't want to trade once issue for another.) Furthermore, this function is begging for an overhaul, so the best solution (for us) might be to start from scratch, provided we can get it to a serviceable place for you.

Maybe Dean will see something I didn't.

Mike

David,

It was not as bad as it seemed. Many of the specimens were still in the same order but the term "ind" had been appended to the rownames (based on an internal model matrix). This is not a simple fix but something we will have to overhaul in the function, but I have a kluge for you, so that you can continue working Starting with one of your steps:

all.equal.character(new.names, dimnames(bilat.test$symm.shape)[[3]])
new.names <- paste("ind", true.names, sep="")
fix.order <- match(new.names, dimnames(bilat.test$symm.shape)[[3]])

bilat.test$symm.shape <- bilat.test$symm.shape[,,fix.order]
all.equal.character(new.names, dimnames(bilat.test$symm.shape)[[3]])

Which yielded this:

all.equal.character(new.names, dimnames(bilat.test$symm.shape)[[3]])
[1] TRUE

I'm sure the asymm.shape could be similarly affected.

Sorry for the hassle. As I mentioned before this particular function requires more overhaul than a simple tweak will fix. Unfortunately that is not easy to do at the spur of the moment. Your coding skills suggest that this kluge should not be an impediment in the interim and hopefully we can get the function updated soon.

Cheers!
Mike

J0vid commented

Yea, I can see what you're saying about most of the issues being from the addition of ind to the names. Most of it is fixed by substring the output.

all.equal.character(true.names, substr(dimnames(bilat.test$symm.shape)[[3]], start = 4, stop = nchar(dimnames(bilat.test$symm.shape)[[3]])))
[1] "279 string mismatches"

One thought that comes to mind is saving the original order of the row names explicitly early in the function and then when lm.fit or whatever you're using for modelling returns the residuals for the symmetric and asymmetric components, index them by the original row names to restore the order. Thanks for being so quick to respond.
-David

Dean fixed this isse