USCbiostats/aphylo

Use `ape::as.phylo` methods

gvegayon opened this issue · 4 comments

The phylo class of the ape package seems to be a very popular storage method for phylogenetic trees, so it is worthwhile either storing this package's objects as phylo objects or create conversion functions to take advantage of the ape package.

Found the formal definition of phylo objects (not trivial at all) from here

Definition of Formats for Coding Phylogenetic Trees in R, Emmanuel Paradis, October 24, 2012

The class "phylo" is used to code “acyclical” phylogenetic trees. These trees have no reticulations, and all their internal nodes are of degree 3 or more, except the root (in the case of rooted trees) which is of degree 2 or more. An object of class "phylo" is a list with the following mandatory elements:

  1. A numeric matrix named edge with two columns and as many rows as there are branches in the tree;

  2. A character vector of length n named tip.label with the labels of the tips;

  3. An integer value named Nnode giving the number of (internal) nodes;

  4. An attribute class equal to "phylo".

In the matrix edge, each branch is coded by the nodes it connects: tips are coded 1, . . . , n, and internal nodes are coded n+ 1, . . . , n+m (n+ 1 is the root). Both series are numbered without gaps.

The matrix edge has the following properties:

  • The first column has only values greater than n (thus, values less than or equal to n appear only in the second column).

  • All nodes appear in the first column at least twice.

  • The number of occurrences of a node in the first column is related to the nature of the node: twice if it is dichotomous (i.e., of degree 3), three times if it is trichotomous (degree 4), and so on.

  • All elements, except the root n + 1, appear once in the second column.

This representation is used for rooted and unrooted trees. For the latter, the
position of the root is arbitrary

I see this was closed, but was this implemented?

Reprex

URL <- "http://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz100way/hg38.100way.commonNames.nh"
tree_path <- file.path("~/Desktop",basename(URL))
download.file(URL,
              destfile = tree_path,
              method = "wget")

#### Remove ";" at the end, which causes errors ####
l <- readLines(tree_path)
l2 <- gsub(";","",l)
writeLines(l2,tree_path)
# tr <- ape::read.tree(file = tree_path)
tr <- aphylo::read_nhx(tree_path)
ape::as.phylo(tr)
Error in UseMethod("as.phylo") : 
  no applicable method for 'as.phylo' applied to an object of class "list"

Session info

R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.4

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] aphylo_0.2-1    phytools_1.0-1  maps_3.4.0      caper_1.0.1     mvtnorm_1.1-3  
 [6] MASS_7.3-55     ape_5.6-1       rotl_3.0.12     orthogene_1.1.2 dplyr_1.0.8    
[11] ggplot2_3.3.5   ggtree_3.2.1   

loaded via a namespace (and not attached):
  [1] colorspace_2.0-2          ggsignif_0.6.3            ellipsis_0.3.2           
  [4] rprojroot_2.0.2           fs_1.5.2                  aplot_0.1.2              
  [7] rstudioapi_0.13           ggpubr_0.4.0              remotes_2.4.2            
 [10] fansi_1.0.2               xml2_1.3.3                codetools_0.2-18         
 [13] mnormt_2.0.2              cachem_1.0.6              knitr_1.37               
 [16] pkgload_1.2.4             jsonlite_1.7.3            broom_0.7.12             
 [19] BiocManager_1.30.16       rentrez_1.2.3             compiler_4.1.0           
 [22] httr_1.4.2                backports_1.4.1           assertthat_0.2.1         
 [25] Matrix_1.4-0              fastmap_1.1.0             lazyeval_0.2.2           
 [28] cli_3.2.0                 htmltools_0.5.2           prettyunits_1.1.1        
 [31] tools_4.1.0               igraph_1.2.11             coda_0.19-4              
 [34] gtable_0.3.0              glue_1.6.1                GenomeInfoDbData_1.2.7   
 [37] clusterGeneration_1.3.7   fastmatch_1.1-3           Rcpp_1.0.8               
 [40] carData_3.0-5             vctrs_0.3.8               babelgene_21.4           
 [43] nlme_3.1-155              xfun_0.29                 ps_1.6.0                 
 [46] brio_1.1.3                testthat_3.1.2            ggimage_0.3.0            
 [49] lifecycle_1.0.1           phangorn_2.8.1            devtools_2.4.3           
 [52] rstatix_0.7.0             XML_3.99-0.8              scales_1.1.1             
 [55] hms_1.1.1                 parallel_4.1.0            expm_0.999-6             
 [58] gprofiler2_0.2.1          curl_4.3.2                yaml_2.2.2               
 [61] memoise_2.0.1             ggfun_0.0.5               yulab.utils_0.0.4        
 [64] desc_1.4.0                plotrix_3.8-2             tidytree_0.3.7           
 [67] fmcmc_0.5-1               pkgbuild_1.3.1            rlang_1.0.1              
 [70] pkgconfig_2.0.3           rncl_0.8.4                evaluate_0.14            
 [73] lattice_0.20-45           purrr_0.3.4               treeio_1.18.1            
 [76] patchwork_1.1.1           htmlwidgets_1.5.4         tidyselect_1.1.1         
 [79] processx_3.5.2            magrittr_2.0.2            R6_2.5.1                 
 [82] magick_2.7.3              generics_0.1.2            combinat_0.0-8           
 [85] DBI_1.1.2                 pillar_1.7.0              withr_2.4.3              
 [88] scatterplot3d_0.3-41      abind_1.4-5               tibble_3.1.6             
 [91] homologene_1.4.68.19.3.27 crayon_1.5.0              car_3.0-12               
 [94] utf8_1.2.2                tmvnsim_1.0-2             plotly_4.10.0            
 [97] rmarkdown_2.11            usethis_2.1.5             progress_1.2.2           
[100] grid_4.1.0                data.table_1.14.2         callr_3.7.0              
[103] digest_0.6.29             tidyr_1.2.0               numDeriv_2016.8-1.1      
[106] gridGraphics_0.5-1        munsell_0.5.0             viridisLite_0.4.0        
[109] ggplotify_0.1.0           sessioninfo_1.2.2         quadprog_1.5-8         

The methods plot, Ntip, Nnode, and Nedge are implemented. Regarding reading the file, it seems to be a problem with ape, not with aphylo. I tried your code, and I get nothing from reading either directly from the file or from the text object in R.