liamrevell/phytools

read.newick() occasionally crushes R

kfuku52 opened this issue · 4 comments

I was working on 10k trees and got core dumps in around 200 trees in the step of read.newick(). ape's read.tree() worked well in the identical trees. The R version I tested was 4.1.1. Here is a reproducible example, and I would appreciate it if you could give me any suggestions to solve the problem.

Reproducible example

library(ape)
library(phytools)
cat(paste('ape version:', packageVersion('ape'), '\n'))
cat(paste('phytools version:', packageVersion('phytools'), '\n'))

nwk_txt = '(tip1:193.0061,(tip2:153.7637944,(((tip3:106.5819,((tip4:66.0381,tip5:66.0381)n52:32.16806667,tip6:98.20616667)n55:8.375733333)n58:16.70018,tip7:123.28208)n61:8.295902353,(tip8:124.5550833,(((tip9:48.0139,(tip10:41.575,tip11:41.575)n19:6.4389)n22:24.5671,tip12:72.581)n25:28.30439091,((((tip13:26.121,tip14:26.121)n28:26.8957,tip15:53.0167)n31:23.975175,tip16:76.991875)n34:5.651567857,(tip17:51.4917,(tip18:38.2097,tip19:38.2097)n37:13.282)n40:31.15174286)n43:18.24194805)n46:23.66969242)n49:7.02289902)n64:22.18581209)n67:39.24230556)n137:0;'
read.tree(text=nwk_txt)
print('Finished read.tree()')
read.newick(text=nwk_txt)
print('Finished read.newick()')

STDOUT

ape version: 5.5 
phytools version: 1.2.0 

Phylogenetic tree with 19 tips and 18 internal nodes.

Tip labels:
  tip1, tip2, tip3, tip4, tip5, tip6, ...
Node labels:
  n137, n67, n64, n61, n58, n55, ...

Rooted; includes branch lengths.
[1] "Finished read.tree()"

# ...and produce a core dump after a while

Hi @kfuku52 ,
your tree has a root edge read.newick seems not to handle well.
Something like nwk_txt <- gsub(":0;", ";", nwk_txt) might work. But why not using read.tree?
Regards,
Klaus

Thank you for your response. I switched to read.tree() but just wanted to report it here just in case if this is not intended behavior.

Thanks @kfuku52 for bringing this to my attention & @KlausVigo for your comments. read.newick is now basically redundant & I should probably synonymize it with read.tree. As I comment in the help file: "The function read.newick is almost completely redundant with read.tree. At the time of development, it was more 'robust' than read.tree in that it didn't fail if the tree contained so-called 'singles' (nodes with only one descendant); however, read.tree can now handle singleton nodes without difficulty."

OTOH, I kind of like that it exists as a totally independently coded Newick parser within R.

In fact, the reason I decided to use read.newick() in a script several years ago was because it was more robust than read.tree() at that time. I recently tried to reuse that script and encountered the above problem. I agree that the current implementation shouldn't be completely obsolete. If you decide to synonymize them, it would be great if the current implementation could be kept available, perhaps as a legacy mode like read.newick(legacy=TRUE).