Hangs loading tree

Question

Hangs loading tree

Closed this issue 7 months ago · 12 comments

Had to add a few things to get this installed in Win11-64bit in R4.2. Main thing was had to add a few steps:

install.packages("remotes")
if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("ggtree")
remotes::install_github("gtonkinhill/panstripe")
Install seemed to complete fine.
Tried to then input existing data from roary. When it got to the step to load the tree data, started to load with a "1:" but then got an hourglass, nothing progressed and had

Steps taken:

library(panstripe)
library(ape)
library(patchwork)
set.seed(1234)
setwd("f:\DNAwork\Scohnii\roaryout")
phylo.file.name <- system.file("extdata", "accessory_binary_genes.fa.newick", package = "panstripe")
rtab.file.name <- system.file("extdata", "gene_presence_absence.Rtab", package = "panstripe")
pa <- read_rtab(rtab.file.name)

�[1mindexing�[0m �[34mgene_presence_absence.Rtab�[0m [===============================] �[32m2.15GB/s�[0m, eta: �[36m 0s�[0m

Rows: 2521 Columns: 51
── Column specification ────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): Gene
dbl (50): t36, t39, t37, t3, t44, t4, t20, t18, t25, t32, t50, t17, t10, t13, t14, t...

ℹ Use spec() to retrieve the full column specification for this data.
ℹ Specify the column types or set show_col_types = FALSE to quiet this message.

tree <- read.tree(phylo.file.name)
1:

Waited 10 minutes and then stopped the current computation
inputfiles.zip

Answer 1 · 2023-01-22T14:13:55.000Z

In playing around some more and seeing if a different Newick tree would suffice I now see that when I invoke 'tree <- read.tree(phylo.file.name) the R system presents a '1:' as though it wants an input and if I hit enter I get:
Warning message:
In read.tree(phylo.file.name) : empty character string.
There is definitely information in the Newick file, so it is not clear is the empty string for the variable phylo.file.name or for the file specified in ' phylo.file.name <- system.file("extdata", "Poppunk_viz_core_NJ.nwk", package = "panstripe")'? I suspect the error must relate to the contents of the file because the preceding command 'pa <- read_rtab(rtab.file.name)' seemed to work fine.

Answer 2 · 2023-01-22T16:48:57.000Z

more experimentation: Whereas the command 'tree <- read.tree(phylo.file.name)' does not complete, the command 'mytree <- read.tree(file= "accessory_binary_genes.fa.newick")' does complete but then when I try to run the fit command I get this:

mytree <- read.tree(file = "accessory_binary_genes.fa.newick")
fit <- panstripe(pa, mytree)
Error in panstripe(pa, mytree) :
number of taxa in tree and p/a matrix need to be the same!
If I run length(mytree) the return is 5.

I then got the package TreeTools and loaded it which allowed me to do Consensus(mytree) and find out I have 106 tips and 104 nodes, but when I do View(pa) there are only 50 rows and 108 columns and the row/clm labels don't match the counts in the Rtab file I loaded (16000x80)

Answer 3 · 2023-01-24T02:00:23.000Z

Hi,

I have not been able to reproduce the issue but don't have access to a windows machine so can not check there.
The read.tree command is from the ape package which is very stable and should run on windows. Are you able to run the following checks?

library(ape)
library(panstripe)

tree <- read.tree("~/Downloads/inputfiles/accessory_binary_genes.fa.newick")
pa <- read_rtab("~/Downloads/inputfiles/gene_presence_absence.Rtab")

dim(pa)
length(tree$tip.label)

all(tree$tip.label %in% rownames(pa))

This should return 'TRUE'

Answer 4 · 2023-01-24T13:54:50.000Z

Thank you very much for these additional tips/tricks/validation steps. I think this has allowed me to spot the problem, just not certain why it is a problem. This is what I did: *******R code start************

setwd("F:\\DNAwork\\Scohnii\\roaryout") tree <- read.tree("accessory_binary_genes.fa.newick") pa <- read_rtab("gene_presence_absence.Rtab")

[1mindexing[0m [34mgene_presence_absence.Rtab[0m [========================================] [32m2.15GB/s[0m, eta: [36m 0s[0m Rows: 15999 Columns: 107 ── Column specification ───────────────────────────────────────────────────────────────────────── Delimiter: "\t" chr (1): Gene dbl (106): 073AN, 1637, 1638, 1649, 1658, 1659, 1661, 1663, 1664, 1665, 1666, 1667, 1697, 170... ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

dim(pa)

[1] 106 15999

length(tree$tip.label)

[1] 106

all(tree$tip.label %in% rownames(pa))

[1] TRUE **********R code stop********* So it appears that the data in pa and tree now match when they did NOT before. So I went back and followed the steps from the instructions. **********Rcode starts********

rtab.file.name <- system.file("extdata", "gene_presence_absence.Rtab", package = "panstripe") pa <- read_rtab(rtab.file.name)

[1mindexing[0m [34mgene_presence_absence.Rtab[0m [========================================] [32m2.15GB/s[0m, eta: [36m 0s[0m Rows: 2521 Columns: 51 ── Column specification ───────────────────────────────────────────────────────────────────────── Delimiter: "\t" chr (1): Gene dbl (50): t36, t39, t37, t3, t44, t4, t20, t18, t25, t32, t50, t17, t10, t13, t14, t15, t29, ... ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

dim(pa)

[1] 50 2521

*********R code stops********* So what we see is that now pa does NOT match tree, because only 50 columns were read instead of 106. I am only a novice at R but there is something strange going on by using an intermediate variable for the rtab file name. So then I tried the same recommended procedure for the tree file using a variable. *********R code starts*********

phylo.file.name <- system.file("extdata", "accessory_binary_genes.fa.newick", package = "panstripe") tree <- read.tree(phylo.file.name)

1: Warning message: In read.tree(phylo.file.name) : empty character string.

*********R code stops******** So the line 1: appeared with the cursor tabbed in some spaces and the mouse pointer turned into an hourglass. I had to hit enter and then it produced the warning. I then set tree back to the command that reads correctly and ran fit, but it doesn’t look good for me. *********R code starts*******

tree <- read.tree("accessory_binary_genes.fa.newick") pa <- read_rtab("gene_presence_absence.Rtab")

[1mindexing[0m [34mgene_presence_absence.Rtab[0m [========================================] [32m2.15GB/s[0m, eta: [36m 0s[0m Rows: 15999 Columns: 107 ── Column specification ───────────────────────────────────────────────────────────────────────── Delimiter: "\t" chr (1): Gene dbl (106): 073AN, 1637, 1638, 1649, 1658, 1659, 1661, 1663, 1664, 1665, 1666, 1667, 1697, 170... ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

all(tree$tip.label %in% rownames(pa))

[1] TRUE

fit <- panstripe(pa, tree)

There were 50 or more warnings (use warnings() to see the first 50)

warnings()

Warning messages: 1: glm.fit: algorithm did not converge 2: glm.fit: algorithm did not converge 3: glm.fit: algorithm did not converge 4: glm.fit: algorithm did not converge 5: glm.fit: algorithm did not converge 6: glm.fit: algorithm did not converge 7: glm.fit: algorithm did not converge 8: glm.fit: algorithm did not converge 9: glm.fit: algorithm did not converge 10: glm.fit: algorithm did not converge 11: glm.fit: algorithm did not converge 12: glm.fit: algorithm did not converge 13: glm.fit: algorithm did not converge 14: glm.fit: algorithm did not converge 15: glm.fit: algorithm did not converge 16: glm.fit: algorithm did not converge 17: glm.fit: algorithm did not converge 18: glm.fit: algorithm did not converge 19: glm.fit: algorithm did not converge 20: glm.fit: algorithm did not converge 21: glm.fit: algorithm did not converge 22: glm.fit: algorithm did not converge 23: glm.fit: algorithm did not converge 24: glm.fit: algorithm did not converge 25: glm.fit: algorithm did not converge 26: glm.fit: algorithm did not converge 27: glm.fit: algorithm did not converge 28: glm.fit: algorithm did not converge 29: glm.fit: algorithm did not converge 30: glm.fit: algorithm did not converge 31: glm.fit: algorithm did not converge 32: glm.fit: algorithm did not converge 33: glm.fit: algorithm did not converge 34: glm.fit: algorithm did not converge 35: glm.fit: algorithm did not converge 36: glm.fit: algorithm did not converge 37: glm.fit: algorithm did not converge 38: glm.fit: algorithm did not converge 39: glm.fit: algorithm did not converge 40: glm.fit: algorithm did not converge 41: glm.fit: algorithm did not converge 42: glm.fit: algorithm did not converge 43: glm.fit: algorithm did not converge 44: glm.fit: algorithm did not converge 45: glm.fit: algorithm did not converge 46: glm.fit: algorithm did not converge 47: glm.fit: algorithm did not converge 48: glm.fit: algorithm did not converge 49: glm.fit: algorithm did not converge 50: glm.fit: algorithm did not converge

********R code stops*********** From: Gerry Tonkin-Hill ***@***.***> Sent: Monday, January 23, 2023 8:01 PM To: gtonkinhill/panstripe ***@***.***> Cc: Douglas Duane Rhoads ***@***.***>; Author ***@***.***> Subject: Re: [gtonkinhill/panstripe] Hangs loading tree (Issue #4) Hi, I have not been able to reproduce the issue but don't have access to a windows machine so can not check there. The read.tree command is from the ape package which is very stable and should run on windows. Are you able to run the following checks? library(ape) library(panstripe) tree <- read.tree("~/Downloads/inputfiles/accessory_binary_genes.fa.newick") pa <- read_rtab("~/Downloads/inputfiles/gene_presence_absence.Rtab") dim(pa) length(tree$tip.label) all(tree$tip.label %in% rownames(pa)) This should return 'TRUE' — Reply to this email directly, view it on GitHub<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fgtonkinhill%2Fpanstripe%2Fissues%2F4%23issuecomment-1401303055&data=05%7C01%7Cdrhoads%40uark.edu%7Cfc9de1b67b2e46e70ac808dafdaec85b%7C79c742c4e61c4fa5be89a3cb566a80d1%7C0%7C0%7C638101224395838016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=4%2BnVZiEIR5p%2FEjoO72j%2Bcwh6M%2F7qdPJzAEHMs5cdxfY%3D&reserved=0>, or unsubscribe<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAIX22VVTBVNJY73JDCLCAHTWT4ZUFANCNFSM6AAAAAAUCR76JE&data=05%7C01%7Cdrhoads%40uark.edu%7Cfc9de1b67b2e46e70ac808dafdaec85b%7C79c742c4e61c4fa5be89a3cb566a80d1%7C0%7C0%7C638101224395838016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=TbHKQfNNkVgDVhG7LM3CVZmNtf%2Bx0evNPvOaCVYr4FQ%3D&reserved=0>. You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>

Answer 5 · 2023-01-24T15:01:28.000Z

Thank you very much for these additional tips/tricks/validation steps. I think this has allowed me to spot the problem, just not certain why it is a problem. This is what I did:
R code start*****

setwd("F:\DNAwork\Scohnii\roaryout")
tree <- read.tree("accessory_binary_genes.fa.newick")
pa <- read_rtab("gene_presence_absence.Rtab")

[1mindexing[0m [34mgene_presence_absence.Rtab[0m [========================================] [32m2.15GB/s[0m, eta: [36m 0s[0m

Rows: 15999 Columns: 107
── Column specification ─────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): Gene
dbl (106): 073AN, 1637, 1638, 1649, 1658, 1659, 1661, 1663, 1664, 1665, 1666, 1667, 1697, 170...

ℹ Use spec() to retrieve the full column specification for this data.
ℹ Specify the column types or set show_col_types = FALSE to quiet this message.

dim(pa)
[1] 106 15999
length(tree$tip.label)
[1] 106
all(tree$tip.label %in% rownames(pa))
[1] TRUE

*R code stop
So it appears that the data in pa and tree now match when they did NOT before. So I went back and followed the steps from the instructions.

**Rcode starts

rtab.file.name <- system.file("extdata", "gene_presence_absence.Rtab", package = "panstripe")
pa <- read_rtab(rtab.file.name)

[1mindexing[0m [34mgene_presence_absence.Rtab[0m [========================================] [32m2.15GB/s[0m, eta: [36m 0s[0m

Rows: 2521 Columns: 51
── Column specification ─────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): Gene
dbl (50): t36, t39, t37, t3, t44, t4, t20, t18, t25, t32, t50, t17, t10, t13, t14, t15, t29, ...

ℹ Use spec() to retrieve the full column specification for this data.
ℹ Specify the column types or set show_col_types = FALSE to quiet this message.

dim(pa)
[1] 50 2521

R code stops
So what we see is that now pa does NOT match tree, because only 50 columns were read instead of 106. I am only a novice at R but there is something strange going on by using an intermediate variable for the rtab file name. So then I tried the same recommended procedure for the tree file using a variable.
R code starts

phylo.file.name <- system.file("extdata", "accessory_binary_genes.fa.newick", package = "panstripe")
tree <- read.tree(phylo.file.name)
1:
Warning message:
In read.tree(phylo.file.name) : empty character string.

*R code stops
So the line 1: appeared with the cursor tabbed in some spaces and the mouse pointer turned into an hourglass. I had to hit enter and then it produced the warning. I then set tree back to the command that reads correctly and ran fit, but it doesn’t look good for me.
**R code starts

tree <- read.tree("accessory_binary_genes.fa.newick")

pa <- read_rtab("gene_presence_absence.Rtab")

[1mindexing[0m [34mgene_presence_absence.Rtab[0m [========================================] [32m2.15GB/s[0m, eta: [36m 0s[0m

Rows: 15999 Columns: 107
── Column specification ─────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): Gene
dbl (106): 073AN, 1637, 1638, 1649, 1658, 1659, 1661, 1663, 1664, 1665, 1666, 1667, 1697, 170...

ℹ Use spec() to retrieve the full column specification for this data.
ℹ Specify the column types or set show_col_types = FALSE to quiet this message.

all(tree$tip.label %in% rownames(pa))
[1] TRUE
fit <- panstripe(pa, tree)
There were 50 or more warnings (use warnings() to see the first 50)
warnings()
Warning messages:
1: glm.fit: algorithm did not converge
2: glm.fit: algorithm did not converge
3: glm.fit: algorithm did not converge
4: glm.fit: algorithm did not converge
5: glm.fit: algorithm did not converge
6: glm.fit: algorithm did not converge
7: glm.fit: algorithm did not converge
8: glm.fit: algorithm did not converge
9: glm.fit: algorithm did not converge
10: glm.fit: algorithm did not converge
11: glm.fit: algorithm did not converge
12: glm.fit: algorithm did not converge
13: glm.fit: algorithm did not converge
14: glm.fit: algorithm did not converge
15: glm.fit: algorithm did not converge
16: glm.fit: algorithm did not converge
17: glm.fit: algorithm did not converge
18: glm.fit: algorithm did not converge
19: glm.fit: algorithm did not converge
20: glm.fit: algorithm did not converge
21: glm.fit: algorithm did not converge
22: glm.fit: algorithm did not converge
23: glm.fit: algorithm did not converge
24: glm.fit: algorithm did not converge
25: glm.fit: algorithm did not converge
26: glm.fit: algorithm did not converge
27: glm.fit: algorithm did not converge
28: glm.fit: algorithm did not converge
29: glm.fit: algorithm did not converge
30: glm.fit: algorithm did not converge
31: glm.fit: algorithm did not converge
32: glm.fit: algorithm did not converge
33: glm.fit: algorithm did not converge
34: glm.fit: algorithm did not converge
35: glm.fit: algorithm did not converge
36: glm.fit: algorithm did not converge
37: glm.fit: algorithm did not converge
38: glm.fit: algorithm did not converge
39: glm.fit: algorithm did not converge
40: glm.fit: algorithm did not converge
41: glm.fit: algorithm did not converge
42: glm.fit: algorithm did not converge
43: glm.fit: algorithm did not converge
44: glm.fit: algorithm did not converge
45: glm.fit: algorithm did not converge
46: glm.fit: algorithm did not converge
47: glm.fit: algorithm did not converge
48: glm.fit: algorithm did not converge
49: glm.fit: algorithm did not converge
50: glm.fit: algorithm did not converge

R code stops***

Answer 6 · 2023-02-17T01:21:12.000Z

Apologies, I missed your reply.

It looks like you are running the test data rather than your dataset in this example.

This line is only needed in the example and simply finds that path of the example tree on your system

phylo.file.name <- system.file("extdata", "accessory_binary_genes.fa.newick", package = "panstripe")

Instead, you should replace phylo.file.name with the path to you phylogeny on your computer.

Answer 7 · 2023-02-17T13:48:01.000Z

Unfortunately, I had already determined that command to be a problem and in what I posted had proceeded with what got me past that point: **R code starts tree <- read.tree("accessory_binary_genes.fa.newick") pa <- read_rtab("gene_presence_absence.Rtab") [1mindexing[0m [34mgene_presence_absence.Rtab[0m [========================================] [32m2.15GB/s[0m, eta: [36m 0s[0m Rows: 15999 Columns: 107 ── Column specification ───────────────────────────────────────────────────────────────────────── Delimiter: "\t" chr (1): Gene dbl (106): 073AN, 1637, 1638, 1649, 1658, 1659, 1661, 1663, 1664, 1665, 1666, 1667, 1697, 170... ℹ Use spec() to retrieve the full column specification for this data. ℹ Specify the column types or set show_col_types = FALSE to quiet this message. all(tree$tip.label %in% rownames(pa)) [1] TRUE fit <- panstripe(pa, tree) There were 50 or more warnings (use warnings() to see the first 50) warnings() Warning messages: From: Gerry Tonkin-Hill ***@***.***> Sent: Thursday, February 16, 2023 7:21 PM To: gtonkinhill/panstripe ***@***.***> Cc: Douglas Duane Rhoads ***@***.***>; Author ***@***.***> Subject: Re: [gtonkinhill/panstripe] Hangs loading tree (Issue #4) Apologies, I missed your reply. It looks like you are running the test data rather than your dataset in this example. This line is only needed in the example and simply finds that path of the example tree on your system phylo.file.name <- system.file("extdata", "accessory_binary_genes.fa.newick", package = "panstripe") Instead, you should replace phylo.file.name with the path to you phylogeny on your computer. — Reply to this email directly, view it on GitHub<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fgtonkinhill%2Fpanstripe%2Fissues%2F4%23issuecomment-1433962106&data=05%7C01%7Cdrhoads%40uark.edu%7Ca639e6a8034949843d4408db10854866%7C79c742c4e61c4fa5be89a3cb566a80d1%7C0%7C0%7C638121936851945578%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Ha%2FsAPf6Q52nkgBoMnsC9oPSX7XeUGVJtljbcuR9SOw%3D&reserved=0>, or unsubscribe<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAIX22VVACUUX2FOYN3VDIATWX3HBHANCNFSM6AAAAAAUCR76JE&data=05%7C01%7Cdrhoads%40uark.edu%7Ca639e6a8034949843d4408db10854866%7C79c742c4e61c4fa5be89a3cb566a80d1%7C0%7C0%7C638121936851945578%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xQ5OVag8gXLxIE5%2Bv9sp8CulkE1VxleojQp4LkFTdAM%3D&reserved=0>. You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>

Answer 8 · 2023-02-18T06:17:25.000Z

Hi, sorry I missed the later section of your output.
It looks like the algorithm is having trouble converging. As outlined in the documentation, this can sometimes be resolved by using a different family in the regression.

You could try running

panstripe(pa, tree, family='gaussian')

It might also be caused by issues with the phylogeny including very short branch lengths.

Answer 9 · 2023-02-18T13:59:51.000Z

This is a somewhat unusual species that I am running as it shows three clusters in a network view of the phylogenomic tree constructed using poppunk, so there are some longer branch lengths. I have just wanted to try panstripe to see what it can show me as we have several different species that we are working on for host preference and pathogen/commensal. I have scoary output for the genes that distinguish the three clusters for the species I have been trying. Is it possible there is just too much diversity? May be a while before I can get back to seeing what I can do with panstripe. Thanks for the reply and will let you know when I try changing the family setting From: Gerry Tonkin-Hill ***@***.***> Sent: Saturday, February 18, 2023 12:18 AM To: gtonkinhill/panstripe ***@***.***> Cc: Douglas Duane Rhoads ***@***.***>; Author ***@***.***> Subject: Re: [gtonkinhill/panstripe] Hangs loading tree (Issue #4) Hi, sorry I missed the later section of your output. It looks like the algorithm is having trouble converging. As outlined in the documentation, this can sometimes be resolved by using a different family in the regression. You could try running panstripe(pa, tree, family='gaussian') It might also be caused by issues with the phylogeny including very short branch lengths. - Reply to this email directly, view it on GitHub<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fgtonkinhill%2Fpanstripe%2Fissues%2F4%23issuecomment-1435496995&data=05%7C01%7Cdrhoads%40uark.edu%7C2f901a7533d144341a7808db1177d430%7C79c742c4e61c4fa5be89a3cb566a80d1%7C0%7C0%7C638122978584425258%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=5gWQA6jwHHIyIz9vcikFurlAV3aSIs41Nv2zKwHmZlY%3D&reserved=0>, or unsubscribe<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAIX22VTI5OQRDUEOBXF3I4TWYBSP7ANCNFSM6AAAAAAUCR76JE&data=05%7C01%7Cdrhoads%40uark.edu%7C2f901a7533d144341a7808db1177d430%7C79c742c4e61c4fa5be89a3cb566a80d1%7C0%7C0%7C638122978584893970%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=D5EChCH3uR86V9e56%2FJ0Rw%2BCoFbE3SsiaMrkCqMJSUM%3D&reserved=0>. You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>

Answer 10 · 2023-02-18T23:31:34.000Z

Hi,

First of all, thanks for the tool. The paper was eye opening about some concepts I had about pangenomes.

I just wanted to add that I had the same problem when trying to open my phylogeny file. I solved it by directly applying the path of my file. So that seemed to solve that issue.

However, I now have a problem about the amount of taxa my tree had compared with my presence absence matrix. I was wondering how this should be solved? I know for a fact that my tree has very short branches since there's no much diversity in the species I'm looking at.

Thanks for any solution you could give me.

P.S: I didn't knew if I had to open a new issue, but I decided to post it here in case someone else was wondering the same question as me.

Answer 11 · 2023-02-20T22:43:04.000Z

Hi,

It is usually better to open a new issue for every question as it's easy for me to miss comments on older issues.

This might be something to do with your tree construction algorithm. Some of them will filter identical sequences. I would suggest checking that your tree has the tip names you expect by running

tree <- read.tree("path/to/your/tree.newick")
tree$tip.labels

Answer 12 · 2023-02-20T22:45:07.000Z

@drhoads yes sometimes the long branch lengths can cause issues. Panstripe includes an option to automatically filter branches below a specified depth that can be used to check this.

panstripe(pa, tree, min_depth=X)