ERROR: Number of populations is not the same in the input matrices!
Closed this issue · 4 comments
Hi!
I got a strange Error reported. It showed that a the title. However, I produced f2-summary statistics
by the instruction Step 1, which was normal and no any error repoted, but the Step 2 running the OrientAGraph formal analysis reporetd this message.
What the problem it is and how can I address this?
Thank you!
Chen Yan
Thank you so much for your interest in using OrientAGraph, Chen Yan. I am happy to help you figure out the issue. Could you share with us the exact commands you are using as well as the input data file for the second command (i.e., the two matrix files corresponding to the f2 statistics)? You could rename population names or share via email (ekmolloy@umd.edu) if you don't want to post publicly. Thanks! -Erin
Hi, Prof. Molloy
Thank you for help.
I have uploaded the two input files of OrientAGraph, the two matrix files that were generated based on the frequency file for treemix.
This step was run with the comand as
orientagraph -i treemix.frq.gz -root PopPD -f2 -freq2stat -k 1000 -seed 123 -o orientagraph.k1000.seed123
orientagraph.k1000.seed123.cov.txt
orientagraph.k1000.seed123.covse.txt
And the command running formal analysis of OrientAGraph is like that:
orientagraph -i orientagraph.k1000.seed123.cov.txt -givenmat orientagraph.k1000.seed123.covse.txt -f2 -mlno -root PopPD -k 1000 -noss -o orientagraph_m0 -tf Auto.tree.txt -score 1
I provided a guide topology file (Auto.tree.txt
) that I thought it was suitable for my analysis.
Auto.tree.txt
It is my great honor to be helped with you and your team. Thanks very much!
Best Regards,
Chen Yan
Dear Chen Yan,
Apologies for the delayed reply; we were on a U.S. holiday around the time of this message and .
I have taken a look at your command and data.
To get the command to work, I made the following changes:
- Removed option
-noss
because it's not compatible with-givenmat
- Removed option
-k 1000
because it's not compatible with-givenmat
(you already used this option to create the matrix) - Revised
-score 1
(the options are-score asis
,-score rfit
,-score mlbt
,-score mlno
) - Removed option
-mlno
because you aren't adding any migration edges (-m
).
This
../src/orientagraph-v1.2-linux64 \
-i orientagraph.k1000.seed123.cov.txt \
-givenmat orientagraph.k1000.seed123.covse.txt \
-f2 \
-root PopPD \
-tf Auto.tree.txt \
-score rfit \
-o output_wst_mig0
which computes the likelihood of the starting tree (-154.44436). This is a really good score.
I then tried adding in one migration edge and performing maximum likelihood network orientation (mlno), with this command:
../src/orientagraph-v1.2-linux64 \
-i orientagraph.k1000.seed123.cov.txt \
-givenmat orientagraph.k1000.seed123.covse.txt \
-f2 \
-root PopPD \
-tf Auto.tree.txt \
-m 1 \
-mlno 1 \
-o output_wst_mig1
It gave the message, "Failed to add migration edge so won't continue trying". This means it couldn't find a way to add one, but since some of this relates to the TreeMix code, it's not 100% clear to me what the issue was.
I then tried to build a tree with this command:
../src/orientagraph-v1.2-linux64 \
-i orientagraph.k1000.seed123.cov.txt \
-givenmat orientagraph.k1000.seed123.covse.txt \
-f2 \
-root PopPD \
-seed 12345 \
-o orientagraph_tree
It produced this tree
(PopPD:0.203138,(PopSP:0.115542,(((PopAm:0.0266065,(PopPB:0.0163228,PopBB:0.00467496):0.0141134):0.0060117,PopCB:0.00733744):0.00748499,((PopSU:0.0357004,PopSL:0.0359395):0.00379131,PopAs:0.0218087):0.00506879):0.083868):0.203138);
which has an even better likelihood (43.136152) then the tree in Auto.txt
. It flips the position of PopCB
and PopAm
, compared to the tree in Auto.txt
.
Lastly, I tried to estimate a network with one migration edge and doing maximum likelihood network orientation (MLNO).
../src/orientagraph-v1.2-linux64 \
-i orientagraph.k1000.seed123.cov.txt \
-givenmat orientagraph.k1000.seed123.covse.txt \
-f2 \
-root PopPD \
-seed 12345 \
-m 1 \
-mlno 1 \
-o orientagraph_1mig
This produces the network
(PopPD:0.203138,(PopSP:0.115542,((PopCB:0.00692547,((PopBB:0.00467496,PopPB:0.0163228):0.0138687,PopAm:0.0266101):0.00663179):0.00856977,(PopSL:0.0355972,(PopSU:0.0338442,PopAs:0.0453469):0.00191585):0.00855608):0.0825982):0.203138);
0.265451 NA NA NA PopAm:0.0266101 PopAs:0.0453469
which increases the likelihood to 133.187.
I hope this helps, and please let me know if you have further questions!
-Erin
Oh I was able to add a migration edge to your starting tree! I used the command
../src/orientagraph-v1.2-linux64 \
-i orientagraph.k1000.seed123.cov.txt \
-givenmat orientagraph.k1000.seed123.covse.txt \
-f2 \
-root PopPD \
-tf Auto.tree.txt \
-m 1 \
-mlno 1 \
-allmigs 1 \
-o output_wst_mig1_try2
This has likelihood 147.976554273 (which is close but slightly better than the previous network with one edge).
This command (1) read the starting tree, (2) added one migration edge (the option-allmigs 1
enabled an edge to be added unlike before), and (3) tried to find a better orientation (with option -mlno 1
) but it didn't succeed (the highest likelihood orientation with the highest likelihood had already been found).
I recommend comparing this result to what you are getting with TreeMix on the frequency matrix.
This is the output orientagraph-issue.tar.gz so that you can visualize it.