sriramlab/OrientAGraph

ERROR: Number of populations is not the same in the input matrices!

Closed this issue · 4 comments

Hi!
I got a strange Error reported. It showed that a the title. However, I produced f2-summary statistics by the instruction Step 1, which was normal and no any error repoted, but the Step 2 running the OrientAGraph formal analysis reporetd this message.
What the problem it is and how can I address this?

Thank you!
Chen Yan

Thank you so much for your interest in using OrientAGraph, Chen Yan. I am happy to help you figure out the issue. Could you share with us the exact commands you are using as well as the input data file for the second command (i.e., the two matrix files corresponding to the f2 statistics)? You could rename population names or share via email (ekmolloy@umd.edu) if you don't want to post publicly. Thanks! -Erin

Hi, Prof. Molloy
Thank you for help.
I have uploaded the two input files of OrientAGraph, the two matrix files that were generated based on the frequency file for treemix.
This step was run with the comand as
orientagraph -i treemix.frq.gz -root PopPD -f2 -freq2stat -k 1000 -seed 123 -o orientagraph.k1000.seed123

orientagraph.k1000.seed123.cov.txt
orientagraph.k1000.seed123.covse.txt

And the command running formal analysis of OrientAGraph is like that:
orientagraph -i orientagraph.k1000.seed123.cov.txt -givenmat orientagraph.k1000.seed123.covse.txt -f2 -mlno -root PopPD -k 1000 -noss -o orientagraph_m0 -tf Auto.tree.txt -score 1

I provided a guide topology file (Auto.tree.txt) that I thought it was suitable for my analysis.
Auto.tree.txt

It is my great honor to be helped with you and your team. Thanks very much!

Best Regards,
Chen Yan

Dear Chen Yan,

Apologies for the delayed reply; we were on a U.S. holiday around the time of this message and .

I have taken a look at your command and data.

To get the command to work, I made the following changes:

  • Removed option -noss because it's not compatible with -givenmat
  • Removed option -k 1000 because it's not compatible with -givenmat (you already used this option to create the matrix)
  • Revised -score 1 (the options are -score asis, -score rfit, -score mlbt, -score mlno)
  • Removed option -mlno because you aren't adding any migration edges (-m).

This

../src/orientagraph-v1.2-linux64 \
    -i orientagraph.k1000.seed123.cov.txt \
    -givenmat orientagraph.k1000.seed123.covse.txt \
    -f2 \
    -root PopPD \
    -tf Auto.tree.txt \
    -score rfit \
    -o output_wst_mig0

which computes the likelihood of the starting tree (-154.44436). This is a really good score.

I then tried adding in one migration edge and performing maximum likelihood network orientation (mlno), with this command:

../src/orientagraph-v1.2-linux64 \
    -i orientagraph.k1000.seed123.cov.txt \
    -givenmat orientagraph.k1000.seed123.covse.txt \
    -f2 \
    -root PopPD \
    -tf Auto.tree.txt \
    -m 1 \
    -mlno 1 \
    -o output_wst_mig1

It gave the message, "Failed to add migration edge so won't continue trying". This means it couldn't find a way to add one, but since some of this relates to the TreeMix code, it's not 100% clear to me what the issue was.

I then tried to build a tree with this command:

../src/orientagraph-v1.2-linux64 \
    -i orientagraph.k1000.seed123.cov.txt \
    -givenmat orientagraph.k1000.seed123.covse.txt \
    -f2 \
    -root PopPD \
    -seed 12345 \
    -o orientagraph_tree

It produced this tree

(PopPD:0.203138,(PopSP:0.115542,(((PopAm:0.0266065,(PopPB:0.0163228,PopBB:0.00467496):0.0141134):0.0060117,PopCB:0.00733744):0.00748499,((PopSU:0.0357004,PopSL:0.0359395):0.00379131,PopAs:0.0218087):0.00506879):0.083868):0.203138);

which has an even better likelihood (43.136152) then the tree in Auto.txt. It flips the position of PopCB and PopAm, compared to the tree in Auto.txt.

Lastly, I tried to estimate a network with one migration edge and doing maximum likelihood network orientation (MLNO).

../src/orientagraph-v1.2-linux64 \
    -i orientagraph.k1000.seed123.cov.txt \
    -givenmat orientagraph.k1000.seed123.covse.txt \
    -f2 \
    -root PopPD \
    -seed 12345 \
    -m 1 \
    -mlno 1 \
    -o orientagraph_1mig

This produces the network

(PopPD:0.203138,(PopSP:0.115542,((PopCB:0.00692547,((PopBB:0.00467496,PopPB:0.0163228):0.0138687,PopAm:0.0266101):0.00663179):0.00856977,(PopSL:0.0355972,(PopSU:0.0338442,PopAs:0.0453469):0.00191585):0.00855608):0.0825982):0.203138);
0.265451 NA NA NA PopAm:0.0266101 PopAs:0.0453469

which increases the likelihood to 133.187.

I hope this helps, and please let me know if you have further questions!

-Erin

Oh I was able to add a migration edge to your starting tree! I used the command

../src/orientagraph-v1.2-linux64   \
    -i orientagraph.k1000.seed123.cov.txt \
    -givenmat orientagraph.k1000.seed123.covse.txt \ 
    -f2 \
    -root PopPD  \
    -tf Auto.tree.txt  \
    -m 1  \
    -mlno 1  \
    -allmigs 1  \
    -o output_wst_mig1_try2

This has likelihood 147.976554273 (which is close but slightly better than the previous network with one edge).

This command (1) read the starting tree, (2) added one migration edge (the option-allmigs 1 enabled an edge to be added unlike before), and (3) tried to find a better orientation (with option -mlno 1) but it didn't succeed (the highest likelihood orientation with the highest likelihood had already been found).

I recommend comparing this result to what you are getting with TreeMix on the frequency matrix.

This is the output orientagraph-issue.tar.gz so that you can visualize it.