collinearity file does not have any collinear genes
Opened this issue · 11 comments
Hi,
I am trying to compare two snake genomes for identifying collinear genes but currently having some issues with MCScanX. Even I have generated the gff and blast files having exact same genes ids in both files, but the output collinearity file is empty.
############### Parameters ###############
MATCH_SCORE: 50
MATCH_SIZE: 2
GAP_PENALTY: -1
OVERLAP_WINDOW: 3
E_VALUE: 1e-10
MAX GAPS: 15
############### Statistics ###############
Number of collinear genes: 0, Percentage: -nan
Number of all genes: 0
##########################################
Can you please suggest, how can I fix this?
My blast and gff input files, have the following content types:
$ head nn_nk.blast
nn14A57224 nn14A57224 100.000 10294 0 0 1 10294 1 10294 0.0 18565
nn14A57224 nn14A57224 95.876 291 10 2 7740 8028 5213 4923 1.09e-127 464
nn14A57224 nn14A57224 95.876 291 10 2 4923 5213 8028 7740 1.09e-127 464
nn14A57224 nn14A57224 76.336 524 78 13 2697 3184 1623 2136 6.46e-99 369
nn14A57224 nn14A57224 76.336 524 78 13 1623 2136 2697 3184 6.46e-99 369
nn14A57224 nn14A57224 84.615 312 31 7 5818 6120 1833 2136 1.62e-87 331
nn14A57224 nn14A57224 84.615 312 31 7 1833 2136 5818 6120 1.62e-87 331
nn14A57224 nn14A57224 80.872 298 51 4 5819 6111 2879 3175 1.63e-68 268
nn14A57224 nn14A57224 80.872 298 51 4 2879 3175 5819 6111 1.63e-68 268
nn14A57224 nn14A57224 89.130 138 13 2 5677 5813 1997 2133 2.75e-40 174
$ head nn_nk.gff
nn14 nn14A57224 29426 39720
nn14 nn14A57225 41412 54382
nn14 nn14A57226 58913 65705
nn14 nn14A57227 70249 82715
nn14 nn14A57228 83915 85686
nn14 nn14A57229 93024 239710
nn14 nn14A57230 248031 267724
nn14 nn14A57231 267964 268794
nn14 nn14A57232 293986 419804
nn14 nn14A57233 422076 434387
Having the same issue!
Reading BLAST file and pre-processing
Generating BLAST list
0 matches imported (645901 discarded)
0 pairwise comparisons
0 alignments generated
Pairwise collinear blocks written to test/dt.collinearity [2.411 seconds elapsed]
Writing multiple syntenic blocks to HTML files
Done! [0.001 seconds elapsed]
$ head test.bed
mt0 28 2484 9_0
mt0 2411 3570 9_1
mt0 3825 5316 9_2
mt0 4245 6630 9_3
mt0 6633 6940 9_4
mt0 6942 7695 9_5
mt0 7699 8373 9_6
mt0 8482 9022 9_7
mt0 8941 10923 9_8
mt0 9024 12637 9_9
$ head test.blast
18_0 19_2516 93.3 119 8 0 1 119 330 448 8.0e-73 223.4
18_0 19_14053 93.3 119 8 0 1 119 343 461 1.1e-72 223.4
18_0 19_24 87.0 123 16 0 1 123 311 433 5.6e-70 215.7
18_0 19_6034 80.5 123 24 0 1 123 228 350 8.4e-68 207.6
18_0 19_8548 35.3 116 67 3 1 109 250 364 2.5e-14 67.4
18_0 19_2323 35.7 112 64 3 1 105 323 433 5.0e-14 66.6
18_0 19_5878 27.1 129 83 3 2 119 352 480 1.9e-09 53.5
18_0 19_8859 35.6 118 57 6 2 104 379 492 4.8e-09 52.4
18_0 19_23809 34.7 118 58 6 2 104 343 456 2.3e-08 50.4
18_1 19_2518 92.6 703 51 1 159 860 5 707 0.0e+00 1262.7
@holmrenser Does it help if you rename test.bed to test.gff and/or re-arrange the columns so that the name of the gene is second?
mt0 9_0 28 2484
@brkldj It did, took me a while to figure it out. The readme says input files should be in bed format, which has the name of the feature as 4th column (see https://genome.ucsc.edu/FAQ/FAQformat.html#format1).
I'll submit a PR to suggest a fix for the readme.
Thanks that just reallllly hurt my head for a good 45 minutes now...
Hello,
I was having the same issue and none of the above solutions worked for me. My files are in the same format as @farhan-lab 's.
I finally realized that (surprise, surprise) my command was not quite right. Say my files are named dir.blast
and dir.gff
. Initially I was running MCScanX path/to/dir
when I needed to specify the prefix of the .blast and .gff files as well: MCScanX path/to/dir/dir
.
I both hope this works for someone and also apologize if it does because now I feel stupid :)
Hi,
Did you ever get this working @farhan-lab ? I have the same issue .. inputs seems to be correct (after a lot of confusion), but still MCscanX isnt running ..
Hey @holmrenser and @siv-n,
It looks like I am having the same issue as you two did. In particular I am getting the message as @holmrenser that all matches have been discarded. Was only thing you did to get it to work changing the order of the .gff?
My .gff is already as described with gene coming in second place.
I can get the program to run on the test data, but not this data, which is the same in format as far as I have been able to discern, unfortunately.
Any pointers would be appreciated.
./MCScanX aa_at
Reading BLAST file and pre-processing
Generating BLAST list
0 matches imported (554113 discarded)
0 pairwise comparisons
0 alignments generated
Pairwise collinear blocks written to aa_at.collinearity [4.383 seconds elapsed]
Writing multiple syntenic blocks to HTML files
aa1.html
aa10.html
aa11.html
aa12.html
aa13.html
aa14.html
aa15.html
aa16.html
aa17.html
aa18.html
aa2.html
aa3.html
aa4.html
aa5.html
aa6.html
aa7.html
aa8.html
aa9.html
at1.html
at10.html
at11.html
at12.html
at2.html
at3.html
at4.html
at5.html
at6.html
at7.html
at8.html
at9.html
Done! [0.339 seconds elapsed]
head aa_at.gff
aa1 AA0000101 1 11347
aa1 AA0000201 19136 23567
aa1 AA0000301 81296 108098
aa1 AA0000401 157331 158227
aa1 AA0000501 183579 184807
aa1 AA0000601 188429 189500
aa1 AA0000701 191079 194975
aa1 AA0000801 198362 198840
aa1 AA0000901 203576 206430
aa1 AA0001001 213149 214163
head aa_at.blast
AA0000101 XP_021987606 72.73 484 60 6 1 413 1 483 0 628
AA0000101 XP_021987611 72.73 484 60 6 1 413 1 483 0 628
AA0000101 OTG38523 72.73 484 60 6 1 413 78 560 0 627
AA0000101 XP_021994914 72.33 459 71 7 1 409 1 453 0 617
AA0000101 OTG09514 72.33 459 71 7 1 409 83 535 0 617
AA0000101 XP_021994915 72.33 459 70 8 1 409 1 452 0 612
AA0000201 XP_021987587 94.33 353 18 2 6 358 5 355 0 671
AA0000201 XP_024983527 89.52 353 35 2 6 358 5 355 0 644
AA0000201 XP_023761470 84.92 358 52 2 1 358 1 356 0 624
AA0000201 XP_024985842 82.78 360 57 3 1 358 1 357 0 600
It is important to note that although the github says for the gff file that it should be 'ch#\tstart\tstop\tgeneID', this is incorrect. The geneID needs to be the second column.
Also, the geneID's in the gff file need to EXACTLY match the geneID's in the blast algorithm. Any difference will result in no matches.
I had this same issue. What solved mine (after changing the order of the geneID) was making sure the gff file was named '.gff' not '.gff3'
Hope this saves a headache.