wyp1125/MCScanX

collinearity file does not have any collinear genes

Opened this issue · 11 comments

Hi,

I am trying to compare two snake genomes for identifying collinear genes but currently having some issues with MCScanX. Even I have generated the gff and blast files having exact same genes ids in both files, but the output collinearity file is empty.

############### Parameters ###############

MATCH_SCORE: 50

MATCH_SIZE: 2

GAP_PENALTY: -1

OVERLAP_WINDOW: 3

E_VALUE: 1e-10

MAX GAPS: 15

############### Statistics ###############

Number of collinear genes: 0, Percentage: -nan

Number of all genes: 0

##########################################
Can you please suggest, how can I fix this?
My blast and gff input files, have the following content types:

$ head nn_nk.blast
nn14A57224 nn14A57224 100.000 10294 0 0 1 10294 1 10294 0.0 18565
nn14A57224 nn14A57224 95.876 291 10 2 7740 8028 5213 4923 1.09e-127 464
nn14A57224 nn14A57224 95.876 291 10 2 4923 5213 8028 7740 1.09e-127 464
nn14A57224 nn14A57224 76.336 524 78 13 2697 3184 1623 2136 6.46e-99 369
nn14A57224 nn14A57224 76.336 524 78 13 1623 2136 2697 3184 6.46e-99 369
nn14A57224 nn14A57224 84.615 312 31 7 5818 6120 1833 2136 1.62e-87 331
nn14A57224 nn14A57224 84.615 312 31 7 1833 2136 5818 6120 1.62e-87 331
nn14A57224 nn14A57224 80.872 298 51 4 5819 6111 2879 3175 1.63e-68 268
nn14A57224 nn14A57224 80.872 298 51 4 2879 3175 5819 6111 1.63e-68 268
nn14A57224 nn14A57224 89.130 138 13 2 5677 5813 1997 2133 2.75e-40 174

$ head nn_nk.gff
nn14 nn14A57224 29426 39720
nn14 nn14A57225 41412 54382
nn14 nn14A57226 58913 65705
nn14 nn14A57227 70249 82715
nn14 nn14A57228 83915 85686
nn14 nn14A57229 93024 239710
nn14 nn14A57230 248031 267724
nn14 nn14A57231 267964 268794
nn14 nn14A57232 293986 419804
nn14 nn14A57233 422076 434387

Having the same issue!

Reading BLAST file and pre-processing
Generating BLAST list
0 matches imported (645901 discarded)
0 pairwise comparisons
0 alignments generated
Pairwise collinear blocks written to test/dt.collinearity [2.411 seconds elapsed]
Writing multiple syntenic blocks to HTML files
Done! [0.001 seconds elapsed]
$ head test.bed
mt0     28      2484    9_0
mt0     2411    3570    9_1
mt0     3825    5316    9_2
mt0     4245    6630    9_3
mt0     6633    6940    9_4
mt0     6942    7695    9_5
mt0     7699    8373    9_6
mt0     8482    9022    9_7
mt0     8941    10923   9_8
mt0     9024    12637   9_9
$ head test.blast
18_0    19_2516 93.3    119     8       0       1       119     330     448     8.0e-73 223.4
18_0    19_14053        93.3    119     8       0       1       119     343     461     1.1e-72 223.4
18_0    19_24   87.0    123     16      0       1       123     311     433     5.6e-70 215.7
18_0    19_6034 80.5    123     24      0       1       123     228     350     8.4e-68 207.6
18_0    19_8548 35.3    116     67      3       1       109     250     364     2.5e-14 67.4
18_0    19_2323 35.7    112     64      3       1       105     323     433     5.0e-14 66.6
18_0    19_5878 27.1    129     83      3       2       119     352     480     1.9e-09 53.5
18_0    19_8859 35.6    118     57      6       2       104     379     492     4.8e-09 52.4
18_0    19_23809        34.7    118     58      6       2       104     343     456     2.3e-08 50.4
18_1    19_2518 92.6    703     51      1       159     860     5       707     0.0e+00 1262.7

@holmrenser Does it help if you rename test.bed to test.gff and/or re-arrange the columns so that the name of the gene is second?
mt0 9_0 28 2484

@brkldj It did, took me a while to figure it out. The readme says input files should be in bed format, which has the name of the feature as 4th column (see https://genome.ucsc.edu/FAQ/FAQformat.html#format1).

I'll submit a PR to suggest a fix for the readme.

Thanks that just reallllly hurt my head for a good 45 minutes now...

I couldn't resolve the error as I am getting an empty collinearity file:
blast : ACO.zip
gff : ACO.zip

please help me to resolve this.
Here is my input fasta file
ACO.zip

Hello,

I was having the same issue and none of the above solutions worked for me. My files are in the same format as @farhan-lab 's.
I finally realized that (surprise, surprise) my command was not quite right. Say my files are named dir.blast and dir.gff. Initially I was running MCScanX path/to/dir when I needed to specify the prefix of the .blast and .gff files as well: MCScanX path/to/dir/dir.

I both hope this works for someone and also apologize if it does because now I feel stupid :)

siv-n commented

Hi,

Did you ever get this working @farhan-lab ? I have the same issue .. inputs seems to be correct (after a lot of confusion), but still MCscanX isnt running ..

Hey @holmrenser and @siv-n,

It looks like I am having the same issue as you two did. In particular I am getting the message as @holmrenser that all matches have been discarded. Was only thing you did to get it to work changing the order of the .gff?

My .gff is already as described with gene coming in second place.

I can get the program to run on the test data, but not this data, which is the same in format as far as I have been able to discern, unfortunately.

Any pointers would be appreciated.

./MCScanX aa_at

Reading BLAST file and pre-processing
Generating BLAST list
0 matches imported (554113 discarded)
0 pairwise comparisons
0 alignments generated
Pairwise collinear blocks written to aa_at.collinearity [4.383 seconds elapsed]
Writing multiple syntenic blocks to HTML files
aa1.html
aa10.html
aa11.html
aa12.html
aa13.html
aa14.html
aa15.html
aa16.html
aa17.html
aa18.html
aa2.html
aa3.html
aa4.html
aa5.html
aa6.html
aa7.html
aa8.html
aa9.html
at1.html
at10.html
at11.html
at12.html
at2.html
at3.html
at4.html
at5.html
at6.html
at7.html
at8.html
at9.html
Done! [0.339 seconds elapsed]
head aa_at.gff
aa1     AA0000101       1       11347
aa1     AA0000201       19136   23567
aa1     AA0000301       81296   108098
aa1     AA0000401       157331  158227
aa1     AA0000501       183579  184807
aa1     AA0000601       188429  189500
aa1     AA0000701       191079  194975
aa1     AA0000801       198362  198840
aa1     AA0000901       203576  206430
aa1     AA0001001       213149  214163

head aa_at.blast
AA0000101       XP_021987606    72.73   484     60      6       1       413     1       483     0       628
AA0000101       XP_021987611    72.73   484     60      6       1       413     1       483     0       628
AA0000101       OTG38523        72.73   484     60      6       1       413     78      560     0       627
AA0000101       XP_021994914    72.33   459     71      7       1       409     1       453     0       617
AA0000101       OTG09514        72.33   459     71      7       1       409     83      535     0       617
AA0000101       XP_021994915    72.33   459     70      8       1       409     1       452     0       612
AA0000201       XP_021987587    94.33   353     18      2       6       358     5       355     0       671
AA0000201       XP_024983527    89.52   353     35      2       6       358     5       355     0       644
AA0000201       XP_023761470    84.92   358     52      2       1       358     1       356     0       624
AA0000201       XP_024985842    82.78   360     57      3       1       358     1       357     0       600

It is important to note that although the github says for the gff file that it should be 'ch#\tstart\tstop\tgeneID', this is incorrect. The geneID needs to be the second column.

Also, the geneID's in the gff file need to EXACTLY match the geneID's in the blast algorithm. Any difference will result in no matches.

I had this same issue. What solved mine (after changing the order of the geneID) was making sure the gff file was named '.gff' not '.gff3'

Hope this saves a headache.