AntonBankevich/LJA

Inconsistency between mdbg.gfa and assembly.fasta

MrTomRod opened this issue · 0 comments

Dear Anton

There is a contig (6) in mdbg.gfa that is missing in assembly.fasta:

$ grep ">" assembly.fasta 
>20
>7
>10
>1
>3
>16
>15

Preview of mdbg.gfa:

H       VN:Z:1.0
S       20      ATTTAGGCGCTAATTTTCCAAAACGCTCAAAACGAGGGTCAGAG...
S       7       AAAATGACCATTTTCTGAACACTCAAATATTGAAAAAAATATAG...
S       6       AAAATGACCATTTTCTGAACACTCAAATATTGAAAAAAATATAG...
S       10      AGACACTCATTCTCTCGTTGCAAACATTGCAAAGTTTTGAAAAA...
S       1       GTTCACTTAAAAGAGAACAAATTCGAGCAAAATATTTTGCCAGA...
S       3       CACAACAACAACGTATTACTCAACTAAAGCAAAGGCTGCAACGC...
S       16      AAAATGACCATTTTCTGAACATCTGATAAAAAAGGAAAAACGAT...
S       15      AAAATGACCATTTTCTGAACATCTGATAAAAAAGGAAAAACGAT...
L       20      +       20      +       7308M
L       7       +       6       -       29769M
L       16      +       6       -       29769M
L       7       +       15      -       29769M
L       16      +       15      -       29769M
L       16      -       7       +       38240M
L       15      -       7       +       38240M
L       16      -       6       +       38240M
L       15      -       6       +       38240M
L       10      +       10      +       7449M
L       1       +       1       +       7514M
L       3       +       3       +       379M1D85M1I6141M2I6M

Visualization using gfaviz --no-gui --render --labels --output mdbg.gfa.svg mdbg.gfa:

mdbg gfa

Moreover, some of the sequences in mdbg.gfa are longer than those in assembly.fasta. They seem not to be circularized.

Is that a bug? Which file should I trust?