vgteam/vg

Latest versions of `vg deconstruct` have different wrong behaviors with graphs following the `PanSN-spec`

Closed this issue · 0 comments

With this graph

scerevisiae7.community.0.fa.gz.d1a145e.417fcdf.7493449.smooth.final.gfa.gz

where we follow the PanSN-spec, I obtain:

# vg 1.40.0
vg deconstruct -P S288C -H '#' -e -a -t 16 scerevisiae7.community.0.fa.gz.d1a145e.417fcdf.7493449.smooth.final.gfa | grep chrI | head -n 5
##contig=<ID=S288C#1#chrI,length=219929>
S288C#1#chrI	5	>8>10	CCCCA	C	60	.	AC=1;AF=1;AN=1;AT=>8>9>10,>8>10;NS=1;LV=0	GT	.	.	.	.	1	.
S288C#1#chrI	35	>10>13	A	C	60	.	AC=1;AF=1;AN=1;AT=>10>12>13,>10>11>13;NS=1;LV=0	GT	.	.	.	.	1	.
S288C#1#chrI	37	>13>16	C	A	60	.	AC=1;AF=1;AN=1;AT=>13>14>16,>13>15>16;NS=1;LV=0	GT	.	.	.	.	1	.
S288C#1#chrI	41	>16>19	A	C	60	.	AC=1;AF=1;AN=1;AT=>16>17>19,>16>18>19;NS=1;LV=0	GT	.	.	.	.	1	.

that is correct.

# vg 1.43.0
vg deconstruct -P S288C -H '#' -e -a -t 16 scerevisiae7.community.0.fa.gz.d1a145e.417fcdf.7493449.smooth.final.gfa | grep chrI | head -n 5 
##contig=<ID=S288C#1#chrI#0,length=219929>
S288C#1#chrI#0	5	>8>10	CCCCA	C	60	.	AC=1;AF=1;AN=1;AT=>8>9>10,>8>10;NS=1;LV=0	GT	.	.	.	.	1	.
S288C#1#chrI#0	35	>10>13	A	C	60	.	AC=1;AF=1;AN=1;AT=>10>12>13,>10>11>13;NS=1;LV=0	GT	.	.	.	.	1	.
S288C#1#chrI#0	37	>13>16	C	A	60	.	AC=1;AF=1;AN=1;AT=>13>14>16,>13>15>16;NS=1;LV=0	GT	.	.	.	.	1	.
S288C#1#chrI#0	41	>16>19	A	C	60	.	AC=1;AF=1;AN=1;AT=>16>17>19,>16>18>19;NS=1;LV=0	GT	.	.	.	.	1	.

that is strange because it adds #0 at the end of the reference path name.

# vg 1.44.0
vg deconstruct -P S288C -H '#' -e -a -t 16 scerevisiae7.community.0.fa.gz.d1a145e.417fcdf.7493449.smooth.final.gfa | grep chrI | head -n 5 
##contig=<ID=S288C#1#chrI#0,length=219929>
chrI	5	>8>10	CCCCA	C	60	.	AC=1;AF=1;AN=1;AT=>8>9>10,>8>10;NS=1;LV=0	GT	.	.	.	.	1	.
chrI	35	>10>13	A	C	60	.	AC=1;AF=1;AN=1;AT=>10>12>13,>10>11>13;NS=1;LV=0	GT	.	.	.	.	1	.
chrI	37	>13>16	C	A	60	.	AC=1;AF=1;AN=1;AT=>13>14>16,>13>15>16;NS=1;LV=0	GT	.	.	.	.	1	.
chrI	41	>16>19	A	C	60	.	AC=1;AF=1;AN=1;AT=>16>17>19,>16>18>19;NS=1;LV=0	GT	.	.	.	.	1	.

that is wrong because the CHROM column (the 1st one) contains a value that is different from the contig ID specified above, where again there is the #0 suffix. This is leading to problems in pggb, as described in pangenome/pggb#262 (comment).

How can I obtain the behavior of vg deconstruct 1.40.0 with the latest versions of vg?