zaeleus/noodles

noodles sam successfully read the recording but failed to write it

natir opened this issue · 2 comments

Hello,

I was creating random sam records (for test data sets) and when I wanted to transform them into bam via noodles_util I got a strange bug.

How to reproduce:

  1. create test.sam
  2. run cargo run --example sam_count -- ~/tmp.sam -> get 5
  3. run cargo run --example sam_view -- ~/tmp.sam -> Error !

sam_view crash with Error: Kind(InvalidInput) on record record_@]APF. I haven't been able to find an explanation of what's wrong with this recording (maybe the flag, but I'm not sure).

test.sam
@HD	VN:1.0
@SQ	SN:1	LN:2147483647
@SQ	SN:2	LN:2147483647
@SQ	SN:3	LN:2147483647
@SQ	SN:4	LN:2147483647
@SQ	SN:5	LN:2147483647
@SQ	SN:6	LN:2147483647
@SQ	SN:7	LN:2147483647
@SQ	SN:8	LN:2147483647
@SQ	SN:9	LN:2147483647
@SQ	SN:10	LN:2147483647
@SQ	SN:11	LN:2147483647
@SQ	SN:12	LN:2147483647
@SQ	SN:13	LN:2147483647
@SQ	SN:14	LN:2147483647
@SQ	SN:15	LN:2147483647
@SQ	SN:16	LN:2147483647
@SQ	SN:17	LN:2147483647
@SQ	SN:18	LN:2147483647
@SQ	SN:19	LN:2147483647
@SQ	SN:22	LN:2147483647
@SQ	SN:X	LN:2147483647
@SQ	SN:Y	LN:2147483647
@SQ	SN:MT	LN:2147483647
@SQ	SN:chr1	LN:2147483647
@SQ	SN:chr2	LN:2147483647
@SQ	SN:chr3	LN:2147483647
@SQ	SN:chr4	LN:2147483647
@SQ	SN:chr5	LN:2147483647
@SQ	SN:chr6	LN:2147483647
@SQ	SN:chr7	LN:2147483647
@SQ	SN:chr8	LN:2147483647
@SQ	SN:chr9	LN:2147483647
@SQ	SN:chr10	LN:2147483647
@SQ	SN:chr11	LN:2147483647
@SQ	SN:chr12	LN:2147483647
@SQ	SN:chr13	LN:2147483647
@SQ	SN:chr14	LN:2147483647
@SQ	SN:chr15	LN:2147483647
@SQ	SN:chr16	LN:2147483647
@SQ	SN:chr17	LN:2147483647
@SQ	SN:chr18	LN:2147483647
@SQ	SN:chr19	LN:2147483647
@SQ	SN:chr22	LN:2147483647
@SQ	SN:chrX	LN:2147483647
@SQ	SN:chrY	LN:2147483647
@SQ	SN:chrMT	LN:2147483647
record_`IbUX	4025	chrX	3136	74	50M	*	0	50	gAAtCGCgtGTTAGTTAagccAcggtAatGcTtgtaCgcAGgAtaTcgAA	2?8C,30C5-D.$.=A@2/&='6A0A$@D&4,1+=!/'@ED:C577DF%"
record_D]MO]	2169	chr18	7114	16	50M	*	0	50	cAtgCtGCAAtTacCGtTAAcaGGtatTCaTCctcTGgAActTgCGAcaA	FG>!$!3A6+9#(7E7<??C;*184,;E>-"=BH3?"6;%13=A-?!2FH
record_@]APF	427	10	13635	47	50M	*	0	50	aCGctGagattTGtgCttaAGggTcCTGcGTAGCTGTCCACgTTTGagtG	>61-B'!01"'!H":,=$*$6*-95FH5D2?BA,+@58%75BH0D?G0+@
record_dE^c]	115	10	61882	50	50M	*	0	50	CTacgtCTaTgTCAGgCtaGTtcCCTcgcTgAgGgAtCAAatTCTATTGT	H/6DHFB;'.<<&0A=(@9!DA+-D/,:*B7C+'=07$C&&C9%H;B=!6
record_E]PA`	3624	chr2	17136	111	50M	*	0	50	AtaatcaCtGcTAGCCAgaTTgcAaTtaTGgACTTagGgtATACCtcTct	.'/!$D()7D,',GB55&(!**$F=@0?3G183F?>6<.C$$6AB2FH4#

The reader will now decode a superset of the specification, but the writer still requires valid fields. record_@]APF is not a valid read name because it includes the @ symbol. See § 1.4 "The alignment section: mandatory fields" (2023-05-24):

Col Field Type Regexp/Range Brief description
1 QNAME String [!-?A-~]{1,254} Query template NAME

Thank a lot.

After wasting several long minutes, I finally understood what this regex meant.

Perhaps a more explicit error message that (indicates the field concerned) would be useful, but I know it's not necessarily easy.