zeek/spicy

Regex processing using {n,m} repeat syntax is off by one

kkvarfordt opened this issue · 2 comments

If you have a unit that has a 4 byte field followed by 0 to 3 null bytes, but the input has 5 null bytes on the end, the resulting output shows 4 null bytes when it should be 3 null bytes. See the following code for details.

Version: spicy-driver v1.9.0 (7b8eff5)

Code:

module Test;
import spicy;

public type Foo = unit {
    field_1: bytes &size=4;
    field_2: /\x00{0,3}/;

    on %done {
        print "field_1 => %x" % self.field_1;
        print "field_2 => %x" % self.field_2;
    }
}

Command:
printf '\01\02\03\04\00\00\00\00\00' | spicy-driver regex-broken-repeat-syntax.spicy

Result:

clang: warning: -lc++abi: 'linker' input unused [-Wunused-command-line-argument]
clang: warning: -lc++abi: 'linker' input unused [-Wunused-command-line-argument]
field_1 => \x01\x02\x03\x04
field_2 => \x00\x00\x00\x00

Thank you for the ticket.

Interestingly at least for this reproducer this seems to work for max lengths 1 and 2, and fail for any other max length.

module foo;

assert |*b"\x00\x00\x00\x00\x00\x00\x00".match(/\x00{0,1}/)| == 1; # WORKS.
assert |*b"\x00\x00\x00\x00\x00\x00\x00".match(/\x00{0,2}/)| == 2; # WORKS.
assert |*b"\x00\x00\x00\x00\x00\x00\x00".match(/\x00{0,3}/)| == 3; # FAILS.
assert |*b"\x00\x00\x00\x00\x00\x00\x00".match(/\x00{0,4}/)| == 4; # FAILS.
assert |*b"\x00\x00\x00\x00\x00\x00\x00".match(/\x00{0,5}/)| == 5; # FAILS.

Since this very likely is in https://github.com/rsmmr/justrx, is this something you could look into @rsmmr?