[BUG] `string.well_known_regex` cannot handle binary HTTP header values
Opened this issue · 2 comments
Description
well_known_regex
is on a string
type. string
must be a valid UTF-8 encoded Unicode code point sequence. But HTTP header values don't have to be. I.e. valid HTTP header values cannot be represented as a proto string
. Yet the validation rule is on the string
type.
Steps to Reproduce
message HeaderValues {
string val = 1 [(buf.validate.field).string.well_known_regex = KNOWN_REGEX_HTTP_HEADER_VALUE];
}
func TestAbc(t *testing.T) {
val := []byte("\xff")
valid := utf8.Valid(val)
assert.True(t, valid, "Invalid UTF-8")
msg := &HeaderValues{
Val: "\xff",
}
_, err := proto.Marshal(msg)
assert.NoError(t, err, "proto.Marshal() failed")
v, err := protovalidate.New()
require.NoError(t, err)
err = v.Validate(msg)
assert.NoError(t, err)
}
Prints:
=== RUN TestAbc
prototool_test.go:16:
Error Trace: file_test.go:16
Error: Should be true
Test: TestAbc
Messages: Invalid UTF-8
prototool_test.go:27:
Error Trace: file_test.go:27
Error: Received unexpected error:
string field contains invalid UTF-8
Test: TestAbc
Messages: proto.Marshal() failed
--- FAIL: TestAbc (0.02s)
FAIL
Process finished with the exit code 1
Expected Behavior
Validation rule should be on bytes
. At least there too, not just on string
.
Actual Behavior
Validation rule on string
that is problematic.
Possible Solution
Additional Context
HTTP header value spec: https://datatracker.ietf.org/doc/html/rfc9110#name-field-values
See original provenance of these regex here: bufbuild/protoc-gen-validate#297
I'd actually rather see these removed from protovalidate, and users can provide predefined constraints that match the semantics they want (according to the thread above, the RFC recommends only using ASCII).
As I explained in the description, I think the regexes the way they are right now cannot be used. So I agree something needs to change. To be clear, this is not a problem for me, I just wanted to report the issue I noticed.