bufbuild/protovalidate

[BUG] `string.well_known_regex` cannot handle binary HTTP header values

Opened this issue · 2 comments

ash2k commented

Description

well_known_regex is on a string type. string must be a valid UTF-8 encoded Unicode code point sequence. But HTTP header values don't have to be. I.e. valid HTTP header values cannot be represented as a proto string. Yet the validation rule is on the string type.

Steps to Reproduce

message HeaderValues {
  string val = 1 [(buf.validate.field).string.well_known_regex = KNOWN_REGEX_HTTP_HEADER_VALUE];
}
func TestAbc(t *testing.T) {
	val := []byte("\xff")
	valid := utf8.Valid(val)
	assert.True(t, valid, "Invalid UTF-8")

	msg := &HeaderValues{
		Val: "\xff",
	}
	_, err := proto.Marshal(msg)
	assert.NoError(t, err, "proto.Marshal() failed")

	v, err := protovalidate.New()
	require.NoError(t, err)
	err = v.Validate(msg)
	assert.NoError(t, err)
}

Prints:

=== RUN   TestAbc
    prototool_test.go:16: 
        	Error Trace:	file_test.go:16
        	Error:      	Should be true
        	Test:       	TestAbc
        	Messages:   	Invalid UTF-8
    prototool_test.go:27: 
        	Error Trace:	file_test.go:27
        	Error:      	Received unexpected error:
        	            	string field contains invalid UTF-8
        	Test:       	TestAbc
        	Messages:   	proto.Marshal() failed
--- FAIL: TestAbc (0.02s)

FAIL

Process finished with the exit code 1

Expected Behavior

Validation rule should be on bytes. At least there too, not just on string.

Actual Behavior

Validation rule on string that is problematic.

Possible Solution

Additional Context

HTTP header value spec: https://datatracker.ietf.org/doc/html/rfc9110#name-field-values

See original provenance of these regex here: bufbuild/protoc-gen-validate#297

I'd actually rather see these removed from protovalidate, and users can provide predefined constraints that match the semantics they want (according to the thread above, the RFC recommends only using ASCII).

ash2k commented

As I explained in the description, I think the regexes the way they are right now cannot be used. So I agree something needs to change. To be clear, this is not a problem for me, I just wanted to report the issue I noticed.