rust-lang/regex

regex-lite with a &[u8] haystack

SimonSapin opened this issue · 2 comments

Describe your feature request

regex::bytes::Regex can search &[u8] haystacks that are not necessarily well-formed UTF-8. This is great for text-based file formats that predate Unicode and hand-wave encodings as a platform detail.

Could the regex-lite crate have a similar feature? As far as I can tell it doesn’t in 0.1.5

Yeah that is definitely the intent. The main internal APIs are specifically defined on &[u8]. I just didn't do this initially because I wasn't 100% certain folks would want it.

The main hitch here is that I think it needs to be a disabled by default opt-in feature. The reason is that it will add a fair bit of code (basically a copy of string.rs, but for bytes), and the primary purpose of regex-lite is to keep binary size small and compilation times short.

I'm not sure when I'll have a chance to work on this, but I could review a PR adding this.

Thanks for the feedback. For what it’s worth, for now I went with regex::bytes and disabled Unicode-related cargo features.