rust-lang/regex

Expose whether a regex_automata error was a size overflow or another error

konstin opened this issue · 2 comments

I'm building an DFA from user provided expressions for a fast-path optimization, which I can skip when the DFA would be too large. Currently, there is no way to tell whether building the DFA failed because there was a syntax error (which I want to raise to the user), or because there was a size overflow (which is non-fatal). It would be great if regex_automata::dfa::dense::BuildError would allow inspecting whether it's a size error.

Motivating example:

let dfa_builder = dfa::dense::Builder::new()
    .configure(
        dfa::dense::Config::new()
            // DFA can grow exponentially, in which case we bail out
            .dfa_size_limit(Some(DFA_SIZE_LIMIT))
            .determinize_size_limit(Some(DFA_SIZE_LIMIT)),
    )
    .build_many(&regexes);
let dfa = match dfa_builder {
    Ok(dfa) => Some(dfa),
    Err(_) => {
        // TODO(konsti): `regex_automata::dfa::dense::BuildError` should allow asking whether
        // is a size error
        warn!(
            "Glob expressions regex is larger than {DFA_SIZE_LIMIT} bytes, \
            falling back to full directory traversal!"
        );
        None
    }
};

Yeah, I think adding a simple predicate like, is_exceeded_size_limit or something like that would be appropriate here. There are multiple different ways to blow the size limit. There are the configured size limits of course, but there are also built-in size limits due to states and patterns using u32 as their identifier type. (i.e., If you try to build a DFA with more than 2^32 - 1 states.) So even if all configured size limits are disabled, you can still get a size limit error.

The other two classes of errors are "NFA failed to build" and "regex feature unsupported." The latter, I believe, can never happen if Unicode mode is disabled. The former is only relevant if you're using the convenience APIs that build a DFA from a pattern string (which you are here). But even that can be avoided by using Builder::build_from_nfa.

So if you use Builder::build_from_nfa and disable Unicode mode, then the only possible error remaining in BuildError is a size limit related error. This means you can work-around this today, but I agree that adding a predicate here would make use cases like yours a little smoother.

@konstin Do you have ideas for what the predicate should be named?

is_size_limit_exceeded sounds good