facebookincubator/fastmod

Feature request: supporting lookarounds (PCRE2 or fancy-regex)

LeoniePhiline opened this issue · 2 comments

Feature request

First of all, thank you very much for opensourcing fastmod! I find it quite joyful to use!

At the moment, it does not have support for positive and negative lookaheads and lookbehinds.

These can be very much required with complex searches.

Security

Note that fastmod is usually run over trusted code, not over untrusted user input, thus an opt-in --pcre2 --fancy flag should not pose any unacceptable security risk.

Prior art

Note that ripgrep (which does search, but not replace) has optional support for switching its regex engine to use PCRE2.

Among other things, this makes it possible to use look-around and backreferences in your patterns, which are not supported in ripgrep's default regex engine. PCRE2 support can be enabled with -P/--pcre2 (use PCRE2 always) or --auto-hybrid-regex (use PCRE2 only if needed). An alternative syntax is provided via the --engine (default|pcre2|auto-hybrid) option.

Pivot from rust-pcre2 to fancy-regex

See #49 (comment). grep::pcre2 is unlikely to expose pcre2_substitute any time soon, due to obvious maintenance overload.

The fancy-regex crate is the go-to escape from rust's regex crate where-ever lookarounds are uncircumventable.

Implementation

https://docs.rs/pcre2/ should be usable as a base for this feature.

This crate is recommended by and instead of https://github.com/BurntSushi/ripgrep/blob/master/crates/pcre2/README.md

However, since fastmod already uses grep (rg as a library), it might just as well enable the pcre2 cargo dependency flag and use grep::pcre2 just as ripgrep does: https://github.com/BurntSushi/ripgrep/blob/327d74f1616e135a6eb09a0c3016f8f45cfc0cfc/crates/core/search.rs#L199

Enum-dispatched regex matcher and replacer based on regex and fancy-regex.

Update:

The fastmod crate is updated in #50 (can be merged!).

Looking at BurntSushi/rust-pcre2#26 and BurntSushi/rust-pcre2#27 for the implementation of PCRE2 support, this looks like a dead end.

Andrew appears to not have the bandwidth for maintenance, as even quality PRs from years ago are unreviewed.

I will drop the attempts to implement PCRE2 support and pivot to fancy-regex, which does not require unsafe code (implemented in pure Rust).

It supports lookarounds, with the same risk for catastrophic backtracking, which is not relevant to fastmod.

Another update: Andrew Gallant expressed openness towards accepting a PR to the pcre2 crate to expose substitution.

BurntSushi/ripgrep#2763 (reply in thread)

I am interested in trying to create a quality patch and get it merged. If that succeeds, then fastmod could get its PCRE2 mode after all.


The stale PRs in https://github.com/BurntSushi/rust-pcre2 do not give me much hope for a fast solution, but there are lots of reasons for hope of resolving this entanglement:

Fish shell appears to have more bandwidth (or more need) to maintain a PCRE2 substitution patch in their fork, which is already implemented.

Their fork remains maintained, since UTF-32 matching was not upstreamed after all.

This means, we have a quite fragmented situation:

  • BurntSushi's grep-pcre2 (which is exposed as grep::pcre2) uses his pcre2-sys C bindings crate, which does not have support for substition and probably never will.
  • Fish shell's maintained pcre2-sys fork does support substitution. But in order to use it, grep-pcre2 would need to be forked.

All this sounds quite doable for a private project, but getting the changes merged back into https://github.com/facebookincubator/ sounds ... somewhat unlikely?

➡️ Ideally, the substitution feature (which BurntSushi has indicated might well be accepted into his upstream pcre2 & pcre2-sys crates) is to be backported from Fish shell's fork into BurntSushi's upstream.

A possible route to success: