paurkedal/ppx_regexp

Not_found exception

konstantin-korovin opened this issue · 5 comments

Thank you for a nice library.

I have an issue with unexpected Not_found exception.

= works as expected:

let parse_line i = 
 match%pcre i with 

   | {|(?<v1>(abc))[[:space:]]*(?<v2>(xyz))|} -> 
     Printf.printf "string: v1: %s v2: %s\n" v1 v2;
   
   | {|(?<f>[-+]?[[:digit:]]+.[[:digit:]]*)|} -> 
     Printf.printf "float: %f \n" (float_of_string f);

   | _ -> failwith "parse_line"

let () =  parse_line "abc xyz"

test:

string: v1: abc v2: xyz

====
If I swap first two match cases:

let parse_line i = 
 match%pcre i with 

   | {|(?<f>[-+]?[[:digit:]]+.[[:digit:]]*)|} -> 
     Printf.printf "float: %f \n" (float_of_string f);

   | {|(?<v1>(abc))[[:space:]]*(?<v2>(xyz))|} -> 
     Printf.printf "string: v1: %s v2: %s\n" v1 v2;
   
   | _ -> failwith "parse_line"

let () =  parse_line "abc xyz"

test:

Fatal error: exception Not_found

===

PS As a side question: is it possible to define a regular expression (or a string representing reg. expression) as an OCaml variable and use it in {| |}, in order to avoid copying definitions.

Thanks,
Konstantin

Drup commented

@paurkedal It seems you have a bug in your handling of offsets! The exception comes from a misaligned Re.get. The bug doesn't happen in the tyre version.

@konstantin-korovin For your side question: ppx_tyre precisely solves that problem. You can look at the documentation. Also, you should use the block code syntax when you post code on github, like so : ```ocaml <the code> ```. I fixed your first message.

@Drup many thanks for your quick reply and fixing my message. I'll try tyre.

The bug is due to top-level group elimination implemented when extracting bindings while missing while extracting the regular expression. I integrated your test and fixed it. Thanks!

I considered your suggestion, but I decided against implementing (?&...) in the %pcre, at least for now. The main reason has to do with scoping. The current PPX assumes all regular expressions are global constants, which makes it easy to compile them at program initialization time. This assumption can be dropped, but it will involve more complex PPX code to detect the optimal initialization point with respect to scoping.

So, I can also recommend looking into the %tyre, which may be better suited for the more complex use cases, anyway. It leaves the initialization point up to the user, and thus has no issue with scope-dependency.

I'll prepare a bugfix release tomorrow.

This was fixed in v0.4.2.