MaLeLabTs/RegexGenerator

How it works?

Closed this issue · 6 comments

Hi!
I need something similar to parse javascript stack strings in different environments and extract the type, message and stack frames (called function, location). Almost every browser has its own stack string format and I can test only a few environments. E.g.:

old Opera:

Statement on line 44: Type mismatch (usually a non-object value used where an object is required)
Backtrace:
  Line 44 of linked script file://localhost/G:/js/stacktrace.js
    this.undef();
  Line 31 of linked script file://localhost/G:/js/stacktrace.js
    ex = ex || this.createException();

V8 (Chrome, Node):

ReferenceError: x is not defined
    at repl:1:5
    at REPLServer.self.eval (repl.js:110:21)
    at repl.js:249:20

It would be nice to write a parser which is adaptive and learns the actual environment on the fly. Can you tell me more about what algorithm you use to generate the regex or how your lib works in general?

edit:
I just read in a different issue that there is no javascript port, because of a missing feature in the js regex lib. Does that mean it is not possible to port this lib at all? If so, is it possible to write something more specific with a different algorithm to solve my problem?

You are looking for a "javascript stacktrace parser".

Here is one: https://github.com/errwischt/stacktrace-parser

I'm sure there are more out there.

@Inf3rno the RegexGenerator algorithm is documented in our paper here.

About the lack of possessive quantifiers in the Js regex implementation (there are also small syntax differences) there are workarounds in order to mimic them, after quick google search I think I found the right one (not sure, lot of time has passed): http://instanceof.me/post/52245507631/regex-emulate-atomic-grouping-with-lookahead
Let's quote the example:

a++ becomes (?=(a+))\1.

I don't know how well is going to work our tool in your scenario, solution effectiveness surely depends on quality (and representativeness) of provided training instances.

About the java!=js regex one Js re-implementation--from scratch-- is always possible but is not in our plans.
Regards

Last thought: our tool generates a text extractor (regular expression) that extracts text entities. Perhaps this fits your scenario but.. usually using a regex to find things when a real parser is needed, is wrong. Usually.

@uiteoi Thanks, but I am writing a stack trace parser, not looking for it. I already know every major project in the topic. :-)

@ftarlao I checked your online example meanwhile. It is impressive, but I don't think it is the right tool for my problem. Still I have a strategy to build a regex pattern by probing the actual environment. For example eval("throw new Error;"); will result a stack like

Uncaught Error
    at eval (eval at <anonymous> (x.html:2), <anonymous>:1:7)
    at x.html:2

while eval(" throw new Error;"); will result a stack like:

Uncaught Error
    at eval (eval at <anonymous> (x.html:2), <anonymous>:1:13)
    at x.html:2

So from the difference the parser will know where it should look for the column. The same strategy can be applied by every changing part of the stack and after a while I will have enough information to convert the at eval (eval at <anonymous> (x.html:2), <anonymous>:1:13) frame string to the at func (path:line:col) template. I am not sure whether this approach can be generalized, but I guess I am not the online one who uses it for teaching programs. It is far from an intuitive approach a human would use, but I guess sooner or later we will have algorithms for that too.

I'm sorry that our tool is not the solution, hope you'll find one :-)