utopia-group/regel

Issues running stack overflow benchmarks

Closed this issue · 6 comments

I'm having a couple issues trying to run the stack overflow benchmarks.

  1. Timeouts. Most of the benchmarks seem to time out before synthesizing a consistent regex. I've tried making the timeout larger (up to a whole minute) but the issue persists. Is this expected behavior? If so, what timeout is appropriate to ensure the benchmarks will complete? Or is it possible to run regel without timeouts?

  2. Java errors. In running exp.py there seem to be some benchmarks that fail to complete due to java errors. For example, benchmark 35 throws a "StringIndexOutOfBoundsException" and benchmark 70 throws a "NullPointerException". Any idea what causes this? I've attached my full terminal output below for reference.
    terminal_output_2.txt

  3. Running interactively. When running interactively it often appears that the top k synthesized regexes are all the same or include redundancies. This makes it difficult to input guiding examples. It seems equivalent synthesized regexes are not being eliminated?

Hello,

  1. Are you running in Sketch Completion mode? Noted that for each benchmark we run the top-25 sketches and not all sketches are guaranteed to be able to synthesize a consistent regex. This means that there almost always exist some sketches in a benchmark that are going to timeout (in any "realistic" timeout). So, it is necessary to run Regel with timeouts. Among all the StackOverflow benchmarks, by setting the timeout to be 600 secs per benchmark, there are 11 benchmarks that all sketches times out. For some of the benchmarks, it requires more than a minute to find the first consistent regex but overall it should be finished within a couple of seconds.

  2. I think I have just finished this issue. Let me know if there are any other benchmarks I didn't fix.

  3. Yes, the equivalent synthesized regexes are not being eliminated. It's a nice feature to add and I will add in the future.

Thanks for the update. I reran the benchmarks in sketch completion mode and am still getting an error for benchmarks 35, 60, 16, 28, 17, 26, 110, 91, 30, 106, 55, 70. The update fixed benchmarks 93, 7, 98, 63, 113 (which were all throwing an "example type incorrect" exception). I've attached my terminal output for reference.
terminal_output_rerun.txt

Thanks for providing the outputs! I think all these benchmark reading errors are fixed now.

After the update I'm still getting an error for 28, 60, and 110. It looks like the ground truth for 28 has an extra "concat" and changing it to concat(<A>,concat(<B>,concat(or(<1>,or(<2>,or(<3>,or(<4>,or(<5>,<6>))))),concat(<_>,concat(<num>,concat(<_>,concat(optional(<num1-9>),concat(<.>,concat(<e>,concat(<x>,<e>)))))))))) seems to work. Benchmark 60 doesn't have a ground truth (right now it's just one of the example strings). I'm not sure what's wrong with 110. There seems to be a problem parsing the regex startwith(<{>).

I fixed the benchmark 28, 60. Benchmark 110 is caused by the current regex grammar not able to recognize the terminal "<{>" so I updated the grammar to support this symbol. Thanks!

Thanks. It looks like this fixes all the issues with the so benchmarks!