Trouble splitting string with Regexp
Closed this issue · 50 comments
Hi,
I'm having some issues splitting strings with Regexps in mruby-onig-regexp
The following:
"<%= 1 + 1 %>".split(/(<%=)|(%>)/)doesn't split and returns ["<%= 1 + 1 %>"]. The same expression splits fine in MRI. I checked that mruby-onig-regexp substitutes String#split with a Regexp-aware implementation, so either I'm doing something seriously wrong or maybe there's a bug in mruby-onig-regexp
thanks,
Ricardo
Also, this returns nil, which is a bit strange:
/(%)/.match("<%= 1 + 1 %>")Another thing that may be relevant, I'm getting a segfault with the following code (in mirb) adapted from the docs:
Regexp.compile('abc').match('abcdef')Are you using mruby-require? Then do require 'mruby-onig-regexp'. If not, you will get same result as mine.
Hi,
Thanks for your answer.
Requiring mruby-onig-regexp does improve things, but not completely. For example, this no longer returns nil and matches as supposed:
Regexp.compile('abc').match('abcdef')but
"<%= 1 + 1 %>".split(/(<%=)|(%>)/)still doesn't split properly (returns an array with a single item instead of 4 items). The same expression works well in MRI.
thanks,
Ricardo
Ah, I understand your said just now. It seems broken. I'll fix in later.
Try to use mruby-pcre-regexp.
I tried that, but mruby-pcre-regexp isn't compiling at this time (against mruby/master). I get the following error:
build/mrbgems/mruby-pcre-regexp//src/mruby_pcre_regexp.c:168: error: ‘PCRE_CASELESS’ undeclared (first use in this function)PCRE_CASELESS is defined as 0x00000001 in my pcre.h
Below is a part of version information.
#define PCRE_MAJOR 8
#define PCRE_MINOR 32
#define PCRE_PRERELEASE
#define PCRE_DATE 2012-11-30
didn't have pcre properly installed, it's working now. it doesn't really give the same result as MRI, but does split the string (returns ["", " 1 + 1 ", ""]). I'll recheck later tonight and get back to you.
Hi,
PCRE does work better, but since the regular expressions I'm using were written for oniguruma, it doesn't split exactly how I need it to. since I'm porting existing software, it wouldn't be ideal to change the regexps.
the ideal for me would be to use oniguruma, if the bug above is fixed, otherwise I'll change the regexps to use PCRE (or I'll try fixing the oniguruma bug).
thanks,
Ricardo
Could you please write test case that you expected?
ok, I'll do that. I tried running mrbtest, but it segfaults. is there any way to fix it, or should I install something like https://github.com/iij/mruby-mtest ?
thanks,
Ricardo
Ah, I don't mind if you write simple script that reproduce it.
Hi,
This very basic script demonstrates the problem. If you run it in MRI it will work fine, in mruby with onig-regexp it will fail.
def test_split
raise 'Split by regexp returned wrong results' unless '<%= 1 + 1 %>'.split(/(<%=)|(%>)/) == ['', '<%=', ' 1 + 1 ', '%>']
'Split by regexp correct'
end
test_splitAfter I get the test suite running I'll add it as a proper test.
thanks,
Ricardo
Currently, you have two cases issue, isn't it? :)
Sorry mattn, couldn't understand your question :(
Sorry, I was thinking you has more problems, so I wonder it's enough
cases to check.
On 4/4/13, Ricardo Trindade notifications@github.com wrote:
Sorry mattn, couldn't understand your question :(
Reply to this email directly or view it on GitHub:
#4 (comment)
- Yasuhiro Matsumoto
For now this is the only problem I'm having. If I find other problems with onig-regexp, I'll create test scripts for them.
thanks,
Ricardo
I fixed it. Please try in latest.
It does work, but only if mruby-require is being used. If mruby-require is not being included in the build, it still fails as before. This is not a problem for me as I can include mruby-require, but it doesn't appear to be 100% ok yet.
thanks,
Ricardo
Did you do require ?
mruby define String#split even if Regexp is disabled.
When not using mruby-require, I didn't require since require isn't defined. But in that case Regexp == OnigRegexp, so I'm assuming that onig-regexp gets included, no ?
- include
mruby-onig-regexp', not includemruby-require`
should work correctly. - include
mruby-onig-regexp', includemruby-requireyou need dorequire 'mruby-onig-regexp'`
This case doesn't work correctly for me:
- include mruby-onig-regexp', not includemruby-require`
should work correctly (in this case the array returned from splitting the string above has just one item, instead of 4)
In this case, mruby defines default String#split that Regexp disabled. It works but it treat pattern as string. So return non-splited strings.
So mruby-onig-regexp doesn't fully work unless it's used with mruby-require ?
I get the same results as in your screenshot when I include mruby-require in the build. That case works well for me. My problem is when I don't include mruby-require in the build.
You mean follow?
MRuby::Build.new do |conf|
# load specific toolchain settings
toolchain :gcc
conf.gem "#{root}/mrbgems/mruby-sprintf"
conf.gem "#{root}/mrbgems/mruby-print"
conf.gem "#{root}/mrbgems/mruby-math"
conf.gem "#{root}/mrbgems/mruby-time"
conf.gem "#{root}/mrbgems/mruby-struct"
conf.gem "#{root}/mrbgems/mruby-enum-ext"
conf.gem "#{root}/mrbgems/mruby-string-ext"
conf.gem "#{root}/mrbgems/mruby-numeric-ext"
conf.gem "#{root}/mrbgems/mruby-array-ext"
conf.gem "#{root}/mrbgems/mruby-hash-ext"
conf.gem "#{root}/mrbgems/mruby-random"
conf.gem "#{root}/mrbgems/mruby-eval"
conf.gem :git => 'git@github.com:mattn/mruby-onig-regexp.git'
#conf.gem :git => 'git@mattn/mruby-require.git'
endI built mruby with build_config.rb above. And working good for me.
This is what happens to me - http://cl.ly/image/3g3C130e2g3B
Probably there is a difference in the mrbgems we're including, I'll compare both lists
Did you update mruby-onig-regexp?
I'm cleaning mruby/build between tests, so I'm getting a new checkout of mruby-onig-regexp each time. I'm also using latest mruby (just updated now).
I'm still getting the same result - http://cl.ly/image/1k14130s2U0y
What toolchain do you use? mingw? msys? visual studio?
Could you please compile & run follow?
#include <stdio.h>
int
main(int argc, char* argv[]) {
puts(onig_version());
return 0;
}I get 5.9.3
I'm using gcc (on macOS X).
I'm also with oniguruma 5.9.3
Try following
onig.c
#include <stdio.h>
#include <memory.h>
#include <onigposix.h>
int
main(int argc, char* argv[]) {
size_t nmatch = 999;
regmatch_t match[nmatch];
int regerr;
int i;
char err[256];
char *ptr = "<%= 1 + 1 %>";
regex_t reg;
regerr = regcomp(®, "(<%=)|(%>)", REG_EXTENDED | REG_NEWLINE);
if (regerr) {
regerror(regerr, ®, err, sizeof(err));
regfree(®);
puts(err);
exit(1);
}
for (i = 0; i < nmatch; i++)
match[i].rm_so = -1;
regerr = regexec(®, "<%= 1 + 1 %>", nmatch, match, 0);
if (regerr == REG_NOMATCH) {
puts("no match");
exit(1);
}
if (regerr) {
regerror(regerr, ®, err, sizeof(err));
puts(err);
exit(1);
}
for (i = 0; i < nmatch; i++) {
if (match[i].rm_so != -1) {
fwrite(ptr + match[i].rm_so, match[i].rm_eo - match[i].rm_so, 1, stdout);
putchar('\n');
}
}
regfree(®);
return 0;
}$ gcc -o onig onig.c -lonig
$ ./onig
I got
<%=
<%=
Same here - http://cl.ly/image/2a2y2O2E0t2u
Hmm, it seems working for me. I can't understand.
> "a1b".split /\d/
=> ["a", "b"]
Is this working?
No, also doesn't work - http://cl.ly/image/3c3t2e0W2C3Z
At this point the only thing I can think of is the platform difference - mac vs windows. Do you have anyone near you with a Mac that can test ? I do have windows machines, but they don't have cygwin etc.. so it wouldn't be so easy (I can do it on the weekend if no other option exists)
But you get same result in onig.c, so I don't think platform differences.
Yes, you are right. I'll try again later tonight with a fresh checkout of mruby, to rule out any other issue.
Thanks for your work
On 4/5/13, Ricardo Trindade notifications@github.com wrote:
Yes, you are right. I'll try again later tonight with a fresh checkout of
mruby, to rule out any other issue.
Reply to this email directly or view it on GitHub:
#4 (comment)
- Yasuhiro Matsumoto
Just tried again with a new git checkout, and getting the same behaviour.
- With mruby-require, splitting works fine
- When not using mruby-require, an array with a single element is returned, instead of the expected result
Could you please try to add printf("something\n"); code to make sure
passing C's functions?
On 4/12/13, Ricardo Trindade notifications@github.com wrote:
Just tried again with a new git checkout, and getting the same behaviour.
- With mruby-require, splitting works fine
- When not using mruby-require, an array with a single element is returned,
instead of the expected result
Reply to this email directly or view it on GitHub:
#4 (comment)
- Yasuhiro Matsumoto
I did that and found a difference. With mruby-require, the onig_regexp_match C function is called twice. Without mruby-require, it's only called once.
http://cl.ly/image/0o3k3X3k3S3D - The difference between both mirb sessions show shows this (there was a build in between)
Do you get follow?
> /\d/.match('a3d')[0]
=> "3"
Cases that match method return nil,
Invalid argument for match (ex match("a3d", -1))
https://github.com/mattn/mruby-onig-regexp/blob/master/src/mruby_onig_regexp.c#L93
Or no matches
https://github.com/mattn/mruby-onig-regexp/blob/master/src/mruby_onig_regexp.c#L106
First cast should be checked with
> /\d/.match("a3d", -1)
=> nil
> /\d/.match("a3d", 0)
=> #<OnigMatchData:0xe0e6f0>
And second case should be checked with
> //
=> #<OnigRegexp:0xe0ea98 @source="", @regexp=#<Object:0xe0ea80>>
@regexp shouldn't be nil


