Conflicting symbols from oniguruma and libc cause segfault / bad memory writes
Ancurio opened this issue · 3 comments
Hi.
I'm not very familiar with oniguruma, so please excuse any possible misunderstandings.
Your gem always causes segfaults/aborts for me (even the example) in mruby's gc heap cleanup. Valgrind reports multiple invalid writes in "regcomp" and "regexec". From what I have seen, oniguruma defines the type regex_t and the above functions, but those same types/symbols are also defined in the posix header "regex.h", and when I trace my program, I see that indeed at eg. "regcomp" the CPU does not jump into libonig but into libc.
Now, I'm sure that the regex_t type defined in the posix header is bigger than the one in "onigposix.h", so what I think happens is that the gem allocates a too small regex_t and regcomp/regexec defined in libc access and write to unallocated memory.
This is how regex_t looks in onigposix.h:
typedef struct {
void* onig; /* Oniguruma regex_t* */
size_t re_nsub;
int comp_options;
} regex_t;
And this is how it looks in the posix header regex.h:
struct re_pattern_buffer
{
unsigned char *__REPB_PREFIX(buffer);
unsigned long int __REPB_PREFIX(allocated);
unsigned long int __REPB_PREFIX(used);
reg_syntax_t __REPB_PREFIX(syntax);
char *__REPB_PREFIX(fastmap);
size_t re_nsub;
unsigned __REPB_PREFIX(can_be_null) : 1;
unsigned __REPB_PREFIX(regs_allocated) : 2;
unsigned __REPB_PREFIX(fastmap_accurate) : 1;
unsigned __REPB_PREFIX(no_sub) : 1;
unsigned __REPB_PREFIX(not_bol) : 1;
unsigned __REPB_PREFIX(not_eol) : 1;
unsigned __REPB_PREFIX(newline_anchor) : 1;
};
typedef struct re_pattern_buffer regex_t;
I am using oniguruma 5.9.4
Edit: I just tried it out. In onig_regexp_init:
reg = malloc(sizeof(struct mrb_onig_regexp));
this allocates 32 bytes.
But if I test sizeof(regex_t) with "regex.h" posix header, I get 64 bytes size.
(I am on a 64 bit Linux system btw.)
Hmm, what can we do...
Looking at both struct definitions, it looks like what onigposix.h is trying to do is to (very unsafe-ly) wrap around the posix regex implementation. The "real" oniguruma regex implementation seems to be defined in "oniguruma.h". I don't really understand much about regular expressions or oniguruma, but is there a reason that you used the onigposix.h API and not the native oniguruma.h API? Did you use MRI as sample code? Maybe they have an answer.
I solved it (I think)! All I had to do is add -lonig to the linker flags of my application (that statically links mruby), and indeed in your binding, the CPU jumps into libonig instead of libc and there are no more crashes.