App crashes with "A heap has been corrupted"
danipen opened this issue · 11 comments
Hi dear Onuguruma creator,
What I'm writing here is maybe a little bit out-of-topic, so sorry for that. I'd just like to kindly ask for some tip/help/advise ...
I'm creating a wrapper around your fantastic library to build TextMateSharp, a grammar interpreter for C#.
I'm asking you because you may help me to find or debug the issue. The problem is that we only reproduce it really very sporadically and in production environments, and that is killing us since we don't have a controlled environment to reproduce and fix.
The exception we get
What we're seeing only sometimes, is the following crash:
Unhandled exception at 0x00007FFE1ABCBE99 (ntdll.dll) in winplasticx.exe.31276.dmp: 0xC0000374: A heap has been corrupted (parameters: 0x00007FFE1AC36780).
This is the mixed stack trace:
ntdll.dll!RtlReportFatalFailure()
ntdll.dll!RtlReportCriticalFailure()
ntdll.dll!RtlpHeapHandleError()
ntdll.dll!RtlpHpHeapHandleError()
ntdll.dll!RtlpLogHeapFailure()
ntdll.dll!RtlpFreeHeapInternal()
ntdll.dll!RtlFreeHeap()
ntdll.dll!RtlpReAllocateHeap()
ntdll.dll!RtlpReAllocateHeapInternal()
ntdll.dll!RtlReAllocateHeap()
onigwrap.dll!00007ffde0ebd03f()
onigwrap.dll!00007ffde0eadf62()
onigwrap.dll!00007ffde0ea4919()
onigwrap.dll!00007ffde0eb35d1()
onigwrap.dll!00007ffde0e9744c()
onigwrap.dll!00007ffde0e971da()
onigwrap.dll!00007ffde0e910af()
[Managed to Native Transition]
TextMateSharp!TextMateSharp.Internal.Oniguruma.ORegex.ORegex(string pattern, bool ignoreCase, bool multiline)
Some useful information
We reproduce the issue in both Windows 10 x64 and macOS platforms. Not sure if it's reproduced in Linux, but probably.
The onig-wrapper is compiled including the static version of the oniguruma library.
We have a minidump file for the issue.
Can you take a look at the wrapper to see if you see everything correct? I think we're getting that "heap has been corrupted when doing onig_new
: https://github.com/danipen/TextMateSharp/blob/49631c692556169a833cfccfc46c96576655d4f6/onigwrap/src/onigwrap.c#L20
And this is how we use the wrapper from C# code:
https://github.com/danipen/TextMateSharp/blob/5315ab80d6f1233ec28057d2788a5d32b7408435/src/TextMateSharp/Internal/Oniguruma/ORegex.cs
Thanks in advance!
No problems were found in onigwrap.c.
I think it's useless to generate region every time in onigwrap_search(), but it has nothing to do with the current problem.
This library has been tested by OSS-Fuzz for one year and nine months, so I don't think the problem will happen so easily.
I think you need to narrow down the conditions under which the problem occurs from various perspectives, such as whether the problem occurs in single-threaded mode or only in multi-threaded mode.
Thank you very much for pointing it out, I'll try to rework the region part in onigwrap_search()
.
Yes! the library works perfectly. We have been extensively using it for a long time and 99.9% of the time we run without issues, but under some machine load or other circumstances, it crashes the program.
I noticed that the crash is much easier to reproduce when I launch many instances of my program at the same time (for example 6 instances). When I do that I reproduce the issue 100% of the time.
I compiled oniguruma and onigwrap libs in debug mode, and it unveiled that the last call is executed before the heap reallocation is onig_set_callout_of_name()
. Then, the heap reallocation happens, the heap corruption is detected and the program crashes.
This is the log from the application verifier:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<avrf:logfile xmlns:avrf="Application Verifier">
<avrf:logSession TimeStarted="2022-02-18 : 09:08:39" PID="26512" Version="2">
<avrf:logEntry Time="2022-02-18 : 09:08:56" LayerName="Heaps" StopCode="0x10" Severity="Error">
<avrf:message>Corrupted start stamp for heap block.</avrf:message>
<avrf:parameter1>1fc5cce1000 - Heap handle used in the call.</avrf:parameter1>
<avrf:parameter2>1fc2f968b50 - Heap block involved in the operation.</avrf:parameter2>
<avrf:parameter3>4b0 - Size of the heap block.</avrf:parameter3>
<avrf:parameter4>abcdbbbb - Corrupted stamp value.</avrf:parameter4>
<avrf:stackTrace>
<avrf:trace>vrfcore!VerifierDisableVerifier+7a7 ( @ 0)</avrf:trace>
<avrf:trace>verifier!VerifierStopMessage+b9 ( @ 0)</avrf:trace>
<avrf:trace>verifier!VerifierDisableFaultInjectionExclusionRange+3730 ( @ 0)</avrf:trace>
<avrf:trace>verifier!VerifierDisableFaultInjectionExclusionRange+3a9a ( @ 0)</avrf:trace>
<avrf:trace>verifier!VerifierCheckPageHeapAllocation+77 ( @ 0)</avrf:trace>
<avrf:trace>vfbasics!+7ffa89d6397e ( @ 0)</avrf:trace>
<avrf:trace>onigwrap!_realloc_base+73 (minkernel\crts\ucrt\src\appcrt\heap\realloc_base.cpp @ 46)</avrf:trace>
<avrf:trace>onigwrap!onig_set_callout_of_name+2a2 ( @ 0)</avrf:trace>
<avrf:trace>onigwrap!sprintf_s+859 ( @ 0)</avrf:trace>
<avrf:trace>onigwrap!onig_initialize_encoding+b1 ( @ 0)</avrf:trace>
<avrf:trace>onigwrap!onig_reg_init+5e ( @ 0)</avrf:trace>
<avrf:trace>onigwrap!onig_new+5a ( @ 0)</avrf:trace>
<avrf:trace>onigwrap!onigwrap_create+af ( @ 0)</avrf:trace>
<avrf:trace>????????+00007FF9A824BE85</avrf:trace>
</avrf:stackTrace>
</avrf:logEntry>
</avrf:logSession>
</avrf:logfile>
More findings: It seems to be a multi-threading issue. I'm able to avoid the issue if I statically lock every access to onig_new
so only one thread at a time calls to it.
For the moment I will workaround the issue in my code it by synchronizing the threads when creating new oniguruma regexs:
https://github.com/danipen/TextMateSharp/blob/bd074b790ec58beda39a96a8b04908a0b861a578/src/TextMateSharp/Internal/Oniguruma/ORegex.cs#L33
But please, let me if you want to take a deeper dive into the issue, and/or if you need my help from my side.
It seems to be a multi-threading issue.
The problem seems to be that onig_set_callout_of_name
is growing GlobalCalloutNameList
without taking any locks. Two threads trying to grow the list at the same time will lead to this heap corruption.
It is true that onig_set_callout_of_name() does not support multi-threading. However, it is intended to affect all threads, so if you are in a multi-threaded environment, it is assumed that you will call it only as needed from one thread first and not use it after that, so there is no exclusive control.
And onig_set_callout_of_name() is not used in onigwrap.c, so it cannot be called directly.
However, I remembered that there is a place where it is called indirectly.
If the onig_initialize() function has never been called by the application, we have no choice but to call onig_initialize() in onig_new(). (I would like to stop this if possible).
In onig_initialize(), for a character encoding specified by an argument, the initialization function for that encoding is called, and onig_set_callout_of_name() may be executed from within it.
I'm not sure if this is the cause yet, but if it is, adding the following initialization function to onigwrap.c and calling it only once from the application first (not from multiple threads at the same time) may help.
extern int
onigwrap_initialize()
{
OnigEncoding use_encs[1];
use_encs[0] = ONIG_ENCODING_UTF16_LE;
return onig_initialize(use_encs, sizeof(use_encs)/sizeof(use_encs[0]));
}
Even if this is not the cause of the problem, it should be done that way. (We also call onig_initialize() in sample/simple.c etc..)
@kkos thank you for the detailed information. I'll call onig_initialize() one at app level as you suggest.
Additionally, I would like to improve this ...
I think it's useless to generate region every time in onigwrap_search(), but it has nothing to do with the current problem.
Unfortunately I didn't catch you. Could you please clarify what you exactly mean and elaborate a little bit more?
You don't have to create a region for each search, you can use one region for all searches.
In a multi-threaded environment, one region for each thread that performs the search is sufficient.
I don't know about C#, but there are probably thread-local variables/storage.
But it just means that you can reduce the number of calls to heap allocation by at most two in each search call, so if you don't mind that, you can leave it as it is.
#include "onigwrap.h"
extern int
onigwrap_initialize()
{
OnigEncoding use_encs[1];
use_encs[0] = ONIG_ENCODING_UTF16_LE;
return onig_initialize(use_encs, sizeof(use_encs)/sizeof(use_encs[0]));
}
regex_t *onigwrap_create(char *pattern, int len, int ignoreCase, int multiline)
{
regex_t *reg;
OnigErrorInfo einfo;
OnigOptionType onigOptions = ONIG_OPTION_NONE | ONIG_OPTION_CAPTURE_GROUP;
if (ignoreCase == 1)
onigOptions |= ONIG_OPTION_IGNORECASE;
if (multiline == 1)
onigOptions |= ONIG_OPTION_MULTILINE;
OnigUChar *stringStart = (OnigUChar*) pattern;
OnigUChar *stringEnd = (OnigUChar*) pattern + len;
int res = onig_new(
®,
stringStart,
stringEnd,
onigOptions,
ONIG_ENCODING_UTF16_LE,
ONIG_SYNTAX_DEFAULT,
&einfo);
return reg;
}
void onigwrap_region_free(OnigRegion *region)
{
onig_region_free(region, 1);
}
void onigwrap_free(regex_t *reg)
{
onig_free(reg);
}
int onigwrap_index_in(regex_t *reg, char *charPtr, int offset, int length, OnigRegion *region)
{
OnigUChar *stringStart = (OnigUChar*) charPtr;
OnigUChar *stringEnd = (OnigUChar*) (charPtr + length);
OnigUChar *stringOffset = (OnigUChar*) (charPtr + offset);
OnigUChar *stringRange = (OnigUChar*) stringEnd;
int result = onig_search(
reg,
stringStart,
stringEnd,
stringOffset,
stringRange,
region,
ONIG_OPTION_NONE);
if (result >= 0)
return result >> 1;
if (result == ONIG_MISMATCH)
return -1;
return -2;
}
int onigwrap_search(regex_t *reg, char *charPtr, int offset, int length, OnigRegion *region)
{
OnigUChar *stringStart = (OnigUChar*) charPtr;
OnigUChar *stringEnd = (OnigUChar*) (charPtr + length);
OnigUChar *stringOffset = (OnigUChar*) (charPtr + offset);
OnigUChar *stringRange = (OnigUChar*) stringEnd;
int result = onig_search(reg, stringStart, stringEnd, stringOffset, stringRange, region, ONIG_OPTION_NONE);
return result;
}
OnigRegion* onigwrap_region_new()
{
return onig_region_new();
}
int onigwrap_num_regs(OnigRegion *region)
{
return region->num_regs;
}
int onigwrap_pos(OnigRegion *region, int nth)
{
if (nth < region->num_regs)
{
int result = region->beg[nth];
if (result < 0)
return result;
return result >> 1;
}
return -3;
}
int onigwrap_len(OnigRegion *region, int nth)
{
if (nth < region->num_regs)
{
int result = region->end[nth] - region->beg[nth];
return result >> 1;
}
return -4;
}
@kkos are any other library parts that are not thread-safe?
In other words... If I call onigwrap_initialize()
at program level, do you consider safe then to remove this lock?
I think so.
Except for some initialization/setting functions and functions that manipulate regset elements, I think I've made it thread-safe.