lextudio/sharpsnmplib

Crash due to access violation (exception code 0xc0000005) in CLR.DLL for repeated Messenger.GetAsync calls

levant opened this issue · 7 comments

The following simple program will crash after about 200 iterations of the Messenger.GetAsync call:

// Test program to demonstrate memory corruption and memory access violation in CLR.dll using Sharpsnmplib
// Author: Levant Tinaz, Profound Medical Inc. ltinaz@profoundmedical.com
namespace TDC7561SnmpCrashTest
{
    using System;
    using System.Collections.Generic;
    using System.Threading.Tasks;
    using Lextm.SharpSnmpLib;
    using Lextm.SharpSnmpLib.Messaging;
    using System.Net;
    using System.Threading;

    class Program
    {
        private const string PortMibString = "1.3.6.1.2.1.1.3.0"; // iso.org.dod.internet.mgmt.mib-2.system.sysUpTime.sysUpTimeInstance
        private static IPAddress AgentIpAddress = new IPAddress(new byte[] { 172, 16, 0, 2 });

        private static void Task1()
        {
            while (true)
            {
                Thread.Sleep(1000);
                Console.WriteLine("task 1 running");
            }
            // ReSharper disable once FunctionNeverReturns
        }

        private static async Task<bool> SendSnmpMessageGetTime()
        {
            try
            {
                Variable switchPortMib = new Variable(new ObjectIdentifier(PortMibString));
                List<Variable> vList = new List<Variable> { switchPortMib };
                Task<IList<Variable>> udpTask = Messenger.GetAsync(VersionCode.V1, new IPEndPoint(AgentIpAddress, 161), new OctetString("public"), vList);
                Task timeoutTask = Task.Delay(5000);

                await Task.WhenAny(udpTask, timeoutTask);

                if (timeoutTask.Status != TaskStatus.RanToCompletion && udpTask.Status == TaskStatus.RanToCompletion)
                {
                    IList<Variable> vListRet = udpTask.Result;
                    if (vListRet[0].Data != null)
                    {
                        Console.WriteLine("Returned data {0}", vListRet[0].Data.ToString());
                        return true;
                    }
                }
            }
            catch (Exception e)
            {
                Console.WriteLine("Couldn't disable/enable the switch {0}", e);
            }

            return false;
        }

        static async Task Main()
        {
            Console.WriteLine("Testing SNMP requests to cause CLR memory access violation (0xc00000005) and crash");
            _ = Task.Run(Task1);
            for (int i = 1; i <= 1000; i++)
            {
                if (!await SendSnmpMessageGetTime())
                {
                    Console.WriteLine("SendSnmpMessageGetTime failed");
                }
                Console.WriteLine("Trial {0}", i);
                Thread.Sleep(10);
            }
        }
    }
}

A crash report will be seen in "Event Viewer" on Windows similar to this one:

Faulting application name: TDC7561SnmpCrashTest.exe, version: 1.0.0.0, time stamp: 0xb44b371b
Faulting module name: clr.dll, version: 4.8.4042.0, time stamp: 0x5d7a9f30
Exception code: 0xc0000005
Fault offset: 0x0000000000003d08
Faulting process id: 0x93dc
Faulting application start time: 0x01d5a929ca778dbd
Faulting application path: C:\sharpsnmplib\TDC7561SnmpCrashTest\bin\x64\Debug\TDC7561SnmpCrashTest.exe
Faulting module path: C:\Windows\Microsoft.NET\Framework64\v4.0.30319\clr.dll
Report Id: dd6dfabe-3dbd-482d-8edb-7db02c8e45b7
Faulting package full name:
Faulting package-relative application ID:

A more detailed WER ("Windows Error Report") will also be generated under C:\ProgramData\Microsoft\Windows\WER and show in event viewer.

The crash seems due to memory corruption of the managed heap by unmanaged code invoked via Messenger.GetAsync. This crash seems due to the following change made on 2016 November 21:

2016-11-23 version/release 9.0.5 has the change and test program crashes

Commit: c8cba7e [c8cba7e]
Parents: 97489d3
Author: Lex Li support@lextm.com
Date: November 21, 2016 12:52:38 PM
Committer: Lex Li
Started to reuse SocketAsyncEventArgs instances.

2016-10-06 version/release 9.0.3 doesn't have change and test program doesn't crash

Our workaround in our product for now is to revert to v9.0.3 of Sharpsnmblib from Nuget from our current version of 11.1.0.

If App Verifier is used, a report like this will be generated:

C:\Users\ltinaz\AppData\Local\Temp\TDC7561SnmpCrashTest.exe.4.dat.xml

<avrf:logfile xmlns:avrf="Application Verifier">
<avrf:logSession TimeStarted="2019-12-02 : 11:01:43" PID="37852" Version="2">
<avrf:logEntry Time="2019-12-02 : 11:01:56" LayerName="Heaps" StopCode="0x13" Severity="Error">
avrf:messageFirst chance access violation for current stack trace.</avrf:message>
avrf:parameter128 - Invalid address causing the exception.</avrf:parameter1>
avrf:parameter27fff3876a839 - Code address executing the invalid access.</avrf:parameter2>
avrf:parameter3d6f03fe630 - Exception record.</avrf:parameter3>
avrf:parameter4d6f03fe140 - Context record.</avrf:parameter4>
avrf:stackTrace
avrf:tracevrfcore!VerifierDisableVerifier+700 ( @ 0)</avrf:trace>
avrf:traceverifier!VerifierStopMessage+b9 ( @ 0)</avrf:trace>
avrf:tracentdll!RtlApplicationVerifierStop+96 ( @ 0)</avrf:trace>
avrf:tracevfbasics!+7ffee1832669 ( @ 0)</avrf:trace>
avrf:tracevfbasics!+7ffee183335a ( @ 0)</avrf:trace>
avrf:tracevfbasics!+7ffee18329aa ( @ 0)</avrf:trace>
avrf:tracentdll!RtlIsGenericTableEmpty+1a6 ( @ 0)</avrf:trace>
avrf:tracentdll!RtlRaiseException+1e6 ( @ 0)</avrf:trace>
avrf:tracentdll!KiUserExceptionDispatcher+2e ( @ 0)</avrf:trace>
avrf:traceKERNELBASE!RaiseException+69 ( @ 0)</avrf:trace>
avrf:traceclr!+7ffee7f311e9 ( @ 0)</avrf:trace>
avrf:traceclr!+7ffee7f3121b ( @ 0)</avrf:trace>
avrf:traceclr!+7ffee7f31225 ( @ 0)</avrf:trace>
</avrf:stackTrace>
</avrf:logEntry>
</avrf:logSession>
</avrf:logfile>

lextm commented

Cannot reproduce this on Windows Server 2019 on both .NET Framework and .NET Core (with a localhost SNMP agent). Maybe there are other details you ignored. It is very difficult to debug without a reliable way to reproduce the crash.

Here are more details that may have made a difference in being able to reproduce the issue most notably OS version (Windows 10 Pro, 64 bit) and clr.dll (4.8.4042.0) version;

C:\tdc-dev\TDC\TDC7561SnmpCrashTest\bin\x64\Release>systeminfo | findstr /B /C:"OS Name" /C:"OS Version"
OS Name: Microsoft Windows 10 Pro
OS Version: 10.0.18362 N/A Build 18362

C:\tdc-dev\TDC\TDC7561SnmpCrashTest\bin\x64\Release>listdlls -v TDC7561SnmpCrashTest | grep -C 9 clr.dll
0x0000000030ea0000 0xa000 C:\WINDOWS\SYSTEM32\VERSION.dll
Verified: Microsoft Windows
Publisher: Microsoft Corporation
Description: Version Checking and File Installation Libraries
Product: Microsoft▒ Windows▒ Operating System
Version: 10.0.18362.1
File version: 6.2.18362.1
Create time: Sat Nov 16 19:33:42 2047

0x00000000e7f30000 0xac6000 C:\Windows\Microsoft.NET\Framework64\v4.0.30319\clr.dll
Verified: Microsoft Windows
Publisher: Microsoft Corporation
Description: Microsoft .NET Runtime Common Language Runtime - WorkStation
Product: Microsoft▒ .NET Framework
Version: 4.0.30319.0
File version: 4.8.4042.0
Create time: Thu Sep 12 15:40:32 2019

hi lextm,
could it be that this ist the same issue i found a while ago?
--> GetNext stress-test fails #119

@pmaItelio Yes, looks like same issue.

I did some more investigation using the source code and narrowed-down the combination of changes that lead to this crash. The first change introduced "public sealed class SocketAsyncEventArgsFactory" to cache and re-use SocketAsyncEventArgs:

Commit: c8cba7e [c8cba7e]
Parents: 97489d3
Author: Lex Li support@lextm.com
Date: November 21, 2016 12:52:38 PM
Committer: Lex Li
Started to reuse SocketAsyncEventArgs instances.

That change combined with this later change where Socket.ReceiveMessageFromAsync was used (which accesses and depends on valid underlying SocketAsyncEventArgs.RemoteEndPoint) introduces the crashing behaviour (first available in release 11.0.0):

Commit: b6de163 [b6de163]
Parents: 8140efc
Author: Lex Li support@lextm.com
Date: August 26, 2018 4:46:55 PM
Committer: Lex Li
Forced packet information query.

It seems that even though there is a reference to the SocketAsyncEventArgs in the Queue of SocketAsyncEventArgsFactory, it is disposed OR one of the underlying resources, most likely the RemoteEndPoint is disposed after garbage collection by CLR.

lextm commented

@levant can you try to change SocketAsyncEventArgsFactory.Reuse to,

        internal void Reuse(SocketAsyncEventArgs args)
        {
            lock (_root)
            {
                args.RemoteEndPoint = null;
                _queue.Enqueue(args);
            }
        }

? That might tell us more.

@lextm Adding the "args.RemoteEndPoint = null;" doesn't fix the issue.

lextm commented

#136 introduced an important change, so please test out 12.0 release and provide feedback there.