dotnet/diagnostics

"No CLR runtime found" analyzing Linux .NET 6.0.26 crash dump

bthharper opened this issue · 7 comments

Description

I am attempting to analyze a memory leak detected in a dotnet core application running in a Linux docker container running in AWS. However, analyzing the crash dump fails with the error:

ERROR: No CLR runtime found. This means that a .NET runtime module or the DAC for the runtime can not be found or downloaded.

The core dump was generated using dotnet-dump on a Linux docker container running dotnet 6.0.26 (the latest version at the time the docker image was built), as follows:

sh-4.2# dotnet dump ps
60 dotnet /usr/share/dotnet/dotnet MyService.dll --console
3060 dotnet /usr/share/dotnet/dotnet dotnet dump ps

sh-4.2# dotnet dump collect -p 60 -o ./crash.dmp --diag
Writing full to /app/logs/crash.dmp
Complete

sh-4.2# zip -9 crash.zip crash.dmp
adding: crash.dmp (deflated 87%)

The dump is then downloaded locally, and analyzed:

sh-4.2# dotnet dump analyze crash.dmp
Loading core dump: crash.dmp ...
Ready to process analysis commands. Type 'help' to list available commands or 'help [command]' to get detailed help on a command.
Type 'quit' or 'exit' to exit the session.
> clrmodules                                                                                                                                                                                                                                                                            
ERROR: No CLR runtime found. This means that a .NET runtime module or the DAC for the runtime can not be found or downloaded.

I believe this may be related to mismatching .NET Core versions - our code is running against 6.0.26 (6.0.418) but the developer machine is running 6.0.27. However, even after installing the 6.0.26 does not fix the issue. We have also tried 6.0.28.

I also tried using a docker image built using the same Docker base image installed with 6.0.26, and run locally in Docker Desktop, but this also fails.

Configuration

The docker image is based on:

FROM public.ecr.aws/amazonlinux/amazonlinux:2

RUN yum install -y aspnetcore-runtime-6.0-6.0.26-1
RUN yum install -y dotnet-sdk-6.0-6.0.418-1

.NET info:

sh-4.2# dotnet --info
.NET SDK (reflecting any global.json):
 Version:   6.0.418
 Commit:    21f869269c

Runtime Environment:
 OS Name:     amzn
 OS Version:  2
 OS Platform: Linux
 RID:         linux-x64
 Base Path:   /usr/share/dotnet/sdk/6.0.418/

global.json file:
  Not found

Host:
  Version:      6.0.28
  Architecture: x64
  Commit:       34a109148c

.NET SDKs installed:
  6.0.418 [/usr/share/dotnet/sdk]

.NET runtimes installed:
  Microsoft.AspNetCore.App 6.0.26 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 6.0.28 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

Download .NET:
  https://aka.ms/dotnet-download

Learn about .NET Runtimes and SDKs:
  https://aka.ms/dotnet/runtimes-sdk-info

Regression?

This is the first time we have tried to analyze a Linux-generated core dump.

Other information

I have tried:

dotnet-symbol  core_20240312_055956
dotnet-sos install

The same dump cannot be debugged in the latest Visual Studio 2022 or WInDBG.

Taking a crash dump from an image running 6.0.27 does work, however, the issue to be diagnosed occurred in 6.0.26.

What does dotnet-symbol -d core_20240312_055956 report? I suspect that this 6.0 runtime is source built meaning that the distro owner (amazon) builds and releases the binaries on their own feed, but the necessary debugging binaries don't get published to the Microsoft symbol store. The tooling can't find them. It looks like you are running dotnet-dump analyze on a different machine than it was generated on. Can you try it on the machine the dump is generated on?

That makes sense, as the source of the core dump was an AWS task running under a Fargate Linux container. We captured the core dump before it was recycled as it was a production server so we needed to get it up and running again.

We do know that on a running container, the commands work, however, we'd like to debug locally rather than inside AWS.

When I run dotnet-symbol, I get several missing or error reading pdb files, such as:

ERROR: Reading PDB records for /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.26/mscorlib.dll: Virtual address range is not mapped 00007FEB5821EF70 4

Which specific dll/so should I be on the lookout for?

That error message is "ok" (known). What you should see is errors attempting to download libmscordaccore.so.

If you want to copy the core dump to another machine, copy all the runtime directory from that machine also (i.e /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.26) and then in SOS use the setclrpath to the local machine copy.

Thanks for the response, just to confirm, I should unzip the source machine's .NET folder, and use setclrpath to set the path to that unzipped folder?

Yes, as long as the machine you load the core dump is same OS and architecture.

Was your issue resolved with setclrpath to the source machines .NET runtime directory? FYI, the directory should contain the libmscordaccore.so.

Yes - that fixed it for me - although the Amazon image has now moved onto 27. However, that’s my issue so thanks for the help.