dotnet/diagnostics

`dotnet-trace` speedscope report: the call of one method is divided into different pieces

JaneySprings opened this issue · 8 comments

Affected platform version

.NET 8.0

Description

Hi! I am analyzing the report received with dotnet-trace and noticed a strange behavior (it seems to me): I call one method three times. For the first time, it is divided into many small calls. Moreover, you can see how the calls are slightly shifted (even if you look at the colors):
Screenshot 2024-02-02 at 22 13 12

But the next two calls look right:
Screenshot 2024-02-02 at 22 16 44

Steps to Reproduce

I attached a speedscope.json for example. Maybe it will helpful.

com.companyname.agendaviewexample.speedscope.json

Did you find any workaround?

No response

Relevant log output

No response

Hi @JaneySprings, I'm taking a look at the speedscope you provided. I'm trying to follow what the bug is, but

  1. Is the issue that those sections (25.13s vs 32.70s & 39.17s from the timestamps in your screenshots) should be the same method?

I don't know the name of the method you are looking for, but judging from the timestamps,

the first section @ 25.13s is IShellContentController.GetOrCreateContent, and that seems to fall under JNI -> n_OnCreateView_Landroid_view_LayoutInflater_Landroid_view_ViewGroup_Landroid_os_Bundle_ -> -> ShellSectionRenderer.OnCreateView -> IShellContentController.GetOrCreateContent(23.89s, 27.92s).

The second (32.70s, 34.82s) and third (39.17s, 41.45s) sections seem to have similar call stacks to each other
JNI -> IOnClickListenerInvoker.n_OnClick_Landroid_view_View_ -> ButtonHandler.ButtonClickListener.OnClick but this seems completely different from that first section.

To me it seems like the method at the first section is different from the method at the second and third sections (again I dont know the name of the method you are looking for).

  1. Or is the issue that in the first section there seems to be two calls interleaving each other, whereas the second and third section appear to be a solid block? I think in the first section, it looks like there are two calls interleaving each other, but I think in reality its probably the same call, just missing a couple calls up the stack trace (I'm noticing that visually shifting the 1st 3rd and 5th column in the large circled area in the first screenshot down 3 rows, it seems to align). If that is the issue, I'm not sure if its easily remedied.

Did you notice a different behavior with a previous version of dotnet-trace, and so this behavior is a regression?
For reproduction steps, it would be helpful to provide a sample app to trace that demonstrates the bug.
It would also be helpful if you clarified Expected behavior vs Actual behavior

Hi, @mdh1418 !

but I think in reality its probably the same call, just missing a couple calls up the stack trace

Yes, I mean this problem. I think that for some reason three root calls:

> mono.android!Android.Runtime.JNINativeWrapper.Wrap_JniMarshal_PPLLL_L(_JniMarshal_PPLLL_L,intptr,intptr,intptr,intptr,intptr)
> xamarin.androidx.fragment!AndroidX.Fragment.App.Fragment.n_OnCreateView_Landroid_view_LayoutInflater_Landroid_view_ViewGroup_Landroid_os_Bundle_(intptr,intptr,intptr,intptr,intptr)
> microsoft.maui.controls!Microsoft.Maui.Controls.Platform.Compatibility.ShellSectionRenderer.OnCreateView(Android.Views.LayoutInflater,Android.Views.ViewGroup,Android.OS.Bundle)

stops being recorded.
Screenshot 2024-02-05 at 23 36 08

I will try to reproduce it on a simple app and create a repository for it!

Hi, @mdh1418 ! I have created a simple .NET MAUI iOS app for this bug:
https://github.com/JaneySprings/MauiInterleaving

You can check the demo below:
(My Button_Clicked method consists of several parts in the report)

Screen.Recording.2024-02-09.at.10.25.35.mp4

Hi @JaneySprings, thank you for providing that sample. The timeframes where it seemed like base events were being omitted had deeper call stacks to the neighbors where the base events were not omitted. From offline discussion with @noahfalk, he had a suspicion that the max stack depth of 100 led to base events being trimmed in favor of newer events.

After bumping that stack depth to 1000, from the speedscopes I've obtained, it looks like the max stack depth of 100 is the cause for base events being omitted.

Max depth 100

Screenshot 2024-02-12 at 2 29 19 PM

trace and speedscope 100

Max depth 1000

Screenshot 2024-02-12 at 2 29 46 PM

trace and speedscope 1000

I'm guessing this value could be bumped, but not sure to what depth we are comfortable bumping to without leading to significant memory issues. Also not sure what would be a reasonable depth would be. /cc @lateralusX @noahfalk @lambdageek thoughts?

Is the staggering of events blocking you @JaneySprings ?

Hi, @mdh1418 . Thank you for your answer!

I capture a speedscope report with my .NET Meteor VSCode extension. It uses a fork of your dotnet-diagnostics repository as submodule:
https://github.com/JaneySprings/DotNet.Meteor/tree/main/src

I can increase this limit in my fork. Can you show me the location in code where I can change this? Or, maybe, it has an environmental variable for a configuration (like DOTNET_MaxStack=1000)

@JaneySprings right now, I don't think there is a configuration setting that we can set from the diagnostics tooling side to control the max number of frames as that is set in the runtime here.

Have you happened to have built the dotnet/runtime repo before? Specifically, to apply the changes to the max stack frames, the runtime pack corresponding to the target platform/architecture would need to be built, and that is achieved through building the mono+libs subsets through these instructions. For example, I built for iOS arm64 on my mac via ./build.sh -s mono+libs -os ios -arch arm64 -c debug using the release/8.0 branch after changing the value of the max stack frames to 1000.

After doing so, the runtime pack can be found in the artifacts directory as microsoft.netcore.app.runtime.<runtime identifier>/Debug. I found mine as <runtime_repo_root>/artifacts/bin/microsoft.netcore.app.runtime.ios-arm64/Debug.
That runtime pack can be substituted into the project using a target that adds that runtimepack as the ResolvedRuntimePack msbuild itemgroup during the dotnet build process. I added this to the bottom of the csproj

	<Target Name="UpdateRuntimePack" AfterTargets="ResolveFrameworkReferences">
		<ItemGroup>
			<ResolvedRuntimePack PackageDirectory="/Users/mitchellhwang/projects/runtime/artifacts/bin/microsoft.netcore.app.runtime.ios-arm64/Debug" Condition="'%(ResolvedRuntimePack.FrameworkName)' == 'Microsoft.NETCore.App'" />
		</ItemGroup>
	</Target>

After doing so, the changes from the locally built runtime should apply.

I will try it. That's enough for me, thank you so much!

Glad to be of help! I think there was some performance impact of bumping up the max stack frames to 1000, the trace took longer and the app seemed to be slower, so you might need to tweak that value if you want better performance and not omit base frames from events. There might also be an increased chance of events being dropped, but more performance testing would need to be done to measure these metrics before we can bump the value in dotnet/runtime or add in customization from the diagnostics tooling side.

Let me know if you need more help building the runtime/applying the change, but if there isn't anything else, I'll go ahead and close this issue.