Linux/Arm: BBJ_ALWAYS block remains unvisited during dominance computation
kunalspathak opened this issue · 10 comments
On arm, we prevent having retless calls by adding BBJ_ALWAYS
block to the enter blocks list.
runtime/src/coreclr/jit/fgopt.cpp
Lines 318 to 321 in 4d72c6b
Due to this, we also skip iterating over these blocks while computing the dominance and we hit an assert because that block in below scenario is never processed while doing inverted post order traversal of blocks.
The code is dated so it might be that we can have simply have a condition now to not create retless calls if we are processing the BBJ_ALWAYS
block. I will check if this approach is possible and then remove BBJ_ALWAYS
from entry block list. If the change is bigger, we will have to temporarily ignore the assert in these situations.
using System.Runtime.CompilerServices;
public class TestClass
{
public struct S2
{
public struct S2_D1_F1
{
public double double_1;
}
}
static int s_int_6 = -2;
static S2 s_s2_16 = new S2();
int int_6 = -2;
[MethodImpl(MethodImplOptions.NoInlining)]
public int LeafMethod6()
{
return 1;
}
[MethodImpl(MethodImplOptions.NoInlining)]
public S2 Method0(out short p_short_1)
{
long long_7 = -1;
p_short_1 = 0;
switch (long_7)
{
case -5:
{
do
{
try
{
int_6 ^= int_6;
}
finally
{
// The expression doesn't matter, it just has to be long enough
// to have few extra blocks which we don't walk when doing inverse
// post order while computing dominance information.
long_7 &= long_7;
int_6 &= (int_6 /= (int_6 -= LeafMethod6() - int_6) + 69) / ((int_6 << (int_6 - int_6)) + (int_6 |= LeafMethod6()) + (LeafMethod6() >> s_int_6) + 62);
}
}
while (long_7 == 8);
break;
}
default:
{
break;
}
}
return s_s2_16;
}
public static void Main(string[] args)
{
new TestClass().Method0(out short s);
}
}
On linux/arm we get the following assertion:
Assert failure(PID 29776 [0x00007450], Thread: 45856 [0xb320]): Assertion failed 'postIndex == fgBBcount + 1' in 'TestClass:Method1(ushort,byref,byref,byref):S2:this' during 'Compute blocks reachability' (IL size 403)
File: D:\git\runtime\src\coreclr\jit\fgopt.cpp Line: 672
Image: D:\git\runtime\artifacts\tests\coreclr\windows.x64.Checked\tests\Core_Root\CoreRun.exe
Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.
Issue Details
On arm, we prevent having retless calls by adding BBJ_ALWAYS
block to the enter blocks list.
runtime/src/coreclr/jit/fgopt.cpp
Lines 318 to 321 in 4d72c6b
Due to this, we also skip iterating over these blocks while computing the dominance and we hit an assert because that block in below scenario is never processed while doing inverted post order traversal of blocks.
The code is dated so it might be that we can have simply have a condition now to not create retless calls if we are processing the BBJ_ALWAYS
block. I will check if this approach is possible and then remove BBJ_ALWAYS
from entry block list. If the change is bigger, we will have to temporarily ignore the assert in these situations.
using System.Runtime.CompilerServices;
public class TestClass
{
public struct S2
{
public struct S2_D1_F1
{
public double double_1;
}
}
static int s_int_6 = -2;
static S2 s_s2_16 = new S2();
int int_6 = -2;
[MethodImpl(MethodImplOptions.NoInlining)]
public int LeafMethod6()
{
return 1;
}
[MethodImpl(MethodImplOptions.NoInlining)]
public S2 Method0(out short p_short_1)
{
long long_7 = -1;
p_short_1 = 0;
switch (long_7)
{
case -5:
{
do
{
try
{
int_6 ^= int_6;
}
finally
{
// The expression doesn't matter, it just has to be long enough
// to have few extra blocks which we don't walk when doing inverse
// post order while computing dominance information.
long_7 &= long_7;
int_6 &= (int_6 /= (int_6 -= LeafMethod6() - int_6) + 69) / ((int_6 << (int_6 - int_6)) + (int_6 |= LeafMethod6()) + (LeafMethod6() >> s_int_6) + 62);
}
}
while (long_7 == 8);
break;
}
default:
{
break;
}
}
return s_s2_16;
}
public static void Main(string[] args)
{
new TestClass().Method0(out short s);
}
}
On linux/arm we get the following assertion:
Assert failure(PID 29776 [0x00007450], Thread: 45856 [0xb320]): Assertion failed 'postIndex == fgBBcount + 1' in 'TestClass:Method1(ushort,byref,byref,byref):S2:this' during 'Compute blocks reachability' (IL size 403)
File: D:\git\runtime\src\coreclr\jit\fgopt.cpp Line: 672
Image: D:\git\runtime\artifacts\tests\coreclr\windows.x64.Checked\tests\Core_Root\CoreRun.exe
Author: | kunalspathak |
---|---|
Assignees: | - |
Labels: |
|
Milestone: | - |
@dotnet/jit-contrib
We have this code for a while, so definitely this is not .NET 6 only issue. Marking it for .NET 7
It looks like this code is indeed ancient. Seems like there has to be a better way to handle this other than pretending these blocks are entries simply to avoid having us delete other blocks that we need to keep around.
@BruceForstall any ideas?
I believe I added that code (and comment) when arm32 support was originally added. I don't have any specific ideas on how to ensure that we don't create retless callfinally besides this mechanism.
I think we should also re-evaluate the whole decision to not allow retless callfinally on ARM. Perhaps we should allow them, and generate "normal" calls instead of the mov lr, <continuation>; jmp <finally>
form we generate now. Then we might be able to get rid of all the BBF_FINALLY_TARGET code, but we would need to set FEATURE_EH_CALLFINALLY_THUNKS=1
. It might generate some worse code, which shouldn't normally matter because we now have finally cloning. We'd have to make sure the VM stack walker is ok with it. It would make ARM32 work just like ARM64, which might be advantageous. And if we could get rid of BBF_FINALLY_TARGET, that would be nice. If ARM32 chips implement a return address stack keyed off bl
instructions (I don't know if they do), it might improve performance.
I must not be understanding what it is we're trying to fix here.
Can't we just mark the "always block" as BBF_DONT_REMOVE
(and the successor block, if that's needed to form the continuation label)? Why do we need to pretend the always block can be reached via flow?
Can't we just mark the "always block" as BBF_DONT_REMOVE
Maybe? Worth a try.
fwiw, I added #59453 to track the idea to re-evaluate the use of retless callfinally on arm.
Can't we just mark the "always block" as BBF_DONT_REMOVE (and the successor block, if that's needed to form the continuation label)?
I tried the option to skip the removal of BBF_ALWAYS
and BBF_FINALLY_TARGET
nodes, but then, there are scenarios where we might end up removing a block XYZ
who is only reachable via a BBF_FINALLY_TARGET
block, which in turn is only reachable by BBJ_ALWAYS
block. If we don't have BBJ_ALWAYS
block's entry in fgEnterBlks
, below code would think that block XYZ
is dead.
runtime/src/coreclr/jit/fgopt.cpp
Lines 399 to 403 in 580d17f
So we definitely need a list that will keep a track of BBJ_ALWAYS
blocks for ARM so for any block, we can quickly check if any of the BBJ_ALWAYS
block can reach the current block. One of the dumb solution that I could think of is just have another fgAlwaysBlks
list to track such blocks separately instead of including them in fgEnterBlks
and that fixes the original issue as well as handles the scenario I mentioned above.
Your example shows how we really should try to change the model to allow creating retless callfinally, because we end up keeping dead code around as a result of not allowing it.
One of the dumb solution that I could think of is just have another fgAlwaysBlks list to track such blocks separately instead of including them in fgEnterBlks and that fixes the original issue as well as handles the scenario I mentioned above.
Presumably only the BBJ_ALWAYS of BBJ_CALLFINALLY/BBJ_ALWAYS pairs?