Linux/Arm: BBJ_ALWAYS block remains unvisited during dominance computation

On arm, we prevent having retless calls by adding BBJ_ALWAYS block to the enter blocks list.

Lines 318 to 321 in 4d72c6b

    
           // TODO-ARM-Cleanup: The ARM code here to prevent creating retless calls by adding the BBJ_ALWAYS 
        
           // to the enter blocks is a bit of a compromise, because sometimes the blocks are already reachable, 
        
           // and it messes up DFS ordering to have them marked as enter block. We should prevent the 
        
           // creation of retless calls some other way.

Due to this, we also skip iterating over these blocks while computing the dominance and we hit an assert because that block in below scenario is never processed while doing inverted post order traversal of blocks.

The code is dated so it might be that we can have simply have a condition now to not create retless calls if we are processing the BBJ_ALWAYS block. I will check if this approach is possible and then remove BBJ_ALWAYS from entry block list. If the change is bigger, we will have to temporarily ignore the assert in these situations.

using System.Runtime.CompilerServices;
public class TestClass
{
    public struct S2
    {
        public struct S2_D1_F1
        {
            public double double_1;
        }
    }
    static int s_int_6 = -2;
    static S2 s_s2_16 = new S2();
    int int_6 = -2;

    [MethodImpl(MethodImplOptions.NoInlining)]
    public int LeafMethod6()
    {
        return 1;
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    public S2 Method0(out short p_short_1)
    {
        long long_7 = -1;
        p_short_1 = 0;
        switch (long_7)
        {
            case -5:
                {
                    do
                    {
                        try
                        {
                            int_6 ^= int_6;
                        }
                        finally
                        {
                            // The expression doesn't matter, it just has to be long enough
                            // to have few extra blocks which we don't walk when doing inverse
                            // post order while computing dominance information.
                            long_7 &= long_7;
                            int_6 &= (int_6 /= (int_6 -= LeafMethod6() - int_6) + 69) / ((int_6 << (int_6 - int_6)) + (int_6 |= LeafMethod6()) + (LeafMethod6() >> s_int_6) + 62);
                        }
                    }
                    while (long_7 == 8);
                    break;
                }
            default:
                {
                    break;
                }
        }
        return s_s2_16;
    }

    public static void Main(string[] args)
    {
        new TestClass().Method0(out short s);
    }
}

On linux/arm we get the following assertion:

Assert failure(PID 29776 [0x00007450], Thread: 45856 [0xb320]): Assertion failed 'postIndex == fgBBcount + 1' in 'TestClass:Method1(ushort,byref,byref,byref):S2:this' during 'Compute blocks reachability' (IL size 403)
    File: D:\git\runtime\src\coreclr\jit\fgopt.cpp Line: 672
    Image: D:\git\runtime\artifacts\tests\coreclr\windows.x64.Checked\tests\Core_Root\CoreRun.exe

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

On arm, we prevent having retless calls by adding BBJ_ALWAYS block to the enter blocks list.

runtime/src/coreclr/jit/fgopt.cpp

Lines 318 to 321 in 4d72c6b

    
           // TODO-ARM-Cleanup: The ARM code here to prevent creating retless calls by adding the BBJ_ALWAYS 
        
           // to the enter blocks is a bit of a compromise, because sometimes the blocks are already reachable, 
        
           // and it messes up DFS ordering to have them marked as enter block. We should prevent the 
        
           // creation of retless calls some other way.

Due to this, we also skip iterating over these blocks while computing the dominance and we hit an assert because that block in below scenario is never processed while doing inverted post order traversal of blocks.

The code is dated so it might be that we can have simply have a condition now to not create retless calls if we are processing the BBJ_ALWAYS block. I will check if this approach is possible and then remove BBJ_ALWAYS from entry block list. If the change is bigger, we will have to temporarily ignore the assert in these situations.

using System.Runtime.CompilerServices;
public class TestClass
{
    public struct S2
    {
        public struct S2_D1_F1
        {
            public double double_1;
        }
    }
    static int s_int_6 = -2;
    static S2 s_s2_16 = new S2();
    int int_6 = -2;

    [MethodImpl(MethodImplOptions.NoInlining)]
    public int LeafMethod6()
    {
        return 1;
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    public S2 Method0(out short p_short_1)
    {
        long long_7 = -1;
        p_short_1 = 0;
        switch (long_7)
        {
            case -5:
                {
                    do
                    {
                        try
                        {
                            int_6 ^= int_6;
                        }
                        finally
                        {
                            // The expression doesn't matter, it just has to be long enough
                            // to have few extra blocks which we don't walk when doing inverse
                            // post order while computing dominance information.
                            long_7 &= long_7;
                            int_6 &= (int_6 /= (int_6 -= LeafMethod6() - int_6) + 69) / ((int_6 << (int_6 - int_6)) + (int_6 |= LeafMethod6()) + (LeafMethod6() >> s_int_6) + 62);
                        }
                    }
                    while (long_7 == 8);
                    break;
                }
            default:
                {
                    break;
                }
        }
        return s_s2_16;
    }

    public static void Main(string[] args)
    {
        new TestClass().Method0(out short s);
    }
}

On linux/arm we get the following assertion:

Assert failure(PID 29776 [0x00007450], Thread: 45856 [0xb320]): Assertion failed 'postIndex == fgBBcount + 1' in 'TestClass:Method1(ushort,byref,byref,byref):S2:this' during 'Compute blocks reachability' (IL size 403)
    File: D:\git\runtime\src\coreclr\jit\fgopt.cpp Line: 672
    Image: D:\git\runtime\artifacts\tests\coreclr\windows.x64.Checked\tests\Core_Root\CoreRun.exe

Author:	kunalspathak
Assignees:	-
Labels:	`arch-arm32`, `os-linux`, `area-CodeGen-coreclr`
Milestone:	-

@dotnet/jit-contrib

We have this code for a while, so definitely this is not .NET 6 only issue. Marking it for .NET 7

It looks like this code is indeed ancient. Seems like there has to be a better way to handle this other than pretending these blocks are entries simply to avoid having us delete other blocks that we need to keep around.

@BruceForstall any ideas?

I believe I added that code (and comment) when arm32 support was originally added. I don't have any specific ideas on how to ensure that we don't create retless callfinally besides this mechanism.

I think we should also re-evaluate the whole decision to not allow retless callfinally on ARM. Perhaps we should allow them, and generate "normal" calls instead of the mov lr, <continuation>; jmp <finally> form we generate now. Then we might be able to get rid of all the BBF_FINALLY_TARGET code, but we would need to set FEATURE_EH_CALLFINALLY_THUNKS=1. It might generate some worse code, which shouldn't normally matter because we now have finally cloning. We'd have to make sure the VM stack walker is ok with it. It would make ARM32 work just like ARM64, which might be advantageous. And if we could get rid of BBF_FINALLY_TARGET, that would be nice. If ARM32 chips implement a return address stack keyed off bl instructions (I don't know if they do), it might improve performance.

I must not be understanding what it is we're trying to fix here.

Can't we just mark the "always block" as BBF_DONT_REMOVE (and the successor block, if that's needed to form the continuation label)? Why do we need to pretend the always block can be reached via flow?

Can't we just mark the "always block" as BBF_DONT_REMOVE

Maybe? Worth a try.

fwiw, I added #59453 to track the idea to re-evaluate the use of retless callfinally on arm.

Can't we just mark the "always block" as BBF_DONT_REMOVE (and the successor block, if that's needed to form the continuation label)?

I tried the option to skip the removal of BBF_ALWAYS and BBF_FINALLY_TARGET nodes, but then, there are scenarios where we might end up removing a block XYZ who is only reachable via a BBF_FINALLY_TARGET block, which in turn is only reachable by BBJ_ALWAYS block. If we don't have BBJ_ALWAYS block's entry in fgEnterBlks, below code would think that block XYZ is dead.

runtime/src/coreclr/jit/fgopt.cpp

Lines 399 to 403 in 580d17f

    
           // If any of the entry blocks can reach this block, then we skip it. 
        
           if (!BlockSetOps::IsEmptyIntersection(this, fgEnterBlks, block->bbReach)) 
        
           { 
        
               goto SKIP_BLOCK; 
        
           }

So we definitely need a list that will keep a track of BBJ_ALWAYS blocks for ARM so for any block, we can quickly check if any of the BBJ_ALWAYS block can reach the current block. One of the dumb solution that I could think of is just have another fgAlwaysBlks list to track such blocks separately instead of including them in fgEnterBlks and that fixes the original issue as well as handles the scenario I mentioned above.

Your example shows how we really should try to change the model to allow creating retless callfinally, because we end up keeping dead code around as a result of not allowing it.

One of the dumb solution that I could think of is just have another fgAlwaysBlks list to track such blocks separately instead of including them in fgEnterBlks and that fixes the original issue as well as handles the scenario I mentioned above.

Presumably only the BBJ_ALWAYS of BBJ_CALLFINALLY/BBJ_ALWAYS pairs?

	// TODO-ARM-Cleanup: The ARM code here to prevent creating retless calls by adding the BBJ_ALWAYS
	// to the enter blocks is a bit of a compromise, because sometimes the blocks are already reachable,
	// and it messes up DFS ordering to have them marked as enter block. We should prevent the
	// creation of retless calls some other way.

	// If any of the entry blocks can reach this block, then we skip it.
	if (!BlockSetOps::IsEmptyIntersection(this, fgEnterBlks, block->bbReach))
	{
	goto SKIP_BLOCK;
	}