dotnet/efcore

Remove unnecessary *last column* in ORDER BY when joining for collection

roji opened this issue Β· 39 comments

roji commented

When loading related one-to-many entities, EF adds ORDER BY clauses to make sure all related entities for a given entity are grouped together.

However, we can remove the last ORDER BY clause as it is unnecessary and causes more work (see #19571).

FYI I have a repro case where I get a timeout (> 40s) retrieving a single line (amongst 10) solely due to this (removing the ORDER BY causes the query to fetch data instantly).

I have a few owned entities which have owned collections.

We encountered the same issue here, we're using Polemo MySQL provider, in our case, the auto generated ORDER BY of Include() forces MySQL to use temp table which had significant performance impact, the query usually take less than 1ms to finish without ORDER BY and it took 14s when ORDER BY applied, and we also caught several exceptions from MySQL reporting 'Can't change size of file (OS errno 28 - No space left on device)', turns out MySQL tries to allocate space for temp tables and the file exceeds 16MB which is the default settings on our MySQL server.

Please do take this seriously as ORDER BY is performed differently on different SQL products, we had to split the LINQ query into three separate queries to workaround it, but I'm not sure how this will affect other queries in the system, it's basically impossible to reproduce this in the testing environment without extensive load tests as the performance impact is closely related to the complexity of the query as well as the result set data volume. Hopefully the fix comes sooner.

Generated SQL:

SELECT `t`.`Id`, `t`.`ApprovedTime`, `t`.`Approver`, `t`.`ApproverComments`, `t`.`AutoCreatePO`, `t`.`CancelReason`, `t`.`ChargedAmount`, `t`.`CreatedTime`, `t`.`Creator`, `t`.`CustomerId`, `t`.`DeliveryDate`, `t`.`ImageFileInfo`, `t`.`IsInternalOrder`, `t`.`LastUpdatedTime`, `t`.`Notes`, `t`.`OrderNumber`, `t`.`OrderType`, `t`.`PackagingMethod`, `t`.`Priority`, `t`.`ProjectName`, `t`.`SalesOrderCompleted`, `t`.`SalesRep`, `t`.`Status`, `t`.`TenantId`, `t`.`TotalAmount`, `t`.`UniqueId`, `t1`.`Id`, `t1`.`Area`, `t1`.`AreaName`, `t1`.`ClientSalesOrderNumber`, `t1`.`ClientWorkOrderNumber`, `t1`.`CreatedTime`, `t1`.`Creator`, `t1`.`CuttingMachineNumber`, `t1`.`CuttingOperator`, `t1`.`ImageFileInfo`, `t1`.`InternalWorkOrderNumber`, `t1`.`LastUpdatedTime`, `t1`.`Length`, `t1`.`OrderPlacingDate`, `t1`.`PackingCaseId`, `t1`.`Position`, `t1`.`Price`, `t1`.`ProcessingDate`, `t1`.`SalesOrderId`, `t1`.`Status`, `t1`.`StockInOperator`, `t1`.`StockInTime`, `t1`.`StockOutOperator`, `t1`.`StockOutTime`, `t1`.`StockingAreaId`, `t1`.`TenantId`, `t1`.`TileNumber`, `t1`.`Width`, `t3`.`Id`, `t3`.`CaseNumber`, `t3`.`CreatedTime`, `t3`.`Creator`, `t3`.`LastUpdatedTime`, `t3`.`SalesOrderId`, `t3`.`Status`, `t3`.`StockInOperator`, `t3`.`StockInTime`, `t3`.`StockOutOperator`, `t3`.`StockOutTime`, `t3`.`StockingAreaId`, `t3`.`TenantId`, `t3`.`UniqueId`
FROM (
    SELECT `s`.`Id`, `s`.`ApprovedTime`, `s`.`Approver`, `s`.`ApproverComments`, `s`.`AutoCreatePO`, `s`.`CancelReason`, `s`.`ChargedAmount`, `s`.`CreatedTime`, `s`.`Creator`, `s`.`CustomerId`, `s`.`DeliveryDate`, `s`.`ImageFileInfo`, `s`.`IsInternalOrder`, `s`.`LastUpdatedTime`, `s`.`Notes`, `s`.`OrderNumber`, `s`.`OrderType`, `s`.`PackagingMethod`, `s`.`Priority`, `s`.`ProjectName`, `s`.`SalesOrderCompleted`, `s`.`SalesRep`, `s`.`Status`, `s`.`TenantId`, `s`.`TotalAmount`, `s`.`UniqueId`
    FROM `SalesOrders` AS `s`
    WHERE (`s`.`TenantId` = '57555980-54df-49ac-b8d5-701090c85420') AND (`s`.`Id` = 1620)
    LIMIT 2
) AS `t`
LEFT JOIN (
    SELECT `t0`.`Id`, `t0`.`Area`, `t0`.`AreaName`, `t0`.`ClientSalesOrderNumber`, `t0`.`ClientWorkOrderNumber`, `t0`.`CreatedTime`, `t0`.`Creator`, `t0`.`CuttingMachineNumber`, `t0`.`CuttingOperator`, `t0`.`ImageFileInfo`, `t0`.`InternalWorkOrderNumber`, `t0`.`LastUpdatedTime`, `t0`.`Length`, `t0`.`OrderPlacingDate`, `t0`.`PackingCaseId`, `t0`.`Position`, `t0`.`Price`, `t0`.`ProcessingDate`, `t0`.`SalesOrderId`, `t0`.`Status`, `t0`.`StockInOperator`, `t0`.`StockInTime`, `t0`.`StockOutOperator`, `t0`.`StockOutTime`, `t0`.`StockingAreaId`, `t0`.`TenantId`, `t0`.`TileNumber`, `t0`.`Width`
    FROM `Tiles` AS `t0`
    WHERE `t0`.`TenantId` = '57555980-54df-49ac-b8d5-701090c85420'
) AS `t1` ON `t`.`Id` = `t1`.`SalesOrderId`
LEFT JOIN (
    SELECT `t2`.`Id`, `t2`.`CaseNumber`, `t2`.`CreatedTime`, `t2`.`Creator`, `t2`.`LastUpdatedTime`, `t2`.`SalesOrderId`, `t2`.`Status`, `t2`.`StockInOperator`, `t2`.`StockInTime`, `t2`.`StockOutOperator`, `t2`.`StockOutTime`, `t2`.`StockingAreaId`, `t2`.`TenantId`, `t2`.`UniqueId`
    FROM `TilePackingCases` AS `t2`
    WHERE `t2`.`TenantId` = '57555980-54df-49ac-b8d5-701090c85420'
) AS `t3` ON `t`.`Id` = `t3`.`SalesOrderId`
ORDER BY `t`.`Id`, `t1`.`Id`, `t3`.`Id`

Execution Results Comparison (first 3 with ORDER BY on, last 3 without ORDER BY)

image

MySQL Execution Plan (note the red circle around ORDER)

image

roji commented

@LiangZugeng @PaulARoy thanks for posting here - note that this is only about removing the last ORDER BY; previous ones are necessary in order to properly materialize results.

What could be helpful for us to evaluate the priority of this, is a runnable code sample which clearly shows that the last ORDER BY has a significant impact on perf.

I'm sorry but it's been 2 month, we fixed it and switched to postgres in between.
The best way to reach it was to stack owned entities / owned collections.

The last order by was the only cause of perf downgrade.

I will spend some time to write some code to reproduce the perf degrade issue this weekend, I will be targeting MySQL as it’s where we found the issue. Will post a Github repo here later for this.

What is the status about this issue?

roji commented

@grietine the issue is in our backlog, which means it won't be part of the upcoming 5.0 release. We'll probably consider it for 6.0.

Here is a sample project that can show the last order reducing performance and drastically increasing IO. I tried to come up with a slightly realistic example but ended up just throwing some stuff on there to get the row count and size higher. Since we use PaaS SQL Server instances our IO is more tightly limited than my local computer so IO intensive operations like the last Order By hurts performance there more than locally.

EfBug.zip

With the ORDER BY
image

Without the ORDER BY
image

Full SQL

exec sp_executesql N'SELECT [t1].[Id], [t1].[ConstructionType], [t1].[ExteriorDescription], [t1].[GeneralDescription], [t1].[InteriorDescription], [t1].[Name], [t1].[SalesDescription], [t1].[YearBuilt], [t1].[Id0], [t1].[Supervisor_Name], [t1].[Supervisor_Title], [b5].[BuildingId], [b5].[Id], [b5].[Number], [b6].[SupervisorBuildingId], [b6].[Id], [b6].[Number], [b7].[SupervisorBuildingId], [b7].[Id], [b7].[Number], [b7].[Type], [t3].[BuildingId], [t3].[Id], [t3].[Notes], [t3].[Number], [t3].[UnitBuildingId], [t3].[UnitId], [t3].[Id0], [t3].[Name], [t3].[Notes0], [t3].[ResidentUnitBuildingId], [t3].[ResidentUnitId], [t3].[ResidentId], [t3].[Id00], [t3].[Number0], [t3].[ResidentUnitBuildingId0], [t3].[ResidentUnitId0], [t3].[ResidentId0], [t3].[Id1], [t3].[ExtraDeductible], [t3].[ResidentUnitBuildingId1], [t3].[ResidentUnitId1], [t3].[ResidentId1], [t3].[Id2], [t3].[Number00], [t3].[Type]
FROM (
    SELECT TOP(1) [b].[Id], [b].[ConstructionType], [b].[ExteriorDescription], [b].[GeneralDescription], [b].[InteriorDescription], [b].[Name], [b].[SalesDescription], [b].[YearBuilt], [t0].[Id] AS [Id0], [t0].[Supervisor_Name], [t0].[Supervisor_Title]
    FROM [Building] AS [b]
    LEFT JOIN (
        SELECT [b0].[Id], [b0].[Supervisor_Name], [b0].[Supervisor_Title]
        FROM [Building] AS [b0]
        WHERE [b0].[Supervisor_Title] IS NOT NULL OR [b0].[Supervisor_Name] IS NOT NULL
        UNION
        SELECT [b1].[Id], [b1].[Supervisor_Name], [b1].[Supervisor_Title]
        FROM [Building] AS [b1]
        INNER JOIN [Building_ParkingSpaces1] AS [b2] ON [b1].[Id] = [b2].[SupervisorBuildingId]
        UNION
        SELECT [b3].[Id], [b3].[Supervisor_Name], [b3].[Supervisor_Title]
        FROM [Building] AS [b3]
        INNER JOIN [Building_PhoneNumbers] AS [b4] ON [b3].[Id] = [b4].[SupervisorBuildingId]
    ) AS [t0] ON [b].[Id] = [t0].[Id]
    WHERE [b].[Id] = @__p_0
) AS [t1]
LEFT JOIN [Building_ParkingSpaces] AS [b5] ON [t1].[Id] = [b5].[BuildingId]
LEFT JOIN [Building_ParkingSpaces1] AS [b6] ON [t1].[Id0] = [b6].[SupervisorBuildingId]
LEFT JOIN [Building_PhoneNumbers] AS [b7] ON [t1].[Id0] = [b7].[SupervisorBuildingId]
LEFT JOIN (
    SELECT [u].[BuildingId], [u].[Id], [u].[Notes], [u].[Number], [t2].[UnitBuildingId], [t2].[UnitId], [t2].[Id] AS [Id0], [t2].[Name], [t2].[Notes] AS [Notes0], [t2].[ResidentUnitBuildingId], [t2].[ResidentUnitId], [t2].[ResidentId], [t2].[Id0] AS [Id00], [t2].[Number] AS [Number0], [t2].[ResidentUnitBuildingId0], [t2].[ResidentUnitId0], [t2].[ResidentId0], [t2].[Id1], [t2].[ExtraDeductible], [t2].[ResidentUnitBuildingId1], [t2].[ResidentUnitId1], [t2].[ResidentId1], [t2].[Id2], [t2].[Number0] AS [Number00], [t2].[Type]
    FROM [Unit] AS [u]
    LEFT JOIN (
        SELECT [r].[UnitBuildingId], [r].[UnitId], [r].[Id], [r].[Name], [r].[Notes], [r0].[ResidentUnitBuildingId], [r0].[ResidentUnitId], [r0].[ResidentId], [r0].[Id] AS [Id0], [r0].[Number], [p].[ResidentUnitBuildingId] AS [ResidentUnitBuildingId0], [p].[ResidentUnitId] AS [ResidentUnitId0], [p].[ResidentId] AS [ResidentId0], [p].[Id] AS [Id1], [p].[ExtraDeductible], [r1].[ResidentUnitBuildingId] AS [ResidentUnitBuildingId1], [r1].[ResidentUnitId] AS [ResidentUnitId1], [r1].[ResidentId] AS [ResidentId1], [r1].[Id] AS [Id2], [r1].[Number] AS [Number0], [r1].[Type]
        FROM [Resident] AS [r]
        LEFT JOIN [Resident_ParkingSpaces] AS [r0] ON (([r].[UnitBuildingId] = [r0].[ResidentUnitBuildingId]) AND ([r].[UnitId] = [r0].[ResidentUnitId])) AND ([r].[Id] = [r0].[ResidentId])
        LEFT JOIN [Pet] AS [p] ON (([r].[UnitBuildingId] = [p].[ResidentUnitBuildingId]) AND ([r].[UnitId] = [p].[ResidentUnitId])) AND ([r].[Id] = [p].[ResidentId])
        LEFT JOIN [Resident_PhoneNumbers] AS [r1] ON (([r].[UnitBuildingId] = [r1].[ResidentUnitBuildingId]) AND ([r].[UnitId] = [r1].[ResidentUnitId])) AND ([r].[Id] = [r1].[ResidentId])
    ) AS [t2] ON ([u].[BuildingId] = [t2].[UnitBuildingId]) AND ([u].[Id] = [t2].[UnitId])
) AS [t3] ON [t1].[Id] = [t3].[BuildingId]
ORDER BY [t1].[Id], [b5].[BuildingId], [b5].[Id], [b6].[SupervisorBuildingId], [b6].[Id], [b7].[SupervisorBuildingId], [b7].[Id], [t3].[BuildingId], [t3].[Id], [t3].[UnitBuildingId], [t3].[UnitId], [t3].[Id0], [t3].[ResidentUnitBuildingId], [t3].[ResidentUnitId], [t3].[ResidentId], [t3].[Id00], [t3].[ResidentUnitBuildingId0], [t3].[ResidentUnitId0], [t3].[ResidentId0], [t3].[Id1], [t3].[ResidentUnitBuildingId1], [t3].[ResidentUnitId1], [t3].[ResidentId1], [t3].[Id2]',N'@__p_0 int',@__p_0=1
roji commented

Thanks @dandenton, it's great to have some hard data on this!

roji commented

@dandenton just to make sure we're discussing the same thing... Can you please confirm that your comparison checks when removing only the last term in the ORDER BY clause (i.e. [t3].[Id2]), rather than the entire ORDER BY clause? This issue is only about the former option, which is something we can actually do - for the latter see #19571.

@roji Nope, I didn't notice #20076 and thought this was to remove the full thing. Removing only the last column from the ORDER BY still results in slightly less CPU/IO but not nearly as much as removing the entire thing.

Including [t3].[Id2]
image

Excluding [t3].[Id2]
image

roji commented

Thanks for confirming.

Our team is using Pomelo and MySql and this is causing real performance issues.

Are there any workarounds for the time being beyond just loading the related entities and join them manually?

roji commented

@rabberbock please note that this issue on only about removing the last ORDER BY when joining; if you're concerned about non-last ORDER BY clauses, see #19571 (and also consider trying out split queries).

If you're convinced you're seeing a perf issue because of the last ORDER BY, it would be useful to have a code sample that shows this - it would help us bump up the priority of this issue.

@roji Thanks for the pointers! I can try to put something together. The gist of it is as follows:

The execution plan with the last ORDER BY is the following:

image

Without the last ORDER BY it's:

image

So with the last ORDER BY it uses tmp table, which in our case really degrades performance.

Without the last ORDER BY it runs in miliseconds, with the ORDER BY the same query takes a minute.

Here is a link to the MySql docs that explain when a tmp table is used. https://dev.mysql.com/doc/refman/8.0/en/internal-temporary-tables.html.

What would you like to see in a code sample beyond the above execution plans?

roji commented

Thanks for this info @rabberbock! I think the above (and the link to the MySQL docs) should be sufficient IMHO.

This is a pretty significant issue. Is there a better way to get it resolved?
a class we can write and swap in maybe on our apps?

We have queries that are taking two minutes, and literally removing the last ORDER BY makes them less than .2 seconds

EDIT: To be clear, this is not an exaggeration - customers are angry

@chazt3n A hack would be to use IDbCommandInterceptor to remove the last ORDER BY if it's a critical situation.

roji commented

@chazt3n this issue is in the 6.0 milestone, and there are already PRs out to resolve it. So chances are pretty good it'll get fixed for 6.0.

@chazt3n A hack would be to use IDbCommandInterceptor to remove the last ORDER BY if it's a critical situation.

@mguinness! thank you for showing up, you know I love your advice. Ok I'll look into that, thank you

@roji - that is wonderful news, we're PSYCHED about all things 6.0

we're only trying 5 out of necessity (split queries, filtered includes - we have very hierarchical data that gets queried from many different vectors) but are trying to stay LTS.

Thank you for the insight, didn't mean to come in all emotionally charged - starting to sweat πŸ˜…

@chazt3n The PR above is going to fix this issue, once it is merged. Pomelo will keep up-to-date with the EF Core 6 previews, so you should be able to verify it in a timely fashion, once it has been merged.

Until then, I would go with the approach that @mguinness suggested. If this is only about a hand full of queries, tag them and use the tag to specifically taget those queries with an interceptor (or use some other proprietary mechanism to mark them somehow), and remove their ORDER BY statement (a regular expression is probably sufficient).

EDIT: To be clear, this is not an exaggeration - customers are angry

If you need some sample code quickly, I can post some (or take a look at the Example: Command interception to add query hints in the EF Core docs).

if anyone's interested in a shoddy temp fix

    public class RemoveLastOrderByInterceptor : DbCommandInterceptor
    {
        public const string QueryTag = "RemoveLastOrderBy";
        public override ValueTask<InterceptionResult<DbDataReader>> ReaderExecutingAsync(
            DbCommand command, CommandEventData eventData, InterceptionResult<DbDataReader> result,
            CancellationToken cancellationToken = new CancellationToken())
        {
            try
            {
                const string orderBy = "ORDER BY";
                if (command.CommandText.Contains(QueryTag) && command.CommandText.Contains(orderBy))
                {
                    int lastOrderBy = command.CommandText.LastIndexOf(orderBy, StringComparison.Ordinal);
                    //beware of string manip on memory consumption
                    command.CommandText = command.CommandText.Remove(lastOrderBy);
                    command.CommandText += ";";
                }
            }
            catch (Exception ex)
            {
                var logger = LogManager.GetCurrentClassLogger();
                logger.Error(ex, "Failed to intercept query.");
            }

            return base.ReaderExecutingAsync(command, eventData, result, cancellationToken);

Usage:

                var things = await _context.Things
                    .Include(x => x.OtherThing)
                    .ThenInclude(e => e.AnotherThing)
                    .TagWith(RemoveLastOrderByInterceptor.QueryTag)
                    .ToListAsync(cancellationToken);

context registration

 services.AddDbContextPool<MyDataContext>((serviceProvider, options) =>
            {
                ...
               //probably should pull this from the service provider..
                options.AddInterceptors(new List<IInterceptor>() {new RemoveLastOrderByInterceptor()});

            });

Is this about removing the last column in the ORDER BY or the last ORDER BY as a whole? I thought it was the former but the interceptor has made me doubt that.

roji commented

@stevendarby the last column (or possibly multiple column it they're a composite key), see #19571 for more details.

Thanks @roji, in that case @chazt3n I would be cautious with that interceptor as it appears to remove the entire ORDER BY which might affect the proper materialisation of results.

My workaround.

Usage:

var things = await _context.Things
    .Include(x => x.OtherThing)
    .RemoveOrdering()
    .ToListAsync();

Implementation:

public static class MyQueryableExtensions
{
    public static IQueryable<TEntity> RemoveOrdering<TEntity>(
        [NotNull] this IQueryable<TEntity> source)
        where TEntity : class
    {
        return source.Provider is EntityQueryProvider
            ? source.Provider.CreateQuery<TEntity>(new RemoveOrderingExpression(source.Expression))
            : source;
    }
}

public class RemoveOrderingExpression : Expression, IPrintableExpression
{
    public Expression Source { get; }
    public sealed override ExpressionType NodeType => ExpressionType.Extension;
    public override Type Type => typeof(RemoveOrderingExpression);

    public RemoveOrderingExpression(Expression source)
    {
        Source = source;
    }

    protected override Expression VisitChildren(ExpressionVisitor visitor)
    {
        var source = visitor.Visit(Source);

        if (source is ShapedQueryExpression sqe &&
            sqe.QueryExpression is SelectExpression se &&
            0 < se.Orderings.Count)
        {
            se.ClearOrdering();
            return source;
        }
        
        return new RemoveOrderingExpression(source);
    }

    /// <inheritdoc />
    void IPrintableExpression.Print(ExpressionPrinter expressionPrinter)
    {
        expressionPrinter.AppendLine($"{nameof(RemoveOrderingExpression)}(");
        using (expressionPrinter.Indent())
        {
            expressionPrinter.Visit(Source);
            expressionPrinter.AppendLine(")");
        }
    }
}

@mikhail-khalizev - Love your work-around, but it does not work for me. It is raising the following exception:

Expression of type 'MyNamespace.Extensions.RemoveOrderingExpression' cannot be used for parameter of type 'System.Linq.IQueryable1[MyNamespace.Entities.Company]' of method 'MyNamespace.Entities.Company FirstOrDefault[Company](System.Linq.IQueryable1[MyNamespace.Entities.Company])' (Parameter 'arg0')"

For a straight-forward query:

_dbContext.Set<Company>().AsQueryable().AsNoTracking().Include(c => c.Business).RemoveOrdering().FirstOrDefaultAsync()

Any thoughts? I copied your code 'as-is'...

roji commented

@mikhail-khalizev @dariusonsched removing all orderings from a query as the above RemoveOrdering does (#19828 (comment)) is dangerous and could result in incorrect results. EF Core generates the orderings it needs in order to perform its job, interfering with that may have various undefined consequences.

@roji thank you for the insight, though I must then not be understanding how this is/was resolved in EF 6 (we are still on EF 5).

roji commented

@dariusonsched this issue tracked removing the last ORDER BY, which indeed isn't necessary. The others are necessary.

I would clarify further that - as the title now states - it’s about removing the last column in the ORDER BY and not the last ORDER BY as a whole (there could possibly be more than one set of ORDER BY statements in the whole query and the workaround in #19828 (comment) was predicated on this misunderstanding)

Strange thing happens to me with this new feature. Surprisingly it significantly slows down my query.

Described here:
https://stackoverflow.com/questions/73375578/performance-downgrade-after-upgrading-ef-core-5-to-6

Does anybody know what may cause this?

roji commented

@vdolek it's very unlikely that removing an ORDER BY makes a query slower... I'd first confirm this by executing the SQL directly against the database, outside of your EF Core application. This would remove other changes in your application, the .NET version, the EF version...

In any case, if you see a problem, please open a new issue with the full repro details.

I m on EF 6.0.7 with Pomelo.EntityFrameworkCore.MySql 6.0.2 and the order by is still getting appended when doing an Order by. It causes huge performance degradation on my query. If I run the query without the order by it is super super fast. Adding it back causes slow perf. Do I have to do anything to remove the order by when running INNER and LEFT Join?

@roji I have tried that and really when I run those two queries directly, the one without tle last order by clause has bad performance (5 min vs 10ms).

roji commented

@pantonis this issue only removed the very last ordering (see #19571 for details), and only if it's added by EF for internal materialization purposes. If you're convinced you're seeing an unnecessary last ordering, please open a new issue with a full repro and we'll investigate.

@vdolek here as well, please post the full query, as well as a the database schema with data so we can repro this here. If you'd like to share that privately, you can send an email to the address on my github profile.

@vdolek Can you provide the output using SET SHOWPLAN_TEXT ON; for both queries?

roji commented

@vdolek and others, please open a new issue rather than continue posting here - that would be better.

I'm still having issues with the default ORDER BY found at the end of the query when handling selections with a lot of inclusions (mainly for serving complex objects from a REST endpoint).
I took the @chazt3n workaround and optimized it a bit to keep the explicitly stated sorting queries working removing only the redundant primary keys sorting (I know it's not actually redundant as per MySQL documentation it's not guaranteed to be reliable without).

So, when using this workaround keep in mind that queries will NOT be idempotent.

/// <summary>
/// Workaround to avoid MySql out of memory on queries with bigger field lists<br/>
/// Inspired by <a href="https://github.com/dotnet/efcore/issues/19828#issuecomment-847222980">chazt3n workaround</a>
/// </summary>
/// <remarks>
/// The interceptor works on the assumption of the first field of the main SELECT statement being also the first field
/// in the ORDER BY statement after the explicitly requested OrderBy expressions
/// </remarks>
public class RemoveLastOrderByInterceptor : DbCommandInterceptor {
	public const string QueryTag = "RemoveLastOrderBy";

	public override ValueTask<InterceptionResult<DbDataReader>> ReaderExecutingAsync(
		DbCommand command, CommandEventData eventData, InterceptionResult<DbDataReader> result,
		CancellationToken token = new()) {
		try {
			TryApplyPatch(command);
		} catch (Exception ex) { // The exception handling can be different based on your requirements
			// _logger.LogError(ex, "Failed to intercept query.");
			Console.WriteLine(ex);
			throw; // Fails forcefully to avoid unexpected silent behaviours
		}

		return base.ReaderExecutingAsync(command, eventData, result, token);
	}

	private static bool TryApplyPatch(DbCommand command) {
		const string orderBy = "ORDER BY";
		const string select = "SELECT ";
		var query = command.CommandText;
		int idx;
		if (!query.StartsWith("-- "))
			// Check if the command is tagged
			return false;
		var separatorIdx = query.IndexOf("\n\n", StringComparison.Ordinal);
		if (separatorIdx < 0
		    || query.IndexOf(QueryTag, 0, separatorIdx + 1, StringComparison.Ordinal) < 0)
			// Efficiently checks if the tags block contains the required QueryTag
			return false;
		if ((idx = query.LastIndexOf(orderBy, StringComparison.Ordinal)) < 0)
			// The query doesn't have an ORDER BY statement
			return false;
		// Using StringBuilder to avoid string allocation issues
		// While using early versions of .NET Framework there would be buffer allocation exceptions so it's
		// necessary to remove part of the pre-allocated string before appending (or specify capacity explicitly)
		var cmd = new StringBuilder(query);
		// Identify first SELECT field
		var start = query.IndexOf(select, StringComparison.Ordinal);
		if (start >= 0) {
			var nextIdx = query.IndexOf(",", start, StringComparison.Ordinal);
			var fromIdx = query.IndexOf("FROM", start, StringComparison.Ordinal);
			// Support both selection with only one value and multi value selection
			var end = nextIdx < 0 ? fromIdx : Math.Min(nextIdx, fromIdx);
			var from = start + select.Length;
			// Assemble first selected field query
			var firstField = cmd.ToString(from, end - from);
			// Check if the ORDER BY starts with a different field than the first selected field and, in such
			// case, identifies the index where the "redundant" ORDER BY begins
			var orderStart = query.IndexOf(firstField, idx, StringComparison.Ordinal);
			if (orderStart > idx + orderBy.Length + 1)
				idx = orderStart - 2 /* Subtract 2 characters to take account of the trailing comma */;
		}

		// Cut ORDER BY statement or remove it entirely
		command.CommandText = cmd
			.Remove(idx, query.Length - idx)
			.Append(';')
			.ToString();
		return true;
	}
}