Remove unnecessary *last column* in ORDER BY when joining for collection
roji opened this issue Β· 39 comments
When loading related one-to-many entities, EF adds ORDER BY clauses to make sure all related entities for a given entity are grouped together.
However, we can remove the last ORDER BY clause as it is unnecessary and causes more work (see #19571).
FYI I have a repro case where I get a timeout (> 40s) retrieving a single line (amongst 10) solely due to this (removing the ORDER BY causes the query to fetch data instantly).
I have a few owned entities which have owned collections.
We encountered the same issue here, we're using Polemo MySQL provider, in our case, the auto generated ORDER BY of Include() forces MySQL to use temp table which had significant performance impact, the query usually take less than 1ms to finish without ORDER BY and it took 14s when ORDER BY applied, and we also caught several exceptions from MySQL reporting 'Can't change size of file (OS errno 28 - No space left on device)', turns out MySQL tries to allocate space for temp tables and the file exceeds 16MB which is the default settings on our MySQL server.
Please do take this seriously as ORDER BY is performed differently on different SQL products, we had to split the LINQ query into three separate queries to workaround it, but I'm not sure how this will affect other queries in the system, it's basically impossible to reproduce this in the testing environment without extensive load tests as the performance impact is closely related to the complexity of the query as well as the result set data volume. Hopefully the fix comes sooner.
Generated SQL:
SELECT `t`.`Id`, `t`.`ApprovedTime`, `t`.`Approver`, `t`.`ApproverComments`, `t`.`AutoCreatePO`, `t`.`CancelReason`, `t`.`ChargedAmount`, `t`.`CreatedTime`, `t`.`Creator`, `t`.`CustomerId`, `t`.`DeliveryDate`, `t`.`ImageFileInfo`, `t`.`IsInternalOrder`, `t`.`LastUpdatedTime`, `t`.`Notes`, `t`.`OrderNumber`, `t`.`OrderType`, `t`.`PackagingMethod`, `t`.`Priority`, `t`.`ProjectName`, `t`.`SalesOrderCompleted`, `t`.`SalesRep`, `t`.`Status`, `t`.`TenantId`, `t`.`TotalAmount`, `t`.`UniqueId`, `t1`.`Id`, `t1`.`Area`, `t1`.`AreaName`, `t1`.`ClientSalesOrderNumber`, `t1`.`ClientWorkOrderNumber`, `t1`.`CreatedTime`, `t1`.`Creator`, `t1`.`CuttingMachineNumber`, `t1`.`CuttingOperator`, `t1`.`ImageFileInfo`, `t1`.`InternalWorkOrderNumber`, `t1`.`LastUpdatedTime`, `t1`.`Length`, `t1`.`OrderPlacingDate`, `t1`.`PackingCaseId`, `t1`.`Position`, `t1`.`Price`, `t1`.`ProcessingDate`, `t1`.`SalesOrderId`, `t1`.`Status`, `t1`.`StockInOperator`, `t1`.`StockInTime`, `t1`.`StockOutOperator`, `t1`.`StockOutTime`, `t1`.`StockingAreaId`, `t1`.`TenantId`, `t1`.`TileNumber`, `t1`.`Width`, `t3`.`Id`, `t3`.`CaseNumber`, `t3`.`CreatedTime`, `t3`.`Creator`, `t3`.`LastUpdatedTime`, `t3`.`SalesOrderId`, `t3`.`Status`, `t3`.`StockInOperator`, `t3`.`StockInTime`, `t3`.`StockOutOperator`, `t3`.`StockOutTime`, `t3`.`StockingAreaId`, `t3`.`TenantId`, `t3`.`UniqueId`
FROM (
SELECT `s`.`Id`, `s`.`ApprovedTime`, `s`.`Approver`, `s`.`ApproverComments`, `s`.`AutoCreatePO`, `s`.`CancelReason`, `s`.`ChargedAmount`, `s`.`CreatedTime`, `s`.`Creator`, `s`.`CustomerId`, `s`.`DeliveryDate`, `s`.`ImageFileInfo`, `s`.`IsInternalOrder`, `s`.`LastUpdatedTime`, `s`.`Notes`, `s`.`OrderNumber`, `s`.`OrderType`, `s`.`PackagingMethod`, `s`.`Priority`, `s`.`ProjectName`, `s`.`SalesOrderCompleted`, `s`.`SalesRep`, `s`.`Status`, `s`.`TenantId`, `s`.`TotalAmount`, `s`.`UniqueId`
FROM `SalesOrders` AS `s`
WHERE (`s`.`TenantId` = '57555980-54df-49ac-b8d5-701090c85420') AND (`s`.`Id` = 1620)
LIMIT 2
) AS `t`
LEFT JOIN (
SELECT `t0`.`Id`, `t0`.`Area`, `t0`.`AreaName`, `t0`.`ClientSalesOrderNumber`, `t0`.`ClientWorkOrderNumber`, `t0`.`CreatedTime`, `t0`.`Creator`, `t0`.`CuttingMachineNumber`, `t0`.`CuttingOperator`, `t0`.`ImageFileInfo`, `t0`.`InternalWorkOrderNumber`, `t0`.`LastUpdatedTime`, `t0`.`Length`, `t0`.`OrderPlacingDate`, `t0`.`PackingCaseId`, `t0`.`Position`, `t0`.`Price`, `t0`.`ProcessingDate`, `t0`.`SalesOrderId`, `t0`.`Status`, `t0`.`StockInOperator`, `t0`.`StockInTime`, `t0`.`StockOutOperator`, `t0`.`StockOutTime`, `t0`.`StockingAreaId`, `t0`.`TenantId`, `t0`.`TileNumber`, `t0`.`Width`
FROM `Tiles` AS `t0`
WHERE `t0`.`TenantId` = '57555980-54df-49ac-b8d5-701090c85420'
) AS `t1` ON `t`.`Id` = `t1`.`SalesOrderId`
LEFT JOIN (
SELECT `t2`.`Id`, `t2`.`CaseNumber`, `t2`.`CreatedTime`, `t2`.`Creator`, `t2`.`LastUpdatedTime`, `t2`.`SalesOrderId`, `t2`.`Status`, `t2`.`StockInOperator`, `t2`.`StockInTime`, `t2`.`StockOutOperator`, `t2`.`StockOutTime`, `t2`.`StockingAreaId`, `t2`.`TenantId`, `t2`.`UniqueId`
FROM `TilePackingCases` AS `t2`
WHERE `t2`.`TenantId` = '57555980-54df-49ac-b8d5-701090c85420'
) AS `t3` ON `t`.`Id` = `t3`.`SalesOrderId`
ORDER BY `t`.`Id`, `t1`.`Id`, `t3`.`Id`
Execution Results Comparison (first 3 with ORDER BY on, last 3 without ORDER BY)
MySQL Execution Plan (note the red circle around ORDER)
@LiangZugeng @PaulARoy thanks for posting here - note that this is only about removing the last ORDER BY; previous ones are necessary in order to properly materialize results.
What could be helpful for us to evaluate the priority of this, is a runnable code sample which clearly shows that the last ORDER BY has a significant impact on perf.
I'm sorry but it's been 2 month, we fixed it and switched to postgres in between.
The best way to reach it was to stack owned entities / owned collections.
The last order by was the only cause of perf downgrade.
I will spend some time to write some code to reproduce the perf degrade issue this weekend, I will be targeting MySQL as itβs where we found the issue. Will post a Github repo here later for this.
What is the status about this issue?
@grietine the issue is in our backlog, which means it won't be part of the upcoming 5.0 release. We'll probably consider it for 6.0.
Here is a sample project that can show the last order reducing performance and drastically increasing IO. I tried to come up with a slightly realistic example but ended up just throwing some stuff on there to get the row count and size higher. Since we use PaaS SQL Server instances our IO is more tightly limited than my local computer so IO intensive operations like the last Order By hurts performance there more than locally.
Full SQL
exec sp_executesql N'SELECT [t1].[Id], [t1].[ConstructionType], [t1].[ExteriorDescription], [t1].[GeneralDescription], [t1].[InteriorDescription], [t1].[Name], [t1].[SalesDescription], [t1].[YearBuilt], [t1].[Id0], [t1].[Supervisor_Name], [t1].[Supervisor_Title], [b5].[BuildingId], [b5].[Id], [b5].[Number], [b6].[SupervisorBuildingId], [b6].[Id], [b6].[Number], [b7].[SupervisorBuildingId], [b7].[Id], [b7].[Number], [b7].[Type], [t3].[BuildingId], [t3].[Id], [t3].[Notes], [t3].[Number], [t3].[UnitBuildingId], [t3].[UnitId], [t3].[Id0], [t3].[Name], [t3].[Notes0], [t3].[ResidentUnitBuildingId], [t3].[ResidentUnitId], [t3].[ResidentId], [t3].[Id00], [t3].[Number0], [t3].[ResidentUnitBuildingId0], [t3].[ResidentUnitId0], [t3].[ResidentId0], [t3].[Id1], [t3].[ExtraDeductible], [t3].[ResidentUnitBuildingId1], [t3].[ResidentUnitId1], [t3].[ResidentId1], [t3].[Id2], [t3].[Number00], [t3].[Type]
FROM (
SELECT TOP(1) [b].[Id], [b].[ConstructionType], [b].[ExteriorDescription], [b].[GeneralDescription], [b].[InteriorDescription], [b].[Name], [b].[SalesDescription], [b].[YearBuilt], [t0].[Id] AS [Id0], [t0].[Supervisor_Name], [t0].[Supervisor_Title]
FROM [Building] AS [b]
LEFT JOIN (
SELECT [b0].[Id], [b0].[Supervisor_Name], [b0].[Supervisor_Title]
FROM [Building] AS [b0]
WHERE [b0].[Supervisor_Title] IS NOT NULL OR [b0].[Supervisor_Name] IS NOT NULL
UNION
SELECT [b1].[Id], [b1].[Supervisor_Name], [b1].[Supervisor_Title]
FROM [Building] AS [b1]
INNER JOIN [Building_ParkingSpaces1] AS [b2] ON [b1].[Id] = [b2].[SupervisorBuildingId]
UNION
SELECT [b3].[Id], [b3].[Supervisor_Name], [b3].[Supervisor_Title]
FROM [Building] AS [b3]
INNER JOIN [Building_PhoneNumbers] AS [b4] ON [b3].[Id] = [b4].[SupervisorBuildingId]
) AS [t0] ON [b].[Id] = [t0].[Id]
WHERE [b].[Id] = @__p_0
) AS [t1]
LEFT JOIN [Building_ParkingSpaces] AS [b5] ON [t1].[Id] = [b5].[BuildingId]
LEFT JOIN [Building_ParkingSpaces1] AS [b6] ON [t1].[Id0] = [b6].[SupervisorBuildingId]
LEFT JOIN [Building_PhoneNumbers] AS [b7] ON [t1].[Id0] = [b7].[SupervisorBuildingId]
LEFT JOIN (
SELECT [u].[BuildingId], [u].[Id], [u].[Notes], [u].[Number], [t2].[UnitBuildingId], [t2].[UnitId], [t2].[Id] AS [Id0], [t2].[Name], [t2].[Notes] AS [Notes0], [t2].[ResidentUnitBuildingId], [t2].[ResidentUnitId], [t2].[ResidentId], [t2].[Id0] AS [Id00], [t2].[Number] AS [Number0], [t2].[ResidentUnitBuildingId0], [t2].[ResidentUnitId0], [t2].[ResidentId0], [t2].[Id1], [t2].[ExtraDeductible], [t2].[ResidentUnitBuildingId1], [t2].[ResidentUnitId1], [t2].[ResidentId1], [t2].[Id2], [t2].[Number0] AS [Number00], [t2].[Type]
FROM [Unit] AS [u]
LEFT JOIN (
SELECT [r].[UnitBuildingId], [r].[UnitId], [r].[Id], [r].[Name], [r].[Notes], [r0].[ResidentUnitBuildingId], [r0].[ResidentUnitId], [r0].[ResidentId], [r0].[Id] AS [Id0], [r0].[Number], [p].[ResidentUnitBuildingId] AS [ResidentUnitBuildingId0], [p].[ResidentUnitId] AS [ResidentUnitId0], [p].[ResidentId] AS [ResidentId0], [p].[Id] AS [Id1], [p].[ExtraDeductible], [r1].[ResidentUnitBuildingId] AS [ResidentUnitBuildingId1], [r1].[ResidentUnitId] AS [ResidentUnitId1], [r1].[ResidentId] AS [ResidentId1], [r1].[Id] AS [Id2], [r1].[Number] AS [Number0], [r1].[Type]
FROM [Resident] AS [r]
LEFT JOIN [Resident_ParkingSpaces] AS [r0] ON (([r].[UnitBuildingId] = [r0].[ResidentUnitBuildingId]) AND ([r].[UnitId] = [r0].[ResidentUnitId])) AND ([r].[Id] = [r0].[ResidentId])
LEFT JOIN [Pet] AS [p] ON (([r].[UnitBuildingId] = [p].[ResidentUnitBuildingId]) AND ([r].[UnitId] = [p].[ResidentUnitId])) AND ([r].[Id] = [p].[ResidentId])
LEFT JOIN [Resident_PhoneNumbers] AS [r1] ON (([r].[UnitBuildingId] = [r1].[ResidentUnitBuildingId]) AND ([r].[UnitId] = [r1].[ResidentUnitId])) AND ([r].[Id] = [r1].[ResidentId])
) AS [t2] ON ([u].[BuildingId] = [t2].[UnitBuildingId]) AND ([u].[Id] = [t2].[UnitId])
) AS [t3] ON [t1].[Id] = [t3].[BuildingId]
ORDER BY [t1].[Id], [b5].[BuildingId], [b5].[Id], [b6].[SupervisorBuildingId], [b6].[Id], [b7].[SupervisorBuildingId], [b7].[Id], [t3].[BuildingId], [t3].[Id], [t3].[UnitBuildingId], [t3].[UnitId], [t3].[Id0], [t3].[ResidentUnitBuildingId], [t3].[ResidentUnitId], [t3].[ResidentId], [t3].[Id00], [t3].[ResidentUnitBuildingId0], [t3].[ResidentUnitId0], [t3].[ResidentId0], [t3].[Id1], [t3].[ResidentUnitBuildingId1], [t3].[ResidentUnitId1], [t3].[ResidentId1], [t3].[Id2]',N'@__p_0 int',@__p_0=1
Thanks @dandenton, it's great to have some hard data on this!
@dandenton just to make sure we're discussing the same thing... Can you please confirm that your comparison checks when removing only the last term in the ORDER BY clause (i.e. [t3].[Id2]
), rather than the entire ORDER BY clause? This issue is only about the former option, which is something we can actually do - for the latter see #19571.
Thanks for confirming.
Our team is using Pomelo and MySql and this is causing real performance issues.
Are there any workarounds for the time being beyond just loading the related entities and join them manually?
@rabberbock please note that this issue on only about removing the last ORDER BY when joining; if you're concerned about non-last ORDER BY clauses, see #19571 (and also consider trying out split queries).
If you're convinced you're seeing a perf issue because of the last ORDER BY, it would be useful to have a code sample that shows this - it would help us bump up the priority of this issue.
@roji Thanks for the pointers! I can try to put something together. The gist of it is as follows:
The execution plan with the last ORDER BY is the following:
Without the last ORDER BY it's:
So with the last ORDER BY it uses tmp table, which in our case really degrades performance.
Without the last ORDER BY it runs in miliseconds, with the ORDER BY the same query takes a minute.
Here is a link to the MySql docs that explain when a tmp table is used. https://dev.mysql.com/doc/refman/8.0/en/internal-temporary-tables.html.
What would you like to see in a code sample beyond the above execution plans?
Thanks for this info @rabberbock! I think the above (and the link to the MySQL docs) should be sufficient IMHO.
This is a pretty significant issue. Is there a better way to get it resolved?
a class we can write and swap in maybe on our apps?
We have queries that are taking two minutes, and literally removing the last ORDER BY makes them less than .2 seconds
EDIT: To be clear, this is not an exaggeration - customers are angry
@chazt3n A hack would be to use IDbCommandInterceptor to remove the last ORDER BY
if it's a critical situation.
@chazt3n this issue is in the 6.0 milestone, and there are already PRs out to resolve it. So chances are pretty good it'll get fixed for 6.0.
@chazt3n A hack would be to use IDbCommandInterceptor to remove the last
ORDER BY
if it's a critical situation.
@mguinness! thank you for showing up, you know I love your advice. Ok I'll look into that, thank you
@roji - that is wonderful news, we're PSYCHED about all things 6.0
we're only trying 5 out of necessity (split queries, filtered includes - we have very hierarchical data that gets queried from many different vectors) but are trying to stay LTS.
Thank you for the insight, didn't mean to come in all emotionally charged - starting to sweat π
@chazt3n The PR above is going to fix this issue, once it is merged. Pomelo will keep up-to-date with the EF Core 6 previews, so you should be able to verify it in a timely fashion, once it has been merged.
Until then, I would go with the approach that @mguinness suggested. If this is only about a hand full of queries, tag them and use the tag to specifically taget those queries with an interceptor (or use some other proprietary mechanism to mark them somehow), and remove their ORDER BY
statement (a regular expression is probably sufficient).
EDIT: To be clear, this is not an exaggeration - customers are angry
If you need some sample code quickly, I can post some (or take a look at the Example: Command interception to add query hints in the EF Core docs).
if anyone's interested in a shoddy temp fix
public class RemoveLastOrderByInterceptor : DbCommandInterceptor
{
public const string QueryTag = "RemoveLastOrderBy";
public override ValueTask<InterceptionResult<DbDataReader>> ReaderExecutingAsync(
DbCommand command, CommandEventData eventData, InterceptionResult<DbDataReader> result,
CancellationToken cancellationToken = new CancellationToken())
{
try
{
const string orderBy = "ORDER BY";
if (command.CommandText.Contains(QueryTag) && command.CommandText.Contains(orderBy))
{
int lastOrderBy = command.CommandText.LastIndexOf(orderBy, StringComparison.Ordinal);
//beware of string manip on memory consumption
command.CommandText = command.CommandText.Remove(lastOrderBy);
command.CommandText += ";";
}
}
catch (Exception ex)
{
var logger = LogManager.GetCurrentClassLogger();
logger.Error(ex, "Failed to intercept query.");
}
return base.ReaderExecutingAsync(command, eventData, result, cancellationToken);
Usage:
var things = await _context.Things
.Include(x => x.OtherThing)
.ThenInclude(e => e.AnotherThing)
.TagWith(RemoveLastOrderByInterceptor.QueryTag)
.ToListAsync(cancellationToken);
context registration
services.AddDbContextPool<MyDataContext>((serviceProvider, options) =>
{
...
//probably should pull this from the service provider..
options.AddInterceptors(new List<IInterceptor>() {new RemoveLastOrderByInterceptor()});
});
Is this about removing the last column in the ORDER BY or the last ORDER BY as a whole? I thought it was the former but the interceptor has made me doubt that.
@stevendarby the last column (or possibly multiple column it they're a composite key), see #19571 for more details.
My workaround.
Usage:
var things = await _context.Things
.Include(x => x.OtherThing)
.RemoveOrdering()
.ToListAsync();
Implementation:
public static class MyQueryableExtensions
{
public static IQueryable<TEntity> RemoveOrdering<TEntity>(
[NotNull] this IQueryable<TEntity> source)
where TEntity : class
{
return source.Provider is EntityQueryProvider
? source.Provider.CreateQuery<TEntity>(new RemoveOrderingExpression(source.Expression))
: source;
}
}
public class RemoveOrderingExpression : Expression, IPrintableExpression
{
public Expression Source { get; }
public sealed override ExpressionType NodeType => ExpressionType.Extension;
public override Type Type => typeof(RemoveOrderingExpression);
public RemoveOrderingExpression(Expression source)
{
Source = source;
}
protected override Expression VisitChildren(ExpressionVisitor visitor)
{
var source = visitor.Visit(Source);
if (source is ShapedQueryExpression sqe &&
sqe.QueryExpression is SelectExpression se &&
0 < se.Orderings.Count)
{
se.ClearOrdering();
return source;
}
return new RemoveOrderingExpression(source);
}
/// <inheritdoc />
void IPrintableExpression.Print(ExpressionPrinter expressionPrinter)
{
expressionPrinter.AppendLine($"{nameof(RemoveOrderingExpression)}(");
using (expressionPrinter.Indent())
{
expressionPrinter.Visit(Source);
expressionPrinter.AppendLine(")");
}
}
}
@mikhail-khalizev - Love your work-around, but it does not work for me. It is raising the following exception:
Expression of type 'MyNamespace.Extensions.RemoveOrderingExpression' cannot be used for parameter of type 'System.Linq.IQueryable1[MyNamespace.Entities.Company]' of method 'MyNamespace.Entities.Company FirstOrDefault[Company](System.Linq.IQueryable1[MyNamespace.Entities.Company])' (Parameter 'arg0')"
For a straight-forward query:
_dbContext.Set<Company>().AsQueryable().AsNoTracking().Include(c => c.Business).RemoveOrdering().FirstOrDefaultAsync()
Any thoughts? I copied your code 'as-is'...
@mikhail-khalizev @dariusonsched removing all orderings from a query as the above RemoveOrdering does (#19828 (comment)) is dangerous and could result in incorrect results. EF Core generates the orderings it needs in order to perform its job, interfering with that may have various undefined consequences.
@roji thank you for the insight, though I must then not be understanding how this is/was resolved in EF 6 (we are still on EF 5).
@dariusonsched this issue tracked removing the last ORDER BY, which indeed isn't necessary. The others are necessary.
I would clarify further that - as the title now states - itβs about removing the last column in the ORDER BY and not the last ORDER BY as a whole (there could possibly be more than one set of ORDER BY statements in the whole query and the workaround in #19828 (comment) was predicated on this misunderstanding)
Strange thing happens to me with this new feature. Surprisingly it significantly slows down my query.
Described here:
https://stackoverflow.com/questions/73375578/performance-downgrade-after-upgrading-ef-core-5-to-6
Does anybody know what may cause this?
@vdolek it's very unlikely that removing an ORDER BY makes a query slower... I'd first confirm this by executing the SQL directly against the database, outside of your EF Core application. This would remove other changes in your application, the .NET version, the EF version...
In any case, if you see a problem, please open a new issue with the full repro details.
I m on EF 6.0.7 with Pomelo.EntityFrameworkCore.MySql 6.0.2 and the order by is still getting appended when doing an Order by. It causes huge performance degradation on my query. If I run the query without the order by it is super super fast. Adding it back causes slow perf. Do I have to do anything to remove the order by when running INNER and LEFT Join?
@roji I have tried that and really when I run those two queries directly, the one without tle last order by
clause has bad performance (5 min vs 10ms).
@pantonis this issue only removed the very last ordering (see #19571 for details), and only if it's added by EF for internal materialization purposes. If you're convinced you're seeing an unnecessary last ordering, please open a new issue with a full repro and we'll investigate.
@vdolek here as well, please post the full query, as well as a the database schema with data so we can repro this here. If you'd like to share that privately, you can send an email to the address on my github profile.
@vdolek Can you provide the output using SET SHOWPLAN_TEXT ON;
for both queries?
@vdolek and others, please open a new issue rather than continue posting here - that would be better.
I'm still having issues with the default ORDER BY found at the end of the query when handling selections with a lot of inclusions (mainly for serving complex objects from a REST endpoint).
I took the @chazt3n workaround and optimized it a bit to keep the explicitly stated sorting queries working removing only the redundant primary keys sorting (I know it's not actually redundant as per MySQL documentation it's not guaranteed to be reliable without).
So, when using this workaround keep in mind that queries will NOT be idempotent.
/// <summary>
/// Workaround to avoid MySql out of memory on queries with bigger field lists<br/>
/// Inspired by <a href="https://github.com/dotnet/efcore/issues/19828#issuecomment-847222980">chazt3n workaround</a>
/// </summary>
/// <remarks>
/// The interceptor works on the assumption of the first field of the main SELECT statement being also the first field
/// in the ORDER BY statement after the explicitly requested OrderBy expressions
/// </remarks>
public class RemoveLastOrderByInterceptor : DbCommandInterceptor {
public const string QueryTag = "RemoveLastOrderBy";
public override ValueTask<InterceptionResult<DbDataReader>> ReaderExecutingAsync(
DbCommand command, CommandEventData eventData, InterceptionResult<DbDataReader> result,
CancellationToken token = new()) {
try {
TryApplyPatch(command);
} catch (Exception ex) { // The exception handling can be different based on your requirements
// _logger.LogError(ex, "Failed to intercept query.");
Console.WriteLine(ex);
throw; // Fails forcefully to avoid unexpected silent behaviours
}
return base.ReaderExecutingAsync(command, eventData, result, token);
}
private static bool TryApplyPatch(DbCommand command) {
const string orderBy = "ORDER BY";
const string select = "SELECT ";
var query = command.CommandText;
int idx;
if (!query.StartsWith("-- "))
// Check if the command is tagged
return false;
var separatorIdx = query.IndexOf("\n\n", StringComparison.Ordinal);
if (separatorIdx < 0
|| query.IndexOf(QueryTag, 0, separatorIdx + 1, StringComparison.Ordinal) < 0)
// Efficiently checks if the tags block contains the required QueryTag
return false;
if ((idx = query.LastIndexOf(orderBy, StringComparison.Ordinal)) < 0)
// The query doesn't have an ORDER BY statement
return false;
// Using StringBuilder to avoid string allocation issues
// While using early versions of .NET Framework there would be buffer allocation exceptions so it's
// necessary to remove part of the pre-allocated string before appending (or specify capacity explicitly)
var cmd = new StringBuilder(query);
// Identify first SELECT field
var start = query.IndexOf(select, StringComparison.Ordinal);
if (start >= 0) {
var nextIdx = query.IndexOf(",", start, StringComparison.Ordinal);
var fromIdx = query.IndexOf("FROM", start, StringComparison.Ordinal);
// Support both selection with only one value and multi value selection
var end = nextIdx < 0 ? fromIdx : Math.Min(nextIdx, fromIdx);
var from = start + select.Length;
// Assemble first selected field query
var firstField = cmd.ToString(from, end - from);
// Check if the ORDER BY starts with a different field than the first selected field and, in such
// case, identifies the index where the "redundant" ORDER BY begins
var orderStart = query.IndexOf(firstField, idx, StringComparison.Ordinal);
if (orderStart > idx + orderBy.Length + 1)
idx = orderStart - 2 /* Subtract 2 characters to take account of the trailing comma */;
}
// Cut ORDER BY statement or remove it entirely
command.CommandText = cmd
.Remove(idx, query.Length - idx)
.Append(';')
.ToString();
return true;
}
}