ardalis/Specification

Rethinking Include implementation

devbased opened this issue Β· 17 comments

As said @fiseni in #182 (comment)

Regarding the Include infrastructure, here is the extension IncludeExtensions. I changed it a few times and it's not terribly bad now, but still, I'm not happy that we have to do it this way. We're duplicating some work done in EF, and Expression.Call uses reflection in the background (it's a lightweight reflection, but anyhow would have been nice not to have it).

For context, read this PR #83 and this issue

It would be great to change 'Include' implementation in terms of performance and logic encapsulation (currently we're using EFCore internals which is not reliable).

Proposal

Dynamically construct and call following delegates in IncludeExtensions.Include or IncludeExtensions.ThenInclude

Instead of copying efcore logic

// EntityFrameworkQueryableExtensions.cs
public static IIncludableQueryable<TEntity, TProperty> Include<TEntity, TProperty>(
    [NotNull] this IQueryable<TEntity> source,
    [NotNull] Expression<Func<TEntity, TProperty>> navigationPropertyPath)
    where TEntity : class
{
    Check.NotNull(source, nameof(source));
    Check.NotNull(navigationPropertyPath, nameof(navigationPropertyPath));

    return new IncludableQueryable<TEntity, TProperty>(
        source.Provider is EntityQueryProvider
            ? source.Provider.CreateQuery<TEntity>(
                Expression.Call(
                    instance: null,
                    method: IncludeMethodInfo.MakeGenericMethod(typeof(TEntity), typeof(TProperty)),
                    arguments: new[] { source.Expression, Expression.Quote(navigationPropertyPath) }))
            : source);
}
// our IncludeExtensions.cs
public static IQueryable<T> Include<T>(this IQueryable<T> source, IncludeExpressionInfo info)
{
    _ = info ?? throw new ArgumentNullException(nameof(info));

    var queryExpr = Expression.Call(
        typeof(EntityFrameworkQueryableExtensions),
        "Include",
        new Type[] {
            info.EntityType,
            info.PropertyType
        },
        source.Expression,
        info.LambdaExpression
        );

    // !!! we've dug into efcore source and copied private code, vulnerable to unannounced modifications !!!
    return source.Provider.CreateQuery<T>(queryExpr);
}

Do this

// our IncludeExtensions.cs
// public API we're safe to use it until efcore team announce breaking changes
private static readonly MethodInfo IncludeMethodInfo = typeof(EntityFrameworkQueryableExtensions)
    .GetTypeInfo().GetDeclaredMethods(nameof(EntityFrameworkQueryableExtensions.Include))
    .Single(mi => mi.GetGenericArguments().Length == 2
        && mi.GetParameters().Any(pi => pi.ParameterType.GetGenericTypeDefinition() == typeof(Expression<>)));

private static readonly CachedReadConcurrentDictionary<(Type EntityType, Type PropertyType, Type? PreviousPropertyType), Lazy<Func<IQueryable, LambdaExpression, IQueryable>>> IncludesCache =
    new CachedReadConcurrentDictionary<(Type EntityType, Type PropertyType, Type? PreviousPropertyType), Lazy<Func<IQueryable, LambdaExpression, IQueryable>>>();

public static IQueryable<T> IncludeCached<T>(this IQueryable<T> source, IncludeExpressionInfo info)
{
    _ = info ?? throw new ArgumentNullException(nameof(info));

    var include = IncludesCache.GetOrAdd((info.EntityType, info.PropertyType, null), CreateIncludeDelegate).Value;

    return (IQueryable<T>)include(source, info.LambdaExpression);
}

private static Lazy<Func<IQueryable, LambdaExpression, IQueryable>> CreateIncludeDelegate((Type EntityType, Type PropertyType, Type? PreviousPropertyType) cacheKey)
    => new Lazy<Func<IQueryable, LambdaExpression, IQueryable>>(() =>
    {
        var concreteInclude = IncludeMethodInfo.MakeGenericMethod(cacheKey.EntityType, cacheKey.PropertyType);
        var sourceParameter = Expression.Parameter(typeof(IQueryable));
        var selectorParameter = Expression.Parameter(typeof(LambdaExpression));

        var call = Expression.Call(
            concreteInclude,
            Expression.Convert(sourceParameter, typeof(IQueryable<>).MakeGenericType(cacheKey.EntityType)),
            Expression.Convert(selectorParameter, typeof(Expression<>).MakeGenericType(typeof(Func<,>).MakeGenericType(cacheKey.EntityType, cacheKey.PropertyType))));

        var lambda = Expression.Lambda<Func<IQueryable, LambdaExpression, IQueryable>>(call, sourceParameter, selectorParameter);

        var include = lambda.Compile();

        return include;
    });

Note: CachedReadConcurrentDictionary is thread-safe dictionary for read-heavy workloads

Pros and cons

Pros

  • Relying only on public efcore API
  • No MethodInfo lookup on each Include/ThenInclude invocation (see: Expression.Call)
  • Faster|less allocative than current implementation (and somehow faster than efcore, i would be very grateful if someone explain this one to me 😰)

Cons

  • Expressions and especially their caching may introduce subtle bugs
  • ???

Implications

// IncludableBuilderExtensions.cs
public static IIncludableSpecificationBuilder<TEntity, TProperty> ThenInclude<TEntity, TPreviousProperty, TProperty>(
    this IIncludableSpecificationBuilder<TEntity, IEnumerable<TPreviousProperty>> previousBuilder,
    Expression<Func<TPreviousProperty, TProperty>> thenIncludeExpression)
    where TEntity : class
{
    var info = new IncludeExpressionInfo(thenIncludeExpression, typeof(TEntity), typeof(TProperty), typeof(TPreviousProperty));

    ((List<IncludeExpressionInfo>)previousBuilder.Specification.IncludeExpressions).Add(info);

    var includeBuilder = new IncludableSpecificationBuilder<TEntity, TProperty>(previousBuilder.Specification);

    return includeBuilder;
}

Should be replaced with

public static IIncludableSpecificationBuilder<TEntity, TProperty> ThenIncludeCached<TEntity, TPreviousProperty, TProperty>(
    this IIncludableSpecificationBuilder<TEntity, IEnumerable<TPreviousProperty>> previousBuilder,
    Expression<Func<TPreviousProperty, TProperty>> thenIncludeExpression)
    where TEntity : class
{
    var info = new IncludeExpressionInfo(thenIncludeExpression, typeof(TEntity), typeof(TProperty), typeof(IEnumerable<TPreviousProperty>));

    ((List<IncludeExpressionInfo>)previousBuilder.Specification.IncludeExpressions).Add(info);

    var includeBuilder = new IncludableSpecificationBuilder<TEntity, TProperty>(previousBuilder.Specification);

    return includeBuilder;
}

Or even consolidated with this one since typeof(TPreviousProperty) == typeof(IEnumerable<TPreviousProperty>) (branching will be done in CreateThenIncludeDelegate)

public static IIncludableSpecificationBuilder<TEntity, TProperty> ThenInclude<TEntity, TPreviousProperty, TProperty>(
    this IIncludableSpecificationBuilder<TEntity, TPreviousProperty> previousBuilder,
    Expression<Func<TPreviousProperty, TProperty>> thenIncludeExpression)
    where TEntity : class
{
    var info = new IncludeExpressionInfo(thenIncludeExpression, typeof(TEntity), typeof(TProperty), typeof(TPreviousProperty));

    ((List<IncludeExpressionInfo>)previousBuilder.Specification.IncludeExpressions).Add(info);

    var includeBuilder = new IncludableSpecificationBuilder<TEntity, TProperty>(previousBuilder.Specification);

    return includeBuilder;
}
  • Also to comply with efcore's extension methods, Include method return type should be changed from IQueryable<T> to IIncludableQueryable<T>. And ThenInclude source parameter type from IQueryable<T> to IIncludableQueryable<T>. Or move this logic directly to the evaluator.

Benchmark

Modified #83 (comment)

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19042.1165 (20H2/October2020Update)
Intel Core i5-9600K CPU 3.70GHz (Coffee Lake), 1 CPU, 6 logical and 6 physical cores
.NET SDK=6.0.100
  [Host]     : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT
  DefaultJob : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT

Method Mean Error StdDev Gen 0 Gen 1 Allocated
EFIncludeExpression 2.309 s 0.0317 s 0.0281 s 351000.0000 - 2 GB
SpecIncludeExpression 4.903 s 0.0371 s 0.0347 s 748000.0000 2000.0000 3 GB
SpecIncludeExpressionCached 1.805 s 0.0092 s 0.0086 s 295000.0000 - 1 GB
using Ardalis.Specification;
using Ardalis.Specification.EntityFrameworkCore;
using Microsoft.EntityFrameworkCore;

namespace Benchmark.Specification.Evaluators;

[MemoryDiagnoser]
public class SpecIncludeBenchmark
{
    private readonly int max = 500000;
    private readonly SpecificationEvaluator evaluator = SpecificationEvaluator.Default;
    private readonly SpecificationEvaluator evaluatorCached = new SpecificationEvaluator(new IEvaluator[]
    {
        WhereEvaluator.Instance,
        Ardalis.Specification.EntityFrameworkCore.SearchEvaluator.Instance,
        IncludeEvaluatorCached.Instance,
        OrderEvaluator.Instance,
        PaginationEvaluator.Instance,
        AsNoTrackingEvaluator.Instance,
#if NETSTANDARD2_1
                AsSplitQueryEvaluator.Instance,
                AsNoTrackingWithIdentityResolutionEvaluator.Instance
#endif
    });
    private readonly Specification<Store> specInclude = new StoreIncludeProductsSpec();
    private readonly Specification<Store> specIncludeCached = new StoreIncludeProductsSpecCached();

    private readonly IQueryable<Store> Stores;

    public SpecIncludeBenchmark()
    {
        Stores = new BenchmarkDbContext().Stores.AsQueryable();
    }

    [Benchmark]
    public void EFIncludeExpression()
    {
        for (int i = 0; i < max; i++)
        {
            _ = Stores.Include(x => x.Products).ThenInclude(x => x.CustomFields);
        }
    }

    [Benchmark]
    public void SpecIncludeExpression()
    {
        for (int i = 0; i < max; i++)
        {
            _ = evaluator.GetQuery(Stores, specInclude);
        }
    }

    [Benchmark]
    public void SpecIncludeExpressionCached()
    {
        for (int i = 0; i < max; i++)
        {
            _ = evaluatorCached.GetQuery(Stores, specIncludeCached);
        }
    }
}

public class BenchmarkDbContext : DbContext
{
    public DbSet<Store> Stores { get; set; }

    /// <inheritdoc />
    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
    {
        base.OnConfiguring(optionsBuilder);

        optionsBuilder.UseSqlServer("Server=(localdb)\\mssqllocaldb;Integrated Security=SSPI;Initial Catalog=SpecificationEFTestsDB;ConnectRetryCount=0");
    }
}

public class StoreIncludeProductsSpec : Specification<Store>
{
    public StoreIncludeProductsSpec()
    {
        Query.Include(x => x.Products).ThenInclude(x => x.CustomFields);
    }
}

public class StoreIncludeProductsSpecCached : Specification<Store>
{
    public StoreIncludeProductsSpecCached()
    {
        Query.Include(x => x.Products).ThenIncludeCached(x => x.CustomFields);
    }
}

public class Store
{
    public int Id { get; set; }
    public IEnumerable<Product> Products { get; set; }
}

public class Product
{
    public int Id { get; set; }
    public CustomFields CustomFields { get; set; }
}

public class CustomFields
{
    public int Id { get; set; }
    public string CustomText1 { get; set; }
    public string CustomText2 { get; set; }
}

Also to comply with efcore's extension methods, Include method return type should be changed from IQueryable<T> to IIncludableQueryable<T>. And ThenInclude source parameter type from IQueryable<T> to IIncludableQueryable<T>.
Or move this logic directly to the evaluator.

Hi @devbased,

Thank you, we appreciate your effort here. I'll analyze your fork in more detail soon. At first glance, I have a few remarks.

Let's have the following specification

public class StoreSpec : Specification<Store>
{
    public StoreSpec(StoreFilter filter)
    {
        Query.Include(x => x.Products.Where(x => x.Name.Equals(filter.Name) && x.Descriptions.Contains(filter.DescriptionSearch))
				.ThenInclude(x => x.CustomFields.Where(x => x.Id == filter.CustomFieldId);
    }
}
  • You assume that Include statements will contain only selector expressions (e.g. `Include(x => x.Products)), which is no longer the case. Starting with EF Core 5, filtered includes were introduced, and Include statements can get really complicated (as seen above). That's one of the reasons we're passing the expressions as they are without altering them at all.
  • That said, the cache key can't be constructed simply from EntityType, PropertyType, PreviousPropertyType, but we have to account for the various user parameters. We can't know how many parameters there will be, nor how complex the expression is. We might think of using the hash key of the expression as part of the cache key, but two expressions with the same parameters will still produce different hash keys, so that won't work.
  • Even if we find an efficient way how to generate the cache key, we face another issue. We have no idea how large the user's data store is. In the above example, there might be 1M products, and based on end-user inputs you'll end up with >>1M entries in the cached dictionary. You get the idea.
  • The most frightening part here is that we have some quite nasty closures. We have them now too, but since all objects are transient, eventually they will be disposed. But, once you start caching those expressions, I think GC won't be able to collect them anymore. All those objects (in this example StoreFilter, and consequently the specification) will remain alive and bloat the memory.
  • Since we're just an abstraction over EF, being faster than EF sounds a bit fishy, right πŸ˜„? I have to analyze and debug this, but I assume the source IQueryable is not passed correctly to EF, this check source.Provider is EntityQueryProvider fails and actually, no operation is done by EF at all.

Also, I don't think you need to replace TPreviousProperty with IEnumerable<TPreviousProperty>), you'll face issues with chains. Anyhow, I'll go through this over the weekend if I can, then we can discuss it in more detail.

Thanks!

@fiseni
Oh, i didn't think about complex expressions in the include. Really sad..

Thank you for the feedback! Now i see downsides and that this problem is not quite trivial
At least maybe my attempt will inspire someone else 😞

No need for disappointment, the road almost always is bumpy πŸ˜„

We'll think about it, but I don't think caching would be a smart idea (especially I try to avoid it in libraries). In this case, it's even a security concern. A malicious user can provide random strings as parameters, and your cache will grow... until eventually, something crashes.

Cache grow could be resolved by expiration or using something like LRU.

Indeed you can. But, what will be your limit? How are you gonna decide for that for a broad audience? If the limits are small, the performance gains will be somewhat trivial, and if you set them high you're messing with the user's intentions. I mean we're stepping in a different realm there, with a whole set of new problems. Even EF is not caching the query expressions. In EF Core 6, they're compiling the model and some of the queries I think, but it's a tricky subject.

In any case, we appreciate your work here. We'll think about how we can evolve this πŸ‘

Wait, i don't cache user's expressions. I cache 'proxy' delegates which used to pass user's expressions to efcore extension methods πŸ€”

Yeah, my point was that not even EF was going toward that idea. You're caching the delegates, but you have an issue with the cache keys. It will grow, so still, you will need some expiration there.

Also, please try if your changes actually work. Run the integration tests or do some additional ones. It seems EF extension methods are not called correctly, and internally EF bypasses the logic and just returns the source. That's why you're faster than EF in the tests.

Hey @devbased,

Now I'm checking your fork and your changes. I was wrong about the cache keys, that's not an issue. Let's have the following specifications

public class StoreSpec1 : Specification<Store>
{
    public StoreSpec1()
    {
        Query.Include(x => x.Products);
    }
}

public class StoreSpec2 : Specification<Store>
{
    public StoreSpec2(StoreFilter filter)
    {
        Query.Include(x => x.Products.Where(x => x.Name.Equals(filter.Name));
    }
}

The Include expressions in these specifications, even though they are different, will map to the same cached delegate. But, that's not an issue actually. All you care about are the generic parameters, and in both cases they're same.

It looks promising. Let's test it a bit more and think of possible issues. Now that I'm checking your code, it doesn't seem we'll have an issue with closures here.

Actually, caching could be opt-in by some option in IncludeEvaluator. Without caching just do this

query = (IQueryable<T>)EfCoreIncludeMethodInfo.MakeGenericMethod(includeInfo.EntityType, includeInfo.PropertyType).Invoke(null, new object[] { query, includeInfo.LambdaExpression, });

There will be reflection overhead but still we'll use efcore public API.

Yes, that's reasonable. Enabling it per specification might not be nice. Perhaps we can provide that possibility in the specification evaluator.

    public class SpecificationEvaluator : ISpecificationEvaluator
    {
        // Will use singleton for default configuration. Yet, it can be instantiated if necessary, with default or provided evaluators.
        public static SpecificationEvaluator Default { get; } = new SpecificationEvaluator();
        public static SpecificationEvaluator Cached { get; } = new SpecificationEvaluator(true);

        private readonly List<IEvaluator> evaluators = new List<IEvaluator>();

        public SpecificationEvaluator(bool cacheInclude = false)
        {
            this.evaluators.AddRange(new IEvaluator[]
            {
                WhereEvaluator.Instance,
                SearchEvaluator.Instance,
                cacheInclude ? (IEvaluator)IncludeEvaluatorCached.Instance : (IEvaluator)IncludeEvaluator.Instance,
                OrderEvaluator.Instance,
                PaginationEvaluator.Instance,
                AsNoTrackingEvaluator.Instance,
#if NETSTANDARD2_1
                AsSplitQueryEvaluator.Instance,
                AsNoTrackingWithIdentityResolutionEvaluator.Instance
#endif
            });
        }
        public SpecificationEvaluator(IEnumerable<IEvaluator> evaluators)
        {
            this.evaluators.AddRange(evaluators);
        }
		
		...
	}

Then users can opt-in for the cached version as following

public class MyRepository<T> : RepositoryBase<T>, IRepository<T> where T : class
{
	private readonly SampleDbContext dbContext;

	public MyRepository(SampleDbContext dbContext) : base(dbContext, SpecificationEvaluator.Cached)
	{
		this.dbContext = dbContext;
	}

	// Not required to implement anything. Add additional functionalities if required.
}

@fiseni Done here (+ cleanup after expressions refactoring).
All tests passed.

Thanks @devbased, it looks great. Would you like to create a PR?

Btw, we have to do the same for EF6 package. Would you like to work on it too? If yes, please let's include it in the same PR.

@fiseni I'm not familiar with ef6 (for me it is deprecatedπŸ˜…), so i need to find a time to take a look what could be done.

@fiseni EF6 supports Include method with string argument and provides extension to pass lambda expression which is parsed to string anyway.

IQueryable<Store>.Include(x => x.Company.Stores.Select(x => x.Products))
// converted internally to
IQueryable<Store>.Include("Company.Stores.Products")

I see a couple of options:

  1. Discard ThenInclude to comply with EF6 design: a bad one because EfCore does not support such lambdas.
  2. Split IncludeExpressionInfos by Include expression type
    a. Include->ThenInclude
    b. Include->ThenInclude->ThenInclude->...
    And then build final lambda expression with .Select invocations. This approach is the same as current with difference that expressions are concatenated instead of strings. With good unit/integration testing it is worthless.
  3. Leave it as is (i prefer this, use efcore, guys! πŸ˜‹).

Indeed, there is no ThenInclude in EF6. But, in order to be consistent with our API, we do evaluate our ThenInclude expressions and convert them into string parameters for the EF6 plugin package. There is much room for improvement there, but I think we won't invest more in it. The EF6 package was the contribution from a community member, and we still welcome improvements and changes. But, it won't be us.

Anyhow, we also provide an overload for Include which accepts a string parameter. So, anyone using EF6 can simply use Include("Company.Stores.Products") or Include($"{nameof(Company)}.{nameof(Company.Stores)}.{nameof(Company.Stores.Products)}") in their specifications. This actually would be quite efficient since there is no any additional operation involved.

So, yeah, we'll leave it as it is, and won't implement caching for EF6.