NetFabric/NetFabric.Hyperlinq

ToArray Result Element Type is Different Than ToList

Mike-E-angelo opened this issue ยท 8 comments

Greetings. I have been enjoying getting Hyperlinq integrated into my code and the performance benefits that result of such.

I did find a possible issue, however, and wanted to get some eyes on it to verify if possible.

Using this file and line here:

https://github.com/DragonSpark/Framework/blob/04415dea2427ed6a0040fd580360544591a6e5c5/DragonSpark.Presentation/Delegates.cs#L42

And attempting to invoke ToArray instead, you can see that the returning element type is different from that which is returned by either ToList or the ToDictionary value type. Screenshot of this here:

Am I misunderstanding something fundamental here, by chance? It would seem to me that they all should return the same element value type.

Thank you for any context/insight you can provide, and thanks for your efforts into this library!

Hi @Mike-EEE! I'm glad you enjoy the benefits of using Hyperlinq.

Are you measuring the performance improvements? I'd love to get real world data.

A query with AsValueEnumerable().Where().Select() is handled by the WhereSelectEnumerable that takes a IValueEnumerables<,> as source.

It contains an implementation of ToList() that calls an optimized version that uses only one enumerator.

I don't have an equivalent optimized version of ToArray() so, your code is calling a method implemented by a code generator. It looks like it's getting confused because a Select()changes the output type. I thought I had fixed those issues.

I'm currently focusing on increasing the code coverage so that I catch all these evasive issues and finally release a stable version. I'll try to release soon a version that fixes this particular one.

Any feedback is welcome and please do report any issue you find.

Ok cool! As far as measuring, no not yet. What I meant, in particular, is that I use the wonderful ReSharper extension Heap Allocations Viewer which underlines/demarks/warns of all Linq calls as possible allocations.

For example, the above looks like this without Hyperlinq:

So, these warnings are now gone when replacing them with Hyperlinq calls. ๐Ÿ˜Š

@Mike-EEE I didn't know ReSharper could do that. Good to know!

I'm already working on a fix for this issue.

I'd like to make a suggestion about your code. The ToList and ToArray methods allocate on the heap and have to copy each result item into it. Exactly what you're trying to avoid, and rightly so, by using Hyperlinq. These methods should be used only when you know that:

  • the result will be enumerated many times and, avoiding the execution of the query code all those time, is worth the heap allocation

  • the result will be enumerated more than once and getting it is very expensive (remote sources)

I don't think you fall into the second case so, we should worry about the first. My opinion is that, it's the caller that knows what will be done with the result. You should not use neither ToList or ToArray. Return the enumeration result without caching it.

Remove these methods from the query and return IValueReadOnlyList<,,>, IValueReadOnlyCollection<,,> or IValueEnumerable<,,>. Depending on what the query return. These are fully compatible with the enumeration interfaces available in .NET. The caller can use Hyperlinq, LINQ or foreach on any of these.

For the case the user uses Hyperlinq or foreach, you can avoid the boxing of the enumerable, resulting in a heap allocation, by returning the actual enumerable type, for example WhereSelectEnumerable<,,>.

I wrote a post about some of this:
https://medium.com/@antao.almada/enumeration-in-net-v-tolist-and-toarray-395ce615ecfd

Ah indeed. In this case, I am storing delegates compiled by lambda expressions which will be used throughout the application's lifetime. This is essentially called the first time it's used (and then cached via ConditionalWeakTable). :)

I will keep your suggestion in mind, however. The problem I see is that they all require numerous generic arguments and result in lengthy symbols throughout the code as a result, which does impact readability IMO. I know you are used to it so no big deal to you, but those of us who are not used to it have a little warming up to do. ๐Ÿ˜…

@Mike-EEE Can you please try version 3.0.0-beta8?

Woohoo! Thank you @aalmada I will try this out ASAP... currently in the middle of a refactor but will let you know! THANK YOU for your quick efforts here and attending to your awesome project here. ๐Ÿ‘

OK! Finally wrapped up my refactor and tried to bump up the version of Hyperlinq. It appears there are new delegates used? I am actually a fan of this as if I understand correctly this differentiates it from Linq calls. Unfortunately all my code is set to Func<TIn, TOut> and also Func<T, bool> so I will have to spend some time figuring a good approach here.

Alright, turns out this wasn't that big of a deal. I was able to wrap Selector<TIn, TOut> in a Func<TIn, TOut> (and vice versa) ... not sure what this does for performance though. ๐Ÿ˜

Example

Also, not sure if this is known, but I had to add an explicit reference to Microsoft.Bcl.AsyncInterfaces to compile:
https://github.com/DragonSpark/Framework/blob/ef853774c535c798dcac3bdeaca314b4e58c7ce1/DragonSpark/DragonSpark.csproj#L15