Some ideas of improvements
oleggolovkov opened this issue · 5 comments
I was looking at the code and thought there are some things worth trying to further improve the execution time:
- pass
Summary
by ref toMerge
Summary
is 16 bytes and for min/maxint
is not needed,short
should be enough..Select(x => (Name: x.Key.ToString(), x.Value))
for final result may not be needed, instead Utf8Span could implement IComparable and do comparison onSpan
property- if the previous point is implemented
OrderBy
can be replaced with something likeSpan<Utf8Span> spans = stackalloc Utf8Span[result.Count]; spans.Sort();
- apply
[SkipLocalsInit]
toSummary
since it looks like theInit
method is always invoked for a new instance anyway .ToList()
may not be needed before callingAggregate
becauseAggregate
already has a version forParallelQuery<TSource>
Console.Write($"{pair.Name} = {pair.Value}");
allocates an unnecessary string, could instead rewrite it asConsole.Write(pair.Name); Console.Write(" = "); Console.Write(pair.Value)
however not sure what is the performance of actual Console api, maybe the better strategy would be to just form 1 big resulting string and invoke a singleConsole.Write
?
pass Summary by ref to Merge
Everything besides ProcessChunk
has near zero impact on performance
Summary is 16 bytes and for min/max int is not needed, short should be enough.
Thanks, I did that after your original comment. It was helpful to reduce the struct size to 16 bytes.
.Select(x => (Name: x.Key.ToString(), x.Value)) for final result may not be needed, instead Utf8Span could implement IComparable and do comparison on Span property
Again, doesn't matter. I need to materialize the span at some point, it's better to just use string sort. Then I reuse the same strings for printing,
The rest is really not relevant for performance, it happens only once.
I need to materialize the span at some point
why? I think it should be possible to just print span to console
The rest is really not relevant for performance, it happens only once.
yes but it depends on the number of unique cities and if we combine overhead of ToList
, allocations caused by OrderBy
(btw we could sort this with Radix sort in O(N)) it may be noticeable
yes but it depends on the number of unique cities and if we combine overhead of ToList, allocations caused by OrderBy (btw we could sort this with Radix sort in O(N)) it may be noticeable
Oh, please. Profile it before suggesting. It's becoming boring.
The problem is there is no exact data set to profile, of course for the sample in the original repo with like 80 cities there would be no difference, however it was mentioned that we can not make any assumptions about the data and if all 1B rows have distinct cities it is obvious there will be a difference
Everything is 100% reproducible... I cannot even imagine which step could be difficult even for a total noob. Both Java repo (with its instructions), and this plain vanilla repo with script showing which commands to call.
Profiling is built into free IDEs.