buybackoff/1brc

Some ideas of improvements

oleggolovkov opened this issue · 5 comments

I was looking at the code and thought there are some things worth trying to further improve the execution time:

  • pass Summary by ref to Merge
  • Summary is 16 bytes and for min/max int is not needed, short should be enough.
  • .Select(x => (Name: x.Key.ToString(), x.Value)) for final result may not be needed, instead Utf8Span could implement IComparable and do comparison on Span property
  • if the previous point is implemented OrderBy can be replaced with something like Span<Utf8Span> spans = stackalloc Utf8Span[result.Count]; spans.Sort();
  • apply [SkipLocalsInit] to Summary since it looks like the Init method is always invoked for a new instance anyway
  • .ToList() may not be needed before calling Aggregate because Aggregate already has a version for ParallelQuery<TSource>
  • Console.Write($"{pair.Name} = {pair.Value}"); allocates an unnecessary string, could instead rewrite it as Console.Write(pair.Name); Console.Write(" = "); Console.Write(pair.Value) however not sure what is the performance of actual Console api, maybe the better strategy would be to just form 1 big resulting string and invoke a single Console.Write?

pass Summary by ref to Merge

Everything besides ProcessChunk has near zero impact on performance

Summary is 16 bytes and for min/max int is not needed, short should be enough.

Thanks, I did that after your original comment. It was helpful to reduce the struct size to 16 bytes.

.Select(x => (Name: x.Key.ToString(), x.Value)) for final result may not be needed, instead Utf8Span could implement IComparable and do comparison on Span property

Again, doesn't matter. I need to materialize the span at some point, it's better to just use string sort. Then I reuse the same strings for printing,

The rest is really not relevant for performance, it happens only once.

I need to materialize the span at some point

why? I think it should be possible to just print span to console

The rest is really not relevant for performance, it happens only once.

yes but it depends on the number of unique cities and if we combine overhead of ToList, allocations caused by OrderBy (btw we could sort this with Radix sort in O(N)) it may be noticeable

yes but it depends on the number of unique cities and if we combine overhead of ToList, allocations caused by OrderBy (btw we could sort this with Radix sort in O(N)) it may be noticeable

Oh, please. Profile it before suggesting. It's becoming boring.

The problem is there is no exact data set to profile, of course for the sample in the original repo with like 80 cities there would be no difference, however it was mentioned that we can not make any assumptions about the data and if all 1B rows have distinct cities it is obvious there will be a difference

Everything is 100% reproducible... I cannot even imagine which step could be difficult even for a total noob. Both Java repo (with its instructions), and this plain vanilla repo with script showing which commands to call.

Profiling is built into free IDEs.