aardvark-platform/aardvark.base

Optimize Constant<T>.XX

luithefirst opened this issue · 9 comments

Using Constant<T>.XX does not compile to an optimal code, see:
https://github.com/aardvark-platform/aardvark.base/blob/v51/src/Tests/Aardvark.Base.Benchmarks/ConstantsBenchmark.cs

We should evaluate how this could be improved. Maybe it is possible by changing the way Constant is defined.

Isn't the benchmark flawed? The cost of the static constructor of Constant<T> is included in the runtimes. If you manually run the constructor beforehand (e.g. by referencing Constant<double>.PositiveTinyValue) I get the following results:

Method Mean Error StdDev Code Size
ApproximateEqDefault 140.4 μs 2.35 μs 1.96 μs 190 B
ApproximateEqInline 140.6 μs 2.05 μs 1.72 μs 190 B

Thanks for reviewing this! Interesting, I would say that depends on the definition of flawed. When I change the benchmark to use Constant in the constructor, I get the same results as you.
Without this modification, I would have assumed that since Constant is static and thereby needs to be created only once per runtime that it would happen in the first Warmup run. But of course, this code must exist in some place, and without any other setup, it is placed in the benchmark routine. Checking the constructor should be super lightweight, but for some reason when looking at the disassembly it appears to happen every loop iteration (call CORINFO_HELP_CLASSINIT_SHARED_DYNAMICCLASS). I would have expected this TypeInitialization code to be hidden and done by the runtime and would have also expected something like this also if Constant has already been created, but when looking at the disassembly after the change, both benchmark routines are identical. This means that the compiler was intelligent enough to optimize this.

So, in conclusion, this means there will be always a single use of Constant that has bad performance per application and in all other cases, we can expect the compiler to optimize this properly. Can we? Maybe this is only true for small routines.
I would say that no change is needed, but do you think it would be worthwhile creating a [OnAardvarkInit] that initializes those constants?

I really don't think it's a good idea to initialize those constants with [OnAardvarkInit]. All you're doing is moving the one time overhead to another place.

Moving the code to a deterministic place would be what I suggest. Otherwise, there would be one place (depending on code generation order) that would have the initialization and not result in an optimal code, the same as we have here with the benchmark. With an [OnAardvarkInit] method we can control this and make sure that it generates the best code everywhere.

The [OnAardvarkInit] introduces additonal overhead, even if those constants are never used (also you'll have to initialize them for all primitive types?). The static constructor is just a small one-time overhead and should only be called once in theory. So if you use ApproximateEquals in a loop, only the first invocation should have that overhead.

I don't understand what's happening with this benchmark code but I don't think it's a good idea to potentially increase the startup time for all applications that use Aardvark.Base based on this. If this is really an issue that noticeably affects performance in a specific use case, you could just manually call the constructor as we did here.

there are several arguments against this:

  • aardvark.init becomes necessary
  • it is optimization tailored to a particular compiler/runtime scenario
  • it is not idiomatic

did we understand why original generated code is so bad? should we try with .net 6?

Adding this initialization would resolve the issue with the benchmark (takes 6ms):

static class Constants
{
    [OnAardvarkInit]
    public static void Init()
    {
        var ignore1 = Constant<double>.MachineEpsilon;
        var ignore2 = Constant<float>.MachineEpsilon;
        var ignore3 = Constant<byte>.MachineEpsilon;
        var ignore4 = Constant<sbyte>.MachineEpsilon;
        var ignore5 = Constant<short>.MachineEpsilon;
        var ignore6 = Constant<ushort>.MachineEpsilon;
        var ignore7 = Constant<uint>.MachineEpsilon;
        var ignore8 = Constant<int>.MachineEpsilon;
        var ignore9 = Constant<long>.MachineEpsilon;
        var ignore0 = Constant<ulong>.MachineEpsilon;
    }
}

The disassembly code without any prior usage of Constant<double> contains call CORINFO_HELP_CLASSINIT_SHARED_DYNAMICCLASS within the loop. This is obviously bad, only with an Init method we have control of this and can guarantee that it is not embedded anywhere else.
I've then added [MethodImpl(MethodImplOptions.NoOptimization)] everywhere to see what the original code is in both scenarios. With the init method there is no call CORINFO_HELP_CLASSINIT_SHARED_DYNAMICCLASS anywhere, and the code is identical (using the same overload, once passing 1e-16, once Constant<double>.TinyValue). This is actually different from my original assumption, where I thought that the overhead of Constant.XX comes from reading the constant value from some memory location every time, similar to the issue we had with V3d.OOI, but its actually only a bad place where the static class initialization check is placed. So even the constant values are fields, the values are actually inlined from the beginning, even without optimizations. This means that the Constant<T> pattern seems a nice solution, except that its initialization might have an uncontrolled effect.

thanks for the clarification.. so it is one call overhead which returns immediately or is it more? This aardvark init does not harm, so should be cool?
EDIT: from a compiler perspective this one is not that easy. this trampline approach cannot be improved in an obvious way, so i think this will stay...

one thing - maybe we could put this thing with the attribute somewhere else. say in the future we move out math from introspection stuff i would be better if it is already separated..