Testability of compute shaders

Thanks for this library! I've played around with it some and I love it. It has been a surprisingly gentle introduction to compute shaders for me (I have no prior experience).

However, I notice that none of the shaders I've written are unit testable:

They depend on the Hlsl, ThreadIds, etc classes, which are static and therefore cannot be mocked
They depend on various buffer types, like ReadWriteBuffer, which lack any accessible constructors and are sealed and therefore can neither be instantiated in a unit testable way nor mocked
My go-to refactorings (such as "encapsulate the logic into a thing which is testable") are often impossible due to limited supported syntax, absence of interfaces on the above types, etc

Do you have any tips for improving the testability of my shaders?

If not, would you be open to discussing the road toward unit testability? I have some suggestions in mind, but I think implementing any of them would be a major version bump (not to mention the programming effort).

Hey there! Glad to hear the library has allowed you to play around with compute shaders without having prior experience! That's precisely what the main goal of this project is (to make all of this easy to use), so I'm happy to see that materialize 😄

"Do you have any tips for improving the testability of my shaders?"

I do! I think you're approaching things from the wrong angle here. You shouldn't try to mock any of the ComputeSharp classes (such as Hlsl, ThreadIds, etc.) as those are strictly GPU intrinsics (ie. methods that directly map to GPU operations). Similarly for the various buffer types, they specific refer to GPU memory and have very precise semantics. There's no realistic way to somehow abstract all of this in a way that allows your shaders to run the same in the testing environment. Even if you could do that, at which point you'd just be testing your own mock abstraction rather than the real functionality, which means you'd just have done a whole lot of work for nothing. And it'd also be extremely easy to have differences in functionality between the two.

What you should do if you want to unit test your shaders is... Just run your shaders in unit tests! You can always use ComputeSharp even when a GPU is not available (eg. in a CI runner in a VM), because it'll always automatically pick the software adapter as a default device (ie. the WARP device). This is how all of my unit tests can also run in GitHub Actions 🙂

I'd recommend to take a look at my unit tests, in case they can provide some inspiration. Eg. here's one:

ComputeSharp/tests/ComputeSharp.Tests/ShaderMembersTests.cs

Lines 16 to 56 in 87de35f

    
           public void StaticConstants(Device device) 
        
           { 
        
               using ReadWriteBuffer<float> buffer = device.Get().AllocateReadWriteBuffer<float>(8); 
        
               device.Get().For(1, new StaticConstantsShader(buffer)); 
        
               float[] results = buffer.ToArray(); 
        
               Assert.AreEqual(3.14f, results[0], 0.00001f); 
        
               Assert.AreEqual(results[1], results[2], 0.00001f); 
        
               Assert.AreEqual(1, results[3], 0.00001f); 
        
               Assert.AreEqual(2, results[4], 0.00001f); 
        
               Assert.AreEqual(3, results[5], 0.00001f); 
        
               Assert.AreEqual(4, results[6], 0.00001f); 
        
               Assert.AreEqual(3.14f, results[7], 0.00001f); 
        
           } 
        
           [AutoConstructor] 
        
           [ThreadGroupSize(DefaultThreadGroupSizes.X)] 
        
           [GeneratedComputeShaderDescriptor] 
        
           internal readonly partial struct StaticConstantsShader : IComputeShader 
        
           { 
        
               public readonly ReadWriteBuffer<float> buffer; 
        
               static readonly float Pi = 3.14f; 
        
               static readonly float SinPi = Hlsl.Sin(Pi); 
        
               static readonly int2x2 Mat = new(1, 2, 3, 4); 
        
               static readonly float Combo = Hlsl.Abs(Hlsl.Clamp(Hlsl.Min(Hlsl.Max(3.14f, 2), 10), 0, 42)); 
        
               public void Execute() 
        
               { 
        
                   buffer[0] = Pi; 
        
                   buffer[1] = SinPi; 
        
                   buffer[2] = Hlsl.Sin(3.14f); 
        
                   buffer[3] = Mat.M11; 
        
                   buffer[4] = Mat.M12; 
        
                   buffer[5] = Mat.M21; 
        
                   buffer[6] = Mat.M22; 
        
                   buffer[7] = Combo; 
        
               } 
        
           }

Hope this helps!

Thank you for the quick response! I have a few thoughts in reply...

This is how all of my unit tests can also run in GitHub Actions 🙂

I'd recommend to take a look at my unit tests, in case they can provide some inspiration.

I did happen to notice your automated tests, and I hear you on that point. Your tests run quickly enough and you have decent coverage. So no gripes there.

But I think we're operating with different definitions of "unit test", and I don't want to be talking past you. On my (very) subjective unit test<->integration test scale, I personally would place your automated tests more on the "integration" side. That's not a critique of them, that's just my intuition. My spidy sense tells me that your tests require alignment across several layers, and I suspect a bug in a lower layer will manifest as lots of failures in higher layers among tests that are otherwise unrelated. In my opinion unit tests by definition would not exhibit that kind of behavior. So whether that's right or wrong, I hope that helps place what I'm saying on the map.

I think you're approaching things from the wrong angle here. You shouldn't try to mock any of the ComputeSharp classes (such as Hlsl, ThreadIds, etc.) as those are strictly GPU intrinsics (ie. methods that directly map to GPU operations). Similarly for the various buffer types, they specific refer to GPU memory and have very precise semantics. There's no realistic way to somehow abstract all of this in a way that allows your shaders to run the same in the testing environment. Even if you could do that, at which point you'd just be testing your own mock abstraction rather than the real functionality, which means you'd just have done a whole lot of work for nothing. And it'd also be extremely easy to have differences in functionality between the two.

I think it depends on the chosen abstraction :)

For example, what if IComputeShader looked like this:

public interface IComputeShader
{
    void Execute(ComputationContext context);
}

public class ComputationContext
{
  public virtual Hlsl Hlsl => new Hlsl();
  public virtual ThreadIds ThreadIds => new ThreadIds(0, 0, 0);
}

public class Hlsl
{
  public virtual float Max(float x, float y) => System.Math.Max(x, y);
  public virtual Float2 Max(Float2 x, Float2 y) => new Float2(Max(x.X, y.X), Max(x.Y, y.Y));
  // Lots of legwork to implement all these intrinsics for the CPU...
  // ...or just keep all the exceptions, either way is fine
}

public sealed record ThreadIds(int X, int Y, int Z)
{
  // Implement all the XX, XY, etc as properties which delegate to X, Y, Z...
}

Then I could write my shader:

public readonly partial struct MyFancyShader(ReadWriteBuffer<int> buffer) : IComputeShader
{
  public void Execute(ComputationContext context)
  {
    buffer[context.ThreadIds.X] *= 2;
  }
}

...and test it (in isolation... in theory even without Windows):

[Fact]
public void MyFancyShaderWorks()
{
  var shader = new MyFancyShader(/* get a buffer from somewhere? */);
  for (var i = 0; i < 10; i++)
  {
    var context = Mock.Of<ComputationContext>(x => x.ThreadIds == new ThreadIds(i, 0, 0));
    shader.Execute(context);
  }
  Assert.Equal<int>(buffer, Enumerable.Range(0, 10).Select(x => x * 2));
}

Of note: since this particular shader doesn't depend on the Hlsl class at all, there's no need to worry about mocking any of those intrinsics.

I would call the above "Refactoring Phase 1".

There is still the lingering "problem" of having to get a ReadWriteBuffer<int> from somewhere. At this point I can't help but notice that my silly shader doesn't care at all what kind of buffer it gets, as long as it can read from it and write to it. So why can't my shader instead depend on System.Memory<int>?

public readonly partial struct MyFancyShader(Memory<int> buffer) : IComputeShader
{
  public void Execute(ComputationContext context)
  {
    buffer.Span[context.ThreadIds.X] *= 2;
  }
}

I would call this shader "fully unit testable". All its dependencies are fully injected and are easily mocked in isolation.

But getting to this point would be a pretty major undertaking. And I'm not sure it fits with your future plans for this project.

Random thoughts:

With a custom implementation of MemoryManager it would be possible to have GraphicsDevice.AllocateXYZ<T>() return a Memory<T> that throws an exception when you try accessing its .Span CPU-side yet still contains a valid pointer to the underlying GPU blob. Or who knows, maybe you'd want CPU-side accesses to .Span to transparently cause a copy from GPU memory, then it'd "just work"?

My spidy sense tells me that your tests require alignment across several layers, and I suspect a bug in a lower layer will manifest as lots of failures in higher layers among tests that are otherwise unrelated

This is often desirable, especially in lower level frameworks/libraries. They help show when you're making a behavioral change or break that someone else may be depending on.

Ultimately testing methodology is a personal preference and not everyone agrees to the same ideology around them. There are many libraries (including the Core .NET libraries that ship in box) that do not (and never will) support mocking in the way some users want it. That is something that other consumers of those libraries will have to accept.

@tannergooding I don't disagree. I just think those kinds of tests where all the layers come together are better called integration tests. I was only trying to clarify how I'm using the terms "testable" and "unit test". As an example of a unit test, Sergio pointed to the tests in this repo, but I personally would not call them unit tests. Are they necessary and good tests? Yes. Are they unit tests? Not in my opinion.

But bringing the discussion back to shaders as consumers of this library, in my mind the discussion overlaps with code reuse and other aspects of good software design. It's not just that my shaders don't feel very testable... my shader code to me just doesn't feel very well designed. "Testability" and "good design" are good buddies with lots of synergy, but I don't get those good vibes from my shaders.

To illustrate, consider the Hlsl class. Those intrinsics provide a lot of value for shaders. But that class can only be used within a shader. That means that any logic around those intrinsics that I want to use in a shader can only live in a shader.

Now, as the complexity of a block of code increases, so too does the complexity of the automated tests that are keeping regressions out of it. So what are my options when the complexity crosses my subjective threshold and I want to refactor it into smaller pieces? There's friction if I want pull the logic out into a separate class which is tested by itself. Sources of friction include:

The source code must remain accessible during compile time. I can't peel it off into a separate class library
The peeled-off thing must be paired with another shader (because it'll use the Hlsl class, for example). I can't only toss the code into a class by itself, I also have to create another shader whose sole purpose is to execute that peeled-off code in a test suite

And so I feel this pressure to just keep piling logic into the same shader, and to just keep writing more and more complex automated tests. The end result is my shader code feels fragile.

Also, what are my options if I want to reuse shader logic? Or what if I want to reuse shader logic across platforms? I feel like some of the friction is unnecessary.

Are they unit tests? Not in my opinion.

This is where it comes down to technical definition vs common usage.

Technically speaking, you are correct, and they are not "strictly" unit tests. But the common usage and how many, if not most, developers refer to them is unit tests. They do this regardless of whether they are functional, end to end, integration, perf, unit, or another kind of test.

other aspects of good software design

This is also an area where there is no true consensus. Some people say you must follow DRY or SOLID or TDD or SOME_OTHER_ACRONYM principles and if you don't it's "bad code". Others would say that vaporware isn't good software, that having something that actually exists, ships, and gets used makes it good software. Some people qualify it as software that people want or enjoy using, etc.

At the end of the day, there is a balance. There is no one true best software design principle. Some can help in certain contexts, they can equally hinder you in others. A lot of the most broadly known/used software follows a balance and picks and chooses what works where and avoids what doesn't.

An example of this is things like 100% code coverage. Simply executing a line of code doesn't mean you've caught the bugs, it doesn't mean you've considered the interesting inputs, it doesn't mean you can't introduce breaking changes. So in practice, 100% code coverage doesn't make your software more robust. You can easily have better tests that meaningfully test the important edge cases which provide more value. -- This also doesn't mean 100% code coverage is bad, it can equally help when done "correctly". Just that it simply isn't the "be all" of how to do things and there are alternatives that can be just as good or even better.

I feel like some of the friction is unnecessary.

You're never going to find a friction free experience in software development and there will always be cases where friction may feel unnecessary but it is necessary in practice.

My suggestion would be to writeup a concrete example of what you'd like to do vs what you have to do now. Provide that as an issue and potentially offer some suggestions on how you might like to see that made possible.

This is exactly how large projects, like the .NET Libraries/Runtime, tackle many problems. We have literally millions of lines of production code and equally millions of lines of tests. We do not mock and not everything is unit testable. However, we still have very high levels of robustness, backwards compatibility, performance, and rarely ever have to go and update tests because the behavior of some other API changed. Equally, most consumers don't have to do the same either.

These are very solvable problems

I very much agree with what Tanner has said here so far. I also wanted to further elaborate on some points you made.

"My spidy sense tells me that your tests require alignment across several layers, and I suspect a bug in a lower layer will manifest as lots of failures in higher layers among tests that are otherwise unrelated"

Why is that a problem? The whole point of unit tests is to verify that your logic is correct, and to catch regressions. Making sure that any arbitrarily small bug will only ever result in the smallest subset of failing tests is a non-goal, and honestly, just feels like wasted effort. Yes, if I broke something in ComputeSharp, it's likely that several dozens of unit tests would fail. That would immediately show that I did break something, and I'd go fix it until the CI becomes green. That's what you should be striving for, not spending effort just to make sure that only one test failed if you broke a specific thing. Why should you even care? Similarly, in the .NET runtime for instance, any regression is likely to cause hundreds of thousands of unit tests to break, all across the stack. That is still perfectly fine and still allows you to have tests that do exactly what they're supposed to do 🙂

"I would call this shader "fully unit testable". All its dependencies are fully injected and are easily mocked in isolation."

But at that point all you've done is spending hundreds of hours essentially building a parrallel implementation of all of that logic, which you're now testing against. You'd literally just be writing code to test mocks, not actual production code. It just seems like completely wasted effort to me. And now you've just introduced even more friction, because any time a test fails you'd have to go check whether it's actually a real bug or just a bug or difference in behavior in the mock implementation, not to mention all the cases where you simply cannot mock things, because they specifically rely on GPU-only features that you cannot even express in C# at all (such as when using swizzling operations, eg. Foo(ref x.RAGB)). This can just never work on the CPU.

"To illustrate, consider the Hlsl class. Those intrinsics provide a lot of value for shaders. But that class can only be used within a shader. That means that any logic around those intrinsics that I want to use in a shader can only live in a shader."

Two points to this:

First: it is not true that you can only use those APIs physically inside a shader type. Nothing's stopping you from creating a set of helper methods in some other class, and sharing that across multiple shaders. That will work perfectly fine.
Those APIs are specifically GPU intrinsics. It's like having APIs that map to specific CPU instructions. There's lots of these in .NET itself as well, and for the same reason, those cannot be mocked, as they specifically refer to hardware instructions.

"Also, what are my options if I want to reuse shader logic?"

As I said, just move code to shared helper methods and invoke those from multiple shaders, that is a supported scenario.

"Or what if I want to reuse shader logic across platforms?"

ComputeSharp is specifically a library that abstracts DirectX shaders. It's not meant to be cross platform because DirectX is not cross platform. The same applies to literally any Windows API, or any other OS-specific API or UI framework that exists 😅

But at that point all you've done is spending hundreds of hours essentially building a parrallel implementation of all of that logic

Additionally, you wouldn’t be able to directly compare the results of the mock against the results of running it on a real GPU. The rules for floating point evaluation are a bit relaxed on the GPU side, in the name of performance. Or at the very least the shader compiler will optimize differently than the .NET JIT. And a lot of GPU intrinsics don’t have good analogs on the CPU side, or you’d have to wrestle with implementing similar approximations and it would just be an enormous headache.

Useful, sure. But not worth it. Better to spend the time and energy elsewhere.

@tannergooding @Sergio0694 @rickbrew Thank you for your patient replies. I appreciate the discussion we're having!

Much of what all of you have said I fully agree with. I would like to clarify a few things though, because I fear we risk talking past one another:

First, when I said this above:

I personally would place your automated tests more on the "integration" side.

...I was not saying the automated tests in this project need to change. I was only saying that I personally would not have used them as examples of unit tests, as Sergio did. That really is all I meant by that. I thought it would be helpful to highlight that point of disagreement between us so that the conversation would be clear. I especially wanted to help clarify what I personally and very subjectively meant by the word "testable" in the title and other words I used elsewhere.

And so...

My spidy sense tells me that your tests require alignment across several layers, and I suspect a bug in a lower layer will manifest as lots of failures in higher layers among tests that are otherwise unrelated. In my opinion unit tests by definition would not exhibit that kind of behavior.

Why is that a problem?

It is not a problem at all! My point was not to say there was a problem with the tests in this or any library or framework :)

I think we all understand and agree that there exists a continuous spectrum between "unit" and "integration" test, and one person's "unit" test is another's "integration" test, and different people target different points on that spectrum for various reasons when they write software. I also understand and agree with both of you that it's not always helpful to always target the far extreme "unit test" end.

And so I think the discussion about testing methodologies is tangential. Likewise the discussion about how this or that library or framework tests itself. I'd like to focus instead on the "testability" of shaders which use this library.

Finally, I understand that at the end of the day everything is being turned into HLSL and so something somewhere will be touching a real, unmocked GPU.

My suggestion would be to writeup a concrete example of what you'd like to do vs what you have to do now. Provide that as an issue and potentially offer some suggestions on how you might like to see that made possible.

This is an excellent suggestion. But I have conflicting thoughts about that.

I think if I filed such an issue it would look similar to my second post. That post hints at what I'd like to do: I'd like to just new up my shader and test it in isolation without having to spin up another framework if I don't have to. That post also hints at a path which would let me get there. Until proven otherwise, I have to assume that the theoretical existence of that path implies that some of that friction I'm perceiving is in fact unnecessary and only exists because of abstractions which this library has chosen.

But at that point all you've done is spending hundreds of hours essentially building a parrallel implementation of all of that logic, which you're now testing against. You'd literally just be writing code to test mocks, not actual production code.

I completely understand the sentiment here. I have done this very thing to myself in the past, and so yes I know exactly the unpleasant sensation you're describing.

But I also understand that when I have done that to myself in the past, it was always because I chose improper abstractions :)

For example, if I find myself needing to mock Hlsl.VeryComplicatedIntrinsic over and over and over again, with tests to verify the behavior of the mocks, then my spidy sense says "encapsulate that behavior, inject it, and mock it in tests".

I think the important point here is: there exists a world in which a correct CPU-side implementation of the Hlsl class is not needed at all:

public abstract class Hlsl
{
  public abstract Float4 VeryComplicatedIntrinsic(Float4 a, Float4 b, Float4 c, Float4 d, Float4 e, Float4 f, Float4 g, Float4 h);
}

public readonly record struct ComputationContext(Hlsl Hlsl, ThreadIds ThreadIds);

public readonly partial struct MyFancyShader(ReadWriteBuffer<Float4> buffer) : IComputeShader
{
  public void Execute(ComputationContext context)
  {
    var x = context.ThreadIds.X;
    buffer[x] = context.Hlsl.VeryComplicatedIntrinsic(
      buffer[x + 1],
      buffer[x + 2],
      // etc
    );
  }
}

[Fact]
public void MyFancyShaderWorks
{
  var buffer = /* Still have to figure out where to get a buffer... */;
  var expectedResult = new Float4(...);
  var shader = new MyFancyShader(buffer);
  shader.Execute(
    new ComputationContext(
      Mock.Of<Hlsl>(x => x.VeryComplicatedIntrinsic(It.IsAny<Float4>(), /*...and so on*/) = expectedResult),
      new ThreadIds(0, 0, 0)
    )
  );
  Assert.Equal(expectedResult, buffer[0]);
}

There are no mocks of mocks of mocks here, nor tests of tests of tests. Instead, only the behavior under test is being tested ("did my fancy shader in fact call the very complicated intrinsic and put its result where I expected"—not "did the very complicated intrinsic work when it was used like this"), and only the dependencies are mocked. Yes, the entire Hlsl class is technically a dependency, but in this case only the VeryComplicatedIntrinsic is being used. And so that's all that needs to be mocked. And for cases where lots of intrinsics are used over and over between different shaders, instead of writing mocks upon mocks upon mocks the programmer would have the option of abstracting away that repeated behavior.

not to mention all the cases where you simply cannot mock things, because they specifically rely on GPU-only features that you cannot even express in C# at all (such as when using swizzling operations, eg. Foo(ref x.RAGB)). This can just never work on the CPU.

Yes, swizzling would be tedious to implement on the CPU. But not impossible. And I'm not sure what you mean by "cannot express in C# at all". If that were true then nobody could use swizzle operations with this library. Yet this library already has 5000+ lines of source code declaring a bunch of the swizzling operations on Float4, which is a great start. I'm sure a T4 template or something could do the rest.

But again, that's only really needed if one needs correct CPU-side swizzling operations, which I think I've illustrated is not necessary.

First: it is not true that you can only use those APIs physically inside a shader type. Nothing's stopping you from creating a set of helper methods in some other class, and sharing that across multiple shaders. That will work perfectly fine.

I don't understand what you mean. This doesn't work:

public static class MyFancyAbstraction
{
    public static void DoIt(Int2 index, ReadWriteTexture2D<Rgba32, Float4> lhs, ReadWriteTexture2D<Rgba32, Float4> rhs)
    {
        lhs[index] = Hlsl.Max(rhs[index], rhs[index + 1]);
    }
}

[ThreadGroupSize(DefaultThreadGroupSizes.XY)]
[GeneratedComputeShaderDescriptor]
public readonly partial struct MyFancyShader(
    ReadWriteTexture2D<Rgba32, Float4> lhs,
    ReadWriteTexture2D<Rgba32, Float4> rhs) : IComputeShader
{
    public void Execute()
    {
        var index = ThreadIds.XY;
        MyFancyAbstraction.DoIt(index, lhs, rhs); // This line won't compile... Error CMPS0050
        lhs[index] = Hlsl.Max(rhs[index], rhs[index + 1]); // But this works
    }
}

To say nothing of turning the abstraction into something non-static that can be injected and mocked.

@matthew-a-thomas Why is it important to test "Does this call to this GPU intrinsic that I wrote get passed the values I think it will be passed?". Does it matter the exact values that are passed to this exact call in the middle of the algorithm as long as the algorithm implementation is correct for the given inputs?

Another way to think about how to know if you're over-testing something: Would a non-breaking change that is not observable given the inputs to this function break the tests? More specifically, if all parameters (and the this pointer if relevant) were set in stone and didn't change and you re-implemented the algorithm such that you get the same return value for all of the same inputs, would your test fail?

If your test would fail in the above case, that's "over-testing". You're not testing that the algorithm is correct, you're testing the implementation details of the algorithm. A unit test is meant to validate the correctness of a chunk of code and validate that when you make changes, it doesn't break the tests. If you need to change the tests every time you change the code, you're not testing what a unit test is supposed to test.

In this case, testing "do I call this particular GPU intrinsic at this point in the algorithm" would be broken the moment you change the algorithm implementation for another that has the same results. Is it important that this particular intrinsic is called at this time or that the algorithm has the right results? I would tend to think that all that matters is that the algorithm is correct.

To me at least, validating that the algorithm calls a particular intrinsic at a particular point feels like testing that the compiler compiled the code you wrote, not that the algorithm is correct.

@jkoritzinsky Those are excellent observations. The direct answer is yes that specific example over-tests things; and yes, it is merely validating that the compiler compiled the code.

But with that specific example I only intended to show that the intrinsics in fact do not have to have CPU-side implementations in order to follow the path I'm suggesting. I suppose it was a little artificial and contrived.

Let's consider something more complex. Suppose I write a suite of prime-number finding algorithms. Given an integer, each of them returns a bool indicating whether it's prime. So they all yield the same result, but some are better suited to smaller numbers (they use a closed form math equation for small enough numbers to yield a direct result) while others are better suited for larger numbers. Furthermore I will probably want a composite algorithm that delegates to other algorithms depending on the context. And of course these algorithms are reusable.

My default approach would be:

Implement the algorithms
Verify the algorithms' behaviors with automated tests
Introduce an abstraction over the algorithms
Glue that abstraction into the context
- WPF? Whip up a view-model that is injected with the algorithm
- Console app? Inject the algorithm into my menu-navigation code
- Web API? Whip up a controller and inject the algorithm
- A very boring game? Inject it into the game control logic
Test the value-add of the context

Of course the abstraction in 3. is important because it allows me to write WPF view-models, console app menu-navigation code, and Web API controllers which all use the algorithm and can themselves be unit-tested. All the consumers of an algorithm have higher "testability" because they're depending on something which can be mocked; the value-adding behavior in them (and only in them) can more easily be verified without relying on the precise behavior of an algorithm.

And in 5., the implementation details of the algorithms are not important within the context of unit tests on those various consumers of the algorithms. Which is precisely why it's valuable to be able to mock an algorithm within those tests... the tests should be testing the value-add of those various consumers in 5., not the correctness of their dependencies. Other tests will be responsible for verifying the correct behavior of the various prime-number finding algorithms. So for the tests on those various consumers of the algorithms, there does not need to be a single working algorithm at all.

Now let's try gluing the abstraction into a compute shader. And for the sake of keeping this example motivating, suppose the shader additionally does something else that is sufficiently valuable and complex enough to convince anyone it's complex and valuable. Maybe the shader uses the prime numbers to perform international bank transactions for the most sue-happy people in the world. If a whole department of people have to be paid to manually test it around the clock, that's what will happen.

So let's use the prime-number algorithm for each cell in a compute shader...

Oops. I can't use C# interfaces. Well there goes the nice abstraction.

Oops. I can't use my class library where all the above was already done ten years ago and is battle-tested and certified by fifteen government agencies. I have to re-write all the prime-number finding code inside a shader.

Oops. I can't use bool or any of the managed types that are used within the algorithm.

Oops. Turns out my algorithm recursively has dependencies which also much be ported.

Oops. Now I have to test both the value-add of the shader, as well as the prime-number finding algorithm, all in the same tests. That leads to more complex tests, more fragile code.

I suppose I could split the prime-number finding into its own separate compute shader (and wrap that in the algorithm interface). And put the other value-added stuff into a separate shader. But remember I promised this example was sufficiently complex! That means that the determination of a prime number happens sufficiently deep within the twisted logic of this very valuable compute shader, and the two things are not going to be easy to separate. A dozen new buffers will be needed to manage the state, and umpteen back-and-forth calls between the shaders is now needed to keep all the state in sync. And no fewer than five custom state machines!

In a word, friction.

Is there a better way?

Now, I understand that a lot of this friction is strictly necessary. You can't easily issue HTTP calls from the GPU, after all. So those recursive dependencies which depend on HttpClient are going to have to go, one way or another.

But I don't think all of this friction is necessary.

That's my point. And I'm framing it in terms of "testability of compute shaders" because "testability" == "abstractability" == "reusability" == a bunch of other stuff (in my mind). It's all jumbled up.

Edit: I can hear it now: But Matt, you're never going to be able to execute IL on the GPU. So it was vane to hope that your class library could be used here. We're talking about compute shaders.

Yes, but I think that's missing the point: not all of the friction needs to be there.

Suppose I finally manage to write a compute shader that keeps the government agencies happy and keeps all the lawyers off my back. Further, suppose I want to reuse some of the logic that's in it, but somewhere else? Most very normal C# abstractions are off the table. Why is that?

So they all yield the same result, but some are better suited to smaller numbers (they use a closed form math equation for small enough numbers to yield a direct result) while others are better suited for larger numbers. Furthermore I will probably want a composite algorithm that delegates to other algorithms depending on the context.

I'd typically say that you only have 1 public API, the composite algorithm. The fact that there are then two algorithms for handling detecting large vs small primes is an implementation detail and one that isn't relevant to be tested or exposed. That only adds additional complexity, points of failure, and validation/overhead you have to introduce to your users. All you need to test is that the exposed public API correctly handles various inputs and returns the correct output. It will never be, and does not need to be, exhaustive.

In your test, you simply consume the API as any external consumer would and validate it works correctly. This is often very valuable because you are testing what your customer tests, not abstracting away real world details via some mock that may or may not do the right thing.

Then, because you have a known good/stable algorithm for detecting primes, you now have zero reason to mock it, because it will only ever give correct results. It is, by definition, deterministic and you won't ever see different results. So why would you need to abstract it?

Now, at the other end you might consider "generating" primes. This may be a reason to allow an API to take in a PrimeNumberGenerator, that may be a reason to ensure that such a generator is initialized with a fixed seed to ensure results can be deterministic in tests. In such a case, it might be reasonable to build a limited/simplified mock that picks from a smaller pool of inputs. However, in a more ideal scenario the PrimeNumberGenerator has a built-in version that supports basic functionality like providing a fixed-seed and guaranteeing determinism across runs. -- This is how System.Random works after all, and in such a case, a mock should likewise not be needed.

Not everything can or should be mocked. Not everything can or should be abstracted. You shouldn't ever be mocking something like System.Math.Sqrt(double x) for example. You likewise shouldn't mock something like BinaryPrimitives.ReadInt32LittleEndian. Both of these have a single deterministic behavior. What you may want to mock is your explicitly designed extensibility points, where it would be reasonable to provide a custom algorithm. Such as dynamic value generation. But, it still comes with the consideration of why such a custom algorithm is needed and if that should be a mock or an explicit separate thing that is itself tested and which you use/pass in. Many prefer the latter approach over mocking.

For ComputeSharp, you can already write and use common abstractions where it makes sense to do so. You can already extract and make your code reusable. You can use interfaces, you can use classes, you can use helper methods, etc. You just can't abstract everything, because many parts do not make sense to abstract (within the context of ComputeSharp itself).

Not all code needs the same type of testing or validation. For a compute shader, you can run a test to validate its output and verify that it hasn't changed compared to yesterday's results -- and you'd use WARP/CPU here to remove the GPU and its driver as a factor. For integration you'd then validate a GPU's output against the CPU's, which would tell you if a recent GPU swap or driver update is breaking things. Strict equality wouldn't work, you'd need to verify within an error/epsilon range, as results will vary across GPU manufacturers, GPU models, and even driver versions.

Different domains of execution do require different domains of implementation. ComputeSharp transpiling C# to HLSL is a massive convenience, but you literally can't do everything in HLSL that you can do in C#. No generics, no interfaces, etc. It's an abstraction with sharp edges and leaks, and it doesn't promise to be a full projection between these two domains.

The prime number scenario is deeply flawed. Shaders have a limited time budget for execution. If they overstay their welcome, either they'll get aborted, the GPU driver will restart, or the system may even blue screen (ask me how I know 😂). Shaders also cannot perform I/O, so they cannot perform bank transactions. Also, you would never perform financial calculations in floating point, nor on a GPU that doesn't guarantee strict IEEE compliance (which again is floating point, so you wouldn't anyway). Shaders are good for very simple but deeply parallel algorithms -- hence their fixation on graphics and ML/DL/AI.

For ComputeSharp, you can already write and use common abstractions where it makes sense to do so. You can already extract and make your code reusable. You can use interfaces, you can use classes, you can use helper methods, etc.

I think if I could figure out how to do such simple things as these I would be content. Can you show some examples?

Below are some things I've tried which don't work in 3.0.0-preview1—should I be cutting my teeth on a different version?

Interfaces

Error CMPS0001 : The compute shader of type MyFancyShader contains a field "abstraction" of an invalid type IMyFancyInterface

public interface IMyFancyInterface
{
    uint Value { get; }
}

[ThreadGroupSize(DefaultThreadGroupSizes.X)]
[GeneratedComputeShaderDescriptor]
public partial struct MyFancyShader(
    ReadWriteBuffer<uint> buffer,
    IMyFancyInterface abstraction) : IComputeShader
{
    public void Execute()
    {
        buffer[ThreadIds.X] = abstraction.Value;
    }
}

Classes

Error CMPS0001 : The compute shader of type MyFancyShader contains a field "abstraction" of an invalid type MyFancyAbstraction

public sealed class MyFancyAbstraction
{
    public uint Value => 42U;
}

[ThreadGroupSize(DefaultThreadGroupSizes.X)]
[GeneratedComputeShaderDescriptor]
public partial struct MyFancyShader(
    ReadWriteBuffer<uint> buffer,
    MyFancyAbstraction abstraction) : IComputeShader
{
    public void Execute()
    {
        buffer[ThreadIds.X] = abstraction.Value;
    }
}

Helper methods

One way

Error CMPS0046 : The shader of type MyFancyShader failed to compile due to an HLSL compiler error (Message: "The DXC compiler encountered one or more errors while trying to compile the shader: "error: use of undeclared identifier 'MyFancyAbstraction' __reserved__buffer[ThreadIds.x] = MyFancyAbstraction.Value; .". Make sure to only be using supported features by checking the README file in the ComputeSharp repository: https://github.com/Sergio0694/ComputeSharp. If you're sure that your C# shader code is valid, please open an issue an include a working repro and this error message.")

public static class MyFancyAbstraction
{
    public static uint Value => 42U;
}

[ThreadGroupSize(DefaultThreadGroupSizes.X)]
[GeneratedComputeShaderDescriptor]
public partial struct MyFancyShader(ReadWriteBuffer<uint> buffer) : IComputeShader
{
    public void Execute()
    {
        buffer[ThreadIds.X] = MyFancyAbstraction.Value;
    }
}

Another way

Error CMPS0050 : The compute shader or method MyFancyShader uses the invalid type ComputeSharp.ReadWriteBuffer (only some .NET primitives and vector types, HLSL primitive, vector and matrix types, and custom types containing these types can be used, and bool fields in custom struct types have to be replaced with the ComputeSharp.Bool type for alignment reasons)

public static class MyFancyAbstraction
{
    public static void DoIt(int x, ReadWriteBuffer<uint> buffer)
    {
        buffer[x] = 42;
    }
}

[ThreadGroupSize(DefaultThreadGroupSizes.X)]
[GeneratedComputeShaderDescriptor]
public partial struct MyFancyShader(ReadWriteBuffer<uint> buffer) : IComputeShader
{
    public void Execute()
    {
        MyFancyAbstraction.DoIt(ThreadIds.X, buffer);
    }
}

You can have shared methods, but they can't take buffers as parameters. If you declare shared methods that just do operations, you can have as many as you want. Rick has a whole mini-library of shared helpers for his shaders, it's fully supported 🙂

The "One Way" error is because I don't support properties yet. If you make Value() a method it'll work.
Supporting properties is on my backlog, just haven't had time to do that just yet.

Okay I think I have a clearer picture now of what's supported, what isn't, and what the general programming landscape is with regard to compute shaders. In summary:

It is possible to write automated tests for these shaders
- The tests will execute quickly enough
- The tests will be independent enough
Some C# abstractions are possible
Those abstractions fulfill the needs of most shaders

By implication I'm gathering that most shaders are somewhat "simple" with regard to abstraction. E.g. direct method calls into static classes, instead of being injected with an interface.

I can also understand why shaders would tend to be that way. As has been said, they're kind of a niche computation solution that's only suited to certain kinds of problems. For example, I/O is limited to what can be passed in and out through buffers and textures, and that naturally limits the kinds of problems you solve with them. And so abstractions are often not needed.

I love this library. I wrote an application using two things which are both new to me: video capture through Media Foundation, and processing those video frames with compute shaders. My application does a simple median averaging across the previous N frames to get an image with higher bit depth. And then it boosts the brightness, which I'm able to do without sacrificing image quality thanks to the higher bit depth. It's fun taking my very average laptop webcam outside in the evening and being able to see things with much more clarity than is otherwise possible. With just the slightest light I can see things in good detail.

It has been a great learning experience for me. And like I said in the beginning, this library has a surprisingly gentle learning curve.

Thank you for your patience, everyone.

	public void StaticConstants(Device device)
	{
	using ReadWriteBuffer<float> buffer = device.Get().AllocateReadWriteBuffer<float>(8);

	device.Get().For(1, new StaticConstantsShader(buffer));

	float[] results = buffer.ToArray();

	Assert.AreEqual(3.14f, results[0], 0.00001f);
	Assert.AreEqual(results[1], results[2], 0.00001f);
	Assert.AreEqual(1, results[3], 0.00001f);
	Assert.AreEqual(2, results[4], 0.00001f);
	Assert.AreEqual(3, results[5], 0.00001f);
	Assert.AreEqual(4, results[6], 0.00001f);
	Assert.AreEqual(3.14f, results[7], 0.00001f);
	}

	[AutoConstructor]
	[ThreadGroupSize(DefaultThreadGroupSizes.X)]
	[GeneratedComputeShaderDescriptor]
	internal readonly partial struct StaticConstantsShader : IComputeShader
	{
	public readonly ReadWriteBuffer<float> buffer;

	static readonly float Pi = 3.14f;
	static readonly float SinPi = Hlsl.Sin(Pi);
	static readonly int2x2 Mat = new(1, 2, 3, 4);
	static readonly float Combo = Hlsl.Abs(Hlsl.Clamp(Hlsl.Min(Hlsl.Max(3.14f, 2), 10), 0, 42));

	public void Execute()
	{
	buffer[0] = Pi;
	buffer[1] = SinPi;
	buffer[2] = Hlsl.Sin(3.14f);
	buffer[3] = Mat.M11;
	buffer[4] = Mat.M12;
	buffer[5] = Mat.M21;
	buffer[6] = Mat.M22;
	buffer[7] = Combo;
	}
	}