MatrixMulSubgroupCompute

This demo could only run on machine whose workgroup size is 16. If you should run it on machine whose workgroup size is not 16, you should change the dispatch size in C++ files and workgroup size and m_B in HLSL files.