asik/FixedMath.Net

Seems to be possible to convert into hardware implementation

Piedone opened this issue · 6 comments

FixedMath.Net seems to be almost completely possible to convert into an FPGA-based hardware implementation with our Hastlayer project. Hastlayer can automatically convert a subset of .NET into hardware implementations, providing significantly better performance in massively parallelizable compute-bound algorithms with lower power consumption.

And fixed point is very efficient with FPGAs.

Would be quite cool. What do you think?

I am VERY interested in this! Amazing!

Glad to hear John! Let me know if you try Hastlayer!

asik commented

Do you need any source code changes to achieve this? I'm not sure how I can help. The source code is under a permissive license so you can do whatever you please with it

I'm looking through the project. At them time the only issue I see is that Hastlayer doesn't yet support static fields. Nevertheless I started a PoC in this branch: https://github.com/Lombiq/Hastlayer-SDK/commits/issue/HAST-140, see the diff of the work done here: Lombiq/Hastlayer-SDK@dev...issue/HAST-140.

I added a PoC implementation after some necessary modifications. What's now transformable into hardware is this implementation: https://github.com/Lombiq/Hastlayer-SDK/blob/issue/HAST-140/Hast.Algorithms/Fix64.cs

The two most notable changes I made is that I got rid of the static fields (not supported by Hastlayer) and applied a workaround temporarily necessary for large hex literals.

Tested various operations and seems to be working alright. This sample that simply sums numbers needs about 106ms on my machine (Core i7, 8 logical cores) with an input of 10000000, 1300ms on a low-end FPGA, This is with purely sequential code (to make FPGAs feasibly the algorithm needs to be massively parallelizable), but still, the FPGA is only 13 times slower with a 32 times lower clock speed (100Mhz vs 3,2Ghz) and most possibly proportionally more power efficient.

With a parallelism of 10 degrees (i.e. 10 threads working in parallel) the same sample takes 264ms on the CPU and still 1300ms on the FPGA. As you can see as we increase the level of parallelism the CPU will get slower of course (if the level of parallelism exceeds the number of cores), the FPGA will stay the same. It's still slower, and this is because more threads won't fit on this low-end FPGA of ours (but we're working on supporting bigger ones).

And thus now FixedMath.Net is the included fixed-point library in Hastlayer.

asik commented

Good to know this library was useful to you! I guess this issue can be closed now.