This is a C++ template libray for various precision linear algebra. It is aiming to research on the approximate computing for optimization algorithms with linear algebra arithmetics in the software layer. The code is synthesizable using High Lever Synthesis (HLS) tool targeting on the hardware design on FPGA. It's also extending to more Machine Learning applications.
sudo apt install python python-numpy python-matplotlib libboost-all-dev libeigen3-dev fftw-dev -y
Notice that: currently the fixed-point arithmetics are all based on the Xilinx Vivado HLS fixed-point template. You MUST have a Xilinx license first. Then copy the "include" folder to the "header" folder and renamed as "xilinx". Otherwise, you cannot pass the compilation with fixed point support. All the copyright of the fixed-point arithmetic templates are belong to Xilinx
Different precision of gradient process is evaluated on x86 platform with Intel i7 6820HK with 8Gb DDR4 memory. (W: the total bit width, I: the integer bit width, M: the mantissa bit witdh)
float: Number of iterations = 40529, Time Usage = 252.721 ms
double: Number of iterations = 40683, Time Usage = 574.483 ms
float: Number of iterations = 47934, Time Usage = 833.31 ms
double: Number of iterations = 48191, Time Usage = 1487.99 ms
float: Number of iterations = 224087, Time Usage = 11561.9 ms
double: Number of iterations = 224665, Time Usage = 21787.5 ms
float: Number of iterations = 403136, Time Usage = 66061.3 ms
double: Number of iterations = 402571, Time Usage = 122316 ms
float: Number of iterations = 297133, Time Usage = 313763 ms
double: Number of iterations = 261360, Time Usage = 395742 ms
W10, M5 W12, M5 W16, M5 W24, M9
W10, M5: Number of iterations = 99998, Time Usage = 11750 ms
W12, M5: Number of iterations = 99998, Time Usage = 11062.5 ms
W16, M5: Number of iterations = 99998, Time Usage = 11265.6 ms
W24, M9: Number of iterations = 40570, Time Usage = 4546.88 ms
W10, M5 W12, M5 W16, M5 W24, M9
W10, M5: Number of iterations = 99998, Time Usage = 41703.1 ms
W12, M5: Number of iterations = 99998, Time Usage = 41781.2 ms
W16, M5: Number of iterations = 99998, Time Usage = 35000 ms
W24, M9: Number of iterations = 47960, Time Usage = 19796.9 ms
W28, I10: Number of iterations = 1617, Time Usage = 597.451 ms
W32, I12: Number of iterations = 3678, Time Usage = 3830.12 ms
W48, I12: Number of iterations = 41602, Time Usage = 70051.8 ms
W28, I12: Number of iterations = 2635, Time Usage = 332.926 ms
W32, I12: Number of iterations = 3228, Time Usage = 877.335 ms
W48, I16: Number of iterations = 36661, Time Usage = 17046.1 ms
As shown, the iteration numbers using float and double precision across different dimention size are very similar from 64 to 256. When reaching the dimension 1024, there is a siginificant difference of iteration number between float and double precision data type.
As shown, the executing time of double precision is alway larger than the float precision even when less iteration happens at dimension 1024.
As shown, the iteration number of fixed point precision is significantly lower than the floating point precision but it is at a cost of degree of optimization. To be specifically, the gradient descend iteration termiated earlier due to the quantization of the either integer or the fraction part of the fixed point number. It is hard to garantee there is both enough bit to represent the integer or the fraction part unless very large number of bits are used. So when quantized, the gradient decent cannot continue any more as the error between current iteration and last iteration will be ZERO, which means there is no further 'descend'.
The executing time comparison between fixed point and floating point is not fair enough on x86 hard core processor as the fixed point has large overhead in approximating the floating number into fixed point. The fixed point arithmetics are based on the Xilinx HLS library headers. The results are better to be considered as a verification of the various precision effect on the proximal gradient descend algorithm.
By comparing the optimized solution between C++ and CVX code, it is clear that both float and double precision maitains the better optimized solution to the problem with only 0.0007 normalized error compared to CVX solution, while the fixed point precision has much wrose normalized error up to 0.25 compared to CVX solution. By using the fixed point precision, minimun value of cost function is much larger than the float and double precision due to the fact that it has not reash as optimal as float and double precision did when doing the gradient descend.
Similar situation happens for the dimension 128.