NervanaSystems/maxas

for newbies: filling the missing bits in the documentation

Stefan20162016 opened this issue · 0 comments

So not an issue per se. But some added documentation for newbies to CUDA-Assembler.

Somehow I found it challenging to follow the docs+code so I first started by looking at the tid/address-shifts-xors via a small c programm, also printing the addresses in binary as mentioned.
Than I figured the matrix A is stored as non-transpose and B as transpose so the loading is the same for both and how the FFMAs are actually added together, which surprisingly is lacking in the documentation.
Finally I wrote 3 pages with some explanations so I don't forget them and which could be helpful to others.

Greetings from Munich,Germany
Stefan