hikettei/cl-waffe2

[Enhancement] Compiling time is remained to be optimized.

hikettei opened this issue · 4 comments

cl-waffe2 instantly generates/compiles forward kernel depending on given tensors' dimensions, and views. This approach allows me to reduce the computing time of multidimensional offsets, and schedule multithreading in advance. However, this compiling is never done at the top level, but the (compile nil ...) function. 80% of compiling time consists of this kernel compiling time (e.g.: expands of SinNode).

For example, (!sin (!sin (!sin x))) uses the completely same code at each time, albeit we need three times compiling. Therefore, one primary strategy to reduce compiling time is to reuse the compiled kernels.

On compiling, the costs of (!sin x) and (!sin (!sin (!sin x))) should be the same because all tensors used here, has the same shape, same views.

Goal: 3 Layers MLP's compiling time of forward and backward << 5sec.

Compiling time with cache.lisp will be approximated as:

O((the number of kernel types used in nodes))

while without cache.lisp:

O((the number of operation))

The latest pull request #10 solved this issue, however, there still remained to be optimized of compiling time, especially in backward compiling...

(Compiling MLP Time) is now <<0.5s