
[WIP] Petalisp as a high-level IR?

hikettei opened this issue · 0 comments

(This article is WIP)

An overview of Petalisp


(As far as I know), Petalisp is a DSL implemented in Common Lisp for generating parallelized array processing codes, providing:

  • Petalisp works on a fundamental level:
    • Abstract Array Processings, Polyhedral Compiler et al.
    • The IRs are sophisticated; lazy-reshape, transform, ranges and many optimization techniques specialized on them can provide by far the fastest and most systematic JIT Compiler.
  • (Could be, or with more effort) applied to multiple backends likewise tinygrad. (i.e.: relatively easy to implement new backends like Metal, x86, gcc, neon et al)


Deep Learning models are everywhere, but what about the technology behind them? Many deep learning frameworks are in development today, and there are DL compilers with a focus on efficient inference (or training). TVM could be one of the good options, but when you want to make a model specific to an arbitrary environment, there are always compatibility issues (e.g.: pytorch/pytorch#49890, this is a case of PyTorch though).

Concretely speaking, It is possible to implement gemm for many devices (e.g.: CPU, GPU, NEON, AVX, Metal etc...) and many data types (e.g.: uint8, int8, int16, ..., float16, FBloat16, float32, ...). But can it be easier?

With Petalisp, once written at a higher layer, it can be run on various backends instead of implementing gemm (like a template).

;; Petalisp
(defun matrix-multiplication (A B)
  (lazy-reduce #'+
   (lazy #'*
    (lazy-reshape A (transform m n to n m 1))
    (lazy-reshape B (transform n k to n 1 k)))))

as well as tinygrad:

# Tinygrad
c = (a.reshape(N, 1, N) * b.permute(1,0).reshape(1, N, N)).sum(axis=2);

Users don't need anymore to worry about parallelization; just rely on the compiler.

If TVM were CISC, tinygrad would be a RISC.

Why Petalisp is a good choice for replacing the cl-waffe2 compiler