bytedance/byteir

This job is really great,can you provide more documentation

APeiZou opened this issue · 8 comments

@bytedance-oss-robot This work is really great. Can you provide more documentation and basic tutorials on how to implement basic operators and how to implement custom operators.

And differences with xxIR

@bytedance-oss-robot Are there plans to increase support for ARM series DSP?Will you add some convolution acceleration work?

And differences with xxIR

It might sound very weird, but byteir is not an IR or a set of IRs. The name is more coming from a legacy purpose.

The compiler actually just contains a bunch of passes, which work with upstream MLIR dialects or google HLO dialects. These passes are used to build our internal pipelines for some HWs, and they are generic enough for most HW if a compiler also use upstream MLIR or HLO. Since internally we already used so many passes from MLIR community, we like to contribute back our passes to the community. We don't officially release pipeline for a specific HW, other than some test compiler pipelines for some public HW in a demo purpose.

The frontends are simply regular frontend pipelines to generate HLOs, reusing some code we already contributed back to Tensorflow, ONNX-MLIR, or Torch-MLIR (which will be released after the rest few PR merged back to Torch-MLIR). These frontends are almost the same version we used internally, mainly for our HW partners to generate certain version we used internally, and also the version used in the compiler we provide if anyone want to use the compiler passses.

The runtime is simply a regular runtime backend we used internally, mainly for our HW partners to glue their code. It also contains a default provider purpose demo purpose, so our HW partners can know to glue.

The frontends, compiler, and runtime can work independently.

@bytedance-oss-robot Are there plans to increase support for ARM series DSP?Will you add some convolution acceleration work?

Since we don't officially release pipeline for a specific HW, other than some test compiler pipelines for some public HW in a demo purpose, and also DSP doesn't fit our best interest, it might not happen in a near future.

But we still want to point a route that you can lower the Linalg fusion op, which generated from our passes, into llvm through standard MLIR passes for your target ARM DSP.

@bytedance-oss-robot This work is really great. Can you provide more documentation and basic tutorials on how to implement basic operators and how to implement custom operators.

I am not sure what your implementing operators refers to.
If you are referring to the compiler, the operators are either from frontends, or lowering from other dialects depending on what dialects you uses. Custom operators depend on where they come from. If it is from a dialect, you can lower it HLO or Linalg, or scf we support. If it is from a language like Python or DSL, then you need a parser converting to IR presenting in HLO or any MLIR dialects.

If you are referring to binding a hand-written op implementation into runtime, you need to create a provider or register an op implementation in a provider. You can check https://github.com/bytedance/byteir/tree/main/runtime/include/brt/backends for creating a provider. For registering an op implementation, it typically happens in a factory of a provider, taking the demo default CUDA provider as an example it happens in https://github.com/bytedance/byteir/blob/main/runtime/lib/backends/cuda/providers/default/cuda_provider.cc.
In most use cases, if your op is generated through the compiler, you only need to register a hook op once for all generated ops.

@liwenchangbdbz Thank you for your reply.

Hi, contributors. Please do not @ @bytedance-oss-robot in the daily development. This is used to automate the PR. issues etc. It won't notify the repo maintainers. If you need to notify maintainer, feel free to create Github teams.

Thanks for your reply, and byteir is literately misleading.