Intel® Extension for MLIR (IMEX) is a collection of MLIR dialects and passes from Intel for supporting MLIR lowering to Intel silicon (CPU, GPU, …). Goal of this project is to support development of MLIR enhancements for upstream contribution, and to provide a sandbox for validation independent of front end frameworks. Current project scope includes:
- Dialects and passes needed to lower and execute MLIR entry dialect (linalg, CFG, and etc) on Intel GPU.
- Wrapper libraries to inteface with level zero runtime and sycl runtime supporting Intel GPU.
- Other experimental dialects: NDArray, Dist
- CMake >= 3.20.0
- Ninja
- doxygen (Optional for building docs)
- pre-commit
- clang-format
- lit (If building with option 2 below. https://pypi.org/project/lit/)
Instructions here
https://dgpu-docs.intel.com/installation-guides/index.html
Install DPC++ compiler : Instructions here
https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html#dpcpp-cpp
Once DPC++ is installed source the compiler vars:
source /PATH_TO/intel/oneapi/compiler/latest/env/vars.sh
- Build from source for non system-wide(local) install
git clone https://github.com/oneapi-src/level-zero.git
cd level-zero
cmake -G Ninja -B build -S . \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=../level-zero-install
cmake --build build --target install
- Binary package for system-wide install: https://github.com/oneapi-src/level-zero/releases
conda create -n imex-dev -c intel -c defaults -c conda-forge pip">=21.2.4" pre-commit cmake clang-format lit doxygen
conda activate imex-dev
pre-commit install -f -c .pre-commit-config.yaml
IMEX supports three different ways of building depending on how LLVM is set up. Option 1 is in-tree (Built as part of LLVM) and option 2 and 3 are out-of-tree (Built outside of LLVM)
IMEX can be treated like a sub-project of LLVM and built as part of LLVM by using an LLVM config option called LLVM_EXTERNAL_PROJECTS.
git clone https://github.com/intel/mlir-extensions.git
git clone https://github.com/llvm/llvm-project.git
cd llvm-project
git checkout `cat ../mlir-extensions/build_tools/llvm_version.txt`
git apply ../mlir-extensions/build_tools/patches/*
cmake -G Ninja -B build -S llvm \
-DLLVM_ENABLE_PROJECTS=mlir \
-DLLVM_BUILD_EXAMPLES=ON \
-DLLVM_TARGETS_TO_BUILD="X86" \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DLLVM_EXTERNAL_PROJECTS="Imex" \
-DLLVM_EXTERNAL_IMEX_SOURCE_DIR=../mlir-extensions
# For GPU support pass thes cmake variables to enable the required runtime libraries
# -DIMEX_ENABLE_L0_RUNTIME=1
# -DIMEX_ENABLE_SYCL_RUNTIME=1
# Additional if using a non system wide Level Zero Loader built from source
# -DLEVEL_ZERO_DIR=/PATH_TO/level-zero-install
cmake --build build --target check-imex
Note: -DLLVM_INSTALL_UTILS=ON
is not needed for this build since all tests
will run using the FileCheck
utility that is available in the build tree.
An external lit
is not needed as well, since all tests will run using llvm-lit
in the build tree.
Note: Make sure to pass -DLLVM_INSTALL_UTILS=ON
when building LLVM with
CMake so that it installs FileCheck
to the chosen installation prefix.
Additonally, lit
has to be installed separately as it does not install with
the rest of LLVM.
Make sure the installed LLVM is built from the git commit sha as stated in
build_tools/llvm_version.txt
.
And has all LLVM patches in build_tools/patches
applied.
cmake -G Ninja -B build -S . \
-DMLIR_DIR=<PATH_TO_DIRECTORY_WITH_MLIRConfig.cmake> \
-DLLVM_EXTERNAL_LIT=<PATH_TO_LIT> \
-DCMAKE_BUILD_TYPE=Release
# For GPU support pass thes cmake variables to enable the required runtime libraries
# -DIMEX_ENABLE_L0_RUNTIME=1
# -DIMEX_ENABLE_SYCL_RUNTIME=1
# Additional if using a non system wide Level Zero Loader built from source
# -DLEVEL_ZERO_DIR=/PATH_TO/level-zero-install
cmake --build build --target check-imex
This is similar to option 2. Instead of installed LLVM, LLVM build tree is used.
Make sure before building LLVM, checkout the git commit sha as stated in
build_tools/llvm_version.txt
.
And apply all LLVM patches in build_tools/patches
.
cmake -G Ninja -B build -S . \
-DMLIR_DIR=<PATH_TO_DIRECTORY_WITH_MLIRConfig.cmake> \
-DCMAKE_BUILD_TYPE=Release
# For GPU support pass thes cmake variables to enable the required runtime libraries
# -DIMEX_ENABLE_L0_RUNTIME=1
# -DIMEX_ENABLE_SYCL_RUNTIME=1
# Additional if using a non system wide Level Zero Loader built from source
# -DLEVEL_ZERO_DIR=/PATH_TO/level-zero-install
cmake --build build --target check-imex
To build user documentation do
cmake --build build --target mlir-doc
It will render docs to the 'doc' directory.
To build code documentation use '-DIMEX_INCLUDE_DOCS' when configuring with cmake and do
cd build
cmake --build build --target doc_doxygen
# enter root directory of mlir-extension
cd mlir-extensions
python scripts/add_dialect.py <name-of-new-dialect>
This will
- generate directories
IR
andTransforms
in the directories (include/mlir/Dialect
andlib/dialect
) - Extend/Create cmake infrastructure with defaults
- Create stub source files for IR and transforms
include/imex/Dialect/<name>/IR/<name>Ops.h
include/imex/Dialect/<name>/IR/<name>Ops.td
lib/Dialect/IR/<name>Ops.cpp
include/imex/Dialect/<name>/Transforms/Passes.h
include/imex/Dialect/<name>/Transforms/Passes.td
lib/Dialect/Transforms/PassDetail.h
Now, it's your turn to
- Add your dialect and its transforms/passes to appropriate places in
include/imex/InitIMEXDialects.h
include/imex/InitIMEXPasses.h
lib/Conversion/IMEXPassDetail.h
- Fill in what's marked with FIXME
- The documentation of the dialect should go into the
description
fields in<name>Ops.td
. At build time the description will be extracted and a filedoc/<name>.md
will be generated automatically. It will include descriptions of the dialect and operations in a standardized way.
# enter root directory of mlir-extension
cd mlir-extensions
python scripts/add_conversion.py $name-of-source-dialect $name-of-target-dialect
This will
- Let $conversion-name name be "$name-of-source-dialectTo$name-of-target-dialect"
- Add directories
include/mlir/Conversion/<conversion-name>
andlib/Conversion/<conversion-name>
- Extend/Create cmake infrastructure with defaults
- Add declarations to header
include/mlir/Conversion/<conversion-name>/<conversion-name>.h
- Put cpp definition stubs to
lib/Conversion/<conversion-name>/<conversion-name>.cpp
- Add conversion to
include/imex/Conversion/IMEXPasses.td and include/imex/Conversion/IMEXPasses.h
- Add a pass def stub to
include/imex/Conversion/IMEXPasses.td and include/imex/Conversion/Passes.td
You will now have to
- Fill in the above files what's marked with FIXME
- The documentation of the pass should go into the
description
field inPasses.td
. At build time the description will be extracted and a filedoc/Conversions.md
will be generated automatically. - Write your Pattern rewriters
To run the FileCheck based tests, follow the following steps:
cmake --build build --target check-imex
Add '-v' to the above command-line to get verbose output.
IMEX provides an initial set of benchmarks for studying its performance. To build these benchmarks, users need
to manually add -DIMEX_ENABLE_BENCHMARK=ON
option when building the IMEX. The benchmark testcases and the
script for running them will be generated under the build/benchmarks
folder.
Currently, IMEX provides benchmarks for the following 4 categories of operations:
Operation | CPU | GPU |
---|---|---|
elementwise (relu and silu) | Yes | Yes |
reduction (softmax) | Yes | Yes |
transpose (transpose) | Yes | Yes |
fusion (kInputFusion and kLoopFusion) | No | Yes |
These test cases are mainly implemented using linalg dialect, and the spriv test cases for
relu are also provided. Each testcase is named following the pattern of opname_shape_dtype.mlir
For simplicity, the bench_imex
script is provided to run the benchmark. It can take a mlir file or a folder as input.
for the later case, it will simply run all test cases inside the folder. In addition, it also has to choose a runtime
based on the option. It accepts one of the following three options:
-c
for cpu runtime-l
for level-zero runtime (for INTEL GPU)-s
for sycl runtime (for INTEL GPU)
# run a specific test case on CPU
./bench_imex -c relu/cpu/relu_1x160x160x120_f16.mlir
# run a set of test cases on GPU using sycl runtime
./bench_imex -s relu/gpu/
NOTE: if you are using
-c
, please use testcases undercpu
subfolder; similarly, if you are using-s
or-l
, please use testcases undergpu
subfolder. Otherwise, it may have unspecified errors or behaviors.
IMEX benchmark suite is implemented using CMAKE template, and initially provides limited set of shapes extraced from some production models, e.g., BERT, and AlexNet.
- ReLU: 1x160x160x120, 50x640x20x15, 512x640x20x15
- SiLU: 1x1024x40x30, 50x20x3072, 512x640x20x15
- Softmax: 1x2000, 16x2000, 64x2000, 256x2000, 1024x2000
- Transpose: 128x136, 1024x1024, 16x96x96, 96x7x96
- Reduce: 32x16x512x512
Users can extend it to evaluate more shapes by editing the, e.g, relu.shapes.in
file, in each subfolder, and then
rebuild the imex. User can also add new data types, but it is currently only limited to basic data types including
fp32, fp16, int32 etc.
export IMEX_ENABLE_PROFILING=ON
run the test
python {your_path}/imex_runner.py xxx -o test.mlir
mlir-translate test.mlir -mlir-to-llvmir -o test.ll
llc test.ll -filetype=obj -o test.o
clang++ test.o {path}/libmlir_runner_utils.so {path}/libmlir_c_runner_utils.so {path}/libsycl-runtime.so -no-pie -o test
ze_tracer ./test
- Not using LoadOp. Instead, everything is a SubviewOp. Any size-1 dim must be annotated with static size 1.
- right now we can only broadcast size-1 dims if their extent is statically known (to be 1)
- Generally, rank reduction of SubviewOp needs overhaul.
- Right now, no rank reduction is happening, and appropriate affine maps are generated accordingly
- Without dist-coalesce, repartitioning of 0d arrays coming from a subview do not work correctly. Only the owning process will have the right data.
- Even if SubviewOp resulted in rank-reduced arrays, we cannot view into our local data since the element might be remote.
- To follow existing mechanisms (e.g. target parts) we'd basically need to duplicate the entire array.
- We probably need some special feature to hold duplicates of slices with only one element on the distributed axis.
- NDArray/dist tests can be run (without GPU tests etc) uwing
cmake --build . --target check-ndarray
This code is made available under the Apache License 2.0 with LLVM Exceptions.
See the LICENSE.txt
file for more details.