A brief description of what your crate does.
This project aims to develop composable abstractions for high-performance code generation within MLIR. The proposed abstractions and transformations offer both immediate and long-term benefits. Our approach involves breaking down generic computations into smaller tiles, utilizing their algebraic properties and structure. These computations can be fused and gradually reduced into loops over vector primitives, which can be retargeted. They apply to both immutable tensor values and in-memory buffers that may have side effects. These abstractions support storage formats, including dense, sparse, and quantized representations. An in-place bufferization pass ensures efficient memory usage by materializing programs in tensor form, even those transformed through tiling, fusion, and other processes. These practical benefits underscore the value and relevance of our proposed abstractions and transformations.
Our approach preserves high-level, domain-specific information, preventing the premature loss of the computational structure. This enables transformations without the performance limitations of numerical libraries when a fused operation lacks a high-performance implementation. These transformations can lower operations to hardware instructions that implement coarse-grained vector operations or to numerical libraries like Eigen, serving as a fallback. This flexible and adaptable approach enhances compiler transformations and opens new possibilities for compiler-library co-design.
Additionally, tiled operations focus on subsets of tensor values or memory buffers, leveraging the natural structure in tensor operations while remaining generic in tensor representation (values or side effects, vectors or scalars, dense or sparse) and decomposition methods (various tiling forms). This improves composability by allowing transformations to apply to individual or group operations rather than entire loops or control-flow graphs. It also simplifies complex transformations' expression and lowers sequences, facilitating autotuning.
The intermediate representation remains executable at any intermediate transformation and lowering step, greatly simplifying debugging, testing, and performance evaluation, and blurring the lines between the programmer's and the compiler's responsibilities.
Add the following line to your Cargo.toml file:
composable = "0.1.0"
- Write Tests for all functions, including edge cases
- Miri: detects undefiuned behaviour and leaks, even if nothing panics
- Sanitizers: detects problematic threading and memory access patterns
- Embrace Chaos: turmoil/shuttle (async/sync chaos), quickcheck/proptest (value chaos), cargo-mutants (logic chaos)
- Be exhaustive when possible: Loom (all possible and distinguishable concurrent executions), Kani (all possible and distinguishible inputs). Only possible for core primitives.
- Use profilers to detect problem areas and prioritize experimentation. flamegraph, samply, counts, coz (causal profiling)
- Benchmark tool: Criterion, Devin, Hyperfine, Bencher
- Capture these in the benchmark: Pathological cases, micro and macro, under at, and over capacity, on all relevant targets, usefulness, throughput, memory usage, latency.
- Use statistical tests (some implmented in Criterion) for comparisons, and not standard pointwise comparisions.
- iai-callgrind, tango
- Decisions taken: Which alternatives were discarded and why? Which tradeoffs were accepted and why?
- (Y)ADRs: tools for documentation
- Missing handling of corner-cases. todo!()/unimplemented!(), Future optimization opportunities: Absence of an impl (like From)
- Newtypes (not aliases): Meters(u64) vs Miles(u64)
- Typestates: Rocket vs Rocket Maybe can be used for train vs inference
- Two-phase Structs: TomConfig vs ResolvedConfig
- Enums over Booleans
- Enums for linked arguments: f(true, Some(_)) (+) f(false, None) should be f(enums, Option)
- Clippy is your friend
- The Rust API Guidelines
- Try to use Rust features instead of trying to re-interpret Python or C/C++
- Concrete types: prefer -> Impl Trait, avoid pub fields
- args, return types, train impls, ...
- impl From: prefer non-pub inherent methods!
- cargo-semver-checks, cargo-public-api, cargo-vet are tools for automatic change detection, but are not perfect.
- Have reminders when there are changes in dependencies.
- Auto-merging bump PRs
- Dependabot
- Upstream changes (no forks!)
- Wrap unstable dependencies.
cmake -G Ninja ../llvm
-DLLVM_ENABLE_PROJECTS=mlir
-DLLVM_BUILD_EXAMPLES=ON
-DLLVM_TARGETS_TO_BUILD="Native;ARM;Mips"
-DCMAKE_BUILD_TYPE=Release
-DLLVM_ENABLE_ASSERTIONS=ON
-DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++
-DLLVM_CCACHE_BUILD=OFF
-DLLVM_USE_SANITIZER="Address;Undefined"
-DMLIR_INCLUDE_INTEGRATION_TESTS=ON