intel/intel-xpu-backend-for-triton

OpenAI Triton backend for Intel® GPUs

MLIRMIT

Issues

[Performance] Improve the flash attention performance on bottom-up optimization pipeline
#2177 opened 2 months ago by chengjunlu
4
[PyTorch pin update] `951c21d6` - Inductor tests are starting to fail after pybind11 update to 2.13.6
#2260 opened 2 months ago by pbchekin
9
Update auto-update-translator-cid to use upstream PyTorch
#2214 opened 2 months ago by pbchekin
0
test-triton.sh does not work if git clone has different name
#2253 opened 2 months ago by pbchekin
0
Compilation errors with Clang 17.0.6
#2261 opened 2 months ago by pbchekin
0
Investigate failures from updating SPIRV-LLVM-Translator
#2247 opened 2 months ago by whitneywhtsang
1
[E2E Accuracy] Enhancements
#2165 opened 2 months ago by vlad-penkin
0
[E2E Accuracy] Baseline results per platform and driver
#2164 opened 2 months ago by vlad-penkin
0
Remove outdated options from compile-pytorch-ipex.sh
#2231 opened 2 months ago by Retribution98
0
Subset of E2E models for accuracy tests
#2246 opened 2 months ago by pbchekin
1
Increase `warmup` and `rep` for FA benchmark
#2255 opened 2 months ago by anmyachev
0
[FA] Identify the root causes for the recent Triton geomean performance degradation vs XeTLA
#2257 opened 2 months ago by vlad-penkin
0
[Advanced Path] can not reproduce shader dump
#2250 opened 2 months ago by Dewei-Wang-sh
0
[Feature Improvement] Change large GRF warnings to trigger by debug flag
#2251 opened 2 months ago by Stonepia
1
Align the conditions under which benchmarks are run for different implementations
#2235 opened 2 months ago by anmyachev
0
Rename branch llvm-target to main and make it default
#2176 opened 2 months ago by pbchekin
0
Torch inductor tests failed for PyTorch main
#2206 opened 2 months ago by pbchekin
3
Some triton kernels gernerated by inductors have low efficiency on PVC 1550 compare to A100
#2229 opened 2 months ago by jianyizh
2
Merge OpenAI Triton till Sept 27th
#2244 opened 2 months ago by whitneywhtsang
0
Decompose large `simdblockwrite` to smaller `simdblockwrite`s
#2226 opened 2 months ago by whitneywhtsang
0
[test-triton.sh] Run micro benchmark tests without IPEX
#2243 opened 2 months ago by whitneywhtsang
0
Lit tests executed twice in CI
#2240 opened 2 months ago by pbchekin
0
CXX tests not running in CI
#2238 opened 2 months ago by anmyachev
4
Decompose large simdblockread to smaller simdblockreads
#2205 opened 2 months ago by vlad-penkin
0
[Benchdmark] [attention] core dumped
#2228 opened 2 months ago by AshburnLee
1
[TritonGEN] Use type mangling utility to mangle the function name for 2D block reads calls
#2225 opened 2 months ago by whitneywhtsang
0
Create presentation for Triton developer meeting
#2169 opened 2 months ago by etiotto
1
Collect performance metrics for tutorials
#2207 opened 2 months ago by pbchekin
1
[CI][Local runs] Unify test runners code in CI and in local env. Switch to `test-triton.sh` script in CI
#2184 opened 2 months ago by vlad-penkin
0
Fix copy of `compile_commands.json` in `compile-triton.sh`
#2222 opened 2 months ago by anmyachev
0
Re-organize build-test workflows
#2218 opened 2 months ago by pbchekin
0
Make benchmarks compatible for both IPEX usage and upstream PyTorch usage
#2201 opened 2 months ago by anmyachev
0
computed results are incorrect with blocked pointer matrix multiplication, in TRITON_INTEL_ADVANCED_PATH
#2209 opened 2 months ago by arunjose696
0
[Local runs] Enhancements to `test-triton.sh` to support local test runs with nightly wheels mode
#2187 opened 2 months ago by vlad-penkin
0
Clean up `test-triton.sh`
#2208 opened 2 months ago by pbchekin
0
Provide the ability to run benchmarks with or without IPEX
#2204 opened 2 months ago by anmyachev
0
[TritonGEN] Use OCL builtins for subgroup block read/write
#2202 opened 2 months ago by vlad-penkin
0
ValueError: Pointer argument (at 0) doesn't reference XPU device memory (cpu tensor?)
#2188 opened 2 months ago by dvrogozh
1
[sycl-free-inference-for-llms] Port and evaluate LLama3-8B and Granite-8B
#2170 opened 2 months ago by etiotto
1
[sycl-free-inference-for-llms] flashattention kernel get up to ~100tflops
#2195 opened 2 months ago by Dewei-Wang-sh
1
[sycl-free-inference-for-llms] gemm kernel perf get up to ~290tflops
#2194 opened 2 months ago by Dewei-Wang-sh
1
[sycl-free-inference-for-llms] run llama3-8B with Pytorch for xpu and get the base line
#2197 opened 2 months ago by Dewei-Wang-sh
1
[sycl-free-inference-for-llms] integrate triton gemm/attention in pytorch for xpu
#2196 opened 2 months ago by Dewei-Wang-sh
1
Rewrite benchmarks to be more `elapsed_time` friendly
#2198 opened 2 months ago by anmyachev
0
[CI] FlashAttension SLM path failure
#2185 opened 2 months ago by vlad-penkin
0
[UT][Tutorials] Add tutorials pass rate summary subreport
#2166 opened 2 months ago by vlad-penkin
0
Adding `XeTLA` FA implementation for all variants mentioned in the FA paper
#2179 opened 2 months ago by ESI-SYD
0
[IPEX deprecation] Add Benchmarks job to the PyTorch pin update validation workflow
#2172 opened 2 months ago by kwasd
0
[benchmarks] Add Prefix sums benchmark
#2167 opened 2 months ago by vlad-penkin
0
[GEMM] Add autotune configs with different prefetch distances
#2168 opened 2 months ago by vlad-penkin
0