Issues
- 4
[Performance] Improve the flash attention performance on bottom-up optimization pipeline
#2177 opened by chengjunlu - 9
[PyTorch pin update] `951c21d6` - Inductor tests are starting to fail after pybind11 update to 2.13.6
#2260 opened by pbchekin - 0
- 0
- 0
Compilation errors with Clang 17.0.6
#2261 opened by pbchekin - 1
- 0
[E2E Accuracy] Enhancements
#2165 opened by vlad-penkin - 0
- 0
- 1
Subset of E2E models for accuracy tests
#2246 opened by pbchekin - 0
Increase `warmup` and `rep` for FA benchmark
#2255 opened by anmyachev - 0
[FA] Identify the root causes for the recent Triton geomean performance degradation vs XeTLA
#2257 opened by vlad-penkin - 0
[Advanced Path] can not reproduce shader dump
#2250 opened by Dewei-Wang-sh - 1
- 0
Align the conditions under which benchmarks are run for different implementations
#2235 opened by anmyachev - 0
Rename branch llvm-target to main and make it default
#2176 opened by pbchekin - 3
Torch inductor tests failed for PyTorch main
#2206 opened by pbchekin - 2
Some triton kernels gernerated by inductors have low efficiency on PVC 1550 compare to A100
#2229 opened by jianyizh - 0
Merge OpenAI Triton till Sept 27th
#2244 opened by whitneywhtsang - 0
- 0
- 0
Lit tests executed twice in CI
#2240 opened by pbchekin - 4
CXX tests not running in CI
#2238 opened by anmyachev - 0
- 1
[Benchdmark] [attention] core dumped
#2228 opened by AshburnLee - 0
[TritonGEN] Use type mangling utility to mangle the function name for 2D block reads calls
#2225 opened by whitneywhtsang - 1
Create presentation for Triton developer meeting
#2169 opened by etiotto - 1
Collect performance metrics for tutorials
#2207 opened by pbchekin - 0
[CI][Local runs] Unify test runners code in CI and in local env. Switch to `test-triton.sh` script in CI
#2184 opened by vlad-penkin - 0
- 0
Re-organize build-test workflows
#2218 opened by pbchekin - 0
- 0
computed results are incorrect with blocked pointer matrix multiplication, in TRITON_INTEL_ADVANCED_PATH
#2209 opened by arunjose696 - 0
[Local runs] Enhancements to `test-triton.sh` to support local test runs with nightly wheels mode
#2187 opened by vlad-penkin - 0
Clean up `test-triton.sh`
#2208 opened by pbchekin - 0
- 0
- 1
ValueError: Pointer argument (at 0) doesn't reference XPU device memory (cpu tensor?)
#2188 opened by dvrogozh - 1
- 1
[sycl-free-inference-for-llms] flashattention kernel get up to ~100tflops
#2195 opened by Dewei-Wang-sh - 1
- 1
[sycl-free-inference-for-llms] run llama3-8B with Pytorch for xpu and get the base line
#2197 opened by Dewei-Wang-sh - 1
[sycl-free-inference-for-llms] integrate triton gemm/attention in pytorch for xpu
#2196 opened by Dewei-Wang-sh - 0
Rewrite benchmarks to be more `elapsed_time` friendly
#2198 opened by anmyachev - 0
[CI] FlashAttension SLM path failure
#2185 opened by vlad-penkin - 0
- 0
- 0
[IPEX deprecation] Add Benchmarks job to the PyTorch pin update validation workflow
#2172 opened by kwasd - 0
[benchmarks] Add Prefix sums benchmark
#2167 opened by vlad-penkin - 0