Pinned Repositories
flash-attention
Fast and memory-efficient exact attention
flux
Official inference repo for FLUX.1 models
kernels
minRF-ONNX
Minimal implementation of scalable rectified flow transformers, based on SD3's approach
nexfort
OneDiff compiler infrastructure using torch Inductor
quant_dit_models
quanto
test_attn
Testing and benchmarking different attention implementations and backends
triton-kernels
Triton kernels for Flux
xDiT
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters
AI Compiler Study's Repositories
ai-compiler-study/triton-kernels
Triton kernels for Flux
ai-compiler-study/flux
Official inference repo for FLUX.1 models
ai-compiler-study/minRF-ONNX
Minimal implementation of scalable rectified flow transformers, based on SD3's approach
ai-compiler-study/nexfort
OneDiff compiler infrastructure using torch Inductor
ai-compiler-study/quanto
ai-compiler-study/cutlass-kernels
ai-compiler-study/flash-attention
Fast and memory-efficient exact attention
ai-compiler-study/kernels
ai-compiler-study/quant_dit_models
ai-compiler-study/test_attn
Testing and benchmarking different attention implementations and backends
ai-compiler-study/unet.cu
UNet diffusion model in pure CUDA
ai-compiler-study/xDiT
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters
ai-compiler-study/flux-tinygrad-opt
Optimize Flux on tinygrad