Pinned Repositories
onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
bitsandbytes
8-bit CUDA functions for PyTorch
cutlass
CUDA Templates for Linear Algebra Subroutines
diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
docker_files
FasterTransformer
Transformer related optimization, including BERT, GPT
flash-attention
Fast and memory-efficient exact attention
onnx
Open Neural Network Exchange
yufenglee's Repositories
yufenglee/onnx
Open Neural Network Exchange
yufenglee/bitsandbytes
8-bit CUDA functions for PyTorch
yufenglee/cutlass
CUDA Templates for Linear Algebra Subroutines
yufenglee/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
yufenglee/docker_files
yufenglee/FasterTransformer
Transformer related optimization, including BERT, GPT
yufenglee/flash-attention
Fast and memory-efficient exact attention
yufenglee/mmperf
MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.
yufenglee/onnxruntime
ONNX Runtime: cross-platform, high performance scoring engine for ML models
yufenglee/llama
Inference code for LLaMA models
yufenglee/neural-speed
An innovation library for efficient LLM inference via low-bit quantization and sparsity
yufenglee/optimum
🏎️ Accelerate training and inference of 🤗 Transformers with easy to use hardware optimization tools
yufenglee/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
yufenglee/triton
Development repository for the Triton language and compiler
yufenglee/tutorials
Tutorials for creating and using ONNX models
yufenglee/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
yufenglee/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
yufenglee/Windows-Machine-Learning
Samples for Windows ML.