Pinned Repositories
AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
alphafold
Open source code for AlphaFold.
antares
Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12 and GraphCore platforms.
chatgpt-api
Node.js client for the official ChatGPT API. 🔥
cutlass
CUDA Templates for Linear Algebra Subroutines
cutlass-kernels
cuvs
cuVS - a library for vector search and clustering on the GPU
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
faiss
A library for efficient similarity search and clustering of dense vectors.
xiayuqing0622's Repositories
xiayuqing0622/AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
xiayuqing0622/alphafold
Open source code for AlphaFold.
xiayuqing0622/antares
Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12 and GraphCore platforms.
xiayuqing0622/chatgpt-api
Node.js client for the official ChatGPT API. 🔥
xiayuqing0622/cutlass
CUDA Templates for Linear Algebra Subroutines
xiayuqing0622/cutlass-kernels
xiayuqing0622/cuvs
cuVS - a library for vector search and clustering on the GPU
xiayuqing0622/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
xiayuqing0622/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
xiayuqing0622/faiss
A library for efficient similarity search and clustering of dense vectors.
xiayuqing0622/finetune-transformer-lm
Code and model for the paper "Improving Language Understanding by Generative Pre-Training"
xiayuqing0622/flash-attention
Fast and memory-efficient exact attention
xiayuqing0622/forwebhook
xiayuqing0622/incubator-tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
xiayuqing0622/nnfusion
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
xiayuqing0622/pytorch-lightning-transformers
Fine-tune transformers with pytorch-lightning
xiayuqing0622/TASO
The Tensor Algebra SuperOptimizer for Deep Learning
xiayuqing0622/ThunderKittens
Tile primitives for speedy kernels
xiayuqing0622/triton
Development repository for the Triton language and compiler
xiayuqing0622/tutel
Tutel MoE: An Optimized Mixture-of-Experts Implementation
xiayuqing0622/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities