xingjinglu's Stars
AUTOMATIC1111/stable-diffusion-webui
Stable Diffusion web UI
CompVis/stable-diffusion
A latent text-to-image diffusion model
LAION-AI/Open-Assistant
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
OAI/OpenAPI-Specification
The OpenAPI Specification Repository
facebook/folly
An open-source C++ library developed and used at Facebook.
apple/ml-stable-diffusion
Stable Diffusion with Core ML on Apple Silicon
triton-lang/triton
Development repository for the Triton language and compiler
temporalio/temporal
Temporal service
facebookresearch/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
gperftools/gperftools
Main gperftools repository
THUDM/GLM-130B
GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)
facebookincubator/AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
oneapi-src/oneDNN
oneAPI Deep Neural Network Library (oneDNN)
sovrasov/flops-counter.pytorch
Flops counter for convolutional networks in pytorch framework
herumi/xbyak
A JIT assembler for x86/x64 architectures supporting MMX, SSE (1-4), AVX (1-2, 512), FPU, APX, and AVX10.2
microsoft/Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
skarupke/flat_hash_map
A very fast hashtable
pytorch/torchdynamo
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
python/pyperformance
Python Performance Benchmark Suite
stochasticai/x-stable-diffusion
Real-time inference for Stable Diffusion - 0.88s latency. Covers AITemplate, nvFuser, TensorRT, FlashAttention. Join our Discord communty: https://discord.com/invite/TgHXuSJEk6
ROCm/composable_kernel
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
amirgholami/ZeroQ
[CVPR'20] ZeroQ: A Novel Zero Shot Quantization Framework
kssteven418/I-BERT
[ICML'21 Oral] I-BERT: Integer-only BERT Quantization
WoosukKwon/retraining-free-pruning
[NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers
zengkid/pdf-books
:books: PDF 书籍库
astojanov/Clover
Clover: Quantized 4-bit Linear Algebra Library
kssteven418/LTP
[KDD'22] Learned Token Pruning for Transformers
renzibei/fph-table
Flash Perfect Hash Table: an implementation of a dynamic perfect hash table, extremely fast for lookup
masahi/tvm-winograd
Test winograd convolution written in TVM for CUDA and AMDGPU
kssteven418/Q-ASR
[ICASSP'22] Integer-only Zero-shot Quantization for Efficient Speech Recognition