Lzhang-hub's Stars
comfyanonymous/ComfyUI
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
langgenius/dify
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
karpathy/LLM101n
LLM101n: Let's build a Storyteller
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
NVIDIA/TensorRT
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
adam-maj/tiny-gpu
A minimal GPU design in Verilog to learn how GPUs work from the ground up
tencent-ailab/IP-Adapter
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
pytorch/torchtune
PyTorch native finetuning library
open-compass/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
huggingface/autotrain-advanced
🤗 AutoTrain Advanced
NVIDIA/nccl
Optimized primitives for collective multi-GPU communication
pytorch/executorch
On-device AI across mobile, embedded and edge for PyTorch
BBuf/tvm_mlir_learn
compiler learning resources collect.
NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
BBuf/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
Lightning-AI/lightning-thunder
Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
openlit/openlit
Open source platform for AI Engineering: OpenTelemetry-native LLM Observability, GPU Monitoring, Guardrails, Evaluations, Prompt Management, Vault, Playground. 🚀💻 Integrates with 30+ LLM Providers, VectorDBs, Agent Frameworks and GPUs.
microsoft/MInference
[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
alibaba/Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
vllm-project/llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
linux-rdma/perftest
Infiniband Verbs Performance Tests
Azure/MS-AMP
Microsoft Automatic Mixed Precision Library
pytorch-labs/attention-gym
Helpful tools and examples for working with flex-attention
BBuf/how-to-learn-deep-learning-framework
how to learn PyTorch and OneFlow
microsoft/vidur
A large-scale simulation framework for LLM inference
microsoft/sarathi-serve
A low-latency & high-throughput serving engine for LLMs
NVIDIA/trt-llm-as-openai-windows
This reference can be used with any existing OpenAI integrated apps to run with TRT-LLM inference locally on GeForce GPU on Windows instead of cloud.
coreweave/nccl-tests
NVIDIA NCCL Tests for Distributed Training