Lzhang-hub

Lzhang-hub's Stars

comfyanonymous/ComfyUI
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
Language:Python57.5k 413 3.7k6.1k
langgenius/dify
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
Language:TypeScript52.4k 373 5k7.6k
karpathy/LLM101n
LLM101n: Let's build a Storyteller
30.2k 2.4k 01.7k
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
Language:Python14.3k 120 1.1k1.3k
NVIDIA/TensorRT
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Language:C++10.8k 157 3.8k2.1k
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
Language:Python10.6k 163 7812.4k
adam-maj/tiny-gpu
A minimal GPU design in Verilog to learn how GPUs work from the ground up
Language:SystemVerilog7.1k 67 24535
tencent-ailab/IP-Adapter
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Language:Jupyter Notebook5.3k 61 394338
pytorch/torchtune
PyTorch native finetuning library
Language:Python4.3k 47 720440
open-compass/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Language:Python4.2k 26 551443
huggingface/autotrain-advanced
🤗 AutoTrain Advanced
Language:Python4k 78 562494
NVIDIA/nccl
Optimized primitives for collective multi-GPU communication
Language:C++3.3k 153 1.3k827
pytorch/executorch
On-device AI across mobile, embedded and edge for PyTorch
Language:C++2.2k 60 516368
BBuf/tvm_mlir_learn
compiler learning resources collect.
Language:Python2.2k 37 4331
NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
Language:Python2k 34 352330
BBuf/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
Language:Cuda1.6k 24 9134
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda1.5k 19 133142
Lightning-AI/lightning-thunder
Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
Language:Python1.2k 35 56280
openlit/openlit
Open source platform for AI Engineering: OpenTelemetry-native LLM Observability, GPU Monitoring, Guardrails, Evaluations, Prompt Management, Vault, Playground. 🚀💻 Integrates with 30+ LLM Providers, VectorDBs, Agent Frameworks and GPUs.
Language:Python869 9 13573
microsoft/MInference
[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
Language:Python795 6 5938
alibaba/Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Language:Python722 10 151103
vllm-project/llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Language:Python698 12 9358
linux-rdma/perftest
Infiniband Verbs Performance Tests
Language:C618 33 103292
Azure/MS-AMP
Microsoft Automatic Mixed Precision Library
Language:Python525 11 6543
pytorch-labs/attention-gym
Helpful tools and examples for working with flex-attention
Language:Python471 5 6623
BBuf/how-to-learn-deep-learning-framework
how to learn PyTorch and OneFlow
349 7 122
microsoft/vidur
A large-scale simulation framework for LLM inference
Language:Python278 7 2544
microsoft/sarathi-serve
A low-latency & high-throughput serving engine for LLMs
Language:Python245 7 1831
NVIDIA/trt-llm-as-openai-windows
This reference can be used with any existing OpenAI integrated apps to run with TRT-LLM inference locally on GeForce GPU on Windows instead of cloud.
Language:Python116 5 112
coreweave/nccl-tests
NVIDIA NCCL Tests for Distributed Training
Language:Shell70 10 218