ningpengtao-coder's Stars
colinhacks/zod
TypeScript-first schema validation with static type inference
servo/servo
Servo, the embeddable, independent, memory-safe, modular, parallel web rendering engine
taichi-dev/taichi
Productive, portable, and performant GPU programming in Python.
huggingface/text-generation-inference
Large Language Model Text Generation Inference
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
zilliztech/GPTCache
Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
fluxcd/flux2
Open and extensible continuous delivery solution for Kubernetes. Powered by GitOps Toolkit.
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
xorbitsai/inference
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
versotile-org/verso
A web browser that plays old world blues to build new world hope
cdarlint/winutils
winutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows
noamgat/lm-format-enforcer
Enforce the output format (JSON Schema, Regex etc) of a language model
datageartech/datagear
DataGear数据可视化分析平台,自由制作任何您想要的数据看板
itsOwen/CyberScraper-2077
A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama
hao-ai-lab/LookaheadDecoding
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
kvcache-ai/Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
alibaba/Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
IST-DASLab/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
feifeibear/LLMSpeculativeSampling
Fast inference from large lauguage models via speculative decoding
hemingkx/SpeculativeDecodingPapers
📰 Must-read papers and blogs on Speculative Decoding ⚡️
fighting41love/DeepLearning-500-questions
深度学习500问,以问答形式对常用的概率知识、线性代数、机器学习、深度学习、计算机视觉等热点问题进行阐述,以帮助自己及有需要的读者。 全书分为15个章节,近20万字。由于水平有限,书中不妥之处恳请广大读者批评指正。 未完待续............ 如有意合作,联系scutjy2015@163.com 版权所有,违权必究 Tan 2018.06
lucidrains/speculative-decoding
Explorations into some recent techniques surrounding speculative decoding
hemingkx/Spec-Bench
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
MoonshotAI/moonpalace
MoonPalace(月宫)是由 Moonshot AI 月之暗面提供的 API 调试工具。
neuralmagic/AutoFP8
conveyordata/data-product-portal
Data product portal created by Dataminded
shreyansh26/Speculative-Sampling
Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmind
romsto/Speculative-Decoding
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.
uw-mad-dash/decoding-speculative-decoding