foricee's Stars
IST-DASLab/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
ggerganov/llama.cpp
LLM inference in C/C++
sony/ctm
leffff/adversarial-diffusion-distillation
My Implementation of Adversarial Diffusion Distillation https://arxiv.org/pdf/2311.17042.pdf
ExponentialML/Text-To-Video-Finetuning
Finetune ModelScope's Text To Video model using Diffusers 🧨
bnabis93/vision-language-examples
Vision-lanugage model example code.
alibaba/rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
siboehm/SGEMM_CUDA
Fast CUDA matrix multiplication from scratch
wangsiping97/FastGEMV
High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
alibaba/animate-anything
Fine-Grained Open Domain Image Animation with Motion Guidance
openmlsys/openmlsys-zh
《Machine Learning Systems: Design and Implementation》- Chinese Version
triton-inference-server/tensorrtllm_backend
The Triton TensorRT-LLM Backend
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
facebookresearch/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
triton-lang/triton
Development repository for the Triton language and compiler
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
karpathy/ng-video-lecture
InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
HqWu-HITCS/Awesome-Chinese-LLM
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
benbalter/word-to-markdown
A ruby gem to liberate content from Microsoft Word documents
HIT-SCIR-SC/QiaoBan
wilmerwang/autoLiterature
autoLiterature是一个基于Python的自动文献管理命令行工具
NaiboWang/EasySpider
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
ai-shifu/ChatALL
Concurrently chat with ChatGPT, Bing Chat, Bard, Alpaca, Vicuna, Claude, ChatGLM, MOSS, 讯飞星火, 文心一言 and more, discover the best answers
PolyAI-LDN/conversational-datasets
Large datasets for conversational AI
yanqiangmiffy/InstructGLM
ChatGLM-6B 指令学习|指令数据|Instruct
wenda-LLM/wenda
闻达:一个LLM调用平台。目标为针对特定环境的高效内容生成,同时考虑个人和中小企业的计算资源局限性,以及知识安全和私密性问题
linjinjin123/awesome-AIOps
AIOps学习资料汇总,欢迎一起补全这个仓库,欢迎star
microsoft/JARVIS
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf