ItsAbdula's Stars
SafeAILab/EAGLE
Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)
UChi-JCL/CacheGen
PSAL-POSTECH/ONNXim
ONNXim is a fast cycle-level simulator that can model multi-core NPUs for DNN inference
mit-han-lab/lmquant
rapidsai/cuvs
cuVS - a library for vector search and clustering on the GPU
microsoft/Samba
Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"
friendliai/LLMServingPerfEvaluator
mistralai/mistral-inference
Official inference library for Mistral models
openvinotoolkit/openvino
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
RussWong/CUDATutorial
A CUDA tutorial to make people learn CUDA program from 0
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
OpenNMT/CTranslate2
Fast inference engine for Transformer models
lapp0/lm-inference-engines
Comparison of Language Model Inference Engines
stanfordnlp/dspy
DSPy: The framework for programming—not prompting—foundation models
horseee/Awesome-Efficient-LLM
A curated list for Efficient Large Language Models
ItzCrazyKns/Perplexica
Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI
PyTorchKorea/tutorials-kr
🇰🇷파이토치에서 제공하는 튜토리얼의 한국어 번역을 위한 저장소입니다. (Translate PyTorch tutorials in Korean🇰🇷)
hao-ai-lab/MuxServe
hemingkx/Spec-Bench
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
hemingkx/SpeculativeDecodingPapers
📰 Must-read papers and blogs on Speculative Decoding ⚡️
mit-han-lab/qserve
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
MrYxJ/calculate-flops.pytorch
The calflops is designed to calculate FLOPs、MACs and Parameters in all various neural networks, such as Linear、 CNN、 RNN、 GCN、Transformer(Bert、LlaMA etc Large Language Model)
DefTruth/Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
intel/intel-extension-for-pytorch
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
Doubiiu/ToonCrafter
[SIGGRAPH Asia 2024, Journal Track] ToonCrafter: Generative Cartoon Interpolation
NVlabs/DoRA
[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation
NVIDIA/gdrcopy
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
2noise/ChatTTS
A generative speech model for daily dialogue.
tabtoyou/KoLLaVA
KoLLaVA: Korean Large Language-and-Vision Assistant (feat.LLaVA)
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence