mahaddad
Founder at konko.ai // Electrical and Computer Engineer with an interest in ML, gaming mods & scripts
New York, New York
mahaddad's Stars
mui/material-ui
Material UI: Comprehensive React component library that implements Google's Material Design. Free forever.
shadcn-ui/ui
Beautifully designed components that you can copy and paste into your apps. Accessible. Customizable. Open Source.
oobabooga/text-generation-webui
A Gradio web UI for Large Language Models.
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
BerriAI/litellm
Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
ShishirPatil/gorilla
Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
openai/triton
Development repository for the Triton language and compiler
vercel/ai
Build AI-powered applications with React, Svelte, Vue, and Solid
Mooler0410/LLMsPracticalGuide
A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)
huggingface/text-generation-inference
Large Language Model Text Generation Inference
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
kedacore/keda
KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes
alibaba/ali-dbhub
已迁移新仓库,此版本将不再维护
triton-inference-server/server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
skypilot-org/skypilot
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
TimDettmers/bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
imoneoi/openchat
OpenChat: Advancing Open-source Language Models with Imperfect Data
kserve/kserve
Standardized Serverless ML Inference Platform on Kubernetes
juncongmoo/pyllama
LLaMA: Open and Efficient Foundation Language Models
FranxYao/chain-of-thought-hub
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
mit-han-lab/llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
casper-hansen/AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
openai/openai-openapi
OpenAPI specification for the OpenAI API
ray-project/ray-llm
RayLLM - LLMs on Ray
OpenLMLab/LOMO
LOMO: LOw-Memory Optimization
Muennighoff/sgpt
SGPT: GPT Sentence Embeddings for Semantic Search
paradigmxyz/flux
Graph-based LLM power tool for exploring many completions in parallel.
ray-project/llmperf
LLMPerf is a library for validating and benchmarking LLMs
cli99/llm-analysis
Latency and Memory Analysis of Transformer Models for Training and Inference
stanford-crfm/ecosystem-graphs