Pinned Repositories
argoexec
argoexec:
candle
Minimalist ML framework for Rust
ci-pipeline
ci-pipeline
codebox-api
CodeBox is the simplest cloud infrastructure for your LLM Apps and Services.
codeinterpreter-api
Open source implementation of the ChatGPT Code Interpreter 👾
container-images
Common container images
langfuse
🪢 Open source LLM engineering platform: Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
QiZhenMedicalExpert
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
zTaoplus
I know you know what I mean..
zTaoplus's Repositories
zTaoplus/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
zTaoplus/zTaoplus
I know you know what I mean..
zTaoplus/langfuse
🪢 Open source LLM engineering platform: Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
zTaoplus/candle
Minimalist ML framework for Rust
zTaoplus/ci-pipeline
ci-pipeline
zTaoplus/codebox-api
CodeBox is the simplest cloud infrastructure for your LLM Apps and Services.
zTaoplus/codeinterpreter-api
Open source implementation of the ChatGPT Code Interpreter 👾
zTaoplus/container-images
Common container images
zTaoplus/enterprise_gateway
A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
zTaoplus/guidance
A guidance language for controlling large language models.
zTaoplus/ESFT
Expert Specialized Fine-Tuning
zTaoplus/fastmoe
A fast MoE impl for PyTorch
zTaoplus/image-mirror
mirror unreachable images
zTaoplus/inference-framework-benchmark
Benchmark for various inference frameworks
zTaoplus/jupyter-images
Kubeflow Jupyter images
zTaoplus/langchain
🦜🔗 Build context-aware reasoning applications
zTaoplus/litellm
Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)
zTaoplus/LLaMA-Efficient-Tuning
Easy-to-use fine-tuning framework using PEFT (PT+SFT+RLHF with QLoRA) (LLaMA-2, BLOOM, Falcon, Baichuan)
zTaoplus/lm-evaluation-harness
A framework for few-shot evaluation of language models.
zTaoplus/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
zTaoplus/Megatron-LM
Ongoing research training transformer models at scale
zTaoplus/mindsdb
MindsDB connects AI models to real time data
zTaoplus/mirrored-image
zTaoplus/Parameter-Efficient-MoE
Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
zTaoplus/qlora
QLoRA: Efficient Finetuning of Quantized LLMs
zTaoplus/sglang
SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
zTaoplus/tablegpt-agent
A pre-built agent for TableGPT2.
zTaoplus/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
zTaoplus/tensorrtllm_backend
The Triton TensorRT-LLM Backend
zTaoplus/text-generation-inference
Large Language Model Text Generation Inference