unix1986

LMSys & LLM Infer & Machine Learning Platform & Search Engine

Pinned Repositories

acl
one advanced C/C++ lib for UNIX and WINDOWS
Language:C0 2 00
advanced-java
😮 互联网 Java 工程师进阶知识完全扫盲：涵盖高并发、分布式、高可用、微服务、海量数据处理等领域知识，后端同学必看，前端同学也可学习
Language:Java0 1 00
ATen
ATen: A TENsor library for C++11
Language:C++0 0 00
cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++0 0 00
flash-attention
Fast and memory-efficient exact attention
Language:Python0 0 00
git
Git Source Code Mirror - This is a publish-only repository and all pull requests are ignored. Please follow Documentation/SubmittingPatches procedure for any of your improvements.
Language:C1 2 00
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++00
workerman
一个高性能多进程 PHP socket 服务器框架，支持libevent。 High performance Socket server framework for network applications implemented in PHP using libevent
Language:PHP1 2 00
rucene
Rust port of Lucene
Language:Rust1k 31 1763
ZhiLight
A highly optimized LLM inference acceleration engine for Llama and its variants.
Language:C++795 40 1298

unix1986's Repositories

unix1986/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++0 0 00
unix1986/flash-attention
Fast and memory-efficient exact attention
Language:Python0 0 00
unix1986/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++00
unix1986/Awesome-ChatGPT
ChatGPT资料汇总学习，持续更新......
0 0
unix1986/BMInf
Efficient Inference for Big Models
Language:Python0 0
unix1986/BMTrain
Efficient Training (including pre-training and fine-tuning) for Big Models
Language:Python0 0
unix1986/cccl
CUDA Core Compute Libraries
unix1986/ChatGLM-6B
ChatGLM-6B：开源双语对话语言模型 | An Open Bilingual Dialogue Language Model
Language:Python0 0
unix1986/CPM-Live
Live Training for Open-source Big Models
Language:Python0 0
unix1986/cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
unix1986/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Language:Python0 0
unix1986/FasterTransformer
Transformer related optimization, including BERT, GPT
unix1986/flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda0 0
unix1986/gpt4free
decentralising the Ai Industry, just some language model api's...
Language:Python0 0
unix1986/gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
Language:Python0 0
unix1986/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
unix1986/kubectl-node-shell
Exec into node via kubectl
unix1986/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Language:Python0 0
unix1986/llama.cpp
LLM inference in C/C++
Language:C++0 0
unix1986/LLMSpeculativeSampling
Fast inference from large lauguage models via speculative decoding
Language:Python0 0
unix1986/Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
unix1986/MOSS
An open-source tool-augmented conversational language model from Fudan University
Language:Python0 0
unix1986/openai-python
The official Python library for the OpenAI API
unix1986/OpenDelta
A plug-and-play library for parameter-efficient-tuning (Delta Tuning)
Language:Python0 0
unix1986/Qwen-7B
The official repo of Qwen-7B (通义千问-7B) chat & pretrained large language model proposed by Alibaba Cloud.
Language:Python0 0
unix1986/Qwen-Agent
Agent framework and applications built upon Qwen2, featuring Function Calling, Code Interpreter, RAG, and Chrome extension.
unix1986/uvicorn
An ASGI web server, for Python. 🦄
Language:Python0 0
unix1986/veGiantModel
Language:Python0 0
unix1986/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python0 0
unix1986/ZhiLight
A highly optimized inference acceleration engine for Llama and its variants.
Language:C++