Pinned Repositories
acl
one advanced C/C++ lib for UNIX and WINDOWS
advanced-java
😮 互联网 Java 工程师进阶知识完全扫盲:涵盖高并发、分布式、高可用、微服务、海量数据处理等领域知识,后端同学必看,前端同学也可学习
ATen
ATen: A TENsor library for C++11
cutlass
CUDA Templates for Linear Algebra Subroutines
flash-attention
Fast and memory-efficient exact attention
git
Git Source Code Mirror - This is a publish-only repository and all pull requests are ignored. Please follow Documentation/SubmittingPatches procedure for any of your improvements.
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
workerman
一个高性能多进程 PHP socket 服务器框架,支持libevent。 High performance Socket server framework for network applications implemented in PHP using libevent
rucene
Rust port of Lucene
ZhiLight
A highly optimized LLM inference acceleration engine for Llama and its variants.
unix1986's Repositories
unix1986/cutlass
CUDA Templates for Linear Algebra Subroutines
unix1986/flash-attention
Fast and memory-efficient exact attention
unix1986/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
unix1986/Awesome-ChatGPT
ChatGPT资料汇总学习,持续更新......
unix1986/BMInf
Efficient Inference for Big Models
unix1986/BMTrain
Efficient Training (including pre-training and fine-tuning) for Big Models
unix1986/cccl
CUDA Core Compute Libraries
unix1986/ChatGLM-6B
ChatGLM-6B:开源双语对话语言模型 | An Open Bilingual Dialogue Language Model
unix1986/CPM-Live
Live Training for Open-source Big Models
unix1986/cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
unix1986/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
unix1986/FasterTransformer
Transformer related optimization, including BERT, GPT
unix1986/flashinfer
FlashInfer: Kernel Library for LLM Serving
unix1986/gpt4free
decentralising the Ai Industry, just some language model api's...
unix1986/gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
unix1986/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
unix1986/kubectl-node-shell
Exec into node via kubectl
unix1986/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
unix1986/llama.cpp
LLM inference in C/C++
unix1986/LLMSpeculativeSampling
Fast inference from large lauguage models via speculative decoding
unix1986/Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
unix1986/MOSS
An open-source tool-augmented conversational language model from Fudan University
unix1986/openai-python
The official Python library for the OpenAI API
unix1986/OpenDelta
A plug-and-play library for parameter-efficient-tuning (Delta Tuning)
unix1986/Qwen-7B
The official repo of Qwen-7B (通义千问-7B) chat & pretrained large language model proposed by Alibaba Cloud.
unix1986/Qwen-Agent
Agent framework and applications built upon Qwen2, featuring Function Calling, Code Interpreter, RAG, and Chrome extension.
unix1986/uvicorn
An ASGI web server, for Python. 🦄
unix1986/veGiantModel
unix1986/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
unix1986/ZhiLight
A highly optimized inference acceleration engine for Llama and its variants.