foricee

china

Pinned Repositories

flash-attention
Fast and memory-efficient exact attention
Language:Python13.4k 115 1k1.2k
codeview
Automatically exported from code.google.com/p/codeview
Language:C++00
foricee.github.io
博客
Language:HTML10
qinglai_py_lib
Language:Python01
vim-bash-rc
Automatically exported from code.google.com/p/vim-bash-rc
Language:C++00
vimfiles
vim conf files
Language:Vim Script00
ipex-llm
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Language:Python6.5k 248 2.5k1.2k
lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Language:Python4.2k 35 1.3k376
gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
Language:Python1.9k 29 48150
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++8.2k 87 1.8k908

foricee/foricee.github.io
博客
Language:HTML10
foricee/codeview
Automatically exported from code.google.com/p/codeview
Language:C++00
foricee/qinglai_py_lib
Language:Python01
foricee/vim-bash-rc
Automatically exported from code.google.com/p/vim-bash-rc
Language:C++00
foricee/vimfiles
vim conf files
Language:Vim Script00