Pinned Repositories
flash-attention
Fast and memory-efficient exact attention
codeview
Automatically exported from code.google.com/p/codeview
foricee.github.io
博客
qinglai_py_lib
vim-bash-rc
Automatically exported from code.google.com/p/vim-bash-rc
vimfiles
vim conf files
ipex-llm
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
foricee's Repositories
foricee/foricee.github.io
博客
foricee/codeview
Automatically exported from code.google.com/p/codeview
foricee/qinglai_py_lib
foricee/vim-bash-rc
Automatically exported from code.google.com/p/vim-bash-rc
foricee/vimfiles
vim conf files