Pinned Repositories
accelerate
🚀 A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision
bitsandbytes
8-bit CUDA functions for PyTorch
FasterTransformer
Transformer related optimization, including BERT, GPT
GLM-130B
GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)
peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
query_doc_topk
SwissArmyTransformer
SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.
SystemC
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
wenet_trt8
huismiling's Repositories
huismiling/accelerate
🚀 A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision
huismiling/bitsandbytes
8-bit CUDA functions for PyTorch
huismiling/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
huismiling/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
huismiling/AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
huismiling/autogen
Enable Next-Gen Large Language Model Applications. Join our Discord: https://discord.gg/pAbnFJrkgZ
huismiling/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
huismiling/BitDistiller
A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.
huismiling/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
huismiling/ComputeLibrary
The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
huismiling/deepmd-kit
A deep learning package for many-body potential energy representation and molecular dynamics
huismiling/fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
huismiling/faiss
A library for efficient similarity search and clustering of dense vectors.
huismiling/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
huismiling/flash-attention
Fast and memory-efficient exact attention
huismiling/JCLIP
huismiling/llm-export
llm-export can export llm model to onnx.
huismiling/lm-evaluation-harness
A framework for few-shot evaluation of autoregressive language models.
huismiling/mmsegmentation
OpenMMLab Semantic Segmentation Toolbox and Benchmark.
huismiling/MNN
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
huismiling/optimum-benchmark
A unified multi-backend utility for benchmarking Transformers and Diffusers with support for Optimum's arsenal of hardware optimizations/quantization schemes.
huismiling/PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
huismiling/Qwen
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
huismiling/qwen.cpp
C++ implementation of Qwen-LM
huismiling/RWKV-LM
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
huismiling/safetensors
Simple, safe way to store and distribute tensors
huismiling/smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
huismiling/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
huismiling/triton
Development repository for the Triton language and compiler
huismiling/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs