Pinned Repositories
DeepSeek-V2
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
diffq
DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weights, in order to achieve a given trade-off between model size and accuracy.
involution
[CVPR 2021] Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator
onnxruntime-inference-examples
Examples for using ONNX Runtime for machine learning inferencing.
vision
Datasets, Transforms and Models specific to Computer Vision
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
SWE-bench
[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
mamba
Mamba SSM architecture
hsm1997's Repositories
hsm1997/diffq
DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weights, in order to achieve a given trade-off between model size and accuracy.
hsm1997/involution
[CVPR 2021] Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator
hsm1997/onnxruntime-inference-examples
Examples for using ONNX Runtime for machine learning inferencing.
hsm1997/vision
Datasets, Transforms and Models specific to Computer Vision
hsm1997/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs