fxmarty

anything deep learning deployment

Hugging FaceFrance

Pinned Repositories

AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Language:Python4.5k 31 463485
accelerated-pytorch-transformers-generation
Language:Python6 2 22
Anki-Furigana-Creator
An add-on for Anki to generate furigana on demand during Japanese vocabulary card creation
Language:Python3 1 01
bettertransformer_demo
An end-to-end gradio demo of BetterTransformer integration with 🤗 Transformers, using TorchServe or HF's Inference Endpoints
Language:Python4 1 00
directvoxgo-mareva
Easy custom datasets and visualization of new sythetized views from DirectVoxGO
Language:Python5 2 01
efficient-attention-benchmark
Benchmarking PyTorch eager vs torch.nn.functional.scaled_dot_product_attention vs HazyResearch implementation
Language:Python4 3 00
gpu-gemm-hierarchy
A description of a simple GEMM hierarchy on Nvidia GPUs, as used in CUTLASS
1 1 00
q4f16-gemm-gemv-benchmark
Language:Python5 2 10
rikai-mpv
A port of Rikaichamp Japanese dictionary and parser into mpv video player
Language:TypeScript34 1 123
optimum
🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
Language:Python2.6k 58 755469

fxmarty's Repositories

fxmarty/accelerated-pytorch-transformers-generation
Language:Python6 2 22
fxmarty/flash-attention-rocm
Fast and memory-efficient exact attention
Language:Python1 0 0
fxmarty/hgemm_vs_gemmex
Language:Python1 1 0
fxmarty/pyrsmi
python package of rocm-smi-lib
Language:Python1 0 0
fxmarty/transformers-regression-test
Language:Python1 1 0
fxmarty/vllm-public
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python1 0 0
fxmarty/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Language:Python0 01
fxmarty/autogptq-test
Language:Python1 0
fxmarty/bench-flash
Language:Python1 0
fxmarty/dummy-repo
1 0
fxmarty/exllama-kernels
q4f16 kernel extracted from exllama
1 0
fxmarty/exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
Language:Python0 0
fxmarty/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Language:Python0 0
fxmarty/Medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Language:Jupyter Notebook0 0
fxmarty/neural-compressor
Provide unified APIs for SOTA model compression techniques, such as low precision (INT8/INT4/FP4/NF4) quantization, sparsity, pruning, and knowledge distillation on mainstream AI frameworks such as TensorFlow, PyTorch, and ONNX Runtime.
Language:Python0 0
fxmarty/onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Language:C++0 0
fxmarty/optimum
🏎️ Accelerate training and inference of 🤗 Transformers with easy to use hardware optimization tools
Language:Python0 0
fxmarty/optimum-benchmark
A repository for benchmarking HF Optimum's optimizations for inference and training.
Language:Python0 0
fxmarty/optimum-nvidia
Language:Python0 0
fxmarty/optimum-quanto
A pytorch quantization backend for optimum
Language:Python0 0
fxmarty/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:Python0 0
fxmarty/rocm-vllm
Language:Python1 0
fxmarty/safetensors
Simple, safe way to store and distribute tensors
Language:Python0 0
fxmarty/sentence-transformers
Multilingual Sentence & Image Embeddings with BERT
Language:Python0 0
fxmarty/test-github-actions-environments
1 01
fxmarty/text-embeddings-inference
A blazing fast inference solution for text embeddings models
Language:Rust0 0
fxmarty/text-generation-inference
Large Language Model Text Generation Inference
Language:Python0 0
fxmarty/torch_library_playground
Language:Python
fxmarty/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Language:Python0 0
fxmarty/transformers-hard-fork
Language:Python1 0