dsingal0

@basetenlabs Las Vegas, NV

Pinned Repositories

truss
The simplest way to serve AI/ML models in production
Language:Python932 19 12375
flux_fp8
Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.
Language:Python00
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++0 0 00
test_70b
Language:Python0 1 00
truss-examples
Examples of models deployable with Truss
Language:Python0 0 00
unmanic-documentation
All documentation for Unmanic
Language:JavaScript0 0 00
unsloth
Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
Language:Python00
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python00
unsloth
Finetune Llama 3.3, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 70% less memory
Language:Python19.9k 134 1.2k1.4k
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python33k 271 5.8k5k

dsingal0/flux_fp8
Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.
Language:Python00
dsingal0/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++0 0 00
dsingal0/test_70b
Language:Python0 1 00
dsingal0/truss-examples
Examples of models deployable with Truss
Language:Python0 0 00
dsingal0/unmanic-documentation
All documentation for Unmanic
Language:JavaScript0 0 00
dsingal0/unsloth
Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
Language:Python00
dsingal0/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python00