monologuer

Pinned Repositories

candle
Minimalist ML framework for Rust
Language:Rust0 0 00
distributed-llama
Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.
Language:C++0 0 00
flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
Language:Cuda0 0 00
LLM4Decompile
Reverse Engineering: Decompiling Binary Code with Large Language Models
Language:Python0 0 00
NyuziProcessor
GPGPU microprocessor architecture
Language:C0 0 00
tiny-gpu
A minimal GPU design in Verilog to learn how GPUs work from the ground up
Language:SystemVerilog0 0 00

monologuer's Repositories

monologuer/candle
Minimalist ML framework for Rust
Language:Rust0 0 00
monologuer/distributed-llama
Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.
Language:C++0 0 00
monologuer/flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
Language:Cuda0 0 00
monologuer/LLM4Decompile
Reverse Engineering: Decompiling Binary Code with Large Language Models
Language:Python0 0 00
monologuer/NyuziProcessor
GPGPU microprocessor architecture
Language:C0 0 00
monologuer/tiny-gpu
A minimal GPU design in Verilog to learn how GPUs work from the ground up
Language:SystemVerilog0 0 00