Pinned Repositories
candle
Minimalist ML framework for Rust
distributed-llama
Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.
flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
LLM4Decompile
Reverse Engineering: Decompiling Binary Code with Large Language Models
NyuziProcessor
GPGPU microprocessor architecture
tiny-gpu
A minimal GPU design in Verilog to learn how GPUs work from the ground up
monologuer's Repositories
monologuer/candle
Minimalist ML framework for Rust
monologuer/distributed-llama
Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.
monologuer/flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
monologuer/LLM4Decompile
Reverse Engineering: Decompiling Binary Code with Large Language Models
monologuer/NyuziProcessor
GPGPU microprocessor architecture
monologuer/tiny-gpu
A minimal GPU design in Verilog to learn how GPUs work from the ground up