mananshah99's Stars
golang/go
The Go programming language
kubernetes/kubernetes
Production-Grade Container Scheduling and Management
mingrammer/diagrams
:art: Diagram as Code for prototyping cloud system architectures
kilimchoi/engineering-blogs
A curated list of engineering blogs
karpathy/llm.c
LLM training in simple, raw C/CUDA
simdjson/simdjson
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
ahmetb/kubectx
Faster way to switch between clusters and namespaces in kubectl
benfred/py-spy
Sampling profiler for Python programs
adam-maj/tiny-gpu
A minimal GPU design in Verilog to learn how GPUs work from the ground up
google/gemma.cpp
lightweight, standalone C++ inference engine for Google's Gemma models.
ReagentX/imessage-exporter
Export iMessage data + run iMessage Diagnostics
axboe/liburing
Library providing helpers for the Linux kernel io_uring support
unitycatalog/unitycatalog
Open, Multi-modal Catalog for Data & AI
unum-cloud/ucall
Web Serving and Remote Procedure Calls at 50x lower latency and 70x higher bandwidth than FastAPI, implementing JSON-RPC & REST over io_uring ☎️
ianlancetaylor/libbacktrace
A C library that may be linked into a C/C++ program to produce symbolic backtraces
Liu-xiandong/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
roma-glushko/awesome-distributed-system-projects
🚀 List of distributed system projects for inspiration and learning to build distributed services from real world examples
PABannier/bark.cpp
Suno AI's Bark model in C/C++ for fast text-to-speech generation
baidu-research/baidu-allreduce
BobMcDear/attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
mattgodbolt/pt-three-ways
Path tracing, done three ways
siboehm/ShallowSpeed
Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
eschluntz/PytorchBridge
Designing bridge trusses with Pytorch autograd
jkomoros/card-web
The web app behind thecompendium.cards
AIWintermuteAI/whispercpp
Pybind11 bindings for Whisper.cpp
okuvshynov/llama_duo
asynchronous/distributed speculative evaluation for llama3
srush/anynp
Proof-of-concept of global switching between numpy/jax/pytorch in a library.
evelynmitchell/shouldersOfGiants.rs
I have no idea what I'm doing , but llm.c in rust
jpetazzo/color
Tigerrr07/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.