khushi-411

Open Source PyTorch && Compilers: LPython & LFortran && GSoC'22 @cupy && Intern @Quansight-Labs'21

IvyIndia

khushi-411's Stars

pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:Python83.6k 1.7k 46.2k22.6k
karpathy/nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Language:Python37.1k 374 3185.9k
karpathy/llm.c
LLM training in simple, raw C/CUDA
Language:Cuda24.3k 246 1392.7k
karpathy/llama2.c
Inference Llama 2 in one file of pure C
Language:C17.4k 193 2212.1k
karpathy/micrograd
A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
Language:Jupyter Notebook10.4k 150 301.5k
NVIDIA/apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Language:Python8.4k 100 1.2k1.4k
adam-maj/tiny-gpu
A minimal GPU design in Verilog to learn how GPUs work from the ground up
Language:SystemVerilog7.1k 68 24532
intel-analytics/ipex-llm
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Language:Python6.7k 251 2.6k1.3k
pytorch-labs/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Language:Python5.6k 61 104512
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++5.6k 109 1.1k956
linkedin/Liger-Kernel
Efficient Triton Kernels for LLM Training
Language:Python3.4k 39 98189
NVIDIA/libcudacxx
[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl
Language:C++2.3k 67 94186
NVIDIA/stdexec
`std::execution`, the proposed C++ framework for asynchronous and parallel programming.
Language:C++1.6k 56 548161
lcompilers/lpython
Python compiler
Language:C++1.5k 34 1k163
NVIDIA/cccl
CUDA Core Compute Libraries
Language:C++1.2k 30 1.4k158
Lightning-AI/lightning-thunder
Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
Language:Python1.2k 34 54178
imteekay/programming-language-research
✨ Programming Language Research, Applied PLT & Compilers
Language:Clojure873 20 052
j2kun/mlir-tutorial
MLIR For Beginners tutorial
Language:C++808 18 1767
illustrated-machine-learning/illustrated-machine-learning.github.io
Website containing illustrations about Machine Learning theory!
Language:JavaScript598 12 1262
NVIDIA/jitify
A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).
Language:C++516 25 4864
modularml/max
A collection of sample programs, notebooks, and tools which highlight the power of the MAX Platform
Language:Python362 24 13147
NVIDIA/cuQuantum
Home for cuQuantum Python & NVIDIA cuQuantum SDK C++ samples
Language:Jupyter Notebook347 20 6165
microsoft/onnxscript
ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.
Language:Python280 27 57353
NVIDIA/Fuser
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Language:C++266 18 68653
jax-ml/ml_dtypes
A stand-alone implementation of several NumPy dtype extensions used in machine learning.
Language:C++206 9 4828
albanD/subclass_zoo
Language:Jupyter Notebook146 13 2824
metaopt/optree
OpTree: Optimized PyTree Utilities
Language:Python141 5 227
csarofeen/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:C++26 5 6827
emcastillo/torch-mlir-ltc-backend
Standalone backend compilation for torch-mlir ltc
Language:C++4 3 00
csarofeen/simple_ir
Language:C++2 1 00

khushi-411

khushi-411's Stars

pytorch/pytorch

karpathy/nanoGPT

karpathy/llm.c

karpathy/llama2.c

karpathy/micrograd

NVIDIA/apex

adam-maj/tiny-gpu

intel-analytics/ipex-llm

pytorch-labs/gpt-fast

NVIDIA/cutlass

linkedin/Liger-Kernel

NVIDIA/libcudacxx

NVIDIA/stdexec

lcompilers/lpython

NVIDIA/cccl

Lightning-AI/lightning-thunder

imteekay/programming-language-research

j2kun/mlir-tutorial

illustrated-machine-learning/illustrated-machine-learning.github.io

NVIDIA/jitify

modularml/max

NVIDIA/cuQuantum

microsoft/onnxscript

NVIDIA/Fuser

jax-ml/ml_dtypes

albanD/subclass_zoo

metaopt/optree

csarofeen/pytorch

emcastillo/torch-mlir-ltc-backend

csarofeen/simple_ir