khushi-411
Open Source PyTorch && Compilers: LPython & LFortran && GSoC'22 @cupy && Intern @Quansight-Labs'21
IvyIndia
khushi-411's Stars
pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
karpathy/nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
karpathy/llm.c
LLM training in simple, raw C/CUDA
karpathy/llama2.c
Inference Llama 2 in one file of pure C
karpathy/micrograd
A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
NVIDIA/apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
adam-maj/tiny-gpu
A minimal GPU design in Verilog to learn how GPUs work from the ground up
intel-analytics/ipex-llm
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
pytorch-labs/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
linkedin/Liger-Kernel
Efficient Triton Kernels for LLM Training
NVIDIA/libcudacxx
[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl
NVIDIA/stdexec
`std::execution`, the proposed C++ framework for asynchronous and parallel programming.
lcompilers/lpython
Python compiler
NVIDIA/cccl
CUDA Core Compute Libraries
Lightning-AI/lightning-thunder
Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
imteekay/programming-language-research
✨ Programming Language Research, Applied PLT & Compilers
j2kun/mlir-tutorial
MLIR For Beginners tutorial
illustrated-machine-learning/illustrated-machine-learning.github.io
Website containing illustrations about Machine Learning theory!
NVIDIA/jitify
A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).
modularml/max
A collection of sample programs, notebooks, and tools which highlight the power of the MAX Platform
NVIDIA/cuQuantum
Home for cuQuantum Python & NVIDIA cuQuantum SDK C++ samples
microsoft/onnxscript
ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.
NVIDIA/Fuser
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
jax-ml/ml_dtypes
A stand-alone implementation of several NumPy dtype extensions used in machine learning.
albanD/subclass_zoo
metaopt/optree
OpTree: Optimized PyTree Utilities
csarofeen/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
emcastillo/torch-mlir-ltc-backend
Standalone backend compilation for torch-mlir ltc
csarofeen/simple_ir