rebel-jueonpark

rebel-jueonpark's Stars

bytedance/flux
A fast communication-overlapping library for tensor parallelism on GPUs.
Language:C++20115
bloomberg/memray
Memray is a memory profiler for Python
Language:Python13.2k394
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++8.4k947
llvm/llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
Language:LLVM28.5k11.8k
triton-inference-server/tensorrtllm_backend
The Triton TensorRT-LLM Backend
Language:Python67998
facebookresearch/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
Language:Python8.5k605
pytorch-labs/triton-cpu
An experimental CPU backend for Triton (https//github.com/openai/triton)
Language:C++343
tenstorrent/tt-buda
Tenstorrent TT-BUDA Repository
Language:Python21429
tenstorrent/tt-metal
:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Language:C++43159
open-mpi/ompi
Open MPI main development repository
Language:C2.1k859
nod-ai/SHARK-Studio
SHARK Studio -- Web UI for SHARK+IREE High Performance Machine Learning Distribution
Language:Python1.4k170
fmtlib/fmt
A modern formatting library
Language:C++20.6k2.5k
gcc-mirror/gcc
Language:C++9.2k4.4k
nod-ai/techtalks
16
facebookresearch/fairscale
PyTorch extensions for high performance and large scale training.
Language:Python3.2k279
microsoft/triton-shared
Shared Middle-Layer for Triton Compilation
Language:MLIR17137
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python28.4k4.2k
intel/mlir-extensions
Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.
Language:MLIR11944
triton-inference-server/pytorch_backend
The Triton backend for the PyTorch TorchScript models.
Language:C++11943
modularml/mojo
The Mojo Programming Language
Language:Mojo23k2.6k
mlc-ai/docs
The documents for TVM Unity
Language:Shell112
huggingface/optimum
🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
Language:Python2.5k451
llvm/torch-mlir
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
Language:C++1.3k496
octoml/relax
A fork of tvm/unity
Language:Python1513
gabime/spdlog
Fast C++ logging library.
Language:C++24.1k4.5k
plaidml/plaidml
PlaidML is a framework for making deep learning work everywhere.
Language:C++4.6k400
triton-lang/triton
Development repository for the Triton language and compiler
Language:C++13k1.6k
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
Language:Python13.8k1.3k
pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:Python82.9k22.4k
ise-uiuc/neuri-artifact
Artifact for ESEC/FSE'23 paper "NeuRI: Diversifying DNN Generation via Inductive Rule Inference"
Language:Python295