Pinned Repositories
AlphaCodium
AMD-CK-fork
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
AMD-flashinfer-benchmark-fork
FlashInfer: Kernel Library for LLM Serving
GC-OXFORD-CVPR2021-gbp-poplar
Poplar implementation of "Bundle Adjustment on a Graph Processor" (CVPR 2020)
llama-cpp-python
Python bindings for llama.cpp
llama.cpp
Port of Facebook's LLaMA model in C/C++
NV-DOCA-code-examples
DOCA Application code sharing Contest
NV-nccl-tests
NCCL Tests
NV_grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM for MoE.
Tooklkit-remote-pdb-for-pytorch-distributed
Debugging torch distributed program
yiakwy-xpu-ml-framework-team's Repositories
yiakwy-xpu-ml-framework-team/Tooklkit-remote-pdb-for-pytorch-distributed
Debugging torch distributed program
yiakwy-xpu-ml-framework-team/AMD-flashinfer-benchmark-fork
FlashInfer: Kernel Library for LLM Serving
yiakwy-xpu-ml-framework-team/AMD-GCN-ASM
amdgpu example code in hip/asm
yiakwy-xpu-ml-framework-team/AMD-lab-notes-fork
AMD lab notes with code examples to demonstrate use of AMD GPUs
yiakwy-xpu-ml-framework-team/AMD-libhipcxx
The C++ Standard Library for your entire system.
yiakwy-xpu-ml-framework-team/AMD-MIGraphX-fork
AMD's graph optimization engine.
yiakwy-xpu-ml-framework-team/AMD-Profiler-MI-omniperf
Advanced Profiling and Analytics for AMD Hardware
yiakwy-xpu-ml-framework-team/AMD-ROCIR-fork
yiakwy-xpu-ml-framework-team/AMD-ROCM-Tracer
Omnitrace: Application Profiling, Tracing, and Analysis
yiakwy-xpu-ml-framework-team/AMD-rocOpt-base
Next generation library for iterative sparse solvers for ROCm platform
yiakwy-xpu-ml-framework-team/AMD-rocPRIM-fork
ROCm Parallel Primitives
yiakwy-xpu-ml-framework-team/AMD-sglang-benchmark-fork
SGLang is a fast serving framework for large language models and vision language models.
yiakwy-xpu-ml-framework-team/AMD-vllm-compile-stack
A high-throughput and memory-efficient inference and serving engine for LLMs
yiakwy-xpu-ml-framework-team/amdbench
CUDA/ROCM Kernel Benchmarking Library
yiakwy-xpu-ml-framework-team/hipbench
HIP Kernel Benchmarking Library
yiakwy-xpu-ml-framework-team/Liger-Kernel
Efficient Triton Kernels for LLM Training
yiakwy-xpu-ml-framework-team/META-torch-xla
Enabling PyTorch on XLA Devices (e.g. Google TPU)
yiakwy-xpu-ml-framework-team/MODEL_MS_VLM_LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
yiakwy-xpu-ml-framework-team/MODEL_MS_VLM_OSCAR
Oscar and VinVL
yiakwy-xpu-ml-framework-team/NV-cccl-fork
CUDA Core Compute Libraries
yiakwy-xpu-ml-framework-team/NV-cuOpt-Resources
A collection of NVIDIA cuOpt samples and other resources
yiakwy-xpu-ml-framework-team/NV-cutlass-fork
CUDA Templates for Linear Algebra Subroutines
yiakwy-xpu-ml-framework-team/NV-Megatron-LM
Ongoing research training transformer models at scale
yiakwy-xpu-ml-framework-team/NV-MODELS-LLM-Cosmos
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. Cosmos is purpose built for physical AI. The Cosmos repository will enable end users to run the Cosmos models, run inference scripts and generate videos.
yiakwy-xpu-ml-framework-team/OSS-cuPDLP-C
Code for solving LP on GPU using first-order methods
yiakwy-xpu-ml-framework-team/Tools-dockerhub
CUDA&ROCM dockerfile repo
yiakwy-xpu-ml-framework-team/xDiT-fork
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters
yiakwy-xpu-ml-framework-team/xDiT-long-context-attention-fork
Sequence Parallel Attention for Long Context LLM Model Training and Inference
yiakwy-xpu-ml-framework-team/yiakwy-xpu-ml-framework-team
Config files for my GitHub profile.
yiakwy-xpu-ml-framework-team/ZLUDA-fork
CUDA on non-NVIDIA GPUs