cumtchw

中国青岛

cumtchw's Stars

deepseek-ai/DeepGEMM
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Language:Cuda5k498
Tony-Tan/CUDA_Freshman
Language:Cuda2.4k459
ShadyBoukhary/GPU-research-FFT-OpenACC-CUDA
Case studies constitute a modern interdisciplinary and valuable teaching practice which plays a critical and fundamental role in the development of new skills and the formation of new knowledge. This research studies the behavior and performance of two interdisciplinary and widely adopted scientific kernels, a Fast Fourier Transform and Matrix Multiplication. Both routines are implemented in the two current most popular many-core programming models CUDA and OpenACC. A Fast Fourier Transform (FFT) samples a signal over a period of time and divides it into its frequency components, computing the Discrete Fourier Transform (DFT) of a sequence. Unlike the traditional approach to computing a DFT, FFT algorithms reduce the complexity of the problem from O(n2) to O(nLog2n). Matrix multiplication is a cornerstone routine in Mathematics, Artificial Intelligence and Machine Learning. This research also shows that the nature of the problem plays a crucial role in determining what many-core model will provide the highest benefit in performance.
Language:Cuda133
airockchip/ultralytics_yolov8
NEW - YOLOv8 🚀 in PyTorch > ONNX > CoreML > TFLite
Language:Python17442
airockchip/rknn_model_zoo
Language:C1.3k236
a-hamdi/GPU
100 days of building GPU kernels!
Language:Cuda29027
Tongkaio/CUDA_Kernel_Samples
CUDA 算子手撕与面试指南
Language:Cuda21120
Maharshi-Pandya/cudacodes
Learnings and programs related to CUDA
Language:Cuda32612
Open-LLM-VTuber/Open-LLM-VTuber
Talk to any LLM with hands-free voice interaction, voice interruption, and Live2D taking face running locally across platforms
Language:Python2.7k273
ifzhang/ByteTrack
[ECCV 2022] ByteTrack: Multi-Object Tracking by Associating Every Detection Box
Language:Python5.1k967
ultralytics/ultralytics
Ultralytics YOLO11 🚀
Language:Python38k7.4k
ireader/media-server
RTSP/RTP/RTMP/FLV/HLS/MPEG-TS/MPEG-PS/MPEG-DASH/MP4/fMP4/MKV/WebM
Language:C3.2k1.1k
ZLMediaKit/ZLMediaKit
WebRTC/RTSP/RTMP/HTTP/HLS/HTTP-FLV/WebSocket-FLV/HTTP-TS/HTTP-fMP4/WebSocket-TS/WebSocket-fMP4/GB28181/SRT server and client framework based on C++11
Language:C++14.7k3.6k
gelldur/EventBus
A lightweight and very fast event bus / event framework for C++17
Language:C++37980
hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Language:Python44.3k5.4k
NVIDIA/workbench-llamafactory
This is an NVIDIA AI Workbench example project that demonstrates an end-to-end model development workflow using Llamafactory.
Language:Jupyter Notebook5021
NVIDIA/cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Language:C7.1k2k
Cambricon/CNStream
CNStream is a streaming framework for building Cambricon machine learning pipelines http://forum.cambricon.com https://gitee.com/SolutionSDK/CNStream
Language:C++498
brucefan1983/CUDA-Programming
Sample codes for my CUDA programming book
Language:Cuda1.7k338
progschj/ThreadPool
A simple C++11 Thread Pool implementation
Language:C++8.2k2.3k
cumtchw/MemoryPool
C++内存池的高级实现，包含代码详解、CMake构建工程、应用实例。
Language:Makefile2
cacay/MemoryPool
An easy to use and efficient memory pool allocator written in C++.
Language:C++1.3k414
wispytrace/magik-toolkit
Language:C73
open-webui/open-webui
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
Language:JavaScript83.6k10.1k
MarkFzp/act-plus-plus
Imitation learning algorithms with Co-training for Mobile ALOHA: ACT, Diffusion Policy, VINN
Language:Python3.2k587
MarkFzp/mobile-aloha
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
Language:Jupyter Notebook4k692
weaigc/bingo
Bingo，一个让你呼吸顺畅 New Bing。
Language:TypeScript2.9k1.3k
nxp-imx/uboot-imx
i.MX U-Boot
Language:C117136
diffgram/diffgram
The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.
Language:Python1.9k121
microsoft/onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Language:C++16k3.1k

cumtchw

cumtchw's Stars

deepseek-ai/DeepGEMM

Tony-Tan/CUDA_Freshman

ShadyBoukhary/GPU-research-FFT-OpenACC-CUDA

airockchip/ultralytics_yolov8

airockchip/rknn_model_zoo

a-hamdi/GPU

Tongkaio/CUDA_Kernel_Samples

Maharshi-Pandya/cudacodes

Open-LLM-VTuber/Open-LLM-VTuber

ifzhang/ByteTrack

ultralytics/ultralytics

ireader/media-server

ZLMediaKit/ZLMediaKit

gelldur/EventBus

hiyouga/LLaMA-Factory

NVIDIA/workbench-llamafactory

NVIDIA/cuda-samples

Cambricon/CNStream

brucefan1983/CUDA-Programming

progschj/ThreadPool

cumtchw/MemoryPool

cacay/MemoryPool

wispytrace/magik-toolkit

open-webui/open-webui

MarkFzp/act-plus-plus

MarkFzp/mobile-aloha

weaigc/bingo

nxp-imx/uboot-imx

diffgram/diffgram

microsoft/onnxruntime