BBuf

Asian regional gold medal in 42nd ACM International Undergraduate Programming Competition . Working at Oneflow.The Creator of GiantPandaCV official account.

SkyWorkChengDu

Pinned Repositories

Darknet
AlexeyAB-DarkNet源码解析
Language:C346 10 9118
giantpandacv.com
www.giantpandacv.com
Language:Python149 9 330
how-to-learn-deep-learning-framework
how to learn PyTorch and OneFlow
332 7 120
how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
Language:Cuda1.5k 22 9121
Image-processing-algorithm
paper implement
Language:C++883 34 3274
Image-processing-algorithm-Speed
opencv
Language:C++237 9 484
Keras-Semantic-Segmentation
Keras-Semantic-Segmentation
Language:Python334 11 26101
oneflow-cifar
Language:Python13 3 02
tvm_mlir_learn
compiler learning resources collect.
Language:Python2.1k 36 4324
oneflow_convert
OneFlow->ONNX
Language:Python41 45 118

BBuf's Repositories

BBuf/tvm_mlir_learn
compiler learning resources collect.
Language:Python2.1k 36 4324
BBuf/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
Language:Cuda1.5k 22 9121
BBuf/Image-processing-algorithm
paper implement
Language:C++883 34 3274
BBuf/how-to-learn-deep-learning-framework
how to learn PyTorch and OneFlow
332 7 120
BBuf/giantpandacv.com
www.giantpandacv.com
Language:Python149 9 330
BBuf/ArmNeonOptimization
arm-neon
Language:C++85 3 323
BBuf/RWKV-World-HF-Tokenizer
Language:Python33 4 35
BBuf/flash-rwkv
Language:Python28 2 21
BBuf/run-rwkv-world-4-in-mlc-llm
20 2 4
BBuf/megatron-lm-parallel-group-playground
Language:Python13 2 0
BBuf/mlc-llm-code-analysis
Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
Language:Python10 1 01
BBuf/trl
Train transformer language models with reinforcement learning.
Language:Python4 1 0
BBuf/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (LLaMA, LLaMa2, ChatGLM2, ChatGPT, Claude, etc) over 50+ datasets.
Language:Python3 1 0
BBuf/BBuf
2 2 05
BBuf/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
Language:Cuda2 1 0
BBuf/tokenizers-cpp
Universal cross-platform tokenizers binding to HF and sentencepiece
Language:C++2 1 0
BBuf/FasterTransformer
Transformer related optimization, including BERT, GPT
Language:C++1 1 0
BBuf/LLaMA-Factory
Easy-to-use LLM fine-tuning framework (LLaMA, BLOOM, Mistral, Baichuan, Qwen, ChatGLM)
Language:Python1 1 0
BBuf/nndeploy
nndeploy是一款模型端到端部署框架。以多端推理以及基于有向无环图模型部署为内核，致力为用户提供跨平台、简单易用、高性能的模型部署体验。
Language:C++1 0 0
BBuf/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Language:Python1 1 01
BBuf/accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
Language:Python0 0
BBuf/ChatRWKV
ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.
Language:Python1 0
BBuf/deepseekv2-profile
Language:Jupyter Notebook0 0
BBuf/fastllm
纯c++的全平台llm加速库，支持python调用，chatglm-6B级模型单卡可达10000+token / s，支持glm, llama, moss基座，手机端流畅运行
Language:C++1 0
BBuf/kineto
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
Language:HTML0 0
BBuf/lm-evaluation-harness
A framework for few-shot evaluation of autoregressive language models.
Language:Python1 0
BBuf/RWKV-CUDA
The CUDA version of the RWKV language model ( https://github.com/BlinkDL/RWKV-LM )
Language:Cuda1 0
BBuf/TiledCUDA
TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
BBuf/tvm_gpu_gemm
play gemm with tvm
Language:Cuda1 0
BBuf/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs