BBuf
Asian regional gold medal in 42nd ACM International Undergraduate Programming Competition . Working at Oneflow.The Creator of GiantPandaCV official account.
SkyWorkChengDu
Pinned Repositories
Darknet
AlexeyAB-DarkNet源码解析
giantpandacv.com
www.giantpandacv.com
how-to-learn-deep-learning-framework
how to learn PyTorch and OneFlow
how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
Image-processing-algorithm
paper implement
Image-processing-algorithm-Speed
opencv
Keras-Semantic-Segmentation
Keras-Semantic-Segmentation
oneflow-cifar
tvm_mlir_learn
compiler learning resources collect.
oneflow_convert
OneFlow->ONNX
BBuf's Repositories
BBuf/tvm_mlir_learn
compiler learning resources collect.
BBuf/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
BBuf/Image-processing-algorithm
paper implement
BBuf/how-to-learn-deep-learning-framework
how to learn PyTorch and OneFlow
BBuf/giantpandacv.com
www.giantpandacv.com
BBuf/ArmNeonOptimization
arm-neon
BBuf/RWKV-World-HF-Tokenizer
BBuf/flash-rwkv
BBuf/run-rwkv-world-4-in-mlc-llm
BBuf/megatron-lm-parallel-group-playground
BBuf/mlc-llm-code-analysis
Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
BBuf/trl
Train transformer language models with reinforcement learning.
BBuf/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (LLaMA, LLaMa2, ChatGLM2, ChatGPT, Claude, etc) over 50+ datasets.
BBuf/BBuf
BBuf/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
BBuf/tokenizers-cpp
Universal cross-platform tokenizers binding to HF and sentencepiece
BBuf/FasterTransformer
Transformer related optimization, including BERT, GPT
BBuf/LLaMA-Factory
Easy-to-use LLM fine-tuning framework (LLaMA, BLOOM, Mistral, Baichuan, Qwen, ChatGLM)
BBuf/nndeploy
nndeploy是一款模型端到端部署框架。以多端推理以及基于有向无环图模型部署为内核,致力为用户提供跨平台、简单易用、高性能的模型部署体验。
BBuf/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
BBuf/accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
BBuf/ChatRWKV
ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.
BBuf/deepseekv2-profile
BBuf/fastllm
纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行
BBuf/kineto
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
BBuf/lm-evaluation-harness
A framework for few-shot evaluation of autoregressive language models.
BBuf/RWKV-CUDA
The CUDA version of the RWKV language model ( https://github.com/BlinkDL/RWKV-LM )
BBuf/TiledCUDA
TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
BBuf/tvm_gpu_gemm
play gemm with tvm
BBuf/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs