Pinned Repositories
AFPQ
AFPQ code implementation
awesome-vit-quantization-acceleration
List of papers related to Vision Transformers quantization and hardware acceleration in recent AI conferences and journals.
BitDistiller
[ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.
cs61c
Hi, I'm a student self-learning CS61C(Summer 2020). This repository contains my work on CS61C labs and projects, if you find something mistake, please tell me or put it on Issues. Welcome communication!
Cute-Learning
Examples of CUDA implementations by Cutlass CuTe
dataset_mm
LabelTrack
LabelTrack是一个针对于多目标跟踪的图形化自动标注平台
Model_Learning
nlp_course
YSDA course in Natural Language Processing
TensorRT-in-Action
TensorRT-in-Action 是一个 GitHub 代码库,提供了使用 TensorRT 的代码示例,并有对应 Jupyter Notebook。
DD-DuDa's Repositories
DD-DuDa/BitDistiller
[ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.
DD-DuDa/LabelTrack
LabelTrack是一个针对于多目标跟踪的图形化自动标注平台
DD-DuDa/awesome-vit-quantization-acceleration
List of papers related to Vision Transformers quantization and hardware acceleration in recent AI conferences and journals.
DD-DuDa/Cute-Gemm-Optimization
DD-DuDa/TensorRT-in-Action
TensorRT-in-Action 是一个 GitHub 代码库,提供了使用 TensorRT 的代码示例,并有对应 Jupyter Notebook。
DD-DuDa/Model_Learning
DD-DuDa/nlp_course
YSDA course in Natural Language Processing
DD-DuDa/AFPQ
AFPQ code implementation
DD-DuDa/cs61c
Hi, I'm a student self-learning CS61C(Summer 2020). This repository contains my work on CS61C labs and projects, if you find something mistake, please tell me or put it on Issues. Welcome communication!
DD-DuDa/dataset_mm
DD-DuDa/DD-DuDa
Config files for my GitHub profile.
DD-DuDa/LLMEvalution
DD-DuDa/Paddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
DD-DuDa/Pattern-Recognition-Project
DD-DuDa/SGEMM-Optimization
DD-DuDa/Sparse-Needle
DD-DuDa/team-learning-nlp
主要存储Datawhale组队学习中“自然语言处理”方向的资料。
DD-DuDa/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.