ClubieDong
I'm a graduate student at Nanjing University, interested in accelerating ML training/inference.
Nanjing UniversityNanjing, China
ClubieDong's Stars
markverick/ns3-ospf
Simplified, native OSPFv2 implementation on ns-3's external module for research purpose.
HigherOrderCO/Bend
A massively parallel, high-level programming language
pytorch/kineto
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
facebookresearch/HolisticTraceAnalysis
A library to analyze PyTorch traces.
openucx/ucc
Unified Collective Communication Library
meta-llama/llama-recipes
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama for WhatsApp & Messenger.
rendercv/rendercv
The engine of the RenderCV App
AnthonyCalandra/modern-cpp-features
A cheatsheet of modern C++ language and library features.
ClubieDong/QAQ-KVCacheQuantization
QAQ: Quality Adaptive Quantization for LLM KV Cache
merrymercy/awesome-tensor-compilers
A list of awesome compiler projects and papers for tensor computation and deep learning.
cabaletta/baritone
google maps for block game
NiHoel/Anno1800Calculator
Calculator for the production and consumption of goods in the computer game Anno 1800
KarlsruheMIS/KaMIS
Maximum independent sets and vertex covers of large sparse graphs.
ChatGPTNextWeb/ChatGPT-Next-Web
A cross-platform ChatGPT/Gemini UI (Web / PWA / Linux / Win / MacOS). 一键拥有你自己的跨平台 ChatGPT/Gemini/Claude LLM 应用。
LijunChang/Near-Maximum-Independent-Set
Near-linear time algorithm for computing near-maximum independent set
iPapatsoris/Maximum-Independent-Set
An exact algorithm for computing the Maximum Independent Set on graphs
EleutherAI/lm-evaluation-harness
A framework for few-shot evaluation of language models.
alibaba/clusterdata
cluster data collected from production clusters in Alibaba for cluster management research
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
cypress-io/cypress
Fast, easy and reliable testing for anything that runs in a browser.
gabime/spdlog
Fast C++ logging library.
michalusio/screeps-async-example
Using some fancy Rollup plugins to convert async/await into generators under the hood!
NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
IST-DASLab/gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
AUTOMATIC1111/stable-diffusion-webui
Stable Diffusion web UI
jupyter-xeus/xeus-cling
Jupyter kernel for the C++ programming language
bencbartlett/screeps-packrat
Lightning-fast and memory-efficient serialization of Screeps IDs, Coords, and RoomPositions
mgth/LittleBigMouse
DPI Aware mouse move across screens