Wu0103

Wu0103's Stars

CyC2018/CS-Notes
:books: 技术面试必备基础知识、Leetcode、计算机操作系统、计算机网络、系统设计
175k 5.3k 56650.8k
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
Language:Python10k 160 7152.3k
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++8.2k 87 1.8k908
NVIDIA/nccl
Optimized primitives for collective multi-GPU communication
Language:C++3.1k 152 1.3k791
inducer/pycuda
CUDA integration for Python, plus shiny features
Language:Python1.8k 56 260285
enjoy-digital/litepcie
Small footprint and configurable PCIe core
Language:Python465 28 68116
ai4co/rl4co
A PyTorch library for all things Reinforcement Learning (RL) for Combinatorial Optimization (CO)
Language:Python386 8 7870
microsoft/msccl
Microsoft Collective Communication Library
Language:C++304 13 2729
CMU-SAFARI/MQSim
MQSim is a fast and accurate simulator modeling the performance of modern multi-queue (MQ) SSDs as well as traditional SATA based SSDs. MQSim faithfully models new high-bandwidth protocol implementations, steady-state SSD conditions, and the full end-to-end latency of requests in modern SSDs. It is described in detail in the FAST 2018 paper by Arash Tavakkol et al., "MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices" (https://people.inf.ethz.ch/omutlu/pub/MQSim-SSD-simulation-framework_fast18.pdf)
Language:C++270 28 54148
CMU-SAFARI/ramulator2
Ramulator 2.0 is a modern, modular, extensible, and fast cycle-accurate DRAM simulator. It provides support for agile implementation and evaluation of new memory system designs (e.g., new DRAM standards, emerging RowHammer mitigation techniques). Described in our paper https://people.inf.ethz.ch/omutlu/pub/Ramulator2_arxiv23.pdf
Language:C++216 13 4650
harvard-acc/gem5-aladdin
End-to-end SoC simulation: integrating the gem5 system simulator with the Aladdin accelerator simulator.
Language:C++212 14 3859
tukl-msd/DRAMSys
DRAMSys a SystemC TLM-2.0 based DRAM simulator.
Language:C++200 15 4753
Xilinx/DPU-PYNQ
DPU on PYNQ
Language:Tcl198 16 7267
vineodd/PIMSim
PIMSim is a Process-In-Memory Simulator with the compatibility of GEM5 full-system simulation.
Language:C++178 6 2685
SlugLab/CXLMemSim
A place to store the CXL simulator
Language:C++121 5 1720
aliyun/aicb
Language:HTML100 6 317
harvard-acc/smaug
SMAUG: Simulating Machine Learning Applications Using Gem5-Aladdin
Language:C++96 7 2727
BUAA-CI-LAB/Literatures-on-Homomorphic-Encryption
A reading list for homomorphic encryption
86 5 07
TeCSAR-UNCC/gem5-SALAM
Language:C++83 7 2322
PSAL-POSTECH/ONNXim
ONNXim is a fast cycle-level simulator that can model multi-core NPUs for DNN inference
Language:C++43 1 410
qiyancos/gem5-with-chinese-comment
Gem5 with chinese comment and introduction (master) and some other std gem5 version.
Language:C++39 1 014
ece-fast-lab/cxl_type3_tests
This is the respository that holds the artifacts of MICRO'23 -- Demystifying CXL Memory with True CXL-Ready Systems and CXL Memory Devices
Language:C32 1 07
scale-snu/attacc_simulator
Language:Python31 0 02
YuxueYang1204/CudaDemo
Implement custom operators in PyTorch with cuda/c++
Language:Python23 2 15
FCAS-ZJU/Chiplet-Gem5-SharedMemory
Language:C++17 4 012
THU-DSP-LAB/ventus-gpgpu-cpp-simulator
Cycle-accurate C++ & SystemC simulator for the RISC-V GPGPU Ventus
Language:C15 2 11
sg20180546/CXL-awesome-paper
Paper related to Compute Express Link
7 1 00
clustbench/network-tests2
Benchmarks and analysis of interconnection in HPC cluster
Language:C++4 6 177
FYNCH-BIO/dpu
Data Processing Unit for eVOLVER
Language:HTML419
Wu0103/UPMEM_GPT2XL
Language:C1 1 00