Wu0103's Stars
CyC2018/CS-Notes
:books: 技术面试必备基础知识、Leetcode、计算机操作系统、计算机网络、系统设计
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
NVIDIA/nccl
Optimized primitives for collective multi-GPU communication
inducer/pycuda
CUDA integration for Python, plus shiny features
enjoy-digital/litepcie
Small footprint and configurable PCIe core
ai4co/rl4co
A PyTorch library for all things Reinforcement Learning (RL) for Combinatorial Optimization (CO)
microsoft/msccl
Microsoft Collective Communication Library
CMU-SAFARI/MQSim
MQSim is a fast and accurate simulator modeling the performance of modern multi-queue (MQ) SSDs as well as traditional SATA based SSDs. MQSim faithfully models new high-bandwidth protocol implementations, steady-state SSD conditions, and the full end-to-end latency of requests in modern SSDs. It is described in detail in the FAST 2018 paper by Arash Tavakkol et al., "MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices" (https://people.inf.ethz.ch/omutlu/pub/MQSim-SSD-simulation-framework_fast18.pdf)
CMU-SAFARI/ramulator2
Ramulator 2.0 is a modern, modular, extensible, and fast cycle-accurate DRAM simulator. It provides support for agile implementation and evaluation of new memory system designs (e.g., new DRAM standards, emerging RowHammer mitigation techniques). Described in our paper https://people.inf.ethz.ch/omutlu/pub/Ramulator2_arxiv23.pdf
harvard-acc/gem5-aladdin
End-to-end SoC simulation: integrating the gem5 system simulator with the Aladdin accelerator simulator.
tukl-msd/DRAMSys
DRAMSys a SystemC TLM-2.0 based DRAM simulator.
Xilinx/DPU-PYNQ
DPU on PYNQ
vineodd/PIMSim
PIMSim is a Process-In-Memory Simulator with the compatibility of GEM5 full-system simulation.
SlugLab/CXLMemSim
A place to store the CXL simulator
aliyun/aicb
harvard-acc/smaug
SMAUG: Simulating Machine Learning Applications Using Gem5-Aladdin
BUAA-CI-LAB/Literatures-on-Homomorphic-Encryption
A reading list for homomorphic encryption
TeCSAR-UNCC/gem5-SALAM
PSAL-POSTECH/ONNXim
ONNXim is a fast cycle-level simulator that can model multi-core NPUs for DNN inference
qiyancos/gem5-with-chinese-comment
Gem5 with chinese comment and introduction (master) and some other std gem5 version.
ece-fast-lab/cxl_type3_tests
This is the respository that holds the artifacts of MICRO'23 -- Demystifying CXL Memory with True CXL-Ready Systems and CXL Memory Devices
scale-snu/attacc_simulator
YuxueYang1204/CudaDemo
Implement custom operators in PyTorch with cuda/c++
FCAS-ZJU/Chiplet-Gem5-SharedMemory
THU-DSP-LAB/ventus-gpgpu-cpp-simulator
Cycle-accurate C++ & SystemC simulator for the RISC-V GPGPU Ventus
sg20180546/CXL-awesome-paper
Paper related to Compute Express Link
clustbench/network-tests2
Benchmarks and analysis of interconnection in HPC cluster
FYNCH-BIO/dpu
Data Processing Unit for eVOLVER
Wu0103/UPMEM_GPT2XL