xiaoyu1004's Stars
naklecha/llama3-from-scratch
llama3 implementation one matrix multiplication at a time
microsoft/AI-System
System for AI Education Resource.
Bruce-Lee-LY/flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
karpathy/llama2.c
Inference Llama 2 in one file of pure C
ggerganov/llama.cpp
LLM inference in C/C++
THU-DSP-LAB/ventus-gpgpu-verilog
GPGPU supporting RISCV-V, developed with verilog HDL
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
raphaelmeyer/pickle-cpp
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
adam-maj/tiny-gpu
A minimal GPU design in Verilog to learn how GPUs work from the ground up
Jerc007/Open-GPGPU-FlexGrip-
FlexGripPlus: an open-source GPU model for reliability evaluation and micro architectural simulation
ROCm/ROCm-ComputeABI-Doc
ROCm - AMDGPU Compute Application Binary Interface
ROCm/LLVM-AMDGPU-Assembler-Extra
LLVM AMDGPU Assembler Helper Tools
feifei14119/rocm_start_sample
hip rocm start sample for amd gpu
FlippingLogic/fpga_read_bram
Read FPGA bram content and transfer data to PC through Uart.
WangXuan95/FPGA-UART
Include 3 independent modules: UART receiver, UART transmitter, UART to AXI4 master. 包含3个独立模块:UART接收器、UART发送器、UART转AXI4交互式调试器。
OpenXiangShan/XiangShan
Open-source high-performance RISC-V processor
THU-DSP-LAB/ventus-gpgpu-cpp-simulator
Cycle-accurate C++ & SystemC simulator for the RISC-V GPGPU Ventus
PawelPerek/eeric
A online RISC-V simulator with vector instructions support
light-ly/chisel-template
自建 chisel 工程模板
github-3rr0r/RV32ISC
A RISC-V RV32I ISA Single Cycle CPU
debtanu09/systolic_array_matrix_multiplier
This is a verilog implementation of 4x4 systolic array multiplier
XinyiYuan/Computer_Architecture
大三上计算机体系结构研讨课 源代码
NVIDIA/cub
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
hughperkins/VeriGPU
OpenSource GPU, in Verilog, loosely based on RISC-V ISA
prajna-lang/prajna
a program language for AI infrastructure
Multi2Sim/m2s-bench-amdsdk-2.5-src
AMD Software Development Kit 2.5 Sources
scratch-gpu/MIAOW2
MIAOW2.0 FPGA implementable design
ROCm/HIP
HIP: C++ Heterogeneous-Compute Interface for Portability
THU-DSP-LAB/ventus-gpgpu-doc
documentation for ventus gpgpu