Pinned Repositories
dxvk
Vulkan-based implementation of D3D9, D3D10 and D3D11 for Linux / Wine
aurora
awesome-ncnn
😎 A Collection of Awesome NCNN-based Projects
basecode
The Basecode compiler toolchain and language workbench.
builder
Continuous builder and binary build scripts for pytorch
caffe
Caffe: a fast open framework for deep learning.
Caffe_Code_Analysis
Caffe_Code_Analysis
chisel-template
自建 chisel 工程模板
chisel-test
cmake-examples
Useful CMake Examples
xiaoyu1004's Repositories
xiaoyu1004/rvemu
xiaoyu1004/chisel-test
xiaoyu1004/FPGA-DDR-SDRAM
An AXI4-based DDR1 controller to realize mass, cheap memory for FPGA. 基于FPGA的DDR1控制器,为低端FPGA嵌入式系统提供廉价、大容量的存储。
xiaoyu1004/FPGA-UART
3 modules: UART receiver, UART transmitter, UART to AXI4 master. 3个模块:UART接收器、UART发送器、UART转AXI4交互式调试器
xiaoyu1004/aurora
xiaoyu1004/rvemu-singlecycle
A single cycle risc-v simulator
xiaoyu1004/chisel-template
自建 chisel 工程模板
xiaoyu1004/gpgpu-simx
a Cycle-Approximate Simulator
xiaoyu1004/RV32ISC
A RISC-V RV32I ISA Single Cycle CPU
xiaoyu1004/rvcc
a c programming compiler
xiaoyu1004/VeriGPU
OpenSource GPU, in Verilog, loosely based on RISC-V ISA
xiaoyu1004/ics-pa
The wrapper repo for NJU ICS PA.
xiaoyu1004/how_to_optimize_convolution_in_CPU
how_to_optimize_convolution_in_CPU
xiaoyu1004/ConvolutionBackward
xiaoyu1004/conv3DBwdFilter
xiaoyu1004/cublas_gemm_benchmark
xiaoyu1004/cudnnTest
xiaoyu1004/how-to-optimize-gemm-in-cpu
A gemm compute library
xiaoyu1004/NyuziProcessor
GPGPU microprocessor architecture
xiaoyu1004/cuda-tensorcore-hgemm
xiaoyu1004/how-to-optimize-gemm-cuda
xiaoyu1004/optimize-in-gpu
xiaoyu1004/gemm-optimize
optimize gemm
xiaoyu1004/juliuscblas
a simple blas library
xiaoyu1004/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
xiaoyu1004/mtensor
A C++ Cuda Tensor Lazy Computing Library
xiaoyu1004/MetaNN
xiaoyu1004/kompute
General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.
xiaoyu1004/how-to-optimize-gemm
RowMajor sgemm optimization
xiaoyu1004/ROCm-ComputeABI-Doc
ROCm - AMDGPU Compute Application Binary Interface