Pinned Repositories
triton-viz
hpctoolkit
HPCToolkit performance tools: measurement and analysis components
Awesome-GPU
Awesome resources for GPUs
gBolt
gBolt--very fast implementation for gSpan algorithm in data mining
GPA
GPU Performance Advisor
Notes
Computer Science Reading Notes
triton-samples
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Triton-Puzzles
Puzzles for learning Triton
triton
Development repository for the Triton language and compiler
Jokeren's Repositories
Jokeren/nvprof-overhead
Jokeren/amrex
AMReX: Software Framework for Block Structured AMR
Jokeren/batched_gemm
Jokeren/benchmark
Jokeren/BERT-pytorch
Google AI 2018 BERT pytorch implementation
Jokeren/Castro
An adaptive mesh, astrophysical compressible (radiation-, magneto-) hydrodynamics simulation code for massively parallel CPU and GPU architectures.
Jokeren/CUDA-CFG-10.1
Jokeren/darknet
Windows and Linux version of Darknet Yolo v3 & v2 Neural Networks for object detection (Tensor Cores are used)
Jokeren/dyninst
DyninstAPI: Tools for binary instrumentation, analysis, and modification.
Jokeren/googletest
Googletest - Google Testing and Mocking Framework
Jokeren/gpu-rodinia
Rodinia benchmark
Jokeren/hatchet
Graph-indexed Pandas DataFrames for analyzing hierarchical performance data
Jokeren/Kripke
Kripke is a simple, scalable, 3D Sn deterministic particle transport code
Jokeren/lammps
Public development project of the LAMMPS MD software package
Jokeren/NVBit
Jokeren/nvbit-call-stack
Jokeren/Nyx
An adaptive mesh, N-body hydro cosmological simulation code
Jokeren/omniscidb
OmniSciDB (formerly MapD Core)
Jokeren/PeleC
An AMR code for compressible reacting flow simulations
Jokeren/pprof
pprof is a tool for visualization and analysis of profiling data
Jokeren/pytorch-fixes
Jokeren/Quicksilver
A proxy app for the Monte Carlo Transport Code, Mercury. LLNL-CODE-684037
Jokeren/recommonmark
A markdown parser for docutils
Jokeren/snappy
A fast compressor/decompressor
Jokeren/spack
A flexible package manager that supports multiple versions, configurations, platforms, and compilers.
Jokeren/stack-unwind-samples
Jokeren/TinySAT
Jokeren/tsm2-imp
Implementation of Tall-and-Skinny Matrix Multiplication for CUDA
Jokeren/TurboTransformers
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
Jokeren/waka-box
📊 Update a pinned gist to contain your weekly WakaTime stats