brunowu
Scientific Researcher in HPC and numerical linear algebra
Jülich Supercomputing CentreGermany
brunowu's Stars
vietnh1009/ASCII-generator
ASCII generator (image to text, image to image, video to video)
siboehm/SGEMM_CUDA
Fast CUDA matrix multiplication from scratch
MOLOjl/WCycleSVD
FredTingaud/quick-bench-front-end
Front end side of quick-bench
DefTruth/CUDA-Learn-Notes
📚Tensor/CUDA Cores, 📖150+ CUDA Kernels, ⚡️⚡️toy-hgemm library with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS 🎉🎉).
JuliaGPU/NCCL.jl
A Julia wrapper for the NVIDIA Collective Communications Library.
facebookincubator/gloo
Collective communications library with various primitives for multi-machine training.
shap/shap
A game theoretic approach to explain the output of any machine learning model.
nidode/BLAS-Tensor-Contractions
This repository contains the source code associated with the numerical tests of the paper "Towards an Efficient Use of the BLAS Library for Multilinear Tensor Contractions"
RIKEN-RCCS/EigenExa
parallel eigenvalue solver
FZJ-JSC/JUBE
The JUBE benchmarking environment provides a script based framework to easily create benchmark sets, run those sets on different computer systems and evaluate the results. It is actively developed by the Jülich Supercomputing Centre of Forschungszentrum Jülich, Germany.
mfherbst/2024-siamla-minitutorial
SIAM LA 2024 electronic structure minitutorial
BallisticLA/RandBLAS
A header-only C++ library for sketching in randomized linear algebra
BallisticLA/RandLAPACK
A high-performance C++ library for randomized numerical linear algebra
ChASE-library/ChASE
This repository mirrors the principal Gitlab repository of the Chebyshev Accelerated Subspace iteration Eigensolver. If you want to contribute as developer to this project please contact e.di.napoli@fz-juelich.de.
HybridScale/CholeskyQR2-IM
CholeskyQR2 with Gram-Schmidt orthogonalization for extremely ill-conditioned matrices
leimao/CUDA-GEMM-Optimization
CUDA Matrix Multiplication Optimization
BigDFT-group/bigdft-school
Colab notebooks to execute BigDFT school tutorials
sb17v/bspmm
A mini-app that captures the communication pattern of Block-sparse Matrix Multiplication in flat MPI and hybrid MPI+OpenMP configurations.
NVIDIA/NVPLSamples
NVIDIA Performance Libraries: Sample code
enp1s0/ozIMMU
FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme
Yaraslaut/prop
2D FDTD solver of Maxwell's equations
federico-busato/Modern-CPP-Programming
Modern C++ Programming Course (C++03/11/14/17/20/23/26)
QBouts/BitsOfQ
Code from the BitsOfQ youtube channel
SawyerHood/draw-a-ui
Draw a mockup and generate html for it
ORNL/ReSolve
Library of GPU-resident linear solvers
buildbot/buildbot
Python-based continuous integration testing framework; your pull requests are more than welcome!
spcl/dace
DaCe - Data Centric Parallel Programming
scalable-matrix/CA3DMM
Communication-Avoiding 3D Matrix Multiplication
krahets/hello-algo
《Hello 算法》:动画图解、一键运行的数据结构与算法教程。支持 Python, Java, C++, C, C#, JS, Go, Swift, Rust, Ruby, Kotlin, TS, Dart 代码。简体版和繁体版同步更新,English version ongoing