Pinned Repositories
cuda_examples
Simple CUDA Examples
cuFFT_example
simple cuFFT examples
cuGemmProf
A simple tool to profile performance of multiple combinations of GEMM of cuBLAS
cutlass
CUDA Templates for Linear Algebra Subroutines
KerasToTensorRT
This is a simple demonstration for running Keras model model on Tensorflow with TensorRT integration(TFTRT) or on TensorRT directly without invoking "freeze_graph.py".
openacc_fortran_examples
Simple OpenACC Fortran Examples
Paddle
PArallel Distributed Deep LEarning (『飞桨』核心框架,高性能单机、分布式训练和跨平台部署)
Tensorflow_Inception_v3_TensorRT
This is a simple demonstration for running Tensorflow inception v3 model on TensorRT
tf_keras_example
TensorFlow and Keras Examples
trt-se-resnext
a sample, running se-resnext on TensorRT
jeng1220's Repositories
jeng1220/KerasToTensorRT
This is a simple demonstration for running Keras model model on Tensorflow with TensorRT integration(TFTRT) or on TensorRT directly without invoking "freeze_graph.py".
jeng1220/openacc_fortran_examples
Simple OpenACC Fortran Examples
jeng1220/cuGemmProf
A simple tool to profile performance of multiple combinations of GEMM of cuBLAS
jeng1220/Tensorflow_Inception_v3_TensorRT
This is a simple demonstration for running Tensorflow inception v3 model on TensorRT
jeng1220/trt-se-resnext
a sample, running se-resnext on TensorRT
jeng1220/cuda_examples
Simple CUDA Examples
jeng1220/cuFFT_example
simple cuFFT examples
jeng1220/cutlass
CUDA Templates for Linear Algebra Subroutines
jeng1220/Paddle
PArallel Distributed Deep LEarning (『飞桨』核心框架,高性能单机、分布式训练和跨平台部署)
jeng1220/tf_keras_example
TensorFlow and Keras Examples
jeng1220/amazon-dsstne
Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models
jeng1220/CUDALibrarySamples
CUDA Library Samples
jeng1220/cupy
NumPy-like API accelerated with CUDA
jeng1220/dlrm
An implementation of a deep learning recommendation model (DLRM)
jeng1220/flash-attention
Fast and memory-efficient exact attention
jeng1220/FluidDoc
Documentations for PaddlePaddle
jeng1220/git_test
jeng1220/gpu_isac_mirror
gpu_isac mirror
jeng1220/gpubootcamp
This repository consists for gpu bootcamp material for HPC and AI
jeng1220/install_numba_and_pyculib_by_pip
Installation instructions for numba and pyculib by pip, tested on Ubuntu.
jeng1220/stream_benchmark
CUDA stream benchmark
jeng1220/TensorRT
TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
jeng1220/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference.