jeng1220

major in heterogeneous computing such as CUDA, OpenCL, etc.

NVIDIATaiwan

Pinned Repositories

cuda_examples
Simple CUDA Examples
Language:C++30
cuFFT_example
simple cuFFT examples
Language:Cuda10
cuGemmProf
A simple tool to profile performance of multiple combinations of GEMM of cuBLAS
Language:C++24 3 17
cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++1 4 01
KerasToTensorRT
This is a simple demonstration for running Keras model model on Tensorflow with TensorRT integration(TFTRT) or on TensorRT directly without invoking "freeze_graph.py".
Language:Python67 9 723
openacc_fortran_examples
Simple OpenACC Fortran Examples
Language:Fortran53 6 010
Paddle
PArallel Distributed Deep LEarning （『飞桨』核心框架，高性能单机、分布式训练和跨平台部署）
Language:C++10
Tensorflow_Inception_v3_TensorRT
This is a simple demonstration for running Tensorflow inception v3 model on TensorRT
Language:C++12 5 27
tf_keras_example
TensorFlow and Keras Examples
Language:Python1 3 00
trt-se-resnext
a sample, running se-resnext on TensorRT
Language:C++6 3 02

jeng1220's Repositories

jeng1220/KerasToTensorRT
This is a simple demonstration for running Keras model model on Tensorflow with TensorRT integration(TFTRT) or on TensorRT directly without invoking "freeze_graph.py".
Language:Python67 9 723
jeng1220/openacc_fortran_examples
Simple OpenACC Fortran Examples
Language:Fortran53 6 010
jeng1220/cuGemmProf
A simple tool to profile performance of multiple combinations of GEMM of cuBLAS
Language:C++24 3 17
jeng1220/Tensorflow_Inception_v3_TensorRT
This is a simple demonstration for running Tensorflow inception v3 model on TensorRT
Language:C++12 5 27
jeng1220/trt-se-resnext
a sample, running se-resnext on TensorRT
Language:C++6 3 02
jeng1220/cuda_examples
Simple CUDA Examples
Language:C++30
jeng1220/cuFFT_example
simple cuFFT examples
Language:Cuda10
jeng1220/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++1 4 01
jeng1220/Paddle
PArallel Distributed Deep LEarning （『飞桨』核心框架，高性能单机、分布式训练和跨平台部署）
Language:C++10
jeng1220/tf_keras_example
TensorFlow and Keras Examples
Language:Python1 3 00
jeng1220/amazon-dsstne
Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models
jeng1220/CUDALibrarySamples
CUDA Library Samples
Language:Cuda1 0
jeng1220/cupy
NumPy-like API accelerated with CUDA
Language:Python3 0
jeng1220/dlrm
An implementation of a deep learning recommendation model (DLRM)
Language:Python2 0
jeng1220/flash-attention
Fast and memory-efficient exact attention
jeng1220/FluidDoc
Documentations for PaddlePaddle
Language:Shell2 0
jeng1220/git_test
3 01
jeng1220/gpu_isac_mirror
gpu_isac mirror
Language:Python
jeng1220/gpubootcamp
This repository consists for gpu bootcamp material for HPC and AI
Language:Jupyter Notebook2 0
jeng1220/install_numba_and_pyculib_by_pip
Installation instructions for numba and pyculib by pip, tested on Ubuntu.
jeng1220/stream_benchmark
CUDA stream benchmark
Language:Python3 01
jeng1220/TensorRT
TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
Language:C++2 0
jeng1220/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference.
Language:Python1 0

jeng1220

Pinned Repositories

cuda_examples

cuFFT_example

cuGemmProf

cutlass

KerasToTensorRT

openacc_fortran_examples

Paddle

Tensorflow_Inception_v3_TensorRT

tf_keras_example

trt-se-resnext

jeng1220's Repositories

jeng1220/KerasToTensorRT

jeng1220/openacc_fortran_examples

jeng1220/cuGemmProf

jeng1220/Tensorflow_Inception_v3_TensorRT

jeng1220/trt-se-resnext

jeng1220/cuda_examples

jeng1220/cuFFT_example

jeng1220/cutlass

jeng1220/Paddle

jeng1220/tf_keras_example

jeng1220/amazon-dsstne

jeng1220/CUDALibrarySamples

jeng1220/cupy

jeng1220/dlrm

jeng1220/flash-attention

jeng1220/FluidDoc

jeng1220/git_test

jeng1220/gpu_isac_mirror

jeng1220/gpubootcamp

jeng1220/install_numba_and_pyculib_by_pip

jeng1220/stream_benchmark

jeng1220/TensorRT

jeng1220/TransformerEngine