Pinned Repositories
jren73's Repositories
jren73/delay_param_update
jren73/Benchmarks
jren73/Efficient-Tensor-Management-on-HM-for-Deep-Learning
jren73/dlrm_embedding_temp
jren73/jren73.github.io
jren73/linux
Linux kernel source tree
jren73/opt_dlrm
jren73/RecMG
RecMG AD
jren73/bfs_simd_benchmarks
Parallel Graph Analytics System on Knights Landing
jren73/CSrankings
A web app for ranking computer science departments according to their research output in selective venues, and for finding active faculty across a wide range of areas.
jren73/cuda_programming
Code from the "CUDA Crash Course" YouTube series by CoffeeBeforeArch
jren73/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
jren73/DeepSpeedExamples
Example models using DeepSpeed
jren73/deploy_cluster
jren73/dlrm_processed_dataset
jren73/Folder-Structure-Conventions
Folder / directory structure options and naming conventions for software projects
jren73/hcm-workshop.github.io
jren73/HM-ANN
jren73/layer-to-layer-pytorch
PyTorch implementation of L2L execution algorithm
jren73/Learning-based_MM
jren73/llm-analysis
Latency and Memory Analysis of Transformer Models for Training and Inference
jren73/Megatron-LM
Ongoing research training transformer language models at scale, including: BERT & GPT-2
jren73/ml-visuals
🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
jren73/MM_bigmem
jren73/new_dataloader
jren73/NPB-CPP
The NAS Parallel Benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures
jren73/pebs_example
An example code to use PEBS for mem profiling
jren73/seq2seq
PyTorch implementation of the RNN-based sequence-to-sequence architecture.
jren73/sunshineatnoon.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
jren73/visualization_datasets