Pinned Repositories
a3d3-institute.github.io
draft website using html5up template
comanche
Component-based development framework for memory-centric storage systems
cs526_mlir
CS526 Final Project MLIR pass
FBTT-Embedding
This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.
hls4ml
Machine learning on FPGAs using HLS
HugeCTR_comment
llvm_cs526
pylog
PyLog: An Algorithm-Centric FPGA Programming and Synthesis Flow
PyLog-HLS4ML
An integrated flow of PyLog & HLS4ML
scalehls
A scalable High-Level Synthesis framework on MLIR
TimJZ's Repositories
TimJZ/a3d3-institute.github.io
draft website using html5up template
TimJZ/comanche
Component-based development framework for memory-centric storage systems
TimJZ/cs526_mlir
CS526 Final Project MLIR pass
TimJZ/FBTT-Embedding
This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.
TimJZ/hls4ml
Machine learning on FPGAs using HLS
TimJZ/HugeCTR_comment
TimJZ/llvm_cs526
TimJZ/pylog
PyLog: An Algorithm-Centric FPGA Programming and Synthesis Flow
TimJZ/PyLog-HLS4ML
An integrated flow of PyLog & HLS4ML
TimJZ/scalehls
A scalable High-Level Synthesis framework on MLIR