CedricHwong
When I wrote this code, only God and I understood what I did. Now only God knows.
Bellevue, WA
Pinned Repositories
cedrichwong.github.io
My personal
Cloud-Deployment-of-Deep-Learning-Networks
gettingMean-2
Official Repository for the Manning Book Getting MEAN with MongoDB, Express, Angular and Node, 2nd Edition.
Image_Server
ImageSearchWebApp
Practical-research-on-a-high-performance-convolution-operator-on-ARM-Processors-
Deep learning shines brightly in computer vision, and many problems that traditional methods cannot solve are being overcome one by one. However, the high computational cost also dramatically limits the use of deep learning, and its computationally intensive characteristics are particularly prominent on platforms with limited computing resources such as mobile devices and embedded devices. High performance is the pursuit of modern deep learning training frameworks. Researchers use many operators when implementing algorithms, and many of these operators are non-computational, such as transpose, zip, conditional slice, irregular split/concatenate, and top-k. The fine-grained operator takes more and more time in the model training process. There are few operators in the classification model, and the operators in the detection and other models take up even more than 50% of the time. Therefore, the high performance of the training framework relies heavily on the overall optimization of these continuous operators. In this article, the team members will specify the compilation platform, architecture, and instruction set during the compilation process. It can make the compilation as close as possible to the characteristics of the Arm architecture. For the optimized compilation of Arm processors and their instruction sets, the experimental results will be compared with the performance indicators of the algorithms compiled from general instructions.
Practical-research-on-machine-learning
For the realization of a series of classic machine learning algorithms and simple models, the Python language is used, and the basic practice and visualization provided by some libraries of Numpy and Matplotlib are involved.
rainbow
🌈‒ the Ethereum wallet that lives in your pocket
Social-Media-Web-Application-Base-on-MERN
A Social Media Web Application Create by using React Framework, MongoDB, GraphQL
flash-attention
Fast and memory-efficient exact attention
CedricHwong's Repositories
CedricHwong/cedrichwong.github.io
My personal
CedricHwong/Cloud-Deployment-of-Deep-Learning-Networks
CedricHwong/gettingMean-2
Official Repository for the Manning Book Getting MEAN with MongoDB, Express, Angular and Node, 2nd Edition.
CedricHwong/Image_Server
CedricHwong/ImageSearchWebApp
CedricHwong/Practical-research-on-a-high-performance-convolution-operator-on-ARM-Processors-
Deep learning shines brightly in computer vision, and many problems that traditional methods cannot solve are being overcome one by one. However, the high computational cost also dramatically limits the use of deep learning, and its computationally intensive characteristics are particularly prominent on platforms with limited computing resources such as mobile devices and embedded devices. High performance is the pursuit of modern deep learning training frameworks. Researchers use many operators when implementing algorithms, and many of these operators are non-computational, such as transpose, zip, conditional slice, irregular split/concatenate, and top-k. The fine-grained operator takes more and more time in the model training process. There are few operators in the classification model, and the operators in the detection and other models take up even more than 50% of the time. Therefore, the high performance of the training framework relies heavily on the overall optimization of these continuous operators. In this article, the team members will specify the compilation platform, architecture, and instruction set during the compilation process. It can make the compilation as close as possible to the characteristics of the Arm architecture. For the optimized compilation of Arm processors and their instruction sets, the experimental results will be compared with the performance indicators of the algorithms compiled from general instructions.
CedricHwong/Practical-research-on-machine-learning
For the realization of a series of classic machine learning algorithms and simple models, the Python language is used, and the basic practice and visualization provided by some libraries of Numpy and Matplotlib are involved.
CedricHwong/rainbow
🌈‒ the Ethereum wallet that lives in your pocket
CedricHwong/Social-Media-Web-Application-Base-on-MERN
A Social Media Web Application Create by using React Framework, MongoDB, GraphQL