AI Hardware and Applications
Hayden Kwok-Hay So
Dec 30, 2021
Deep learning (DL) algorithms demonstrate significant improvement for machine intelligence. However, the main issue that hinders broad adoption of DL techniques in real-world applications is the massive computing workload, which prevents realtime implementation on low-power embedded platforms.
In this project, we investigated the design of parallel hardware accelerators for featured DL applications, i.e., convolutional neural networks (CNN) recurrent neural networks (RNN).
With the novel accelerator architectures and their implementation with FPGA/ASIC flow, we improved the processing throughput of the selected DL applications by over 20 times that facilitates a realtime performance.
Publications
Model-Platform Optimized Deep Neural Network Accelerator Generation through Mixed-integer Geometric Programming
Although there are distinct power-performance ad- vantages in customizing an accelerator for a specific combination of FPGA platform …
Yuhao Ding, Jiajun Wu, Yizhao Gao, Maolin Wang, Hayden Kwok-Hay So
MSD: Mixing Signed Digit Representations for Hardware-efficient DNN Acceleration on FPGA with Heterogeneous Resources
While quantizing deep neural networks (DNNs) to 8- bit fixed point representations has become the de facto technique in modern …
Jiajun Wu, Jiajun Zhou, Yizhao Gao, Yuhao Ding, Ngai Wong, Hayden Kwok-Hay So
DPACS: Hardware Accelerated Dynamic Neural Network Pruning through Algorithm-Architecture Co-design
By eliminating compute operations intelligently based on the run time input, dynamic pruning (DP) promises to improve deep neural …
Yizhao Gao, Baoheng Zhang, Xiaojuan Qi, Hayden Kwok-Hay So
NITI: Training Integer Neural Networks Using Integer-only Arithmetic
Low bitwidth integer arithmetic has been widely adopted in hardware implementations of deep neural network inference applications. …
Maolin Wang, Seyedramin Rasoulinezhad, Philip H.W. Leong, Hayden Kwok-Hay So
HAO: Hardware-aware Neural Architecture Optimization for Efficient Inference
Automatic algorithm-hardware co-design for DNN has shown great success in improving the performance of DNNs on FPGAs. However, this …
Zhen Dong, Yizhao Gao, Qijing Huang, John Wawrzynek, Hayden Kwok-Hay So, Kurt Keutzer
Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework
Deep Neural Networks (DNNs) have achieved extraordinary performance in various application domains. To support diverse DNN models, …
Sung-En Chang, Yanyu Li, Mengshu Sun, Runbin Shi, Hayden Kwok-Hay So, Xuehai Qian, Yanzhi Wang, Xue Lin
Low-Latency In Situ Image Analytics With FPGA-Based Quantized Convolutional Neural Network
Real-time in situ image analytics impose stringent latency requirements on intelligent neural network inference operations. While …
Maolin Wang, Kelvin C.M. Lee, Bob M.F. Chung, Sharatchandra Varma Bogaraju, Ho-Cheung Ng, Justin S.J. Wong, Ho Cheung Shum, Kevin K. Tsia, Hayden Kwok-Hay So
Vision Guided Crop Detection in Field Robots using FPGA-Based Reconfigurable Computers
A case study in applying modern FPGAs as a platform to accelerate intelligent vision-guided crop detection in agricultural field robots …
Cyrus Wing-Hei Chan, Philip H.W. Leong, Hayden Kwok-Hay So
FTDL: A tailored FPGA-overlay for deep learning with high scalability
Fast inference is of paramount value to a wide range of deep learning applications. To address the architecture and hardware mismatch …
Runbin Shi, Yuhao Ding, Xuechao Wei, He Li, Hang Liu, Hayden Kwok-Hay So, Caiwen Ding
CSB-RNN: A faster-than-realtime RNN acceleration framework with compressed structured blocks
Recurrent neural networks (RNNs) have been widely adopted in temporal sequence analysis, where realtime performance is often in demand. …
Runbin Shi, Peiyan Dong, Tong Geng, Yuhao Ding, Xiaolong Ma, Hayden Kwok-Hay So, Martin Herbordt, Ang Li, Yanzhi Wang
Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers
We present a novel network pruning algorithm called Dynamic Sparse Training that can jointly find the optimal network parameters and …
Junjie Liu, Zhe Xu, Runbin Shi, Ray C.C. Cheung, Hayden Kwok-Hay So
FTDL: An FPGA-tailored architecture for deep learning systems
Hardware acceleration of deep learning (DL) systems has been increasingly studied to achieve desirable performance and energy …
Runbin Shi, Yuhao Ding, Xuechao Wei, Hang Liu, Hayden Kwok-Hay So, Caiwen Ding
A super real-time RNN framework with compressed structured block
This paper presents CSB-RNN, an optimized full-stack RNN framework with the novel compressed structured block (CSB) technique. The …
Runbin Shi, Peiyan Dong, Tong Geng, Martin Herbordt, Hayden Kwok-Hay So, Yanzhi Wang
Exploiting Elasticity in Tensor Ranks for Compressing Neural Networks
Elasticities in depth, width, kernel size and resolution have been explored in compressing deep neural networks (DNNs). Recognizing …
Jie Ran, Rui Lin, Hayden Kwok-Hay So, Graziano Chesi, Ngai Wong
E-LSTM: Efficient inference of sparse LSTM on embedded heterogeneous system
Various models with Long Short-Term Memory (LSTM) network have demonstrated prior art performances in sequential information …
Runbin Shi, Junjie Liu, Hayden Kwok-Hay So, Shuo Wang, Yun Liang
Large-scale multi-class image-based cell classification with deep learning
Recent advances in ultra-high-throughput microscopy have enabled a new generation of cell classification methodologies using …
Nan Meng, Edmund Lam, Kevin Tsia, Hayden Kwok-Hay So
NnCore: A parameterized non-linear function generator for machine learning applications in FPGAs
Efficient implementation of machine learning applications on FPGAs often
Sam M.H. Ho, Hayden Kwok-Hay So
Image super-resolution for ultrafast optical time-stretch imaging
We report on a super-resolution scheme for optical time-stretch imaging. It is particularly applicable to ultrafast flow imaging, but …
Runbin Shi, Antony Chan, Edmund Lam, Hayden Kwok-Hay So
A parameterizable activation function generator for FPGA-based neural network applications
Neural network applications on FPGAs at times require arithmetic operators that are either not available in the manufacturer’s …
Sam M.H. Ho, C.H. Dominic Hung, Ho-Cheung Ng, Maolin Wang, Hayden Kwok-Hay So
Real-time object detection and classification for high-speed asymmetric-detection time-stretch optical microscopy on FPGA
A real-time object detection and classification system using FPGA developed for high-speed asymmetric time-stretched optical microscopy …
Maolin Wang, Ho-Cheung Ng, Bob MF Chung, B. Sharat Chandra Varma, Manish Kumar Jaiswal, Kevin K. Tsia, Ho Cheung Shum, Hayden Kwok-Hay So
Computationally Efficient Hyperspectral Data Learning Based on the Doubly Stochastic Dirichlet Process
The Dirichlet process (DP) prior is effective in modeling HSIs (HSI) and identifying land-cover classes. However, modeling a …
Xing Sun, Nelson Yung, Edmund Lam, Hayden Kwok-Hay So