AI Hardware and Applications

Hayden Kwok-Hay So

Dec 30, 2021

Deep learning (DL) algorithms demonstrate significant improvement for machine intelligence. However, the main issue that hinders broad adoption of DL techniques in real-world applications is the massive computing workload, which prevents realtime implementation on low-power embedded platforms.

In this project, we investigated the design of parallel hardware accelerators for featured DL applications, i.e., convolutional neural networks (CNN) recurrent neural networks (RNN).

With the novel accelerator architectures and their implementation with FPGA/ASIC flow, we improved the processing throughput of the selected DL applications by over 20 times that facilitates a realtime performance.

Publications

Model-Platform Optimized Deep Neural Network Accelerator Generation through Mixed-integer Geometric Programming

Although there are distinct power-performance ad- vantages in customizing an accelerator for a specific combination of FPGA platform …

Yuhao Ding, Jiajun Wu, Yizhao Gao, Maolin Wang, Hayden Kwok-Hay So

PDF Project

MSD: Mixing Signed Digit Representations for Hardware-efficient DNN Acceleration on FPGA with Heterogeneous Resources

While quantizing deep neural networks (DNNs) to 8- bit fixed point representations has become the de facto technique in modern …

Jiajun Wu, Jiajun Zhou, Yizhao Gao, Yuhao Ding, Ngai Wong, Hayden Kwok-Hay So

PDF Project

DPACS: Hardware Accelerated Dynamic Neural Network Pruning through Algorithm-Architecture Co-design

By eliminating compute operations intelligently based on the run time input, dynamic pruning (DP) promises to improve deep neural …

Yizhao Gao, Baoheng Zhang, Xiaojuan Qi, Hayden Kwok-Hay So

PDF Code Project DOI

NITI: Training Integer Neural Networks Using Integer-only Arithmetic

Low bitwidth integer arithmetic has been widely adopted in hardware implementations of deep neural network inference applications. …

Maolin Wang, Seyedramin Rasoulinezhad, Philip H.W. Leong, Hayden Kwok-Hay So

PDF Code Project DOI

HAO: Hardware-aware Neural Architecture Optimization for Efficient Inference

Automatic algorithm-hardware co-design for DNN has shown great success in improving the performance of DNNs on FPGAs. However, this …

Zhen Dong, Yizhao Gao, Qijing Huang, John Wawrzynek, Hayden Kwok-Hay So, Kurt Keutzer

Project DOI

Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework

Deep Neural Networks (DNNs) have achieved extraordinary performance in various application domains. To support diverse DNN models, …

Sung-En Chang, Yanyu Li, Mengshu Sun, Runbin Shi, Hayden Kwok-Hay So, Xuehai Qian, Yanzhi Wang, Xue Lin

Project DOI

Low-Latency In Situ Image Analytics With FPGA-Based Quantized Convolutional Neural Network

Real-time in situ image analytics impose stringent latency requirements on intelligent neural network inference operations. While …

Maolin Wang, Kelvin C.M. Lee, Bob M.F. Chung, Sharatchandra Varma Bogaraju, Ho-Cheung Ng, Justin S.J. Wong, Ho Cheung Shum, Kevin K. Tsia, Hayden Kwok-Hay So

Project DOI

Vision Guided Crop Detection in Field Robots using FPGA-Based Reconfigurable Computers

A case study in applying modern FPGAs as a platform to accelerate intelligent vision-guided crop detection in agricultural field robots …

Cyrus Wing-Hei Chan, Philip H.W. Leong, Hayden Kwok-Hay So

Project DOI

FTDL: A tailored FPGA-overlay for deep learning with high scalability

Fast inference is of paramount value to a wide range of deep learning applications. To address the architecture and hardware mismatch …

Runbin Shi, Yuhao Ding, Xuechao Wei, He Li, Hang Liu, Hayden Kwok-Hay So, Caiwen Ding

PDF Code Project Project

CSB-RNN: A faster-than-realtime RNN acceleration framework with compressed structured blocks

Recurrent neural networks (RNNs) have been widely adopted in temporal sequence analysis, where realtime performance is often in demand. …

Runbin Shi, Peiyan Dong, Tong Geng, Yuhao Ding, Xiaolong Ma, Hayden Kwok-Hay So, Martin Herbordt, Ang Li, Yanzhi Wang

PDF Project DOI

Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers

We present a novel network pruning algorithm called Dynamic Sparse Training that can jointly find the optimal network parameters and …

Junjie Liu, Zhe Xu, Runbin Shi, Ray C.C. Cheung, Hayden Kwok-Hay So

PDF Code Project DOI

FTDL: An FPGA-tailored architecture for deep learning systems

Hardware acceleration of deep learning (DL) systems has been increasingly studied to achieve desirable performance and energy …

Runbin Shi, Yuhao Ding, Xuechao Wei, Hang Liu, Hayden Kwok-Hay So, Caiwen Ding

PDF Code Project Project Poster Slides DOI

A super real-time RNN framework with compressed structured block

This paper presents CSB-RNN, an optimized full-stack RNN framework with the novel compressed structured block (CSB) technique. The …

Runbin Shi, Peiyan Dong, Tong Geng, Martin Herbordt, Hayden Kwok-Hay So, Yanzhi Wang

PDF Project Slides

Exploiting Elasticity in Tensor Ranks for Compressing Neural Networks

Elasticities in depth, width, kernel size and resolution have been explored in compressing deep neural networks (DNNs). Recognizing …

Jie Ran, Rui Lin, Hayden Kwok-Hay So, Graziano Chesi, Ngai Wong

Project DOI

E-LSTM: Efficient inference of sparse LSTM on embedded heterogeneous system

Various models with Long Short-Term Memory (LSTM) network have demonstrated prior art performances in sequential information …

Runbin Shi, Junjie Liu, Hayden Kwok-Hay So, Shuo Wang, Yun Liang

PDF Code Project Poster Slides DOI

Large-scale multi-class image-based cell classification with deep learning

Recent advances in ultra-high-throughput microscopy have enabled a new generation of cell classification methodologies using …

Nan Meng, Edmund Lam, Kevin Tsia, Hayden Kwok-Hay So

Project DOI

NnCore: A parameterized non-linear function generator for machine learning applications in FPGAs

Efficient implementation of machine learning applications on FPGAs often

Sam M.H. Ho, Hayden Kwok-Hay So

PDF Code Project DOI

Image super-resolution for ultrafast optical time-stretch imaging

We report on a super-resolution scheme for optical time-stretch imaging. It is particularly applicable to ultrafast ﬂow imaging, but …

Runbin Shi, Antony Chan, Edmund Lam, Hayden Kwok-Hay So

PDF Project Slides

A parameterizable activation function generator for FPGA-based neural network applications

Neural network applications on FPGAs at times require arithmetic operators that are either not available in the manufacturer’s …

Sam M.H. Ho, C.H. Dominic Hung, Ho-Cheung Ng, Maolin Wang, Hayden Kwok-Hay So

PDF Project DOI

Real-time object detection and classification for high-speed asymmetric-detection time-stretch optical microscopy on FPGA

A real-time object detection and classification system using FPGA developed for high-speed asymmetric time-stretched optical microscopy …

Maolin Wang, Ho-Cheung Ng, Bob MF Chung, B. Sharat Chandra Varma, Manish Kumar Jaiswal, Kevin K. Tsia, Ho Cheung Shum, Hayden Kwok-Hay So

Project Project DOI

Computationally Efficient Hyperspectral Data Learning Based on the Doubly Stochastic Dirichlet Process

The Dirichlet process (DP) prior is effective in modeling HSIs (HSI) and identifying land-cover classes. However, modeling a …

Xing Sun, Nelson Yung, Edmund Lam, Hayden Kwok-Hay So

Project DOI