AI Hardware and Applications

Deep learning (DL) algorithms demonstrate significant improvement for machine intelligence. However, the main issue that hinders broad adoption of DL techniques in real-world applications is the massive computing workload, which prevents realtime implementation on low-power embedded platforms.

In this project, we investigated the design of parallel hardware accelerators for featured DL applications, i.e., convolutional neural networks (CNN) recurrent neural networks (RNN).

With the novel accelerator architectures and their implementation with FPGA/ASIC flow, we improved the processing throughput of the selected DL applications by over 20 times that facilitates a realtime performance.