FPGA Overlay

The use of FPGAs as compute accelerators has been demonstrated by numerous researchers as an effective solution to meet the performance requirement across many application domains. However, the design productivity of developing FPGA accelerators remains much lower compared to the use of a typical software development flow. Although the use of the high-level design tools may partly alleviate this shortcoming, the lengthy low-level FPGA implementation process, including synthesis, placing and routing still dramatically limits the number of compile-debug-edit cycles per day and hinders the widespread adoption of FPGAs.

In this project, we investigate the FPGA overlay, an effective FPGA design abstraction for general computation dataflow and domain-specific scenarios.

For the general cases, we have developed a rapid FPGA loop accelerator generation framework called QuickDough. By utilizing a soft coarse-grained reconfigurable array (SCGRA) overlay built on top of off-the-shelf FPGAs, it compiles a high-level loop to the overlay through rapid operation scheduling first and then generates the FPGA accelerator bitstream through rapid integration of the scheduling result and a pre-built overlay bitstream.

The domain-specific overlay is also popular that provides the specialization for the a package of parameterized applications. For example, we present FTDL, an FPGA tailored deep learning overlay, which is optimized for the timing in FPGA implementation and hardware utilization in real-world cases.

Demonstration of FPGA overaly framework.