CSB-RNN: A faster-than-realtime RNN acceleration framework with compressed structured blocks

Abstract

Recurrent neural networks (RNNs) have been widely adopted in temporal sequence analysis, where realtime performance is often in demand. However, RNNs suffer from heavy computational workload as the model often comes with large weight matrices. Pruning (a model compression method) schemes have been proposed for RNNs to eliminate the redundant (close-to-zero) weight values. On one hand, the non-structured pruning methods achieve a high pruning rate but introducing computation irregularity (random sparsity), which is unfriendly to parallel hardware. On the other hand, hardware-oriented structured pruning suffers from low pruning rate due to restricted constraints on allowable pruning structure.

Publication
2020 International Conference on Supercomputing
Yuhao Ding
Yuhao Ding
PhD Candidate
Hayden Kwok-Hay So
Hayden Kwok-Hay So
Associate Professor