While quantizing deep neural networks (DNNs) to 8- bit fixed point representations has become the de facto technique in modern inference accelerator designs, the quest to further improve hardware efficiency by reducing the bitwidth remains a major challenge due to (i) the significant loss in accuracy and (ii) the need for specialized hardware to operate on these ultra-low bitwidth data that are not readily available in commodity devices. By employing a mix of a novel restricted signed digit (RSD) rep- resentation that utilizes a limited number of effectual bits and the conventional 2’s complement representation of weights, a hybrid approach that employs both the fine-grained configurable logic resources and coarse-grained signal processing blocks in modern FPGAs is presented. Depending on the availability of fine-grained and coarse-grained resources, the proposed framework encodes a subset of weights with RSD to allow highly efficient bit- serial multiply-accumulate implementation using LUT resources. Furthermore, the number of effectual bits used in RSD is optimized to match the bit-serial hardware latency to the bit- parallel operation on the coarse-grained resources to ensure the highest run-time utilization of all on-chip resources. Experiments show that the proposed mixed signed digit (MSD) framework can achieve a 1.23× speedup on the ResNet-18 model over the state-of-the-art, and a remarkable 4.78% higher accuracy on MobileNet-V2.