Motivation

COnvolutional Nueral Networks are used across a wide range of applications due to their record breaking high accuracy. However, they are computationally expensive, and Micro ringresonator based CNN accelerators give better energy efficiency and throughput compared to electronic accelerators. The MRR-based accelerators perform convolution operations by transforming them as vector dot product operations. The size of VDP operations, number of VDP operations and supported bit precision of these accelerators is limited Scalability Analysis. But, the CNNs VDP sizes varies drastically with in and across the models. This variation in VDP size requirement with fixed VDP size of MRR-based accelerators leads to underutilization of MRRs if VDP size is less than accelerator supported size. Prior work Reconfigurable Accelerator proposed a reconfigurable accelerator employing comb switches to improve the underutilization and also throughput. However, for the CNN layers with VDP size requirement greater than the supported VDP size of accelerator, the VDPs are further broken down into smaller VDPs matching supported VDP size by the GeMM compiler. The smaller VDPs results in partial sum and a partial reduction network is employed to get the final result. In addition, due to the limited bit precision, The partial sums leads to partial sum latency which is huge decreasing the throughput and energy effieciency of these accelerators. Therefore, reducing these partial sum latency can improve the throughput and energy efficiency of these accelerators.

Idea

Generally, these MRR-based accelerators employ digital reduction networks to perform the addition of these partial sums. We aim to relax the requirement of these reduction networks by using PD based Time integrating reciever to add partial sums TIR. This can be achieved by replacing balanced PD with TIR circuits in the current MRR-based accelerators. However, this design would restrict the partial VDPs to be mapped on to the same VDP element which reduces the throughput of final VDP. In addition, we provide flexibility to add different combination of partial sums.

Queries

  • Motivation for reconfigurability is not clear