VW-SDK: Efficient Convolutional Weight Mapping Using Variable Windows for Processing-In-Memory Architectures.
With their high energy efficiency, processing-in-memory (PIM) arrays are increasingly being used for convolutional neural network (CNN) inference. In PIM-based CNN inference, the computational latency and energy are dependent on how the CNN weights are mapped to the PIM array. A recent study proposed shifted and duplicated kernel (SDK) mapping that reuses the input feature maps with a unit of a parallel window, which is convolved with duplicated kernels to obtain multiple output elements in parallel. However, the existing SDK-based mapping algorithm does not always result in the minimum computing cycles because it only maps a square-shaped parallel window with the entire channels. In this paper, we introduce a novel mapping algorithm called variable-window SDK (VW-SDK), which adaptively determines the shape of the parallel window that leads to the minimum computing cycles for a given convolutional layer and PIM array. By allowing rectangular-shaped windows with partial channels, VW-SDK utilizes the PIM array more efficiently, thereby further reduces the number of computing cycles. The simulation with a 512x512 PIM array and Resnet-18 shows that VW-SDK improves the inference speed by 1.69x compared to the existing SDK-based algorithm.
- python3.x+
- This code calculates the computing cycle according to the mapping methods (im2col, SDK, VW-SDK).
- You have to input some parameters for calculation using Parser.
- This code includes functions for VW-SDK and shows the information of the network and PIM array.
- This code calculates the computing cycle according to the mapping methods (im2col, SDK, VW-SDK).
With list, you have to input the PIM array size. ex) [rows, columns]
And then, choice the network.
In these form, you can input other networks.
These tables show the computing cycles according to the convolutional layer.
We set the array size to 512x512.
Image size | Conv layer | Im2col | SDK | VW-SDK | Optimal parallel window |
---|---|---|---|---|---|
224x224 | 3x3x3x64 | 49284 | 12321 | 6216 | 10x3x3x64 |
224x224 | 3x3x64x64 | 98568 | 24642 | 24642 | 4x4x32x128 |
112x112 | 3x3x64x128 | 24200 | 6050 | 6050 | 4x4x32x128 |
112x112 | 3x3x128x128 | 36300 | 36300 | 12100 | 4x4x32x128 |
56x56 | 3x3x128x256 | 8748 | 8748 | 5832 | 4x3x42x256 |
56x56 | 3x3x256x256 | 14580 | 14580 | 10206 | 4x3x42x256 |
Image size | Conv layer | Im2col | SDK | VW-SDK | Optimal parallel window |
---|---|---|---|---|---|
112x112 | 7x7x3x64 | 11236 | 2809 | 1431 | 10x8x3x64 |
56x56 | 3x3x64x64 | 5832 | 1458 | 1458 | 4x4x32x64 |
28x28 | 3x3x128x128 | 2028 | 2028 | 676 | 4x4x32x128 |
14x14 | 3x3x256x256 | 720 | 720 | 504 | 4x3x42x256 |
You can read the original pdf here
Each kernel with size KxKxIC (where K is kernel, IC is input channel) is unrolled into the column. A kernel-sized window in an input feature map (IFM) is convolved with the kernel.
You can read the original pdf here
This mapping computes multiple windows instead of single window simultaneously in each cycle. To reuses the input data, this method forms the parallel window that is a set of windows. Thus, it obtains multiple output feature maps (OFMs) by utilizing the PIM array.
You can read the original pdf here
You can read the original pdf here