intel_assignment

This source code is for the interviewing of INTEL.

running

sh run.sh

gcc -std=gnu99 -mavx2 -mfma -mfma4 -fopenmp -lm \
    trace.c tensor.c im2col.c conv2d.c pooling.c relu.c main.c \
    -o main.o \
    && ./main.o

gcc -O3 -std=gnu99 -mavx2 -mfma -mfma4 -fopenmp -lm \
    trace.c tensor.c im2col.c conv2d.c pooling.c relu.c main.c \
    -o main.o \
    && ./main.o

tensor storage

In my design, every matrix is corresponding to a one-dimensional array, as shown in Figure 1. Given a 3-D matrix with a 3x4x5 shape on the left side, my program stores each element into memory along channel direction, to form a sequential one-dimensional space on the right side of Figure 1. To a 4-D matrix, the storage way almost goes like a 3-D matrix, as shown in Figure 2. I already put the index on the element square, and hope it can help to understand.

Figure 1. 3-D Tensor Storage.

Figure 2. 4-D Tensor Storage.

Specially, I use only one struct to represent the 3-D matrix and 4-D matrix, as shown in the following code block. It is easy to understand that this struct represents a 3-D matrix. However, in my design, a 4-D matrix is also stored in this struct, which means the 4-D matrix needs to be represented internally. As shown in Figure 3, the OUTSIDE format is what you can see, representing by my Tensor struct, where the shape is [Input Channel, Output Channel, Kernel Size^2], a 3-D dimensional shape. Meanwhile, the internal shape of this 4-D matrix is [Input Channel, Output Channel, Kernel Size, Kernel Size], which needs your imagination. Their storage sequences in memory are exactly the same.

Figure 3. Internal Representing of 4-D Matrix by My Struct.

output example

Name: conv2d
Average cycle : 859606612.6
Average second: 3.737420e-01
GFlop         : 0.0720
GFlop/s       : 0.1926

Name: conv2d_omp
Average cycle : 123192591.4
Average second: 5.356200e-02
GFlop         : 0.0720
GFlop/s       : 1.3442

Name: conv2d_omp_im2col_locality
Average cycle : 67256949.2
Average second: 2.924215e-02
GFlop         : 0.0720
GFlop/s       : 2.4620

Name: conv2d_simd_fma_omp_im2col_locality
Average cycle : 55251988.8
Average second: 2.402260e-02
GFlop         : 0.0720
GFlop/s       : 2.9970

reference

to-do

Mat Mul with Tilling

leejiajun/intel_assignment