MMALIB Datasheet
Performance Summary

Performance data was obtained on the J7AEP EVM. EVM warm cycle obtained by profiling the kernel's compute code execution after a cold run of the same code. Please refer to the kernel's documentation for more information about the parameters in the tables shown below.



CNN kernels

This section contains tables that depict the expected performance numbers for the CNN kernels.

Row-wise Convolution (MMALIB_CNN_convolveBias_row_ixX_ixX_oxX)

Bit Width
Stride
Fr
Fc
Ni
No
inWidth
validColsPerRowIn
validRowsIn
ValidColsIn
inChOffset
Bias
quantMethod
Pad left
Pad Right
Pad Top
Pad Bottom
EVM Cycles
8
1
3
3
256
64
64
NA
NA
768
1024
Yes
Yes
1
1
0
0
25261
8
1
5
5
64
64
64
NA
NA
1280
2048
Yes
Yes
2
2
0
0
27441
8
1
1
1
64
128
256
NA
NA
1024
1088
Yes
Yes
0
0
0
0
2657
8
2
5
5
12
64
128
128
18
NA
3072
Yes
Yes
2
2
2
2
3516

Depth-wise Convolution (MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX)

Bit Width
Dilation Y
Top Pad
Bottom Pad
Left Pad
Right Pad
Pad Fill Value
Lc
Lr
Num. In Channels
Num Out Channels
Num. Groups Per Kernel
EVM Warm Cycles
8
1
1
1
1
1
0
256
256
1
1
4
4823
8
1
2
2
2
2
0
256
256
1
1
4
4745
8
1
1
1
1
1
-87
512
128
1
1
4
6884
16
1
2
2
2
2
-17
256
128
1
1
4
11160

Depth-wise Convolution with High Precision (MMALIB_CNN_convolve_col_smallNo_highPrecision)

Bit Width
kernelWidth
kernelHeight
strideX
strideY
dilationX
Dilation Y
Top Pad
Bottom Pad
Left Pad
Right Pad
Pad Fill Value
Lc
Lr
Num. In Channels
Num Out Channels
Num. Groups Per Kernel
EVM Warm Cycles
8
3
3
1
1
1
1
1
1
1
1
0
256
256
1
1
2
2529
8
5
5
1
1
1
1
2
2
2
2
0
256
256
1
1
2
2517
8
3
3
2
2
1
1
1
1
1
1
-56
512
128
1
1
2
3584
16
5
5
2
2
1
1
2
2
2
2
70
256
128
1
1
2
5728

Fully-connected Layer (MMALIB_CNN_fullyConnectedBias_ixX_ixX_oxX)

Bit Width
Batch Size
Ni
No
Stride Y
Stride X
Stride H
EVM Warm Cycles
8
64
1280
128
64
1600
81920
3214
16
64
1280
32
64
2624
81920
3110


LINALG kernels

This section contains tables that depict expected performance numbers for the linear algebra kernels.

Matrix-matrix Multiplication

Bit Width
m
k
n
Stride C
Stride A
Stride B
EVM Warm Cycles
8
255
255
255
320
320
320
4521
8
2048
64
64
64
64
64
2306

Matrix-matrix Multiply and Accumulate

 Bit Width
m
k
n
EVM Warm Cycles
8
65
257
257
5190
32
65
17
257
9484

Pointwise Matrix-matrix Multiply

 Bit Width
m
n
EVM Warm Cycles
8
1025
65
2228
16
513
33
1193
32
255
63
1211

Matrix Transpose

Bit Width
M
N
EVM Warm Cycles
32
56
56
412
8
256
256
1470


DSP kernels

Shown below are the expected performance numbers for the FIR filter kernel.

FIR Filter

Bit Width
Data Size
Batch Size
Filter Size
EVM Warm Cycles
32
2048
1
512
3840
32
4096
4
16
2231


FFT kernels

Shown below are the expected performance numbers for the FFT kernel.

Bit Width
FFT Size
Batch Size
EVM Warm Cycles
16
256
16
1060
16
4096
1
1668