Performance data was obtained on the J7AEP EVM. EVM warm cycle obtained by profiling the kernel's compute code execution after a cold run of the same code. Please refer to the kernel's documentation for more information about the parameters in the tables shown below.
CNN kernels
This section contains tables that depict the expected performance numbers for the CNN kernels.
Row-wise Convolution (MMALIB_CNN_convolveBias_row_ixX_ixX_oxX)
| Bit Width | Stride | Fr | Fc | Ni | No | inWidth | validColsPerRowIn | validRowsIn | ValidColsIn | inChOffset | Bias | quantMethod | Pad left | Pad Right | Pad Top | Pad Bottom | EVM Cycles
|
| 8 | 1 | 3 | 3 | 256 | 64 | 64 | NA | NA | 768 | 1024 | Yes | Yes | 1 | 1 | 0 | 0 | 25261
|
| 8 | 1 | 5 | 5 | 64 | 64 | 64 | NA | NA | 1280 | 2048 | Yes | Yes | 2 | 2 | 0 | 0 | 27441
|
| 8 | 1 | 1 | 1 | 64 | 128 | 256 | NA | NA | 1024 | 1088 | Yes | Yes | 0 | 0 | 0 | 0 | 2657
|
| 8 | 2 | 5 | 5 | 12 | 64 | 128 | 128 | 18 | NA | 3072 | Yes | Yes | 2 | 2 | 2 | 2 | 3516 |
Depth-wise Convolution (MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX)
| Bit Width | Dilation Y | Top Pad | Bottom Pad | Left Pad | Right Pad | Pad Fill Value | Lc | Lr | Num. In Channels | Num Out Channels | Num. Groups Per Kernel | EVM Warm Cycles
|
| 8 | 1 | 1 | 1 | 1 | 1 | 0 | 256 | 256 | 1 | 1 | 4 | 4823
|
| 8 | 1 | 2 | 2 | 2 | 2 | 0 | 256 | 256 | 1 | 1 | 4 | 4745
|
| 8 | 1 | 1 | 1 | 1 | 1 | -87 | 512 | 128 | 1 | 1 | 4 | 6884
|
| 16 | 1 | 2 | 2 | 2 | 2 | -17 | 256 | 128 | 1 | 1 | 4 | 11160 |
Depth-wise Convolution with High Precision (MMALIB_CNN_convolve_col_smallNo_highPrecision)
| Bit Width | kernelWidth | kernelHeight | strideX | strideY | dilationX | Dilation Y | Top Pad | Bottom Pad | Left Pad | Right Pad | Pad Fill Value | Lc | Lr | Num. In Channels | Num Out Channels | Num. Groups Per Kernel | EVM Warm Cycles
|
| 8 | 3 | 3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 256 | 256 | 1 | 1 | 2 | 2529
|
| 8 | 5 | 5 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 0 | 256 | 256 | 1 | 1 | 2 | 2517
|
| 8 | 3 | 3 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | -56 | 512 | 128 | 1 | 1 | 2 | 3584
|
| 16 | 5 | 5 | 2 | 2 | 1 | 1 | 2 | 2 | 2 | 2 | 70 | 256 | 128 | 1 | 1 | 2 | 5728 |
Fully-connected Layer (MMALIB_CNN_fullyConnectedBias_ixX_ixX_oxX)
| Bit Width | Batch Size | Ni | No | Stride Y | Stride X | Stride H | EVM Warm Cycles
|
| 8 | 64 | 1280 | 128 | 64 | 1600 | 81920 | 3214
|
| 16 | 64 | 1280 | 32 | 64 | 2624 | 81920 | 3110 |
LINALG kernels
This section contains tables that depict expected performance numbers for the linear algebra kernels.
Matrix-matrix Multiplication
| Bit Width | m | k | n | Stride C | Stride A | Stride B | EVM Warm Cycles
|
| 8 | 255 | 255 | 255 | 320 | 320 | 320 | 4521
|
| 8 | 2048 | 64 | 64 | 64 | 64 | 64 | 2306 |
Matrix-matrix Multiply and Accumulate
| Bit Width | m | k | n | EVM Warm Cycles
|
| 8 | 65 | 257 | 257 | 5190
|
| 32 | 65 | 17 | 257 | 9484 |
Pointwise Matrix-matrix Multiply
| Bit Width | m | n | EVM Warm Cycles
|
| 8 | 1025 | 65 | 2228
|
| 16 | 513 | 33 | 1193
|
| 32 | 255 | 63 | 1211 |
Matrix Transpose
| Bit Width | M | N | EVM Warm Cycles
|
| 32 | 56 | 56 | 412
|
| 8 | 256 | 256 | 1470 |
DSP kernels
Shown below are the expected performance numbers for the FIR filter kernel.
FIR Filter
| Bit Width | Data Size | Batch Size | Filter Size | EVM Warm Cycles
|
| 32 | 2048 | 1 | 512 | 3840
|
| 32 | 4096 | 4 | 16 | 2231 |
FFT kernels
Shown below are the expected performance numbers for the FFT kernel.
| Bit Width | FFT Size | Batch Size | EVM Warm Cycles
|
| 16 | 256 | 16 | 1060
|
| 16 | 4096 | 1 | 1668
|