Performance data was obtained on the AM62A EVM. EVM warm cycle obtained by profiling the kernel's compute code execution after a cold run of the same code. Please refer to the kernel's documentation for more information about the parameters in the tables shown below.
CNN kernels
This section contains tables that depict the expected performance numbers for the CNN kernels.
CNN kernels
This section contains tables that depict the expected performance numbers for the CNN kernels.
Row-wise Convolution (MMALIB_CNN_convolveBias_row_ixX_ixX_oxX)
Bit Width | Stride | Fr | Fc | Ni | No | inWidth | validColsPerRowIn | validRowsIn | ValidColsIn | inChOffset | Bias | quantMethod | Pad left | Pad Right | Pad Top | Pad Bottom | EVM Cycles
|
8 | 1 | 3 | 3 | 378 | 64 | 128 | NA | NA | 512 | 576 | Yes | No | 1 | 1 | 0 | 0 | 55388
|
8 | 1 | 5 | 5 | 64 | 64 | 64 | NA | NA | 1280 | 2048 | Yes | Yes | 2 | 2 | 0 | 0 | 103360
|
8 | 1 | 1 | 1 | 64 | 128 | 256 | NA | NA | 1024 | 1088 | Yes | Yes | 0 | 0 | 0 | 0 | 9105
|
8 | 2 | 3 | 3 | 3 | 32 | 2048 | 2048 | 31 | NA | 65536 | Yes | Yes | 1 | 1 | 1 | 0 | 19282 |
Depth-wise Convolution (MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX)
Bit Width | Kernel Width | Kernel Height | StrideX | strideY | topPad | bottomPad | leftPad | rightPad | padFillValue | Lc | Lr | numInChannels | numOutChannels | numGroupsPerKernel | EVM Cycles
|
8 | 3 | 3 | 1 | 1 | 1 | 1 | 1 | 1 | -122 | 256 | 128 | 1 | 1 | 4 | 4848
|
8 | 5 | 5 | 1 | 1 | 2 | 2 | 2 | 2 | -112 | 128 | 128 | 1 | 1 | 4 | 2648
|
8 | 3 | 3 | 2 | 2 | 1 | 1 | 1 | 1 | -97 | 256 | 128 | 1 | 1 | 4 | 6726
|
16 | 3 | 3 | 1 | 1 | 1 | 1 | 1 | 1 | -62 | 128 | 128 | 1 | 1 | 4 | 4850 |
Depth-wise Convolution with High Precision (MMALIB_CNN_convolve_col_smallNo_highPrecision
Fully-connected Layer (MMALIB_CNN_fullyConnectedBias_ixX_ixX_oxX)
Bit Width | Batch Size | Ni | No | Stride Y | Stride X | Stride H | EVM Warm Cycles
|
8 | 64 | 1280 | 128 | 128 | 1600 | 40960 | 10880
|
16 | 64 | 1280 | 32 | 128 | 2624 | 40960 | 10762 |