Kernel for computing CNN-style 2D convolution using column major data ordering on the input and output feature maps. This approach computes more quickly if filter grouping is chosen such that Ni=No=1, or if filter grouping is chosen such that NiFrFc < MMA_SIZE, otherwise use regular convolution method MMALIB_CNN_convolve_row_ixX_ixX_oxX. This kernel is also referred to as depth-wise convolution.
The kernel is designed to process a pair of MMA-wide columns at a time, although processing a single column is supported. These two columns are typically adjacent columns in the same input feature map, although provisions are made in the interface for the two columns to come from separate input feature maps.
Supports Ni == No for small values of No
8 and 16-bit integer data type support
3x3, 5x5 and 7x7(8-bit only) kernel sizes supported
Supports stride values of 1 or 2
When stride == 2, the number of rows in an input feature map should be even, including the top and bottom pad – i.e. (pKerInitArgs->blockFeatureHeight + pKerInitArgs->topPad + pKerInitArgs->bottomPad) % 2 == 0
When stride == 2, the kernel size must be 3x3
May compute multiple groups in a single call
The input data for this kernel consists of the filter coefficients and the input feature maps, while the output is the output feature maps. The kernel requires the filter coefficients to be preprocessed and stored in a custom arrangement prior to calling the kernel execute function. This reordering can be done offline (more efficient) or at runtime. MMALIB provides the utilities in MMALIB_CNN_convolve_col_smallNo_highPrecision_reorderWeights to generate this reordering (src0) and the associated MMALIB_bufParams2D_t struct (src0_addr).
The input feature map memory arrangement is flexible and is described by the parameters illustrated in the the figure below. Note that the input feature maps for each groups must be vertically stacked on top of each other as there is no parameter to control the offset between groups.
The output feature map is described by the parameters illustrated in the the figure below.
MMALIB_CNN_convolve_col_smallNo_highPrecision requires that the weights be preprocessed into a specific arrangement. The functions in this module perform that preprocessing and other associated tasks.
This function call is required to initialize the handle. In this function most of the one time operations are performed and results are stored in the handle. More...
This function is the main compute function, and performs the convolution primitive (conv + ReLU) for CNN on the column based data arrangement. It is called multiple times. More...
This function call is required to initialize the handle. In this function most of the one time operations are performed and results are stored in the handle.
This function is the main compute function, and performs the convolution primitive (conv + ReLU) for CNN on the column based data arrangement. It is called multiple times.