Kernel for computing CNN-style 2D convolution using column major data ordering on the input and output feature maps. This approach computes more quickly if filter grouping is chosen such that Ni=No=1, or if filter grouping is chosen such that NiFrFc < MMA_SIZE, otherwise use regular convolution method MMALIB_CNN_convolve_row_ixX_ixX_oxX. This kernel is also referred to as depth-wise convolution.
The kernel is designed to process a pair of MMA-wide columns at a time, although processing a single column is supported. These two columns are typically adjacent columns in the same input feature map, although provisions are made in the interface for the two columns to come from separate input feature maps.
Supports Ni == No for small values of No
8 and 16-bit integer data type support
3x3, 5x5 and 7x7(8-bit only) kernel sizes supported
Supports stride values of 1 or 2
When stride == 2, the number of rows in an input feature map should be even, including the top and bottom pad – i.e. (pKerInitArgs->blockFeatureHeight + pKerInitArgs->topPad + pKerInitArgs->bottomPad) % 2 == 0
When stride == 2, the kernel size must be 3x3
May compute multiple groups in a single call
The input data for this kernel consists of the filter coefficients and the input feature maps, while the output is the output feature maps. The kernel requires the filter coefficients to be preprocessed and stored in a custom arrangement prior to calling the kernel execute function. This reordering can be done offline (more efficient) or at runtime. MMALIB provides the utilities in MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_reorderWeights to generate this reordering (src0) and the associated MMALIB_bufParams2D_t struct (src0_addr).
The input feature map memory arrangement is flexible and is described by the parameters illustrated in the the figure below. Note that the input feature maps for each groups must be vertically stacked on top of each other as there is no parameter to control the offset between groups.
The output feature map is described by the parameters illustrated in the the figure below.
int32_t MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_getHandleSize (MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_InitArgs *pKerInitArgs)
This is a query function to kernel to get the size of internal handle. More...
MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_init (MMALIB_kernelHandle handle, const MMALIB_bufParams3D_t *src0_addr, const MMALIB_bufParams2D_t *src1_addr, const MMALIB_bufParams2D_t *src2_addr, const MMALIB_bufParams1D_t *src3_addr, const MMALIB_bufParams3D_t *dst_addr, const MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_InitArgs *pKerInitArgs)
This function call is required to initialize the handle. In this function most of the one time operations are performed and results are stored in the handle. More...
MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_init_checkParams (MMALIB_kernelHandle handle, const MMALIB_bufParams3D_t *src0_addr, const MMALIB_bufParams2D_t *src1_addr, const MMALIB_bufParams2D_t *src2_addr, const MMALIB_bufParams1D_t *src3_addr, const MMALIB_bufParams3D_t *dst_addr, const MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_InitArgs *pKerInitArgs)
This function checks the parameters and should be called before kernel execution. It can be called once. More...
MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_exec (MMALIB_kernelHandle handle, const void *src0, const void *src1, const void *src2, const void *src3, const uint8_t *shiftValues, const void *lutValues, void *restrict dst, const MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_ExecInArgs *pKerInArgs, MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_ExecOutArgs *pKerOutArgs)
This function is the main compute function, and performs the convolution primitive (conv + ReLU) for CNN on the column based data arrangement. It is called multiple times. More...
MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_exec_checkParams (MMALIB_kernelHandle handle, const void *src0, const void *src1, const void *src2, const void *src3, const uint8_t *shiftValues, const void *lutValues, const void *dst, const MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_ExecInArgs *pKerInArgs)
This function checks the parameters and should be called before kernel execution. It can be called once. More...
void MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_perfEst (MMALIB_kernelHandle handle, const MMALIB_bufParams3D_t *src0_addr, const MMALIB_bufParams2D_t *src1_addr, const MMALIB_bufParams2D_t *src2_addr, const MMALIB_bufParams1D_t *src3_addr, const MMALIB_bufParams3D_t *dst_addr, const MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_InitArgs *pKerInitArgs, const MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_ExecInArgs *pExecInArgs, MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_ExecOutArgs *pExecOutArgs, uint64_t *archCycles, uint64_t *estCycles)
This function estimates the cycles consumed for the kernel execution. More...
◆ MMALIB_CONVOLVE_COL_SHIFT_SINGLE
#define MMALIB_CONVOLVE_COL_SHIFT_SINGLE 0
◆ MMALIB_CONVOLVE_COL_SHIFT_PER_GROUP
#define MMALIB_CONVOLVE_COL_SHIFT_PER_GROUP 1
◆ MMALIB_CNN_CONVOLVE_COL_SMALLNO_HIGHPRECISION_POINTWISEPOST_STATUS_NAME
Enumeration for different error codes for MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost kernel.
Enumerator MMALIB_CNN_CONVOLVE_COL_SMALLNO_HIGHPRECISION_POINTWISEPOST_ERR_SMALL_K
MMALIB_CNN_CONVOLVE_COL_SMALLNO_HIGHPRECISION_POINTWISEPOST_ERR_MAX Error case because k < Ni*Fr*Fc
Definition at line 85 of file MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost.h .
◆ MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_getHandleSize()
This is a query function to kernel to get the size of internal handle.
Parameters
[in] pKerInitArgs : Pointer to structure holding init parameters
Returns Size of the buffer in bytes
◆ MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_init()
This function call is required to initialize the handle. In this function most of the one time operations are performed and results are stored in the handle.
Parameters
[in] handle : Active handle to the kernel
[in] src0_addr : Pointer to structure containing dimensional information of src0, content at this pointer address must be generated by MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_reorderWeights_fillBufParams
[in] src1_addr : Pointer to structure containing dimensional information of src1
[in] src2_addr : Pointer to structure containing dimensional information of src2 (bias)
[in] src3_addr : Pointer to structure containing dimensional information of src3 (scale)
[out] dst_addr : Pointer to structure containing dimensional information of dst
[in] pKerInitArgs : Pointer to structure holding init parameters
Returns Status of success or error with error codes, refer to MMALIB_STATUS .
◆ MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_init_checkParams()
This function checks the parameters and should be called before kernel execution. It can be called once.
Parameters
[in] handle : Active handle to the kernel
[in] src0_addr : Pointer to structure containing dimensional information of src0
[in] src1_addr : Pointer to structure containing dimensional information of src1
[in] src2_addr : Pointer to structure containing dimensional information of src2 (bias)
[in] src3_addr : Pointer to structure containing dimensional information of src3 (scale)
[out] dst_addr : Pointer to structure containing dimensional information of dst
[in] pKerInitArgs : Pointer to structure holding init parameters
Returns Status of success or error with error codes, refer to MMALIB_STATUS .
◆ MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_exec()
This function is the main compute function, and performs the convolution primitive (conv + ReLU) for CNN on the column based data arrangement. It is called multiple times.
Parameters
Returns Status of success or error with error codes, refer MMALIB_STATUS .
Assumptions:
I/O buffer pointers are assumed to be not aliased.
Fr = Fc = 3, 5 or 7 (defaults to natural-c if not the case, 7 valid for 8-bit data type only)
stride-by-2 calls have an even number of rows in the feature map (defaults to natural-c if not the case)
Ni == No (defaults to natural-c if not the case)
Ni*Fr*Fc < MMA_SIZE (defaults to natural-c if not the case)
Performance Considerations:
For best performance, the following parameter settings are recommended:
Set widths equal to strides
Align all pointers to 8 byte boundaries
Set all stride values to a multiple of 8
Set all width values to a multiple of 16
◆ MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_exec_checkParams()
This function checks the parameters and should be called before kernel execution. It can be called once.
Parameters
[in] handle : Active handle to the kernel
[in] src0[] : Pointer to buffer holding convolution weights [ A matrix]
[in] src1[] : Pointer to buffer holding input feature map data [ B matrix]
[in] src2[] : Pointer to buffer holding bias data
[in] src3[] : Pointer to buffer holding scale values
[in] shiftValues[] : Pointer to buffer of shift values to apply results prior to output
[in] lutValues[] : Pointer to buffer of lookup table values; set to NULL if lookup table not used
[out] dst[] : Pointer to buffer holding output feature map data [ C matrix]
[in] pKerInArgs : Pointer to structure holding input arguments
Returns Status of success or error with error codes refer MMALIB_STATUS .
◆ MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_perfEst()
void MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_perfEst
(
MMALIB_kernelHandle
handle ,
const MMALIB_bufParams3D_t *
src0_addr ,
const MMALIB_bufParams2D_t *
src1_addr ,
const MMALIB_bufParams2D_t *
src2_addr ,
const MMALIB_bufParams1D_t *
src3_addr ,
const MMALIB_bufParams3D_t *
dst_addr ,
const MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_InitArgs *
pKerInitArgs ,
const MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_ExecInArgs *
pExecInArgs ,
MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_ExecOutArgs *
pExecOutArgs ,
uint64_t *
archCycles ,
uint64_t *
estCycles
)
This function estimates the cycles consumed for the kernel execution.
Parameters
[in] handle : Active handle to the kernel
[in] src0_addr : Pointer to the structure containing dimensional information of src0
[in] src1_addr : Pointer to the structure containing dimensional information of src1
[in] src2_addr : Pointer to buffer holding bias data
[in] src3_addr : Pointer to structure containing dimensional information of src3 (scale)
[out] dst_addr : Pointer to the structure containing dimensional information of dst
[in] pKerInitArgs : Pointer to structure holding init parameters
[in] pExecInArgs : Pointer to structure holding execute input arguments
[in] pExecOutArgs : Pointer to structure holding execute output arguments
[out] archCycles : Cycles estimated for the compute, startup and teardown
[out] estCycles : Cycles estimated for the compute, startup, teardown and any associated overhead