![]() |
MMALIB User Guide
|
Kernel for computing CNN-style 2D convolution using column major data ordering on the input and output feature maps. This approach computes more quickly if filter grouping is chosen such that Ni=No=1, or if filter grouping is chosen such that NiFrFc < MMA_SIZE, otherwise use regular convolution method MMALIB_CNN_convolve_row_ixX_ixX_oxX. This kernel is also referred to as depth-wise convolution.
The kernel is designed to process a pair of MMA-wide columns at a time, although processing a single column is supported. These two columns are typically adjacent columns in the same input feature map, although provisions are made in the interface for the two columns to come from separate input feature maps.
On C7100 devices, this kernel requires that the input feature maps are padded before they are passed to the kernel whereas other devices support padding inside this kernel.
This kernel differs from MMALIB_CNN_convolve_col_smallNo_highPrecision in that it requires all scale, shift and bias values to be the constant across all groups called at the same time. This restriction enables the same amount of input data to be processed in fewer cycles.
The input data for this kernel consists of the filter coefficients and the input feature maps, while the output is the output feature maps. The kernel requires the filter coefficients to be preprocessed and stored in a custom arrangement prior to calling the kernel execute function. This reordering can be done offline (more efficient) or at runtime. MMALIB provides the utilities in MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_reorderWeights to generate this reordering (src0) and the associated MMALIB_bufParams2D_t struct (src0_addr).
The input feature map memory arrangement is flexible and is described by the parameters illustrated in the the figure below. Note that the input feature maps for each groups must be vertically stacked on top of each other as there is no parameter to control the offset between groups.
The output feature map is described by the parameters illustrated in the the figure below.
Sub Modules | |
MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_reorderWeights | |
MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX requires that the weights be preprocessed into a specific arrangement. The functions in this module perform that preprocessing and other associated tasks. | |
Data Structures | |
struct | MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs |
This structure holds all the initialization parameters for CNN column based convolution kernel. More... | |
struct | MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_ExecInArgs |
This structure holds all the execution input parameters for the CNN column based convolution kernel. More... | |
struct | MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_ExecOutArgs |
This structure holds all the runtime output parameters for CNN column based convolution kernel. More... | |
Functions | |
int32_t | MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_getHandleSize (MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs *pKerInitArgs) |
This is a query function to kernel to get the size of internal handle. More... | |
MMALIB_STATUS | MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_init (MMALIB_kernelHandle handle, const MMALIB_bufParams2D_t *src0_addr, const MMALIB_bufParams2D_t *src1_addr, const MMALIB_bufParams3D_t *dst_addr, const MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs *pKerInitArgs) |
This function call is required to initialize the handle. In this function most of the one time operations are performed and results are stored in the handle. More... | |
MMALIB_STATUS | MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_init_checkParams (MMALIB_kernelHandle handle, const MMALIB_bufParams2D_t *src0_addr, const MMALIB_bufParams2D_t *src1_addr, const MMALIB_bufParams3D_t *dst_addr, const MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs *pKerInitArgs) |
This function checks the parameters and should be called before kernel executuon. It can be called once. More... | |
MMALIB_STATUS | MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_exec (MMALIB_kernelHandle handle, const void *src0, const void *src1, const uint8_t *shiftValues, const int32_t *biasBValues, void *restrict dst, const MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_ExecInArgs *pKerInArgs, MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_ExecOutArgs *pKerOutArgs) |
This function is the main compute function, and performs the convolution primitive (conv + ReLU) for CNN on the column based data arrangement. It is called multiple times. More... | |
MMALIB_STATUS | MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_exec_checkParams (MMALIB_kernelHandle handle, const void *src0, const void *src1, const uint8_t *shiftValues, const int32_t *biasBValues, const void *dst, const MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_ExecInArgs *pKerInArgs) |
This function checks the parameters and should be called before kernel executuon. It can be called once. More... | |
void | MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_perfEst (MMALIB_kernelHandle handle, const MMALIB_bufParams2D_t *src0_addr, const MMALIB_bufParams2D_t *src1_addr, const MMALIB_bufParams3D_t *dst_addr, const MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs *pKerInitArgs, const MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_ExecInArgs *pExecInArgs, MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_ExecOutArgs *pExecOutArgs, uint64_t *archCycles, uint64_t *estCycles) |
This function estimates the cycles consumed for the kernel execution. More... | |
Enumerations | |
enum | MMALIB_CNN_CONVOLVE_COL_SMALLNO_IXX_IXX_OXX_STATUS_NAME { MMALIB_CNN_CONVOLVE_COL_SMALLNO_IXX_IXX_OXX_ERR_SMALL_K = MMALIB_ERROR_MAX , MMALIB_CNN_CONVOLVE_COL_SMALLNO_IXX_IXX_OXX_ERR_MAX } |
Enumeration for different error codes for MMALIB_CNN_CONVOLVE_COL_SMALLNO kernel. More... | |
Macros | |
#define | MMALIB_CONVOLVE_COL_SHIFT_SINGLE 0 |
Use the same shift value for all groups in calls to MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_exec. More... | |
#define | MMALIB_CONVOLVE_COL_SHIFT_PER_GROUP 1 |
[Future feature, no yet supported] Use the unique shift value for each group in calls to MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_exec More... | |
#define MMALIB_CONVOLVE_COL_SHIFT_SINGLE 0 |
Use the same shift value for all groups in calls to MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_exec.
Definition at line 103 of file MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX.h.
#define MMALIB_CONVOLVE_COL_SHIFT_PER_GROUP 1 |
[Future feature, no yet supported] Use the unique shift value for each group in calls to MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_exec
Definition at line 104 of file MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX.h.
Enumeration for different error codes for MMALIB_CNN_CONVOLVE_COL_SMALLNO kernel.
Enumerator | |
---|---|
MMALIB_CNN_CONVOLVE_COL_SMALLNO_IXX_IXX_OXX_ERR_SMALL_K | |
MMALIB_CNN_CONVOLVE_COL_SMALLNO_IXX_IXX_OXX_ERR_MAX | Error case because k < Ni*Fr*Fc |
Definition at line 96 of file MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX.h.
int32_t MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_getHandleSize | ( | MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs * | pKerInitArgs | ) |
This is a query function to kernel to get the size of internal handle.
[in] | pKerInitArgs | : Pointer to structure holding init parameters |
MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_init | ( | MMALIB_kernelHandle | handle, |
const MMALIB_bufParams2D_t * | src0_addr, | ||
const MMALIB_bufParams2D_t * | src1_addr, | ||
const MMALIB_bufParams3D_t * | dst_addr, | ||
const MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs * | pKerInitArgs | ||
) |
This function call is required to initialize the handle. In this function most of the one time operations are performed and results are stored in the handle.
[in] | handle | : Active handle to the kernel |
[in] | src0_addr | : Pointer to structure containing dimensional information of src0, content at this pointer address must be generated by MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_reorderWeights_fillBufParams |
[in] | src1_addr | : Pointer to structure containing dimensional information of src1 |
[out] | dst_addr | : Pointer to structure containing dimensional information of dst |
[in] | pKerInitArgs | : Pointer to structure holding init parameters |
MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_init_checkParams | ( | MMALIB_kernelHandle | handle, |
const MMALIB_bufParams2D_t * | src0_addr, | ||
const MMALIB_bufParams2D_t * | src1_addr, | ||
const MMALIB_bufParams3D_t * | dst_addr, | ||
const MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs * | pKerInitArgs | ||
) |
This function checks the parameters and should be called before kernel executuon. It can be called once.
[in] | handle | : Active handle to the kernel |
[in] | src0_addr | : Pointer to structure containing dimensional information of src0 |
[in] | src1_addr | : Pointer to structure containing dimensional information of src1 |
[out] | dst_addr | : Pointer to structure containing dimensional information of dst |
[in] | pKerInitArgs | : Pointer to structure holding init parameters |
MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_exec | ( | MMALIB_kernelHandle | handle, |
const void * | src0, | ||
const void * | src1, | ||
const uint8_t * | shiftValues, | ||
const int32_t * | biasBValues, | ||
void *restrict | dst, | ||
const MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_ExecInArgs * | pKerInArgs, | ||
MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_ExecOutArgs * | pKerOutArgs | ||
) |
This function is the main compute function, and performs the convolution primitive (conv + ReLU) for CNN on the column based data arrangement. It is called multiple times.
[in] | handle | : Active handle to the kernel |
[in] | src0[] | : Pointer to buffer holding convolution weights. Content at this pointer address must be generated by MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_reorderWeights_exec. [ A matrix] |
[in] | src1[] | : Pointer to buffer holding input feature map data. [ B matrix] |
[in] | shiftValues[] | : Pointer to buffer of shift values to apply results prior to output. Set to NULL if using MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs::shiftMethod = MMALIB_CONVOLVE_COL_SHIFT_SINGLE, which is currently the only supported option. [Future development] Set to numGroupsPerKernel if using MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs::shiftMethod = MMALIB_CONVOLVE_COL_SHIFT_PER_GROUP. |
[in] | biasBValues[] | : Pointer to buffer of bias values to load into the MMA B matrix. Set to NULL if using MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs::shiftMethod = MMALIB_CONVOLVE_COL_SHIFT_SINGLE, which is currently the only supported option. [Future development] Set to numGroupsPerKernel if using MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs::shiftMethod = MMALIB_CONVOLVE_COL_SHIFT_PER_GROUP. MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs::numBiasVals must == 1 for MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs::shiftMethod = MMALIB_CONVOLVE_COL_SHIFT_PER_GROUP |
[out] | dst[] | : Pointer to buffer holding output feature map data. [ C matrix] |
[in] | pKerInArgs | : Pointer to structure holding input arguments |
[out] | pKerOutArgs | : Pointer to structure holding output arguments |
MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_exec_checkParams | ( | MMALIB_kernelHandle | handle, |
const void * | src0, | ||
const void * | src1, | ||
const uint8_t * | shiftValues, | ||
const int32_t * | biasBValues, | ||
const void * | dst, | ||
const MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_ExecInArgs * | pKerInArgs | ||
) |
This function checks the parameters and should be called before kernel executuon. It can be called once.
[in] | handle | : Active handle to the kernel |
[in] | src0[] | : Pointer to buffer holding convolution weights [ A matrix] |
[in] | src1[] | : Pointer to buffer holding input feature map data [ B matrix] |
[in] | shiftValues[] | : Pointer to buffer of shift values to apply results prior to output |
[in] | biasBValues[] | : Pointer to buffer of bias values to load into the MMA B-matrix |
[out] | dst[] | : Pointer to buffer holding output feature map data [ C matrix] |
[in] | pKerInArgs | : Pointer to structure holding input arguments |
void MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_perfEst | ( | MMALIB_kernelHandle | handle, |
const MMALIB_bufParams2D_t * | src0_addr, | ||
const MMALIB_bufParams2D_t * | src1_addr, | ||
const MMALIB_bufParams3D_t * | dst_addr, | ||
const MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs * | pKerInitArgs, | ||
const MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_ExecInArgs * | pExecInArgs, | ||
MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_ExecOutArgs * | pExecOutArgs, | ||
uint64_t * | archCycles, | ||
uint64_t * | estCycles | ||
) |
This function estimates the cycles consumed for the kernel execution.
[in] | handle | : Active handle to the kernel |
[in] | src0_addr | : Pointer to the structure containing dimensional information of src0 |
[in] | src1_addr | : Pointer to the structure containing dimensional information of src1 |
[out] | dst_addr | : Pointer to the structure containing dimensional information of dst |
[in] | pKerInitArgs | : Pointer to structure holding init parameters |
[in] | pExecInArgs | : Pointer to structure holding execute input arguments |
[in] | pExecOutArgs | : Pointer to structure holding execute output arguments |
[out] | archCycles | : Cycles estimated for the compute, startup and teardown |
[out] | estCycles | : Cycles estimated for the compute, startup, teardown and any associated overhead |