MMALIB User Guide
MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost

Introduction

Kernel for computing CNN-style 2D convolution using column major data ordering on the input and output feature maps. This approach computes more quickly if filter grouping is chosen such that Ni=No=1, or if filter grouping is chosen such that NiFrFc < MMA_SIZE, otherwise use regular convolution method MMALIB_CNN_convolve_row_ixX_ixX_oxX. This kernel is also referred to as depth-wise convolution.

The kernel is designed to process a pair of MMA-wide columns at a time, although processing a single column is supported. These two columns are typically adjacent columns in the same input feature map, although provisions are made in the interface for the two columns to come from separate input feature maps.

The input data for this kernel consists of the filter coefficients and the input feature maps, while the output is the output feature maps. The kernel requires the filter coefficients to be preprocessed and stored in a custom arrangement prior to calling the kernel execute function. This reordering can be done offline (more efficient) or at runtime. MMALIB provides the utilities in MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_reorderWeights to generate this reordering (src0) and the associated MMALIB_bufParams2D_t struct (src0_addr).

The input feature map memory arrangement is flexible and is described by the parameters illustrated in the the figure below. Note that the input feature maps for each groups must be vertically stacked on top of each other as there is no parameter to control the offset between groups.

The output feature map is described by the parameters illustrated in the the figure below.

Sub Modules

 MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_reorderWeights
 MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost requires that the weights be preprocessed into a specific arrangement. The functions in this module perform that preprocessing and other associated tasks.
 

Data Structures

struct  MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_InitArgs
 This structure holds all the initialization parameters for CNN column based convolution kernel. More...
 
struct  MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_ExecInArgs
 This structure holds all the execution input parameters for the CNN column based convolution kernel. More...
 
struct  MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_ExecOutArgs
 This structure holds all the runtime output parameters for CNN column based convolution kernel. More...
 

Functions

int32_t MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_getHandleSize (MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_InitArgs *pKerInitArgs)
 This is a query function to kernel to get the size of internal handle. More...
 
MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_init (MMALIB_kernelHandle handle, const MMALIB_bufParams3D_t *src0_addr, const MMALIB_bufParams2D_t *src1_addr, const MMALIB_bufParams2D_t *src2_addr, const MMALIB_bufParams1D_t *src3_addr, const MMALIB_bufParams3D_t *dst_addr, const MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_InitArgs *pKerInitArgs)
 This function call is required to initialize the handle. In this function most of the one time operations are performed and results are stored in the handle. More...
 
MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_init_checkParams (MMALIB_kernelHandle handle, const MMALIB_bufParams3D_t *src0_addr, const MMALIB_bufParams2D_t *src1_addr, const MMALIB_bufParams2D_t *src2_addr, const MMALIB_bufParams1D_t *src3_addr, const MMALIB_bufParams3D_t *dst_addr, const MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_InitArgs *pKerInitArgs)
 This function checks the parameters and should be called before kernel execution. It can be called once. More...
 
MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_exec (MMALIB_kernelHandle handle, const void *src0, const void *src1, const void *src2, const void *src3, const uint8_t *shiftValues, const void *lutValues, void *restrict dst, const MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_ExecInArgs *pKerInArgs, MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_ExecOutArgs *pKerOutArgs)
 This function is the main compute function, and performs the convolution primitive (conv + ReLU) for CNN on the column based data arrangement. It is called multiple times. More...
 
MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_exec_checkParams (MMALIB_kernelHandle handle, const void *src0, const void *src1, const void *src2, const void *src3, const uint8_t *shiftValues, const void *lutValues, const void *dst, const MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_ExecInArgs *pKerInArgs)
 This function checks the parameters and should be called before kernel execution. It can be called once. More...
 
void MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_perfEst (MMALIB_kernelHandle handle, const MMALIB_bufParams3D_t *src0_addr, const MMALIB_bufParams2D_t *src1_addr, const MMALIB_bufParams2D_t *src2_addr, const MMALIB_bufParams1D_t *src3_addr, const MMALIB_bufParams3D_t *dst_addr, const MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_InitArgs *pKerInitArgs, const MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_ExecInArgs *pExecInArgs, MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_ExecOutArgs *pExecOutArgs, uint64_t *archCycles, uint64_t *estCycles)
 This function estimates the cycles consumed for the kernel execution. More...
 

Enumerations

enum  MMALIB_CNN_CONVOLVE_COL_SMALLNO_HIGHPRECISION_POINTWISEPOST_STATUS_NAME { MMALIB_CNN_CONVOLVE_COL_SMALLNO_HIGHPRECISION_POINTWISEPOST_ERR_SMALL_K = MMALIB_ERROR_MAX , MMALIB_CNN_CONVOLVE_COL_SMALLNO_HIGHPRECISION_POINTWISEPOST_ERR_MAX }
 Enumeration for different error codes for MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost kernel. More...
 

Macros

#define MMALIB_CONVOLVE_COL_SHIFT_SINGLE   0
 Use the same shift value for all groups in calls to MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_exec. More...
 
#define MMALIB_CONVOLVE_COL_SHIFT_PER_GROUP   1
 [Future feature, no yet supported] Use the unique shift value for each group in calls to MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_exec More...
 

Macro Definition Documentation

◆ MMALIB_CONVOLVE_COL_SHIFT_SINGLE

#define MMALIB_CONVOLVE_COL_SHIFT_SINGLE   0

Use the same shift value for all groups in calls to MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_exec.

Definition at line 92 of file MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost.h.

◆ MMALIB_CONVOLVE_COL_SHIFT_PER_GROUP

#define MMALIB_CONVOLVE_COL_SHIFT_PER_GROUP   1

[Future feature, no yet supported] Use the unique shift value for each group in calls to MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_exec

Definition at line 93 of file MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost.h.

Enumeration Type Documentation

◆ MMALIB_CNN_CONVOLVE_COL_SMALLNO_HIGHPRECISION_POINTWISEPOST_STATUS_NAME

Enumeration for different error codes for MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost kernel.

Enumerator
MMALIB_CNN_CONVOLVE_COL_SMALLNO_HIGHPRECISION_POINTWISEPOST_ERR_SMALL_K 
MMALIB_CNN_CONVOLVE_COL_SMALLNO_HIGHPRECISION_POINTWISEPOST_ERR_MAX 

Error case because k < Ni*Fr*Fc

Definition at line 85 of file MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost.h.

Function Documentation

◆ MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_getHandleSize()

int32_t MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_getHandleSize ( MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_InitArgs pKerInitArgs)

This is a query function to kernel to get the size of internal handle.

Parameters
[in]pKerInitArgs: Pointer to structure holding init parameters
Returns
Size of the buffer in bytes
Remarks
Application is expected to allocate buffer of the requested size and provide it during init and exec function calls

◆ MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_init()

MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_init ( MMALIB_kernelHandle  handle,
const MMALIB_bufParams3D_t src0_addr,
const MMALIB_bufParams2D_t src1_addr,
const MMALIB_bufParams2D_t src2_addr,
const MMALIB_bufParams1D_t src3_addr,
const MMALIB_bufParams3D_t dst_addr,
const MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_InitArgs pKerInitArgs 
)

This function call is required to initialize the handle. In this function most of the one time operations are performed and results are stored in the handle.

Parameters
[in]handle: Active handle to the kernel
[in]src0_addr: Pointer to structure containing dimensional information of src0, content at this pointer address must be generated by MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_reorderWeights_fillBufParams
[in]src1_addr: Pointer to structure containing dimensional information of src1
[in]src2_addr: Pointer to structure containing dimensional information of src2 (bias)
[in]src3_addr: Pointer to structure containing dimensional information of src3 (scale)
[out]dst_addr: Pointer to structure containing dimensional information of dst
[in]pKerInitArgs: Pointer to structure holding init parameters
Returns
Status of success or error with error codes, refer to MMALIB_STATUS.
Remarks
Application is expected to provide a valid handle

◆ MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_init_checkParams()

MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_init_checkParams ( MMALIB_kernelHandle  handle,
const MMALIB_bufParams3D_t src0_addr,
const MMALIB_bufParams2D_t src1_addr,
const MMALIB_bufParams2D_t src2_addr,
const MMALIB_bufParams1D_t src3_addr,
const MMALIB_bufParams3D_t dst_addr,
const MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_InitArgs pKerInitArgs 
)

This function checks the parameters and should be called before kernel execution. It can be called once.

Parameters
[in]handle: Active handle to the kernel
[in]src0_addr: Pointer to structure containing dimensional information of src0
[in]src1_addr: Pointer to structure containing dimensional information of src1
[in]src2_addr: Pointer to structure containing dimensional information of src2 (bias)
[in]src3_addr: Pointer to structure containing dimensional information of src3 (scale)
[out]dst_addr: Pointer to structure containing dimensional information of dst
[in]pKerInitArgs: Pointer to structure holding init parameters
Returns
Status of success or error with error codes, refer to MMALIB_STATUS.
Remarks
None

◆ MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_exec()

MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_exec ( MMALIB_kernelHandle  handle,
const void *  src0,
const void *  src1,
const void *  src2,
const void *  src3,
const uint8_t *  shiftValues,
const void *  lutValues,
void *restrict  dst,
const MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_ExecInArgs pKerInArgs,
MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_ExecOutArgs pKerOutArgs 
)

This function is the main compute function, and performs the convolution primitive (conv + ReLU) for CNN on the column based data arrangement. It is called multiple times.

Parameters
[in]handle: Active handle to the kernel
[in]src0[]: Pointer to buffer holding convolution weights. Content at this pointer address must be generated by MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_reorderWeights_exec. [ A matrix]
[in]src1[]: Pointer to buffer holding input feature map data. [ B matrix]
[in]src2[]: Pointer to buffer holding bias data
[in]src3[]: Pointer to buffer holding scale values
[in]shiftValues[]: Pointer to buffer of shift values to apply results prior to output. Set to NULL if using MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_InitArgs::shiftMethod = MMALIB_CONVOLVE_COL_SHIFT_SINGLE, which is currently the only supported option. [Future development] Set to numGroupsPerKernel if using MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_InitArgs::shiftMethod = MMALIB_CONVOLVE_COL_SHIFT_PER_GROUP.
[in]lutValues[]: Pointer to buffer of lookup table values; set to NULL if lookup table not used
[out]dst[]: Pointer to buffer holding output feature map data. [ C matrix]
[in]pKerInArgs: Pointer to structure holding input arguments
[out]pKerOutArgs: Pointer to structure holding output arguments
Returns
Status of success or error with error codes, refer MMALIB_STATUS.
Assumptions:
  • I/O buffer pointers are assumed to be not aliased.
  • Fr = Fc = 3, 5 or 7 (defaults to natural-c if not the case, 7 valid for 8-bit data type only)
  • stride-by-2 calls have an even number of rows in the feature map (defaults to natural-c if not the case)
  • Ni == No (defaults to natural-c if not the case)
  • Ni*Fr*Fc < MMA_SIZE (defaults to natural-c if not the case)
Performance Considerations:
  • For best performance, the following parameter settings are recommended:
    • Set widths equal to strides
    • Align all pointers to 8 byte boundaries
    • Set all stride values to a multiple of 8
    • Set all width values to a multiple of 16
Remarks
Application is expected to call the checkParams function prior to this function as it avoids check of paramaters for each invocation for optimization

◆ MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_exec_checkParams()

MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_exec_checkParams ( MMALIB_kernelHandle  handle,
const void *  src0,
const void *  src1,
const void *  src2,
const void *  src3,
const uint8_t *  shiftValues,
const void *  lutValues,
const void *  dst,
const MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_ExecInArgs pKerInArgs 
)

This function checks the parameters and should be called before kernel execution. It can be called once.

Parameters
[in]handle: Active handle to the kernel
[in]src0[]: Pointer to buffer holding convolution weights [ A matrix]
[in]src1[]: Pointer to buffer holding input feature map data [ B matrix]
[in]src2[]: Pointer to buffer holding bias data
[in]src3[]: Pointer to buffer holding scale values
[in]shiftValues[]: Pointer to buffer of shift values to apply results prior to output
[in]lutValues[]: Pointer to buffer of lookup table values; set to NULL if lookup table not used
[out]dst[]: Pointer to buffer holding output feature map data [ C matrix]
[in]pKerInArgs: Pointer to structure holding input arguments
Returns
Status of success or error with error codes refer MMALIB_STATUS.
Remarks
None

◆ MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_perfEst()

void MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_perfEst ( MMALIB_kernelHandle  handle,
const MMALIB_bufParams3D_t src0_addr,
const MMALIB_bufParams2D_t src1_addr,
const MMALIB_bufParams2D_t src2_addr,
const MMALIB_bufParams1D_t src3_addr,
const MMALIB_bufParams3D_t dst_addr,
const MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_InitArgs pKerInitArgs,
const MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_ExecInArgs pExecInArgs,
MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_ExecOutArgs pExecOutArgs,
uint64_t *  archCycles,
uint64_t *  estCycles 
)

This function estimates the cycles consumed for the kernel execution.

Parameters
[in]handle: Active handle to the kernel
[in]src0_addr: Pointer to the structure containing dimensional information of src0
[in]src1_addr: Pointer to the structure containing dimensional information of src1
[in]src2_addr: Pointer to buffer holding bias data
[in]src3_addr: Pointer to structure containing dimensional information of src3 (scale)
[out]dst_addr: Pointer to the structure containing dimensional information of dst
[in]pKerInitArgs: Pointer to structure holding init parameters
[in]pExecInArgs: Pointer to structure holding execute input arguments
[in]pExecOutArgs: Pointer to structure holding execute output arguments
[out]archCycles: Cycles estimated for the compute, startup and teardown
[out]estCycles: Cycles estimated for the compute, startup, teardown and any associated overhead
Remarks
None