MMALIB User Guide
MMALIB_CNN_convolve_col_smallNo_highPrecision

Introduction

NOTE: This API is now a wrapper to MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost with the lutValues input argument set to NULL. It is recommended to call MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost directly.

Kernel for computing CNN-style 2D convolution using column major data ordering on the input and output feature maps. This approach computes more quickly if filter grouping is chosen such that Ni=No=1, or if filter grouping is chosen such that NiFrFc < MMA_SIZE, otherwise use regular convolution method MMALIB_CNN_convolve_row_ixX_ixX_oxX. This kernel is also referred to as depth-wise convolution.

The kernel is designed to process a pair of MMA-wide columns at a time, although processing a single column is supported. These two columns are typically adjacent columns in the same input feature map, although provisions are made in the interface for the two columns to come from separate input feature maps.

The input data for this kernel consists of the filter coefficients and the input feature maps, while the output is the output feature maps. The kernel requires the filter coefficients to be preprocessed and stored in a custom arrangement prior to calling the kernel execute function. This reordering can be done offline (more efficient) or at runtime. MMALIB provides the utilities in MMALIB_CNN_convolve_col_smallNo_highPrecision_reorderWeights to generate this reordering (src0) and the associated MMALIB_bufParams2D_t struct (src0_addr).

The input feature map memory arrangement is flexible and is described by the parameters illustrated in the the figure below. Note that the input feature maps for each groups must be vertically stacked on top of each other as there is no parameter to control the offset between groups.

The output feature map is described by the parameters illustrated in the the figure below.

Sub Modules

 MMALIB_CNN_convolve_col_smallNo_highPrecision_reorderWeights
 MMALIB_CNN_convolve_col_smallNo_highPrecision requires that the weights be preprocessed into a specific arrangement. The functions in this module perform that preprocessing and other associated tasks.
 

Data Structures

struct  MMALIB_CNN_convolve_col_smallNo_highPrecision_InitArgs
 This structure holds all the initialization parameters for CNN column based convolution kernel. More...
 
struct  MMALIB_CNN_convolve_col_smallNo_highPrecision_ExecInArgs
 This structure holds all the execution input parameters for the CNN column based convolution kernel. More...
 
struct  MMALIB_CNN_convolve_col_smallNo_highPrecision_ExecOutArgs
 This structure holds all the runtime output parameters for CNN column based convolution kernel. More...
 

Functions

int32_t MMALIB_CNN_convolve_col_smallNo_highPrecision_getHandleSize (MMALIB_CNN_convolve_col_smallNo_highPrecision_InitArgs *pKerInitArgs)
 This is a query function to kernel to get the size of internal handle. More...
 
MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_highPrecision_init (MMALIB_kernelHandle handle, const MMALIB_bufParams3D_t *src0_addr, const MMALIB_bufParams2D_t *src1_addr, const MMALIB_bufParams2D_t *src2_addr, const MMALIB_bufParams1D_t *src3_addr, const MMALIB_bufParams3D_t *dst_addr, const MMALIB_CNN_convolve_col_smallNo_highPrecision_InitArgs *pKerInitArgs)
 This function call is required to initialize the handle. In this function most of the one time operations are performed and results are stored in the handle. More...
 
MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_highPrecision_init_checkParams (MMALIB_kernelHandle handle, const MMALIB_bufParams3D_t *src0_addr, const MMALIB_bufParams2D_t *src1_addr, const MMALIB_bufParams2D_t *src2_addr, const MMALIB_bufParams1D_t *src3_addr, const MMALIB_bufParams3D_t *dst_addr, const MMALIB_CNN_convolve_col_smallNo_highPrecision_InitArgs *pKerInitArgs)
 This function checks the parameters and should be called before kernel executuon. It can be called once. More...
 
MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_highPrecision_exec (MMALIB_kernelHandle handle, const void *src0, const void *src1, const void *src2, const void *src3, const uint8_t *shiftValues, void *restrict dst, const MMALIB_CNN_convolve_col_smallNo_highPrecision_ExecInArgs *pKerInArgs, MMALIB_CNN_convolve_col_smallNo_highPrecision_ExecOutArgs *pKerOutArgs)
 This function is the main compute function, and performs the convolution primitive (conv + ReLU) for CNN on the column based data arrangement. It is called multiple times. More...
 
MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_highPrecision_exec_checkParams (MMALIB_kernelHandle handle, const void *src0, const void *src1, const void *src2, const void *src3, const uint8_t *shiftValues, const void *dst, const MMALIB_CNN_convolve_col_smallNo_highPrecision_ExecInArgs *pKerInArgs)
 This function checks the parameters and should be called before kernel executuon. It can be called once. More...
 
void MMALIB_CNN_convolve_col_smallNo_highPrecision_perfEst (MMALIB_kernelHandle handle, const MMALIB_bufParams3D_t *src0_addr, const MMALIB_bufParams2D_t *src1_addr, const MMALIB_bufParams2D_t *src2_addr, const MMALIB_bufParams1D_t *src3_addr, const MMALIB_bufParams3D_t *dst_addr, const MMALIB_CNN_convolve_col_smallNo_highPrecision_InitArgs *pKerInitArgs, const MMALIB_CNN_convolve_col_smallNo_highPrecision_ExecInArgs *pExecInArgs, MMALIB_CNN_convolve_col_smallNo_highPrecision_ExecOutArgs *pExecOutArgs, uint64_t *archCycles, uint64_t *estCycles)
 This function estimates the cycles consumed for the kernel execution. More...
 

Typedefs

typedef MMALIB_CNN_CONVOLVE_COL_SMALLNO_HIGHPRECISION_POINTWISEPOST_STATUS_NAME MMALIB_CNN_CONVOLVE_COL_SMALLNO_HIGHPRECISION_STATUS_NAME
 
typedef MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_InitArgs MMALIB_CNN_convolve_col_smallNo_highPrecision_InitArgs
 
typedef MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_ExecInArgs MMALIB_CNN_convolve_col_smallNo_highPrecision_ExecInArgs
 
typedef MMALIB_CNN_convolve_col_smallNo_highPrecision_pointwisePost_ExecOutArgs MMALIB_CNN_convolve_col_smallNo_highPrecision_ExecOutArgs
 

Macros

#define MMALIB_CONVOLVE_COL_SHIFT_SINGLE   0
 Use the same shift value for all groups in calls to MMALIB_CNN_convolve_col_smallNo_highPrecision_exec. More...
 
#define MMALIB_CONVOLVE_COL_SHIFT_PER_GROUP   1
 [Future feature, no yet supported] Use the unique shift value for each group in calls to MMALIB_CNN_convolve_col_smallNo_highPrecision_exec More...
 

Macro Definition Documentation

◆ MMALIB_CONVOLVE_COL_SHIFT_SINGLE

#define MMALIB_CONVOLVE_COL_SHIFT_SINGLE   0

Use the same shift value for all groups in calls to MMALIB_CNN_convolve_col_smallNo_highPrecision_exec.

Definition at line 85 of file MMALIB_CNN_convolve_col_smallNo_highPrecision.h.

◆ MMALIB_CONVOLVE_COL_SHIFT_PER_GROUP

#define MMALIB_CONVOLVE_COL_SHIFT_PER_GROUP   1

[Future feature, no yet supported] Use the unique shift value for each group in calls to MMALIB_CNN_convolve_col_smallNo_highPrecision_exec

Definition at line 86 of file MMALIB_CNN_convolve_col_smallNo_highPrecision.h.

Typedef Documentation

◆ MMALIB_CNN_CONVOLVE_COL_SMALLNO_HIGHPRECISION_STATUS_NAME

◆ MMALIB_CNN_convolve_col_smallNo_highPrecision_InitArgs

◆ MMALIB_CNN_convolve_col_smallNo_highPrecision_ExecInArgs

◆ MMALIB_CNN_convolve_col_smallNo_highPrecision_ExecOutArgs

Function Documentation

◆ MMALIB_CNN_convolve_col_smallNo_highPrecision_getHandleSize()

int32_t MMALIB_CNN_convolve_col_smallNo_highPrecision_getHandleSize ( MMALIB_CNN_convolve_col_smallNo_highPrecision_InitArgs pKerInitArgs)

This is a query function to kernel to get the size of internal handle.

Parameters
[in]pKerInitArgs: Pointer to structure holding init parameters
Returns
Size of the buffer in bytes
Remarks
Application is expected to allocate buffer of the requested size and provide it during init and exec function calls

◆ MMALIB_CNN_convolve_col_smallNo_highPrecision_init()

MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_highPrecision_init ( MMALIB_kernelHandle  handle,
const MMALIB_bufParams3D_t src0_addr,
const MMALIB_bufParams2D_t src1_addr,
const MMALIB_bufParams2D_t src2_addr,
const MMALIB_bufParams1D_t src3_addr,
const MMALIB_bufParams3D_t dst_addr,
const MMALIB_CNN_convolve_col_smallNo_highPrecision_InitArgs pKerInitArgs 
)

This function call is required to initialize the handle. In this function most of the one time operations are performed and results are stored in the handle.

Parameters
[in]handle: Active handle to the kernel
[in]src0_addr: Pointer to structure containing dimensional information of src0, content at this pointer address must be generated by MMALIB_CNN_convolve_col_smallNo_highPrecision_reorderWeights_fillBufParams
[in]src1_addr: Pointer to structure containing dimensional information of src1
[in]src2_addr: Pointer to structure containing dimensional information of src2 (bias)
[in]src3_addr: Pointer to structure containing dimensional information of src3 (scale)
[out]dst_addr: Pointer to structure containing dimensional information of dst
[in]pKerInitArgs: Pointer to structure holding init parameters
Returns
Status of success or error with error codes, refer to MMALIB_STATUS.
Remarks
Application is expected to provide a valid handle

◆ MMALIB_CNN_convolve_col_smallNo_highPrecision_init_checkParams()

MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_highPrecision_init_checkParams ( MMALIB_kernelHandle  handle,
const MMALIB_bufParams3D_t src0_addr,
const MMALIB_bufParams2D_t src1_addr,
const MMALIB_bufParams2D_t src2_addr,
const MMALIB_bufParams1D_t src3_addr,
const MMALIB_bufParams3D_t dst_addr,
const MMALIB_CNN_convolve_col_smallNo_highPrecision_InitArgs pKerInitArgs 
)

This function checks the parameters and should be called before kernel executuon. It can be called once.

Parameters
[in]handle: Active handle to the kernel
[in]src0_addr: Pointer to structure containing dimensional information of src0
[in]src1_addr: Pointer to structure containing dimensional information of src1
[in]src2_addr: Pointer to structure containing dimensional information of src2 (bias)
[in]src3_addr: Pointer to structure containing dimensional information of src3 (scale)
[out]dst_addr: Pointer to structure containing dimensional information of dst
[in]pKerInitArgs: Pointer to structure holding init parameters
Returns
Status of success or error with error codes, refer to MMALIB_STATUS.
Remarks
None

◆ MMALIB_CNN_convolve_col_smallNo_highPrecision_exec()

MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_highPrecision_exec ( MMALIB_kernelHandle  handle,
const void *  src0,
const void *  src1,
const void *  src2,
const void *  src3,
const uint8_t *  shiftValues,
void *restrict  dst,
const MMALIB_CNN_convolve_col_smallNo_highPrecision_ExecInArgs pKerInArgs,
MMALIB_CNN_convolve_col_smallNo_highPrecision_ExecOutArgs pKerOutArgs 
)

This function is the main compute function, and performs the convolution primitive (conv + ReLU) for CNN on the column based data arrangement. It is called multiple times.

Parameters
[in]handle: Active handle to the kernel
[in]src0[]: Pointer to buffer holding convolution weights. Content at this pointer address must be generated by MMALIB_CNN_convolve_col_smallNo_highPrecision_reorderWeights_exec. [ A matrix]
[in]src1[]: Pointer to buffer holding input feature map data. [ B matrix]
[in]src2[]: Pointer to buffer holding bias data
[in]src3[]: Pointer to buffer holding scale values
[in]shiftValues[]: Pointer to buffer of shift values to apply results prior to output.
[out]dst[]: Pointer to buffer holding output feature map data. [ C matrix]
[in]pKerInArgs: Pointer to structure holding input arguments
[out]pKerOutArgs: Pointer to structure holding output arguments
Returns
Status of success or error with error codes, refer MMALIB_STATUS.
Assumptions:
  • I/O buffer pointers are assumed to be not aliased.
  • Fr = Fc = 3, 5 or 7 (defaults to natural-c if not the case, 7 valid for 8-bit data type only)
  • stride-by-2 calls have an even number of rows in the feature map (defaults to natural-c if not the case)
  • Ni == No (defaults to natural-c if not the case)
  • Ni*Fr*Fc < MMA_SIZE (defaults to natural-c if not the case)
Performance Considerations:
  • For best performance, the following parameter settings are recommended:
    • Set widths equal to strides
    • Align all pointers to 8 byte boundaries
    • Set all stride values to a multiple of 8
    • Set all width values to a multiple of 16
Remarks
Application is expected to call the checkParams function prior to this function as it avoids check of paramaters for each invocation for optimization

◆ MMALIB_CNN_convolve_col_smallNo_highPrecision_exec_checkParams()

MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_highPrecision_exec_checkParams ( MMALIB_kernelHandle  handle,
const void *  src0,
const void *  src1,
const void *  src2,
const void *  src3,
const uint8_t *  shiftValues,
const void *  dst,
const MMALIB_CNN_convolve_col_smallNo_highPrecision_ExecInArgs pKerInArgs 
)

This function checks the parameters and should be called before kernel executuon. It can be called once.

Parameters
[in]handle: Active handle to the kernel
[in]src0[]: Pointer to buffer holding convolution weights [ A matrix]
[in]src1[]: Pointer to buffer holding input feature map data [ B matrix]
[in]src2[]: Pointer to buffer holding bias data
[in]src3[]: Pointer to buffer holding scale values
[in]shiftValues[]: Pointer to buffer of shift values to apply results prior to output
[out]dst[]: Pointer to buffer holding output feature map data [ C matrix]
[in]pKerInArgs: Pointer to structure holding input arguments
Returns
Status of success or error with error codes refer MMALIB_STATUS.
Remarks
None

◆ MMALIB_CNN_convolve_col_smallNo_highPrecision_perfEst()

void MMALIB_CNN_convolve_col_smallNo_highPrecision_perfEst ( MMALIB_kernelHandle  handle,
const MMALIB_bufParams3D_t src0_addr,
const MMALIB_bufParams2D_t src1_addr,
const MMALIB_bufParams2D_t src2_addr,
const MMALIB_bufParams1D_t src3_addr,
const MMALIB_bufParams3D_t dst_addr,
const MMALIB_CNN_convolve_col_smallNo_highPrecision_InitArgs pKerInitArgs,
const MMALIB_CNN_convolve_col_smallNo_highPrecision_ExecInArgs pExecInArgs,
MMALIB_CNN_convolve_col_smallNo_highPrecision_ExecOutArgs pExecOutArgs,
uint64_t *  archCycles,
uint64_t *  estCycles 
)

This function estimates the cycles consumed for the kernel execution.

Parameters
[in]handle: Active handle to the kernel
[in]src0_addr: Pointer to the structure containing dimensional information of src0
[in]src1_addr: Pointer to the structure containing dimensional information of src1
[in]src2_addr: Pointer to buffer holding bias data
[in]src3_addr: Pointer to structure containing dimensional information of src3 (scale)
[out]dst_addr: Pointer to the structure containing dimensional information of dst
[in]pKerInitArgs: Pointer to structure holding init parameters
[in]pExecInArgs: Pointer to structure holding execute input arguments
[in]pExecOutArgs: Pointer to structure holding execute output arguments
[out]archCycles: Cycles estimated for the compute, startup and teardown
[out]estCycles: Cycles estimated for the compute, startup, teardown and any associated overhead
Remarks
None