MMALIB User Guide
MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX

Introduction

Kernel for computing CNN-style 2D convolution using column major data ordering on the input and output feature maps. This approach computes more quickly if filter grouping is chosen such that Ni=No=1, or if filter grouping is chosen such that NiFrFc < MMA_SIZE, otherwise use regular convolution method MMALIB_CNN_convolve_row_ixX_ixX_oxX. This kernel is also referred to as depth-wise convolution.

The kernel is designed to process a pair of MMA-wide columns at a time, although processing a single column is supported. These two columns are typically adjacent columns in the same input feature map, although provisions are made in the interface for the two columns to come from separate input feature maps.

On C7100 devices, this kernel requires that the input feature maps are padded before they are passed to the kernel whereas other devices support padding inside this kernel.

This kernel differs from MMALIB_CNN_convolve_col_smallNo_highPrecision in that it requires all scale, shift and bias values to be the constant across all groups called at the same time. This restriction enables the same amount of input data to be processed in fewer cycles.

The input data for this kernel consists of the filter coefficients and the input feature maps, while the output is the output feature maps. The kernel requires the filter coefficients to be preprocessed and stored in a custom arrangement prior to calling the kernel execute function. This reordering can be done offline (more efficient) or at runtime. MMALIB provides the utilities in MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_reorderWeights to generate this reordering (src0) and the associated MMALIB_bufParams2D_t struct (src0_addr).

The input feature map memory arrangement is flexible and is described by the parameters illustrated in the the figure below. Note that the input feature maps for each groups must be vertically stacked on top of each other as there is no parameter to control the offset between groups.

The output feature map is described by the parameters illustrated in the the figure below.

Sub Modules

 MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_reorderWeights
 MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX requires that the weights be preprocessed into a specific arrangement. The functions in this module perform that preprocessing and other associated tasks.
 

Data Structures

struct  MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs
 This structure holds all the initialization parameters for CNN column based convolution kernel. More...
 
struct  MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_ExecInArgs
 This structure holds all the execution input parameters for the CNN column based convolution kernel. More...
 
struct  MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_ExecOutArgs
 This structure holds all the runtime output parameters for CNN column based convolution kernel. More...
 

Functions

int32_t MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_getHandleSize (MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs *pKerInitArgs)
 This is a query function to kernel to get the size of internal handle. More...
 
MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_init (MMALIB_kernelHandle handle, const MMALIB_bufParams2D_t *src0_addr, const MMALIB_bufParams2D_t *src1_addr, const MMALIB_bufParams3D_t *dst_addr, const MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs *pKerInitArgs)
 This function call is required to initialize the handle. In this function most of the one time operations are performed and results are stored in the handle. More...
 
MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_init_checkParams (MMALIB_kernelHandle handle, const MMALIB_bufParams2D_t *src0_addr, const MMALIB_bufParams2D_t *src1_addr, const MMALIB_bufParams3D_t *dst_addr, const MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs *pKerInitArgs)
 This function checks the parameters and should be called before kernel executuon. It can be called once. More...
 
MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_exec (MMALIB_kernelHandle handle, const void *src0, const void *src1, const uint8_t *shiftValues, const int32_t *biasBValues, void *restrict dst, const MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_ExecInArgs *pKerInArgs, MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_ExecOutArgs *pKerOutArgs)
 This function is the main compute function, and performs the convolution primitive (conv + ReLU) for CNN on the column based data arrangement. It is called multiple times. More...
 
MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_exec_checkParams (MMALIB_kernelHandle handle, const void *src0, const void *src1, const uint8_t *shiftValues, const int32_t *biasBValues, const void *dst, const MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_ExecInArgs *pKerInArgs)
 This function checks the parameters and should be called before kernel executuon. It can be called once. More...
 
void MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_perfEst (MMALIB_kernelHandle handle, const MMALIB_bufParams2D_t *src0_addr, const MMALIB_bufParams2D_t *src1_addr, const MMALIB_bufParams3D_t *dst_addr, const MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs *pKerInitArgs, const MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_ExecInArgs *pExecInArgs, MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_ExecOutArgs *pExecOutArgs, uint64_t *archCycles, uint64_t *estCycles)
 This function estimates the cycles consumed for the kernel execution. More...
 

Enumerations

enum  MMALIB_CNN_CONVOLVE_COL_SMALLNO_IXX_IXX_OXX_STATUS_NAME { MMALIB_CNN_CONVOLVE_COL_SMALLNO_IXX_IXX_OXX_ERR_SMALL_K = MMALIB_ERROR_MAX , MMALIB_CNN_CONVOLVE_COL_SMALLNO_IXX_IXX_OXX_ERR_MAX }
 Enumeration for different error codes for MMALIB_CNN_CONVOLVE_COL_SMALLNO kernel. More...
 

Macros

#define MMALIB_CONVOLVE_COL_SHIFT_SINGLE   0
 Use the same shift value for all groups in calls to MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_exec. More...
 
#define MMALIB_CONVOLVE_COL_SHIFT_PER_GROUP   1
 [Future feature, no yet supported] Use the unique shift value for each group in calls to MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_exec More...
 

Macro Definition Documentation

◆ MMALIB_CONVOLVE_COL_SHIFT_SINGLE

#define MMALIB_CONVOLVE_COL_SHIFT_SINGLE   0

Use the same shift value for all groups in calls to MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_exec.

Definition at line 103 of file MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX.h.

◆ MMALIB_CONVOLVE_COL_SHIFT_PER_GROUP

#define MMALIB_CONVOLVE_COL_SHIFT_PER_GROUP   1

[Future feature, no yet supported] Use the unique shift value for each group in calls to MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_exec

Definition at line 104 of file MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX.h.

Enumeration Type Documentation

◆ MMALIB_CNN_CONVOLVE_COL_SMALLNO_IXX_IXX_OXX_STATUS_NAME

Enumeration for different error codes for MMALIB_CNN_CONVOLVE_COL_SMALLNO kernel.

Enumerator
MMALIB_CNN_CONVOLVE_COL_SMALLNO_IXX_IXX_OXX_ERR_SMALL_K 
MMALIB_CNN_CONVOLVE_COL_SMALLNO_IXX_IXX_OXX_ERR_MAX 

Error case because k < Ni*Fr*Fc

Definition at line 96 of file MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX.h.

Function Documentation

◆ MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_getHandleSize()

int32_t MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_getHandleSize ( MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs pKerInitArgs)

This is a query function to kernel to get the size of internal handle.

Parameters
[in]pKerInitArgs: Pointer to structure holding init parameters
Returns
Size of the buffer in bytes
Remarks
Application is expected to allocate buffer of the requested size and provide it during init and exec function calls

◆ MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_init()

MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_init ( MMALIB_kernelHandle  handle,
const MMALIB_bufParams2D_t src0_addr,
const MMALIB_bufParams2D_t src1_addr,
const MMALIB_bufParams3D_t dst_addr,
const MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs pKerInitArgs 
)

This function call is required to initialize the handle. In this function most of the one time operations are performed and results are stored in the handle.

Parameters
[in]handle: Active handle to the kernel
[in]src0_addr: Pointer to structure containing dimensional information of src0, content at this pointer address must be generated by MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_reorderWeights_fillBufParams
[in]src1_addr: Pointer to structure containing dimensional information of src1
[out]dst_addr: Pointer to structure containing dimensional information of dst
[in]pKerInitArgs: Pointer to structure holding init parameters
Returns
Status of success or error with error codes, refer to MMALIB_STATUS.
Remarks
Application is expected to provide a valid handle

◆ MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_init_checkParams()

MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_init_checkParams ( MMALIB_kernelHandle  handle,
const MMALIB_bufParams2D_t src0_addr,
const MMALIB_bufParams2D_t src1_addr,
const MMALIB_bufParams3D_t dst_addr,
const MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs pKerInitArgs 
)

This function checks the parameters and should be called before kernel executuon. It can be called once.

Parameters
[in]handle: Active handle to the kernel
[in]src0_addr: Pointer to structure containing dimensional information of src0
[in]src1_addr: Pointer to structure containing dimensional information of src1
[out]dst_addr: Pointer to structure containing dimensional information of dst
[in]pKerInitArgs: Pointer to structure holding init parameters
Returns
Status of success or error with error codes, refer to MMALIB_STATUS.
Remarks
None

◆ MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_exec()

MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_exec ( MMALIB_kernelHandle  handle,
const void *  src0,
const void *  src1,
const uint8_t *  shiftValues,
const int32_t *  biasBValues,
void *restrict  dst,
const MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_ExecInArgs pKerInArgs,
MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_ExecOutArgs pKerOutArgs 
)

This function is the main compute function, and performs the convolution primitive (conv + ReLU) for CNN on the column based data arrangement. It is called multiple times.

Parameters
[in]handle: Active handle to the kernel
[in]src0[]: Pointer to buffer holding convolution weights. Content at this pointer address must be generated by MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_reorderWeights_exec. [ A matrix]
[in]src1[]: Pointer to buffer holding input feature map data. [ B matrix]
[in]shiftValues[]: Pointer to buffer of shift values to apply results prior to output. Set to NULL if using MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs::shiftMethod = MMALIB_CONVOLVE_COL_SHIFT_SINGLE, which is currently the only supported option. [Future development] Set to numGroupsPerKernel if using MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs::shiftMethod = MMALIB_CONVOLVE_COL_SHIFT_PER_GROUP.
[in]biasBValues[]: Pointer to buffer of bias values to load into the MMA B matrix. Set to NULL if using MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs::shiftMethod = MMALIB_CONVOLVE_COL_SHIFT_SINGLE, which is currently the only supported option. [Future development] Set to numGroupsPerKernel if using MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs::shiftMethod = MMALIB_CONVOLVE_COL_SHIFT_PER_GROUP. MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs::numBiasVals must == 1 for MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs::shiftMethod = MMALIB_CONVOLVE_COL_SHIFT_PER_GROUP
[out]dst[]: Pointer to buffer holding output feature map data. [ C matrix]
[in]pKerInArgs: Pointer to structure holding input arguments
[out]pKerOutArgs: Pointer to structure holding output arguments
Returns
Status of success or error with error codes, refer MMALIB_STATUS.
Assumptions:
  • I/O buffer pointers are assumed to be not aliased.
  • Fr = Fc = 3, 5 or 7 (defaults to natural-c if not the case, 7 valid for 8-bit data type only)
  • stride-by-2 calls have an even number of rows in the feature map (defaults to natural-c if not the case)
  • Ni == No (defaults to natural-c if not the case)
  • Ni*Fr*Fc < MMA_SIZE (defaults to natural-c if not the case)
Performance Considerations:
  • For best performance, the following parameter settings are recommended:
    • Set widths equal to strides
    • Align all pointers to 8 byte boundaries
    • Set all stride values to a multiple of 8
    • Set all width values to a multiple of 16
Remarks
Application is expected to call the checkParams function prior to this function as it avoids check of paramaters for each invocation for optimization

◆ MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_exec_checkParams()

MMALIB_STATUS MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_exec_checkParams ( MMALIB_kernelHandle  handle,
const void *  src0,
const void *  src1,
const uint8_t *  shiftValues,
const int32_t *  biasBValues,
const void *  dst,
const MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_ExecInArgs pKerInArgs 
)

This function checks the parameters and should be called before kernel executuon. It can be called once.

Parameters
[in]handle: Active handle to the kernel
[in]src0[]: Pointer to buffer holding convolution weights [ A matrix]
[in]src1[]: Pointer to buffer holding input feature map data [ B matrix]
[in]shiftValues[]: Pointer to buffer of shift values to apply results prior to output
[in]biasBValues[]: Pointer to buffer of bias values to load into the MMA B-matrix
[out]dst[]: Pointer to buffer holding output feature map data [ C matrix]
[in]pKerInArgs: Pointer to structure holding input arguments
Returns
Status of success or error with error codes refer MMALIB_STATUS.
Remarks
None

◆ MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_perfEst()

void MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_perfEst ( MMALIB_kernelHandle  handle,
const MMALIB_bufParams2D_t src0_addr,
const MMALIB_bufParams2D_t src1_addr,
const MMALIB_bufParams3D_t dst_addr,
const MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_InitArgs pKerInitArgs,
const MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_ExecInArgs pExecInArgs,
MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX_ExecOutArgs pExecOutArgs,
uint64_t *  archCycles,
uint64_t *  estCycles 
)

This function estimates the cycles consumed for the kernel execution.

Parameters
[in]handle: Active handle to the kernel
[in]src0_addr: Pointer to the structure containing dimensional information of src0
[in]src1_addr: Pointer to the structure containing dimensional information of src1
[out]dst_addr: Pointer to the structure containing dimensional information of dst
[in]pKerInitArgs: Pointer to structure holding init parameters
[in]pExecInArgs: Pointer to structure holding execute input arguments
[in]pExecOutArgs: Pointer to structure holding execute output arguments
[out]archCycles: Cycles estimated for the compute, startup and teardown
[out]estCycles: Cycles estimated for the compute, startup, teardown and any associated overhead
Remarks
None