MMALIB User Guide
MMALIB_CNN_convolve_row_ixX_ixX_oxX

Introduction

Kernel for computing dense CNN convolution with row based processing and matrix multiplication.

MMALIB_convolve_row_stride1_1.svg
Input and coefficient buffer for stride 1 convolution
MMALIB_convolve_row_stride1_2.svg
Output buffer for convolve row stride 1
MMALIB_convolve_row_strided_1.svg
Input and coefficient buffer for strided convolution
MMALIB_convolve_row_strided_2.svg
Output buffer for for strided convolution

Data Structures

struct  MMALIB_CNN_convolve_row_ixX_ixX_oxX_InitArgs
 Structure containing the parameters initialization of CNN convolution computation. More...
 
struct  MMALIB_CNN_convolve_row_ixX_ixX_oxX_ExecInArgs
 Structure containing the parameters for input to the execute phase of CNN convolution computation. More...
 
struct  MMALIB_CNN_convolve_row_ixX_ixX_oxX_reorderWeights_Args
 This structure holds all the input parameters for reordering CNN filter weights for row convolution kernel. More...
 
struct  MMALIB_CNN_convolve_row_ixX_ixX_oxX_ExecOutArgs
 Structure containing the parameters for output from the execute phase of CNN convolution computation. More...
 

Functions

int32_t MMALIB_CNN_convolve_row_ixX_ixX_oxX_getHandleSize (MMALIB_CNN_convolve_row_ixX_ixX_oxX_InitArgs *pKerInitArgs)
 This is a query function to calculate the size of internal handle. More...
 
MMALIB_STATUS MMALIB_CNN_convolve_row_ixX_ixX_oxX_init (MMALIB_kernelHandle handle, const MMALIB_bufParams2D_t *src0_addr, const MMALIB_bufParams2D_t *src1_addr, const MMALIB_bufParams3D_t *dst_addr, const MMALIB_CNN_convolve_row_ixX_ixX_oxX_InitArgs *pKerInitArgs)
 This function call is required to initialize the handle. In this function most of the one time operation are performed and results are stored in handle. More...
 
MMALIB_STATUS MMALIB_CNN_convolve_row_ixX_ixX_oxX_init_checkParams (MMALIB_kernelHandle handle, const MMALIB_bufParams2D_t *src0_addr, const MMALIB_bufParams2D_t *src1_addr, const MMALIB_bufParams3D_t *dst_addr, const MMALIB_CNN_convolve_row_ixX_ixX_oxX_InitArgs *pKerInitArgs)
 This function call is required to initialize the handle. In this function most of the one time operation are performed and results are stored in handle. More...
 
MMALIB_STATUS MMALIB_CNN_convolve_row_ixX_ixX_oxX_exec (MMALIB_kernelHandle handle, const void *src0, const void *src1, void *dst, const MMALIB_CNN_convolve_row_ixX_ixX_oxX_ExecInArgs *pKerInArgs, MMALIB_CNN_convolve_row_ixX_ixX_oxX_ExecOutArgs *pKerOutArgs)
 This function is the main compute function, and performs the convolution primitive (conv + ReLU) for CNN on the row based data arrangement. It is called multiple times. More...
 
MMALIB_STATUS MMALIB_CNN_convolve_row_ixX_ixX_oxX_exec_checkParams (MMALIB_kernelHandle handle, const void *src0, const void *src1, void *dst, const MMALIB_CNN_convolve_row_ixX_ixX_oxX_ExecInArgs *pKerInArgs)
 This function checks the parameters and should be called before kernel executuon. It can be called once. More...
 
int32_t MMALIB_CNN_generateFillSeamPredicateRegisters (MMALIB_kernelHandle handle, int32_t inputWidth, int32_t pad, int32_t inputHeight, int32_t mmaWidth, int32_t MChannels, int32_t subMChannels)
 This function generates the predicate registers once per layer Predicate buffers are create to identify where to insert pad in the output generated between consecutive rows. The pad inserted is either same as the current layer or used for the next layer. More...
 
int32_t MMALIB_CNN_seamPredicateRegistersSize (int32_t inputWidth, int32_t pad, int32_t inputHeight, int32_t mmaWidth, int32_t MChannels, int32_t subMChannels)
 This function provides total bytes needed for seam insertion buffer. More...
 
int32_t MMALIB_CNN_convolve_row_reorderWeights (const void *restrict pWeights, void *restrict pReorderWeights, void *restrict pBias, const MMALIB_bufParams2D_t *src0_addr, const MMALIB_bufParams2D_t *src2_addr, MMALIB_CNN_convolve_row_ixX_ixX_oxX_reorderWeights_Args *reorderWeights, const MMALIB_CNN_convolve_row_ixX_ixX_oxX_InitArgs *pKerInitArgs)
 This function reorder the weights for M < 1 and K < 1 cases. More...
 
int32_t MMALIB_CNN_convolve_row_reorderWeightsFlag (MMALIB_CNN_convolve_row_ixX_ixX_oxX_reorderWeights_Args *reorderWeights)
 This function return the flag for M < 1 and K < 1 cases. More...
 
int32_t MMALIB_CNN_convolve_row_reorderWeightsBufferSize (MMALIB_bufParams2D_t *src0_addr, MMALIB_CNN_convolve_row_ixX_ixX_oxX_reorderWeights_Args *reorderWeights, const MMALIB_CNN_convolve_row_ixX_ixX_oxX_InitArgs *pKerInitArgs)
 This function return the buffer size for M < 1 and K < 1 cases. More...
 
int32_t MMALIB_CNN_seamPredicateRegistersSizeDefault ()
 This function generates the predicate registers once per layer. More...
 
void MMA_CNNLIB_convolveBiasReLUCompute_ixX_ixX_oxX_perfEst (const MMALIB_bufParams2D_t *src0_addr, const MMALIB_bufParams2D_t *src1_addr, const MMALIB_bufParams3D_t *dst_addr, MMALIB_CNN_convolve_row_ixX_ixX_oxX_InitArgs *pKerInitArgs, const MMALIB_CNN_convolve_row_ixX_ixX_oxX_ExecInArgs *pKerInArgs, MMALIB_CNN_convolve_row_ixX_ixX_oxX_ExecOutArgs *pKerOutArgs, int32_t iterN, uint64_t *archCycles, uint64_t *estCycles)
 This function generates the performance of MMALIB kernels. More...
 

Enumerations

enum  MMALIB_CNN_CONVOLVE_ROW_IXX_IXX_OXX_STATUS_NAME { MMALIB_CNN_CONVOLVE_ROW_IXX_IXX_OXX_ERR_SMALL_K = MMALIB_ERROR_MAX, MMALIB_CNN_CONVOLVE_ROW_IXX_IXX_OXX_ERR_MAX }
 Enum to define the error codes. More...
 

Enumeration Type Documentation

◆ MMALIB_CNN_CONVOLVE_ROW_IXX_IXX_OXX_STATUS_NAME

Enum to define the error codes.

Enumerator
MMALIB_CNN_CONVOLVE_ROW_IXX_IXX_OXX_ERR_SMALL_K 
MMALIB_CNN_CONVOLVE_ROW_IXX_IXX_OXX_ERR_MAX 

Error case because k < Ni*Fr*Fc

Definition at line 68 of file MMALIB_CNN_convolve_row_ixX_ixX_oxX.h.

Function Documentation

◆ MMALIB_CNN_convolve_row_ixX_ixX_oxX_getHandleSize()

int32_t MMALIB_CNN_convolve_row_ixX_ixX_oxX_getHandleSize ( MMALIB_CNN_convolve_row_ixX_ixX_oxX_InitArgs pKerInitArgs)

This is a query function to calculate the size of internal handle.

Parameters
[in]pKerInitArgs: Pointer to structure holding init parameters
Returns
Size of the buffer in bytes
Remarks
Application is expected to allocate buffer of the requested size and provide it as input to other functions requiring it.

◆ MMALIB_CNN_convolve_row_ixX_ixX_oxX_init()

MMALIB_STATUS MMALIB_CNN_convolve_row_ixX_ixX_oxX_init ( MMALIB_kernelHandle  handle,
const MMALIB_bufParams2D_t src0_addr,
const MMALIB_bufParams2D_t src1_addr,
const MMALIB_bufParams3D_t dst_addr,
const MMALIB_CNN_convolve_row_ixX_ixX_oxX_InitArgs pKerInitArgs 
)

This function call is required to initialize the handle. In this function most of the one time operation are performed and results are stored in handle.

Parameters
[in]handle: Active handle to the kernel
[in]src0_addr: Pointer to structure containing dimensional information of src0 weights/coeffcients
[in]src1_addr: Pointer to structure containing dimensional information of src1 feature maps
[out]dst_addr: Pointer to structure containing dimensional information of dst feature maps
[in]pKerInitArgs: Pointer to structure holding init parameters
Returns
Status of success or Error with Error Codes
Remarks
Application is expected to do provide valid handle

◆ MMALIB_CNN_convolve_row_ixX_ixX_oxX_init_checkParams()

MMALIB_STATUS MMALIB_CNN_convolve_row_ixX_ixX_oxX_init_checkParams ( MMALIB_kernelHandle  handle,
const MMALIB_bufParams2D_t src0_addr,
const MMALIB_bufParams2D_t src1_addr,
const MMALIB_bufParams3D_t dst_addr,
const MMALIB_CNN_convolve_row_ixX_ixX_oxX_InitArgs pKerInitArgs 
)

This function call is required to initialize the handle. In this function most of the one time operation are performed and results are stored in handle.

Parameters
[in]handle: Active handle to the kernel
[in]src0_addr: Pointer to structure containing dimensional information of src0 weights/coefficients
[in]src1_addr: Pointer to structure containing dimensional information of src1 input feature maps
[out]dst_addr: Pointer to structure containing dimensional information of dst output feature maps
[in]pKerInitArgs: Pointer to structure holding init parameters
Returns
Status of success or Error with Error Codes
Remarks
Application is expected to do provide valid handle

◆ MMALIB_CNN_convolve_row_ixX_ixX_oxX_exec()

MMALIB_STATUS MMALIB_CNN_convolve_row_ixX_ixX_oxX_exec ( MMALIB_kernelHandle  handle,
const void *  src0,
const void *  src1,
void *  dst,
const MMALIB_CNN_convolve_row_ixX_ixX_oxX_ExecInArgs pKerInArgs,
MMALIB_CNN_convolve_row_ixX_ixX_oxX_ExecOutArgs pKerOutArgs 
)

This function is the main compute function, and performs the convolution primitive (conv + ReLU) for CNN on the row based data arrangement. It is called multiple times.

The flow and the expectations of this function are as follows

  • Performs both strided and non-strided CNN convolution
  • Function generates partial or full output feature maps with multiple calls by the application
  • Functions expect all the data for input and weights available for one block of output
  • One output block has 64 output feature maps and 64 columns for 8 bit
  • One output block has 64 output feature maps and 64 columns for 16 bit
  • Function computes non multiple of 64 for 8 bit and 32 for 16 bit of output feature maps without requirement of extra memory
  • Function takes Bias as compute with a constant value in B matrix and variable values for A matrix with both 8 bit or 16 bit based on precision. example Bias = (A0 + A1 + A2 + ....)*B.
Parameters
[in]handle: Active handle to the kernel
[in]src0[]: Pointer to buffer holding convolution weights/coefficents*
[in]src1[]: Pointer to buffer holding input feature map
[out]dst[]: Pointer to buffer holding output feature map
[in]pKerInArgs: Pointer to structure holding input Arguments
[out]pKerOutArgs: Pointer to structure holding output Arguments
Returns
Status of success or Error with Error Codes
Assumptions:
  • I/O buffer pointers are assumed to be not aliased.
Performance Considerations:
  • For best performance, the following parameter settings are recommended:
    • Set widths equal to strides
    • Align all pointers to 64 byte boundaries
    • Set all stride values to a multiple of 64 for 8 bit and 32 for 16 bit
    • Set all width values to a multiple of 64 for 8 bit and 32 for 16 bit
    • Set output feature maps to be 64 for 8 bit and 32 for 16 bit
    • Bias value trained to fit in the B matrix rows upto making the B matrix as multiple of SIMD width
Remarks
Application is expected to do call of checkParams function prior to this function as it avoids check of paramaters for each invocation for optimization

◆ MMALIB_CNN_convolve_row_ixX_ixX_oxX_exec_checkParams()

MMALIB_STATUS MMALIB_CNN_convolve_row_ixX_ixX_oxX_exec_checkParams ( MMALIB_kernelHandle  handle,
const void *  src0,
const void *  src1,
void *  dst,
const MMALIB_CNN_convolve_row_ixX_ixX_oxX_ExecInArgs pKerInArgs 
)

This function checks the parameters and should be called before kernel executuon. It can be called once.

Parameters
[in]handle: Active handle to the kernel
[in]src0[]: Pointer to buffer holding convolution weights/coefficents*
[in]src1[]: Pointer to buffer holding input feature map
[out]dst[]: Pointer to buffer holding output feature map
[in]pKerInArgs: Pointer to structure holding input Arguments
Returns
Status of success or Error with Error Codes
Remarks
None

◆ MMALIB_CNN_generateFillSeamPredicateRegisters()

int32_t MMALIB_CNN_generateFillSeamPredicateRegisters ( MMALIB_kernelHandle  handle,
int32_t  inputWidth,
int32_t  pad,
int32_t  inputHeight,
int32_t  mmaWidth,
int32_t  MChannels,
int32_t  subMChannels 
)

This function generates the predicate registers once per layer Predicate buffers are create to identify where to insert pad in the output generated between consecutive rows. The pad inserted is either same as the current layer or used for the next layer.

Parameters
[in]handle: Active handle to the kernel
[in]inputWidth: Width of Feature map
[in]pad: Pad between rows
[out]inputHeight: Maximum height of feature map
[in]mmaWidth: MMA width
[in]MChannels: Number of output channels
[in]subMChannelsNumber of output channels per kernel call
Returns
number of bytes allocated for the predicate buffer
Remarks
None

◆ MMALIB_CNN_seamPredicateRegistersSize()

int32_t MMALIB_CNN_seamPredicateRegistersSize ( int32_t  inputWidth,
int32_t  pad,
int32_t  inputHeight,
int32_t  mmaWidth,
int32_t  MChannels,
int32_t  subMChannels 
)

This function provides total bytes needed for seam insertion buffer.

Parameters
[in]inputWidth: Width of Feature map
[in]pad: Pad between rows
[out]inputHeight: Maximum height of feature map
[in]mmaWidth: MMA width
[in]MChannels: Number of output channels
[in]subMChannelsNumber of output channels per kernel call
Returns
number of bytes allocated for the predicate buffer
Remarks
None

◆ MMALIB_CNN_convolve_row_reorderWeights()

int32_t MMALIB_CNN_convolve_row_reorderWeights ( const void *restrict  pWeights,
void *restrict  pReorderWeights,
void *restrict  pBias,
const MMALIB_bufParams2D_t src0_addr,
const MMALIB_bufParams2D_t src2_addr,
MMALIB_CNN_convolve_row_ixX_ixX_oxX_reorderWeights_Args reorderWeights,
const MMALIB_CNN_convolve_row_ixX_ixX_oxX_InitArgs pKerInitArgs 
)

This function reorder the weights for M < 1 and K < 1 cases.

Parameters
[in]pWeights[]: Pointer to buffer holding convolution weights/coefficents*
[in]pReorderWeights[]Pointer to buffer holding convolution weights/coefficents reordered
[in]pBias[]: Pointer to buffer holding Bias value
[in]src0_addr: Pointer to structure containing dimensional information of src0 weights/coefficients
[in]src2_addr: Pointer to structure containing dimensional information of src2 bias
[in]reorderWeights: Pointer to structure holding reorderWeight parameters information of src1 input feature maps
[in]pKerInitArgs: Pointer to structure holding init parameters information of src1 input feature maps
Returns
status

◆ MMALIB_CNN_convolve_row_reorderWeightsFlag()

int32_t MMALIB_CNN_convolve_row_reorderWeightsFlag ( MMALIB_CNN_convolve_row_ixX_ixX_oxX_reorderWeights_Args reorderWeights)

This function return the flag for M < 1 and K < 1 cases.

Parameters
[in]reorderWeights: Pointer to structure holding reorderWeight parameters information of src1 input feature maps
Returns
status

◆ MMALIB_CNN_convolve_row_reorderWeightsBufferSize()

int32_t MMALIB_CNN_convolve_row_reorderWeightsBufferSize ( MMALIB_bufParams2D_t src0_addr,
MMALIB_CNN_convolve_row_ixX_ixX_oxX_reorderWeights_Args reorderWeights,
const MMALIB_CNN_convolve_row_ixX_ixX_oxX_InitArgs pKerInitArgs 
)

This function return the buffer size for M < 1 and K < 1 cases.

Parameters
[in]src0_addr: Pointer to structure containing dimensional information of src0 weights/coefficients
[in]reorderWeights: Pointer to structure holding reorderWeight parameters information of src1 input feature maps
[in]pKerInitArgs: Pointer to structure holding init parameters information of src1 input feature maps
Returns
status

◆ MMALIB_CNN_seamPredicateRegistersSizeDefault()

int32_t MMALIB_CNN_seamPredicateRegistersSizeDefault ( )

This function generates the predicate registers once per layer.

Returns
number of bytes allocated for the predicate buffer
Remarks
None

◆ MMA_CNNLIB_convolveBiasReLUCompute_ixX_ixX_oxX_perfEst()

void MMA_CNNLIB_convolveBiasReLUCompute_ixX_ixX_oxX_perfEst ( const MMALIB_bufParams2D_t src0_addr,
const MMALIB_bufParams2D_t src1_addr,
const MMALIB_bufParams3D_t dst_addr,
MMALIB_CNN_convolve_row_ixX_ixX_oxX_InitArgs pKerInitArgs,
const MMALIB_CNN_convolve_row_ixX_ixX_oxX_ExecInArgs pKerInArgs,
MMALIB_CNN_convolve_row_ixX_ixX_oxX_ExecOutArgs pKerOutArgs,
int32_t  iterN,
uint64_t *  archCycles,
uint64_t *  estCycles 
)

This function generates the performance of MMALIB kernels.

Parameters
[in]src0_addr: Pointer to structure containing dimensional information of src0 weights/coefficients
[in]src1_addr: Pointer to structure containing dimensional information of src1 input feature maps
[out]dst_addr: Pointer to structure containing dimensional information of dst output feature maps
[in]pKerInitArgs: Pointer to structure holding init parameters
[in]pKerInArgs: Pointer to structure holding input Arguments
[in]pKerOutArgs: Pointer to structure holding output Arguments
[in]iterN: number of subMBlocks iterations
[out]archCycles: pointer to store architecture cycles
[out]estCycles: pointer to store estimated kernel cycles
Remarks
None