![]() |
MMALIB User Guide
|
Kernel for computing dense CNN deconvolution with row-based processing and matrix-matrix multiplication.
Data Structures | |
struct | MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_InitArgs |
Structure containing the parameters initialization of CNN deconvolution computation. More... | |
struct | MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_ExecInArgs |
Structure containing the parameters for input to the execute phase of CNN deconvolution computation. More... | |
struct | MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_ExecOutArgs |
Functions | |
int32_t | MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_getHandleSize (MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_InitArgs *pKerInitArgs) |
This is a query function to return the size of internal handle. More... | |
MMALIB_STATUS | MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_init (MMALIB_kernelHandle handle, const MMALIB_bufParams2D_t *src0_addr, const MMALIB_bufParams2D_t *src1_addr, const MMALIB_bufParams2D_t *dst_addr, const MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_InitArgs *pKerInitArgs) |
This function call is required to initialize the handle. In this function, most of the one-time operation are performed and results are stored in handle. More... | |
MMALIB_STATUS | MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_init_checkParams (MMALIB_kernelHandle handle, const MMALIB_bufParams2D_t *src0_addr, const MMALIB_bufParams2D_t *src1_addr, const MMALIB_bufParams2D_t *dst_addr, const MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_InitArgs *pKerInitArgs) |
This function call is required to initialize the handle. In this function, most of the one-time operation are performed and results are stored in handle. More... | |
MMALIB_STATUS | MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_exec (MMALIB_kernelHandle handle, const void *src0, const void *src1, void *dst, const MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_ExecInArgs *pKerInArgs, MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_ExecOutArgs *pKerOutArgs) |
This function is the main compute function and performs the deconvolution primitive (conv + ReLU) for CNN on the row-based data arrangement. It is typically called multiple times. More... | |
MMALIB_STATUS | MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_exec_checkParams (MMALIB_kernelHandle handle, const void *src0, const void *src1, const void *dst, const MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_ExecInArgs *pKerInArgs) |
This function checks the parameters and should be called before kernel execution. It can be called once. More... | |
MMALIB_STATUS | MMALIB_CNN_deconvolve_row_4x4Stride2PreProcessParameters (uint32_t kDim, uint32_t numInChannels, uint32_t pitchA, uint32_t numOutChannels, uint32_t numGroups, const uint32_t mmaSize, const void *restrict src, void *restrict dst) |
This is a pre-processing function that reshapes the parameter buffer from \( N_o \times N_i \times F_r \times F_c \) to \( 4 \times N_o \times N_i \times \frac{F_r}{2} \times \frac{F_c}{2} \). The kernel expects the parameter tensor in the aforementioned shape to perform \( 4 \times 4 \) stride 2 deconvolution via four, \( 2 \times 2 \) stride 1 convolutions. More... | |
MMALIB_STATUS | MMALIB_CNN_deconvolve_row_2x2Stride2PreProcessParameters (uint32_t kDim, uint32_t numInChannels, uint32_t pitchA, uint32_t numOutChannels, uint32_t numGroups, const uint32_t mmaSize, const void *restrict src, void *restrict dst) |
This is a pre-processing function that reshapes the parameter buffer from \( N_o \times N_i \times F_r \times F_c \) to \( 4 \times N_o \times N_i \times \frac{F_r}{2} \times \frac{F_c}{2} \). The kernel expects the parameter tensor in the aforementioned shape to perform \( 2 \times 2 \) stride 2 deconvolution via four, \( 1 \times 1 \) stride 1 convolutions. More... | |
MMALIB_STATUS | MMALIB_CNN_deconvolve_row_8x8Stride2PreProcessParameters (uint32_t kDim, uint32_t numInChannels, uint32_t pitchA, uint32_t numOutChannels, uint32_t numGroups, const uint32_t mmaSize, const void *restrict src, void *restrict dst) |
This is a pre-processing function that reshapes the parameter buffer from \( N_o \times N_i \times F_r \times F_c \) to \( 4 \times N_o \times N_i \times \frac{F_r}{2} \times \frac{F_c}{2} \). The kernel expects the parameter tensor in the aforementioned shape to perform \( 8 \times 8 \) stride 2 deconvolution via four, \( 4 \times 4 \) stride 1 convolutions. More... | |
void | MMALIB_CNN_deconvolveBiasReLUCompute_ixX_ixX_oxX_perfEst (const MMALIB_bufParams2D_t *src0_addr, const MMALIB_bufParams2D_t *src1_addr, const MMALIB_bufParams2D_t *dst_addr, const MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_InitArgs *kerInitArgs, const MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_ExecInArgs *pKerInArgs, MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_ExecOutArgs *pKerOutArgs, int32_t iterN, uint64_t *archCycles, uint64_t *estCycles) |
This function estimates the cycles consumed for the kernel execution. More... | |
Enumerations | |
enum | MMALIB_CNN_DECONVOLVE_ROW_IXX_IXX_OXX_STATUS_NAME { MMALIB_CNN_DECONVOLVE_ROW_IXX_IXX_OXX_ERR_SMALL_K , MMALIB_CNN_DECONVOLVE_ROW_IXX_IXX_OXX_ERR_MAX } |
Enumeration for different Error codes for MMALIB_CNN_DECONVOLVE_ROW Kernel. More... | |
Enumeration for different Error codes for MMALIB_CNN_DECONVOLVE_ROW Kernel.
Enumerator | |
---|---|
MMALIB_CNN_DECONVOLVE_ROW_IXX_IXX_OXX_ERR_SMALL_K | |
MMALIB_CNN_DECONVOLVE_ROW_IXX_IXX_OXX_ERR_MAX | Error case because k < Ni*Fr*Fc |
Definition at line 162 of file MMALIB_CNN_deconvolve_row_ixX_ixX_oxX.h.
int32_t MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_getHandleSize | ( | MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_InitArgs * | pKerInitArgs | ) |
This is a query function to return the size of internal handle.
[in] | pKerInitArgs | : Pointer to structure holding init parameters |
MMALIB_STATUS MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_init | ( | MMALIB_kernelHandle | handle, |
const MMALIB_bufParams2D_t * | src0_addr, | ||
const MMALIB_bufParams2D_t * | src1_addr, | ||
const MMALIB_bufParams2D_t * | dst_addr, | ||
const MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_InitArgs * | pKerInitArgs | ||
) |
This function call is required to initialize the handle. In this function, most of the one-time operation are performed and results are stored in handle.
[in] | handle | : Active handle to the kernel |
[in] | src0_addr | : Pointer to structure containing dimensional information of src0 |
[in] | src1_addr | : Pointer to structure containing dimensional information of src1 |
[out] | dst_addr | : Pointer to structure containing dimensional information of dst |
[in] | pKerInitArgs | : Pointer to structure holding init parameters |
MMALIB_STATUS MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_init_checkParams | ( | MMALIB_kernelHandle | handle, |
const MMALIB_bufParams2D_t * | src0_addr, | ||
const MMALIB_bufParams2D_t * | src1_addr, | ||
const MMALIB_bufParams2D_t * | dst_addr, | ||
const MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_InitArgs * | pKerInitArgs | ||
) |
This function call is required to initialize the handle. In this function, most of the one-time operation are performed and results are stored in handle.
[in] | handle | : Active handle to the kernel |
[in] | src0_addr | : Pointer to structure containing dimensional information of src0 weights/coefficients |
[in] | src1_addr | : Pointer to structure containing dimensional information of src1 input feature maps |
[out] | dst_addr | : Pointer to structure containing dimensional information of dst output feature maps |
[in] | pKerInitArgs | : Pointer to structure holding init parameters |
MMALIB_STATUS MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_exec | ( | MMALIB_kernelHandle | handle, |
const void * | src0, | ||
const void * | src1, | ||
void * | dst, | ||
const MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_ExecInArgs * | pKerInArgs, | ||
MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_ExecOutArgs * | pKerOutArgs | ||
) |
This function is the main compute function and performs the deconvolution primitive (conv + ReLU) for CNN on the row-based data arrangement. It is typically called multiple times.
[in] | handle | : Active handle to the kernel |
[in] | src0[] | : Pointer to buffer holding convolution weights [ A matrix] |
[in] | src1[] | : Pointer to buffer holding input feature map [ B matrix] |
[out] | dst[] | : Pointer to buffer holding paritial output feature map [ C matrix] |
[in] | pKerInArgs | : Pointer to structure holding input Arguments |
[out] | pKerOutArgs | : Pointer to structure holding output Arguments |
MMALIB_STATUS MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_exec_checkParams | ( | MMALIB_kernelHandle | handle, |
const void * | src0, | ||
const void * | src1, | ||
const void * | dst, | ||
const MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_ExecInArgs * | pKerInArgs | ||
) |
This function checks the parameters and should be called before kernel execution. It can be called once.
[in] | handle | : Active handle to the kernel |
[in] | src0[] | : Pointer to buffer holding convolution weights [ A matrix] |
[in] | src1[] | : Pointer to buffer holding input feature map data [ B matrix] |
[out] | dst[] | : Pointer to buffer holding output feature map data [ C matrix] |
[in] | pKerInArgs | : Pointer to structure holding input Arguments |
MMALIB_STATUS MMALIB_CNN_deconvolve_row_4x4Stride2PreProcessParameters | ( | uint32_t | kDim, |
uint32_t | numInChannels, | ||
uint32_t | pitchA, | ||
uint32_t | numOutChannels, | ||
uint32_t | numGroups, | ||
const uint32_t | mmaSize, | ||
const void *restrict | src, | ||
void *restrict | dst | ||
) |
This is a pre-processing function that reshapes the parameter buffer from \( N_o \times N_i \times F_r \times F_c \) to \( 4 \times N_o \times N_i \times \frac{F_r}{2} \times \frac{F_c}{2} \). The kernel expects the parameter tensor in the aforementioned shape to perform \( 4 \times 4 \) stride 2 deconvolution via four, \( 2 \times 2 \) stride 1 convolutions.
[in] | kDim | : Length of parameter buffer |
[in] | numInChannels | : Number of input channels in parameter tensor |
[in] | pitchA | : Pitch of parameter buffer |
[in] | numOutChannels | : Number of output channels in parameter tensor |
[in] | numGroups | : Number of groups in parameter tensor |
[in] | mmaSize | : MMA width |
[in] | src | : Pointer to buffer with parameter tensor |
[out] | dst | : Pointer to buffer with reshaped parameter tensor |
MMALIB_STATUS MMALIB_CNN_deconvolve_row_2x2Stride2PreProcessParameters | ( | uint32_t | kDim, |
uint32_t | numInChannels, | ||
uint32_t | pitchA, | ||
uint32_t | numOutChannels, | ||
uint32_t | numGroups, | ||
const uint32_t | mmaSize, | ||
const void *restrict | src, | ||
void *restrict | dst | ||
) |
This is a pre-processing function that reshapes the parameter buffer from \( N_o \times N_i \times F_r \times F_c \) to \( 4 \times N_o \times N_i \times \frac{F_r}{2} \times \frac{F_c}{2} \). The kernel expects the parameter tensor in the aforementioned shape to perform \( 2 \times 2 \) stride 2 deconvolution via four, \( 1 \times 1 \) stride 1 convolutions.
[in] | kDim | : Length of parameter buffer |
[in] | numInChannels | : Number of input channels in parameter tensor |
[in] | pitchA | : Pitch of parameter buffer |
[in] | numOutChannels | : Number of output channels in parameter tensor |
[in] | numGroups | : Number of groups in parameter tensor |
[in] | mmaSize | : MMA width |
[in] | src | : Pointer to buffer with parameter tensor |
[out] | dst | : Pointer to buffer with reshaped parameter tensor |
MMALIB_STATUS MMALIB_CNN_deconvolve_row_8x8Stride2PreProcessParameters | ( | uint32_t | kDim, |
uint32_t | numInChannels, | ||
uint32_t | pitchA, | ||
uint32_t | numOutChannels, | ||
uint32_t | numGroups, | ||
const uint32_t | mmaSize, | ||
const void *restrict | src, | ||
void *restrict | dst | ||
) |
This is a pre-processing function that reshapes the parameter buffer from \( N_o \times N_i \times F_r \times F_c \) to \( 4 \times N_o \times N_i \times \frac{F_r}{2} \times \frac{F_c}{2} \). The kernel expects the parameter tensor in the aforementioned shape to perform \( 8 \times 8 \) stride 2 deconvolution via four, \( 4 \times 4 \) stride 1 convolutions.
[in] | kDim | : Length of parameter buffer |
[in] | numInChannels | : Number of input channels in parameter tensor |
[in] | pitchA | : Pitch of parameter buffer |
[in] | numOutChannels | : Number of output channels in parameter tensor |
[in] | numGroups | : Number of groups in parameter tensor |
[in] | mmaSize | : MMA width |
[in] | src | : Pointer to buffer with parameter tensor |
[out] | dst | : Pointer to buffer with reshaped parameter tensor |
void MMALIB_CNN_deconvolveBiasReLUCompute_ixX_ixX_oxX_perfEst | ( | const MMALIB_bufParams2D_t * | src0_addr, |
const MMALIB_bufParams2D_t * | src1_addr, | ||
const MMALIB_bufParams2D_t * | dst_addr, | ||
const MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_InitArgs * | kerInitArgs, | ||
const MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_ExecInArgs * | pKerInArgs, | ||
MMALIB_CNN_deconvolve_row_ixX_ixX_oxX_ExecOutArgs * | pKerOutArgs, | ||
int32_t | iterN, | ||
uint64_t * | archCycles, | ||
uint64_t * | estCycles | ||
) |
This function estimates the cycles consumed for the kernel execution.
[in] | src0_addr | : Pointer to the structure containing dimensional information of src0 |
[in] | src1_addr | : Pointer to the structure containing dimensional information of src1 |
[out] | dst_addr | : Pointer to the structure containing dimensional information of dst |
[in] | kerInitArgs | : Pointer to structure holding init parameters |
[in] | pKerInArgs | : Pointer to structure holding input arguments |
[in] | pKerOutArgs | : Pointer to structure holding output arguments |
[in] | iterN | : Number of subMBlocks iterations |
[out] | archCycles | : Cycles estimated for the compute, startup and teardown |
[out] | estCycles | : Cycles estimated for the compute, startup, teardown and any associated overhead |