Kernel provides compute functionality of Fully Connected Layer: \( Y^T = X^T \times H^T + B^T\).
Kernel requires feature map ( \( X^T \)), coefficients ( \( H^T \)), and bias ( \( B^T \)) to be available in memory
8- and 16-bit datatypes supported
Supported datatypes for feature map are 8- and 16-bit signed or unsigned
Supported datatypes for coefficients are 8- and 16-bit signed
Supported datatypes for bias are 32- and 64-bit signed
Supported datatypes for output are 8- and 16-bit signed or unsigned
This kernel requires specific data arrangement for the kernel matrix to alleviate bank conflicts in L2 when performing DMA transfers in conjunction to execution of this kernel
Desired physical bank-access pattern for SEs and DMA in L2 is {0,0,1,1,2,2,3,3,0,0,1,1,2,2,3,3, ...}
Re-ordering functionality of MMALIB_CNN_fullyConnectedBias_ixX_ixX_oxX_reorderWeights may be used
Below figure shows two examples where MMALIB_CNN_fullyConnected_ixX_ixX_oxX_InitArgs.Ni (512); Number of output features = 128 or 104
Filter Coefficient Buffer
Three buffers of type MMALIB_bufParams2D_t : src0 \( \rightarrow X^T \); src1 \( \rightarrow H^T \); src2 \( \rightarrow B^T \), and dst \( \rightarrow Y^T \)
The kernel also supports cases when parameter or feature map matrix does not fit in L2 memory
int32_t MMALIB_CNN_fullyConnectedBias_ixX_ixX_oxX_getHandleSize (MMALIB_CNN_fullyConnectedBias_ixX_ixX_oxX_InitArgs *pKerInitArgs)
This is a query function to return the size of internal handle. More...
MMALIB_STATUS MMALIB_CNN_fullyConnectedBias_ixX_ixX_oxX_init (MMALIB_kernelHandle handle, const MMALIB_bufParams2D_t *src0_addr, const MMALIB_bufParams2D_t *src1_addr, const MMALIB_bufParams1D_t *src2_addr, const MMALIB_bufParams1D_t *src3_addr, const MMALIB_bufParams2D_t *dst_addr, const MMALIB_CNN_fullyConnectedBias_ixX_ixX_oxX_InitArgs *pKerInitArgs)
This function call is required to initialize the handle. In this function most of the one time operations are performed and results are stored in the handle. More...
MMALIB_STATUS MMALIB_CNN_fullyConnectedBias_ixX_ixX_oxX_init_checkParams (MMALIB_kernelHandle handle, const MMALIB_bufParams2D_t *src0_addr, const MMALIB_bufParams2D_t *src1_addr, const MMALIB_bufParams1D_t *src2_addr, const MMALIB_bufParams2D_t *dst_addr, const MMALIB_CNN_fullyConnectedBias_ixX_ixX_oxX_InitArgs *pKerInitArgs)
This function checks the parameters and should be called before kernel execution. It can be called once. More...
MMALIB_STATUS MMALIB_CNN_fullyConnectedBias_ixX_ixX_oxX_exec (MMALIB_kernelHandle handle, const void *src0, const void *src1, const void *src2, const void *src3, const void *src4, void *dst, const MMALIB_CNN_fullyConnectedBias_ixX_ixX_oxX_ExecInArgs *pKerExecInArgs, MMALIB_CNN_fullyConnectedBias_ixX_ixX_oxX_ExecOutArgs *pKerExecOutArgs)
This function is the main compute function, and performs matrix-matrix multiplication. More...
MMALIB_STATUS MMALIB_CNN_fullyConnectedBias_ixX_ixX_oxX_exec_checkParams (MMALIB_kernelHandle handle, const void *src0, const void *src1, const void *src2, const void *dst)
This function checks the parameters and should be called before kernel executuon. It can be called once. More...
void MMALIB_CNN_fullyConnectedBias_ixX_ixX_oxX_perfEst (MMALIB_kernelHandle handle, const MMALIB_bufParams2D_t *src0_addr, const MMALIB_bufParams2D_t *src1_addr, const MMALIB_bufParams2D_t *dst_addr, uint64_t *idealCycles, uint64_t *archCycles, uint64_t *estCycles, int32_t *caseNumber)
This function estimates the cycles consumed for the kernel execution. More...
◆ MMALIB_CNN_FULLYCONNECTEDBIAS_IXX_IXX_OXX_STATUS_NAME
Enumeration of different error codes for the MMALIB_CNN_FULLYCONNECTED kernel.
Enumerator MMALIB_CNN_FULLYCONNECTEDBIAS_IXX_IXX_OXX_ERR_SMALL_K
MMALIB_CNN_FULLYCONNECTEDBIAS_IXX_IXX_OXX_ERR_MAX
Definition at line 134 of file MMALIB_CNN_fullyConnectedBias_ixX_ixX_oxX.h .
◆ MMALIB_CNN_fullyConnectedBias_ixX_ixX_oxX_getHandleSize()
This is a query function to return the size of internal handle.
Parameters
[in] pKerInitArgs : Pointer to structure holding init parameters
Returns Size of the buffer in bytes
◆ MMALIB_CNN_fullyConnectedBias_ixX_ixX_oxX_init()
This function call is required to initialize the handle. In this function most of the one time operations are performed and results are stored in the handle.
Parameters
[in] handle : Active handle to the kernel
[in] src0_addr : Pointer to the structure containing dimensional information of src0, which is the feature map matrix
[in] src1_addr : Pointer to the structure containing dimensional information of src1, which is the kernel matrix
[in] src2_addr[] : Pointer to the structure containing dimensional information of src2, which is the bias vector
[in] src3_addr[] : Pointer to the structure containing dimensional information of src3, which is the scale vector
[out] dst_addr : Pointer to the structure containing dimensional information of dst
[in] pKerInitArgs : Pointer to the structure holding init parameters
Returns Status value indicating success or failure. Refer to MMALIB_STATUS .
◆ MMALIB_CNN_fullyConnectedBias_ixX_ixX_oxX_init_checkParams()
This function checks the parameters and should be called before kernel execution. It can be called once.
Parameters
[in] handle : Active handle to the kernel
[in] src0_addr : Pointer to the structure containing dimensional information of src0, which is the feature map
[in] src1_addr : Pointer to the structure containing dimensional information of src1, which is the kernel
[in] src2_addr[] : Pointer to the structure containing dimensional information of src2, which is the bias vector
[out] dst_addr : Pointer to the structure containing dimensional information of dst
[in] pKerInitArgs : Pointer to the structure holding init parameters
Returns Status value indicating success or failure. Refer to MMALIB_STATUS .
◆ MMALIB_CNN_fullyConnectedBias_ixX_ixX_oxX_exec()
This function is the main compute function, and performs matrix-matrix multiplication.
The flow and the expectations of this function are as follows
Parameters
[in] handle : Active handle to the kernel
[in] src0[] : Pointer to buffer holding the first matrix input [ A matrix], which is the feature map
[in] src1[] : Pointer to buffer holding the second matrix input [ B matrix], which is the pre-processed kernel matrix with bias values
[in] src2[] : Pointer to buffer holding the bias vector
[in] src3[] : Pointer to buffer holding the scale vector
[in] src4[] : Pointer to buffer holding the shift vector
[out] dst[] : Pointer to buffer holding the output matrix [ C matrix]
[in] pKerExecInArgs : Pointer to the structure holding input exec parameters
[out] pKerExecOutArgs : Pointer to the structure holding exec output parameters
Returns Status value indicating success or failure. Refer to MMALIB_STATUS .
Assumptions:
I/O buffer pointers are assumed to be not aliased.
Performance Considerations:
For best performance, the following parameter settings are recommended:
Align all pointers to 8 byte boundaries
Set all matrix dimensions to a multiple of
64 for 8-bit data
32 for 16-bit data
◆ MMALIB_CNN_fullyConnectedBias_ixX_ixX_oxX_exec_checkParams()
MMALIB_STATUS MMALIB_CNN_fullyConnectedBias_ixX_ixX_oxX_exec_checkParams
(
MMALIB_kernelHandle
handle ,
const void *
src0 ,
const void *
src1 ,
const void *
src2 ,
const void *
dst
)
This function checks the parameters and should be called before kernel executuon. It can be called once.
Parameters
[in] handle : Active handle to the kernel
[in] src0[] : Pointer to buffer holding the first matrix input [ A matrix], which is the feature map
[in] src1[] : Pointer to buffer holding the second matrix input [ B matrix], which is the pre-processed parameter matrix
[in] src2[] : Pointer to buffer holding the bias vector
[out] dst[] : Pointer to buffer holding the output matrix [ C matrix]
Returns Status value indicating success or failure. Refer to MMALIB_STATUS .
◆ MMALIB_CNN_fullyConnectedBias_ixX_ixX_oxX_perfEst()
This function estimates the cycles consumed for the kernel execution.
Parameters
[in] handle : Active handle to the kernel
[in] src0_addr : Pointer to the structure containing dimensional information of src0
[in] src1_addr : Pointer to the structure containing dimensional information of src1
[out] dst_addr : Pointer to the structure containing dimensional information of dst
[out] idealCycles : Cycles estimated for the compute, ideally
[out] archCycles : Cycles estimated for the compute, startup and teardown
[out] estCycles : Cycles estimated for the compute, startup, teardown and any associated overhead
[out] caseNumber : The case (execution path) taken insided the execution of the kernel