The input feature maps are 8 bit, 16 bit signed or unsigned
The weights are 8 bit, 16 bit signed
The output feature maps are 8 bit, 16 bit signed or unsigned
The bias is loaded in a array for each output feature map as 32bit precision for 8 bit and 64bit precision for 6bit
The output of each feature map can be scaled(8 bit) and shift(8 bit) after accumulation
Programmable saturation is supported on the output values
The input feature maps are passed to a kernel with all the rows are next to each other. All the input feature maps of same number of pixels fed into the kernel buffer. When starting processing the feature maps can start at a given column -> col parameter of a row and kernel will start processing from the intermediate location using the parameter.
Input buffer for strided and non strided convolution
starting location with col parameter
The filter coeffieint buffer for each output feature map is layer out in a linear manner with Ni*Fr*Fc values. The dilated kernel coefficients are used without the dilated zero values.
Filter coefficients buffer
The kernel requires multiple handles for a given feature maps which will be prestored in the L1D memory. For a givn CNN layer these handles will be fixed. There are three categories of these handles
Top row handles which require top pad, right pad and left pad
Middle row handles which require right pad and left pad
Bottom row handle which require right pad, left pad and bottom pad
The top row handles will require the starting pointer srcPtr to be at the beginning of the first pixel of the feature map with "col" location provided
The middle row handles can have the pointer starting at any location with "col" location provided
The bottom row handles can have the pointer starting at any location with "col" location provided and also validColsOutBottom as input to the kernel
Examples of parameters for a different handles for 3x3 stride 1 convolution
Strided convolution
The strided convolution will have all the handles to starting pointer srcPtr at the first pixel of the row.
Strided convolution generates complete row for a kernel call
The handles for top row, middle row and bottom row will be different
Examples of parameters for a different handles for 3x3 stride 2 convolution
Structure containing the parameters for input to the execute phase of CNN convolution computation These parameters will not exist in J7AM, kept for J7ES compatibility. More...
This function call is required to initialize the handle. In this function most of the one time operation are performed and results are stored in handle. More...
This function call is required to initialize the handle. In this function most of the one time operation are performed and results are stored in handle. More...
This function is the main compute function, and performs the convolution primitive (conv + ReLU) for CNN on the row based data arrangement. It is called multiple times. More...
This function call is required to initialize the handle. In this function most of the one time operation are performed and results are stored in handle.
Parameters
[in]
handle
: Active handle to the kernel
[in]
src0_addr
: Pointer to structure containing dimensional information of src0 weights/coeffcients
[in]
src1_addr
: Pointer to structure containing dimensional information of src1 feature maps
[in]
src2_addr
: Pointer to structure containing dimensional information of src2 bias
[in]
src3_addr
: Pointer to structure containing dimensional information of src3 scale values
[out]
dst_addr
: Pointer to structure containing dimensional information of dst feature maps
[in]
pKerInitArgs
: Pointer to structure holding init parameters
Returns
Status of success or Error with Error Codes
Remarks
Application is expected to do provide valid handle
This function call is required to initialize the handle. In this function most of the one time operation are performed and results are stored in handle.
Parameters
[in]
handle
: Active handle to the kernel
[in]
src0_addr
: Pointer to structure containing dimensional information of src0 weights/coefficients
[in]
src1_addr
: Pointer to structure containing dimensional information of src1 input feature maps
[in]
src2_addr
: Pointer to structure containing dimensional information of src2 bias
[in]
src3_addr
: Pointer to structure containing dimensional information of src3 scale values
[out]
dst_addr
: Pointer to structure containing dimensional information of dst output feature maps
[in]
pKerInitArgs
: Pointer to structure holding init parameters
Returns
Status of success or Error with Error Codes
Remarks
Application is expected to do provide valid handle
This function is the main compute function, and performs the convolution primitive (conv + ReLU) for CNN on the row based data arrangement. It is called multiple times.
The flow and the expectations of this function are as follows
Performs both strided and non-strided CNN convolution
Function generates partial or full output feature maps with multiple calls by the application
Function creates atleast three output blocks when KBlocks is less than 3
Function creates at least one output block when KBlocks is greater than equal to 3 except for 1x1 stride 2 convolution has greater than 3
Functions expect all the data for input and weights available for one block of output
One output block has 64 output feature maps and 64 columns for 8 bit
One output block has 64 output feature maps and 64 columns for 16 bit
Function computes non multiple of 64 for 8 bit and 32 for 16 bit of output feature maps without requirement of extra memory
Function takes Bias as compute with a constant value in B matrix and variable values for A matrix with both 8 bit or 16 bit based on precision. example Bias = (A0 + A1 + A2 + ....)*B.
Parameters
[in]
handle
: Active handle to the kernel
[in]
src0[]
: Pointer to buffer holding convolution weights/coefficents*
[in]
src1[]
: Pointer to buffer holding input feature map
[in]
src2[]
: Pointer to buffer holding the bias
[in]
src3[]
: Pointer to buffer holding the scale values
[in]
src4[]
: Pointer to buffer holding the shift values
[out]
dst[]
: Pointer to buffer holding output feature map
[in]
pKerInArgs
: Pointer to structure holding input Arguments
[out]
pKerOutArgs
: Pointer to structure holding output Arguments
Returns
Status of success or Error with Error Codes
Assumptions:
I/O buffer pointers are assumed to be not aliased.
Performance Considerations:
For best performance, the following parameter settings are recommended:
Set widths equal to strides
Align all pointers to 64 byte boundaries
Set all stride values to a multiple of 64 for 8 bit and 32 for 16 bit
Set all width values to a multiple of 64 for 8 bit and 32 for 16 bit
Set output feature maps to be 64 for 8 bit and 32 for 16 bit
Bias value trained to fit in the B matrix rows upto making the B matrix as multiple of SIMD width
Remarks
Application is expected to do call of checkParams function prior to this function as it avoids check of paramaters for each invocation for optimization