3. Supported Layer Patterns¶
A layer pattern is a sequence of layers. This section explains which layer patterns can be offloaded to the software library or the NPU for execution. This documentation uses ONNX layer names.
3.1. Overview of Layer Types¶
First Convolution layer: FCONV, not depth-wise, not point-wise, input feature map channel is 1
Generic Convolution layer: GCONV, not depth-wise, not point-wise, input feature map channel is multiple of 4
Depth-Wise Convolution layer: DWCONV
Point-Wise Convolution layer: PWCONV
Point-Wise Convolution with Residual input layer: PWCONVRES
Transposed Convolution layer: TCONV
Fully-Connected layer: FC
Average Pooling layer: AVGPOOL
Max Pooling layer: MAXPOOL
3.2. Terminology¶
Common Sequence |
Layers |
---|---|
BNORM |
Add (bias) => Mul (scale) => Div (2^n, right shift) => Floor => Clip |
BNORMRES |
Add (bias) => Mul (scale) => Div (2^n, right shift) => Floor => Clip (-512, 511) => Add (residual) => Clip |
3.2.1. Restrictions¶
bias is a signed 16-bit integer when input/output feature map are 2/4 bit. bias is a signed 24-bit when input/output feature map are 8-bit.
scale is an unsigned 8-bit integer when kernel weights are 2/4 bit. scale is always 1 when kernel weights are 8-bit.
right shift amount is limited to a 5-bit integer.
Div (2^n) could also be represented as Mul (1/(2^n)).
Clip range always corresponds to the range of signed/unsigned 2-bit/4-bit/8-bit integers, except the one otherwise noted in the BNORMRES sequence.
3.3. Layer Patterns¶
Pattern Name |
Layers |
---|---|
FCONV |
Conv (2d, not depth-wise, not point-wise, ifmap channel is 1) => BNORM sequence |
GCONV |
Conv (2d, not depth-wise, not point-wise, ifmap channel is multiple of 4) => BNORM sequence |
DWCONV |
Conv (2d, depth-wise) => BNORM sequence |
PWCONV |
Conv (2d, point-wise) => BNORM sequence |
PWCONVRES |
Conv (2d, point-wise) => BNORMRES sequence |
TCONV |
ConvTranspose (2d) => BNORM sequence |
FC |
MatMul => BNORM sequence |
AVGPOOL (global) |
ReduceSum (sum across height and weight) => round => BNORM sequence |
AVGPOOL (non-global) |
AveragePool => Mul (by total pool size) => round => BNORM sequence |
MAXPOOL |
MaxPool |
3.3.1. Restrictions¶
Input, output and residual feature maps are 8-bit integers.
Only a batch size of 1 is supported for inference.
The number of groups in the convolution layers is always 1, except for the DWCONV layer.
The number of input/output channels should be a multiple of 4, except for FCONV input.
The TCONV layer’s strides must be the same as the kernel size.