4. Supported Layer Patterns

A layer pattern is a sequence of layers. This section explains which layer patterns can be offloaded to the NPU hardware accelerator or software library for execution. Other layers will be supported by TVM code generation and run on the host CPU. This documentation uses ONNX layer names.

4.1. Overview of Layer Types

  • First Convolution layer: FCONV, not depth-wise, not point-wise, input feature map channel is 1

  • Generic Convolution layer: GCONV, not depth-wise, not point-wise, input feature map channel is multiple of 4

  • Depth-Wise Convolution layer: DWCONV

  • Point-Wise Convolution layer: PWCONV

  • Point-Wise Convolution with Residual input layer: PWCONVRES

  • Transposed Convolution layer: TCONV

  • Fully-Connected layer: FC

  • Average Pooling layer: AVGPOOL

  • Max Pooling layer: MAXPOOL

4.2. Terminology

Common Sequence

Layers

BNORM

Add (bias) => Mul (scale) => Div (2^n, right shift) => Floor => Clip

BNORMRES

Add (bias) => Mul (scale) => Div (2^n, right shift) => Floor => Clip (-512, 511) => Add (residual) => Clip

4.2.1. Restrictions

  • bias is a signed 16-bit integer when weights are 2- or 4-bit. bias is a signed 24-bit when weights are 8-bit.

  • scale is an unsigned 8-bit integer when kernel weights are 2- or 4-bit. scale is always 1 when kernel weights are 8-bit.

  • right shift amount is limited to a 5-bit integer.

  • Div (2^n) could also be represented as Mul (1/(2^n)).

  • Clip range always corresponds to the range of signed/unsigned 2-bit/4-bit/8-bit integers, except the one otherwise noted in the BNORMRES sequence.

4.3. Layer Patterns

Pattern Name

Layers

FCONV

Conv (2d, not depth-wise, not point-wise, ifmap channel is 1) => BNORM sequence

GCONV

Conv (2d, not depth-wise, not point-wise, ifmap channel is multiple of 4) => BNORM sequence

DWCONV

Conv (2d, depth-wise) => BNORM sequence

PWCONV

Conv (2d, point-wise) => BNORM sequence

PWCONVRES

Conv (2d, point-wise) => BNORMRES sequence

TCONV

ConvTranspose (2d) => BNORM sequence

FC

MatMul => BNORM sequence

AVGPOOL (global)

ReduceSum (sum across height and weight) => round => BNORM sequence

AVGPOOL (non-global)

AveragePool => Mul (by total pool size) => round => BNORM sequence

MAXPOOL

MaxPool

4.3.1. Restrictions

  • Input, output, and residual tensor data (feature maps) are 8-bit integers, signed or unsigned.

  • Weights can be 2-bit or 8-bit and are always signed.

  • A combination of 4-bit data and 4-bit weights will be supported in the future.

  • Only a batch size of 1 is supported for inference.

  • The number of groups in the convolution layers is always 1, except for the DWCONV layer. Grouped convolution will be supported in the future.

  • The number of input/output channels should be a multiple of 4, except for FCONV input.

  • The TCONV layer’s strides must be the same as the kernel size.

  • There is no limit on the number of layers in the model.

  • A layer’s input can have different sign-ness and bit-width than the layer’s output.

  • Layers can have mixed precision, for example, one layer may use 8-bit weights, while another layer uses 2-bit weights.