6. Layer Configurations Accelerated by Arm Cortex-M33 CDE

Layers in Neural Network Models trained for the NPU can be mapped to the M33 CDE instruction support for machine learning acceleration. This page lists the supported layer types and configurations that are accelerated by M33 CDE. Layer types and configurations not mentioned here can still run on Arm Cortex-M33 but will not be accelerated by CDE

6.1. Overview of Layer Types

  • Generic Convolution layer (GCONV, not depth-wise, not point-wise, input feature map channel is a multiple of 4)

  • Depth-Wise Convolution layer (DWCONV)

  • Point-Wise Convolution layer (PWCONV)

  • Point-Wise Convolution with Residual input layer (PWCONVRES)

  • Transposed Convolution layer (TCONV)

  • Fully-Connected layer (FC)

6.2. Terminology and Notation

On the NPU, a layer computation takes an input feature map, computes with weights such as a convolution kernel, and produces an output feature map.

_images/conv_terminology.PNG

In the tables below, the column headings indicate the following:

  • ifmap: input feature map, also known as input tensor

  • ofmap: output feature map, also known as output tensor

  • kernel: convolution weights matrix, also known as filter

  • iB, iH, iW, iC: input feature map bit-width, height, width, channels

  • oB, oH, oW, oC: output feature map bit-width, height, width, channels

  • kB, kH, kW: kernel (or pool) bit-width, height, width

  • sH, sW: stride on height, width

  • pL, pR, pT, pB: padding input feature map on left, right, top, bottom. In general, padding on the input feature map is supported. When non-zero values are specified in a layer configuration row, it means that the specified padding is handled by the layer implementation on NPU. Otherwise, it means that padding (if any) is supported separately outside of the layer implementation.

In the tables below, the values in the rows below the headings indicate the following:

  • any: any positive integer value

  • m4: multiples of 4, e.g., 4, 8, 12, …

  • m5: multiples of 5, e.g., 5, 10, 15, …

  • m8b16: multiples of 8, begin with 16 (inclusive), e.g. 16, 24, 32, …

  • m1b69e72: multiples of 1, begin with 69, end with 72 (inclusive on both ends)

  • NA: not applicable

6.3. GCONV

iB

oB

kB

kH

kW

sH

sW

iH

iW

iC

oH

oW

oC

pL

pR

pT

pB

comment

8

8

8

any

1

any

1

any

any

any

any

any

m4

0

0

0

0

6.4. DWCONV

iB

oB

kB

kH

kW

sH

sW

iH

iW

iC

oH

oW

oC

pL

pR

pT

pB

comment

8

8

8

any

any

any

any

any

any

m4

any

any

m4

0

0

0

0

6.5. PWCONV

iB

oB

kB

kH

kW

sH

sW

iH

iW

iC

oH

oW

oC

pL

pR

pT

pB

comment

8

8

8

1

1

1

1

any

any

m4

any

any

m4

0

0

0

0

6.6. PWCONVRES

iB

oB

kB

kH

kW

sH

sW

iH

iW

iC

oH

oW

oC

pL

pR

pT

pB

comment

8

8

8

1

1

1

1

any

any

m4

any

any

m4

0

0

0

0

6.7. TCONV

iB

oB

kB

kH

kW

sH

sW

iH

iW

iC

oH

oW

oC

pL

pR

pT

pB

comment

8

8

8

any

any

kH

kW

any

any

m4

any

any

m4

0

0

0

0

6.8. FC

iB

oB

kB

kH

kW

sH

sW

iH

iW

iC

oH

oW

oC

pL

pR

pT

pB

comment

8

8

8

NA

NA

NA

NA

NA

NA

any

NA

NA

any

NA

NA

NA

NA

6.9. AVGPOOL (non-global)

Non-global AVGPOOL layers are converted to DWCONV layers during compilation. Refer to DWCONV for supported configs.