Introduction

TIDl supports various base feature extraction / back-bone networks like resnets, MobileNets, ShuffleNet, VGG, DenseNet etc. TIDL also supports vision processing meta architecture like below

Supported Model Types

Object Detection
- Single Shot Detection (SSD) - Vehicle, Pedestrian detection etc
- Feature Pyramid Network (FPN) + SSD - Using Resize layer for feature up-sample
Pixel Level Segmentation
- Semantic Segmentation - Free space and Lane marking detection etc
- Motion Segmentation
- Pixel level Depth estimations

Object Detection Architectures

Caffe-SSD

TIDL supports SSD networks and post processing layers as defines in caffe-ssd implementation by original SSD authors. User needs to provide caffe protext and caffemodel. We have validated this architecture with multiple models. Include below

JdetNet512x512 - Link
Pelee Pascal VOC 304x304 - Link

Tensorflow Object Detection API - SSD

TIDL supports SSD - post processing as defined in tensorflow Object detection API. We have validated this with couple of networks including below

ssd_mobilenet_v2 SSD - Link

if the user is not using tensorflow Object detection API and using SSD post processing as defined by original author. the we would recommend the method described next section ONNX SSD.

ONNX - SSD

We have defined a protocol buffer to accept SSD post processing as defined by original author along with model trained using any framework. In this case user need provide list of tensor names for Box and Class prediction heads in original model in this prototext file. Specify the meta Arch type as TIDL_metaArchTidlSsd in the import config file. then the TIDL model import tool would make the complete network with Flatten, Concatenate and ODPost processing layer Protocol buffer definition is available in the below file.
```
├── ti_dl                             # Base Directory
│   ├── utils                        
|   |    |── tidlMetaArch/tidl_meta_arch.proto 
```

We have validated this flow with SSD model trained in Pytorch Exported to ONNX. The Object detection demo in SDK uses this flow.

Performance of SSD Post processing layer

The optimized implementation for SSD post processing (Box decoding, Score computation, Non-maximum suppression) are targeted for generic model (Any number of classes, Prior boxes, head etc). Also currently 8-bit Caffe-SSD / TIDL-SSD only optimized. Tensorflow object detection API Post processing and 18-bit version are provided for feature completeness purpose and not optimized. It is recommend to to write optimized version of post processing for given configuration. If the number of classes, prior Boxes are known upfront, theses post processing can be optimized well on C66x DSP or A72 to offload C7x-MMA for compute heavy layers (Convolutions, Pooling etc)

Pixel Level Segmentation

Pixel Level task networks would need below layers on top of layers used by back bone networks
- Sampling layers
- Argmax layer
- Dilated / Atrous convolution

Layers For Up-sampling

TIDL supports below layers for sampling the features
- De-convolution /transpose convolution
- Resize
  - Bi-leaner
  - Nearest Neighbor
We have validated below networks
- Unet trained on tensorflow and converted TFLite model with Nearest Neighbor resize
- MobileNet V2 + ASSP - network trained on Pytorch with Bi-leaner Resize - In ONNX model format
- Deconvolution (Kernel size 4x4 with stride 2x2) based SegNet trained on Caffe

Example TIDL Proto File for Custom SSD network

In the below example box_input: "376", 376 is the output tensor name of convolutions layer with box/loc prediction. in_width and in_height are base image resolution. This shall match with import config with and height All the other parameters are as defined by Original caffe -SSD implementation

name: "TIAD SSD ARCH"
caffe_ssd {
  name: "ssd_post_proc"
  box_input: "376"
  box_input: "380"
  box_input: "384"
  box_input: "388"
  box_input: "392"
  box_input: "396"
  class_input: "378"
  class_input: "382"
  class_input: "386"
  class_input: "390"
  class_input: "394"
  class_input: "398"
  output: "psd_bboxes"
  in_width: 768
  in_height: 384
  prior_box_param {
    min_size: 46.1
    max_size: 113.7
    aspect_ratio: 3.0
    flip: true
    clip: false
    variance: 0.1
    variance: 0.1
    variance: 0.2
    variance: 0.2
    offset: 0.5
    step: 16
  }
  prior_box_param {
    min_size: 113.7
    max_size: 181.2
    aspect_ratio: 3.0
    aspect_ratio: 5.0
    flip: true
    clip: false
    variance: 0.1
    variance: 0.1
    variance: 0.2
    variance: 0.2
    offset: 0.5
    step: 32
  }
  prior_box_param {
    min_size: 181.2
    max_size: 248.8
    aspect_ratio: 3.0
    aspect_ratio: 5.0
    flip: true
    clip: false
    variance: 0.1
    variance: 0.1
    variance: 0.2
    variance: 0.2
    offset: 0.5
    step: 64
  }
  prior_box_param {
    min_size: 248.8
    max_size: 316.4
    aspect_ratio: 3.0
    aspect_ratio: 5.0
    flip: true
    clip: false
    variance: 0.1
    variance: 0.1
    variance: 0.2
    variance: 0.2
    offset: 0.5
    step: 128
  }
  prior_box_param {
    min_size: 316.4
    max_size: 384.0
    aspect_ratio: 3.0
    aspect_ratio: 5.0
    flip: true
    clip: false
    variance: 0.1
    variance: 0.1
    variance: 0.2
    variance: 0.2
    offset: 0.5
    step_w: 256
    step_h: 192
  }
  prior_box_param {
    min_size: 384.0
    max_size: 768.0
    aspect_ratio: 3.0
    aspect_ratio: 5.0
    flip: true
    clip: false
    variance: 0.1
    variance: 0.1
    variance: 0.2
    variance: 0.2
    offset: 0.5
    step: 384
  }
  detection_output_param {
    num_classes: 4
    share_location: true
    background_label_id: 0
    nms_param {
      nms_threshold: 0.60
      top_k: 100
    }
    code_type: CENTER_SIZE
    keep_top_k: 100
    confidence_threshold: 0.5
  }
}

Table of Contents