Introduction

TIDL-RT supports various base feature extractors / back-bone networks like Resnets, MobileNets, EfficientNets, ShuffleNets, VGG, DenseNet etc. Apart from these back-bone networks TIDL-RT also supports following post processing meta architectures from Object Detection:

Single Shot Detection (SSD)
You Only Look Once (YOLO) V3 and V5 Architecture
RetinaNet Architecture
pointPillars Architecture for 3D object detection from lidar data

A Single Shot Detection (SSD)

A.1 Caffe

TIDL-RT supports SSD networks and post processing layers as defined in Caffe-SSD implementation by original SSD authors. User should follow the following steps to provide this information to TIDL-RT via import configuration file:

Set the metaArchType = 0 (TIDL_metaArchCaffeJacinto)
Set inputNetFile and inputParamsFile to point to Caffe Prototxt and Caffemodel file with post processing information

User can refer following models and configuration as reference:

JdetNet512x512:
- Model Link
- Import config file : ti_dl/test/testvecs/config/import/public/caffe/tidl_import_jdetNet.txt
Pelee Pascal VOC 304x304:
- Model Link
- Import config file : ti_dl/test/testvecs/config/import/public/caffe/tidl_import_peeleNet.txt

A.2 TensorFlow/TFLite

TIDL-RT supports SSD - post processing as defined in TensorFlow Object detection API. User should follow the following steps to provide this information to TIDL-RT via import configuration file:

Set the metaArchType = 1 (TIDL_metaArchTFSSD)
List of all the Box and Class prediction heads as part of outDataNamesList
Set metaLayersNamesList to point to the corresponding pipeline config file

User can refer following model and configuration as reference:

ssd_mobilenet_v2:
- Model Link
- Import Config file : ti_dl/test/testvecs/config/import/public/tensorflow/tidl_import_mobileNetv2_ssd.txt
- Pipeline Config file : ti_dl/test/testvecs/config/import/public/mobilenet_ssd_pipeline.config
Note: If the user is not using TensorFlow Object detection API and using SSD post processing as defined by original author, then we would recommend the method as described in next section.

A.3 ONNX

TIDL-RT supports SSD - post processing in ONNX model format. In order to enable this, TIDL-RT defines a protocol buffer format which enables providing the SSD post processing information as defined by original SSD author to TIDL. Protocol buffer definition is available in the following file:

├── ti_dl                             # Base Directory
│   ├── utils                        
|   |    |── tidlMetaArch/tidl_meta_arch.proto

User should follow the following steps to provide this information to TIDL-RT via import configuration file:

Set the metaArchType = 3 (TIDL_metaArchTIDLSSD)
List the the tensor names of the Box and Class prediction heads as in original model in the prototxt file. An example for the same is given here
Set metaLayersNamesList to point to the prototxt file

TIDL-RT model import tool would make the complete network with Flatten, Concatenate and ODPost processing layer. This mechanism is validated with models trained in Pytorch and exported to ONNX. The Object detection demo in SDK uses this flow. User can refer following model and configuration as reference:

MLPerf ssd-resnet34:
- Model link
- Import Config file: ti_dl/test/testvecs/config/import/public/onnx/tidl_import_mlperf_resnet34_ssd.txt"

B. YOLO Architecture

TIDL-RT supports Yolo architecture (V3 and V5) for object detection post processing. This architecture also takes post processing inform as defined in ONNX-SSD section. User can refer following model and configuration as reference:

Set the metaArchType = 4 (TIDL_metaArchTIDLYolo) for V3 architecture and metaArchType = 6 (TIDL_metaArchTIDLYoloV5) for V5 architecture
List the the tensor names of the Box and Class prediction heads as in original model in the prototxt file.
Set metaLayersNamesList to point to the prototxt file

User can refer following import configuration file as reference :

YoloV3 model:
- Model link
- Import Config file: ti_dl/test/testvecs/config/import/public/onnx/tidl_import_yolo3.txt
- Protocol Buffer file: ti_dl/test/testvecs/config/import/public/onnx/tidl_import_yolo3_metaarch.prototxt

C. RetinaNet Architecture

TIDL-RT supports RetinaNet architecture for object detection post processing.This architecture also takes post processing inform as defined in ONNX-SSD section. User should follow the following steps to provide this information to TIDL-RT via import configuration file:

Set the metaArchType = 5 (TIDL_metaArchTIDLRetinaNet)
List the the tensor names of the Box and Class prediction heads as in original model in the prototxt file.
Set metaLayersNamesList to point to the prototxt file

D. 3D Object Detection

TIDL-RT supports pointPillars architecture for 3d object detection post processing.This architecture also takes post processing information as defined in ONNX-SSD section. User should follow the following steps to provide this information to TIDL-RT via import configuration file:

Set the metaArchType = 7 (TIDL_metaArchTIDL3DOD)
List the the tensor names of the Box , Class and Direction prediction heads as in original model in the prototxt file. An example for the same is given here
Set metaLayersNamesList to point to the prototxt file

Example TIDL Proto File for Custom SSD network

In the below example box_input: "376", 376 is the output tensor name of convolutions layer with box/loc prediction. in_width and in_height are base image resolution. This shall match with width and height parameters as set in the import config file. All the other parameters are as defined by Original Caffe-SSD implementation.

name: "TIAD SSD ARCH"
caffe_ssd {
  name: "ssd_post_proc"
  box_input: "376"
  box_input: "380"
  box_input: "384"
  box_input: "388"
  box_input: "392"
  box_input: "396"
  class_input: "378"
  class_input: "382"
  class_input: "386"
  class_input: "390"
  class_input: "394"
  class_input: "398"
  output: "psd_bboxes"
  in_width: 768
  in_height: 384
  prior_box_param {
    min_size: 46.1
    max_size: 113.7
    aspect_ratio: 3.0
    flip: true
    clip: false
    variance: 0.1
    variance: 0.1
    variance: 0.2
    variance: 0.2
    offset: 0.5
    step: 16
  }
  prior_box_param {
    min_size: 113.7
    max_size: 181.2
    aspect_ratio: 3.0
    aspect_ratio: 5.0
    flip: true
    clip: false
    variance: 0.1
    variance: 0.1
    variance: 0.2
    variance: 0.2
    offset: 0.5
    step: 32
  }
  prior_box_param {
    min_size: 181.2
    max_size: 248.8
    aspect_ratio: 3.0
    aspect_ratio: 5.0
    flip: true
    clip: false
    variance: 0.1
    variance: 0.1
    variance: 0.2
    variance: 0.2
    offset: 0.5
    step: 64
  }
  prior_box_param {
    min_size: 248.8
    max_size: 316.4
    aspect_ratio: 3.0
    aspect_ratio: 5.0
    flip: true
    clip: false
    variance: 0.1
    variance: 0.1
    variance: 0.2
    variance: 0.2
    offset: 0.5
    step: 128
  }
  prior_box_param {
    min_size: 316.4
    max_size: 384.0
    aspect_ratio: 3.0
    aspect_ratio: 5.0
    flip: true
    clip: false
    variance: 0.1
    variance: 0.1
    variance: 0.2
    variance: 0.2
    offset: 0.5
    step_w: 256
    step_h: 192
  }
  prior_box_param {
    min_size: 384.0
    max_size: 768.0
    aspect_ratio: 3.0
    aspect_ratio: 5.0
    flip: true
    clip: false
    variance: 0.1
    variance: 0.1
    variance: 0.2
    variance: 0.2
    offset: 0.5
    step: 384
  }
  detection_output_param {
    num_classes: 4
    share_location: true
    background_label_id: 0
    nms_param {
      nms_threshold: 0.60
      top_k: 100
    }
    code_type: CENTER_SIZE
    keep_top_k: 100
    confidence_threshold: 0.5
  }
}

Example TIDL Proto File for Custom 3D-OD network based on pointPillars

A sample prototext file pointPillars network is provided below. In the below example 208, 206, 207 are the box,class, location head convolution output for pointPillars network. voxel_size_x and voxel_size_y are voxel sizes in meters. Valid area range is provided through min_x/y/z and max_x/y/z, and those values are in meters. Valid area is divided in multiple voxels each of size voxel_size_x/voxel_size_y as described in pointPillars original paper. max_points_per_voxel is the maximum number of points allowed inside in each voxel. If any voxel has 3d points more than max_points_per_voxel, then extra 3D points are discarded.

name: "3dod_ssd"
tidl_3dod {
  name: "point_pillars"
  min_x: 0.0
  max_x: 69.120
  min_y: -39.680
  max_y: 39.680
  min_z: -1.78
  max_z: -1.78
  voxel_size_x : 0.16
  voxel_size_y : 0.16
  max_points_per_voxel : 32
  box_input:   "208"
  class_input: "206"
  dir_input:   "207"
  prior_box_3dod_param {
    anchor_width: 1.6
    anchor_length: 3.9
    anchor_height: 1.56
    rotation: 0.0
    rotation: 90.0
  }
  detection_output_param {
    num_classes: 1
    share_location: true
    background_label_id: -1
    nms_param {
      nms_threshold: 0.01
      top_k: 100
    }
    code_type: CODE_TYPE_3DOD
    keep_top_k: 100
    confidence_threshold: 0.1
  }
}

Table of Contents