Introduction
TIDl supports various base feature extraction / back-bone networks like resnets, MobileNets, ShuffleNet, VGG, DenseNet etc. TIDL also supports vision processing meta architecture like below
Supported Model Types
- Object Detection
- Single Shot Detection (SSD) - Vehicle, Pedestrian detection etc
- Feature Pyramid Network (FPN) + SSD - Using Resize layer for feature up-sample
- Pixel Level Segmentation
- Semantic Segmentation - Free space and Lane marking detection etc
- Motion Segmentation
- Pixel level Depth estimations
Object Detection Architectures
Caffe-SSD
TIDL supports SSD networks and post processing layers as defines in caffe-ssd implementation by original SSD authors. User needs to provide caffe protext and caffemodel. We have validated this architecture with multiple models. Include below
- JdetNet512x512 - Link
- Pelee Pascal VOC 304x304 - Link
Tensorflow Object Detection API - SSD
- TIDL supports SSD - post processing as defined in tensorflow Object detection API. We have validated this with couple of networks including below
ssd_mobilenet_v2 SSD - Link
if the user is not using tensorflow Object detection API and using SSD post processing as defined by original author. the we would recommend the method described next section ONNX SSD.
ONNX - SSD
We have validated this flow with SSD model trained in Pytorch Exported to ONNX. The Object detection demo in SDK uses this flow.
Performance of SSD Post processing layer
The optimized implementation for SSD post processing (Box decoding, Score computation, Non-maximum suppression) are targeted for generic model (Any number of classes, Prior boxes, head etc). Also currently 8-bit Caffe-SSD / TIDL-SSD only optimized. Tensorflow object detection API Post processing and 18-bit version are provided for feature completeness purpose and not optimized. It is recommend to to write optimized version of post processing for given configuration. If the number of classes, prior Boxes are known upfront, theses post processing can be optimized well on C66x DSP or A72 to offload C7x-MMA for compute heavy layers (Convolutions, Pooling etc)
Pixel Level Segmentation
- Pixel Level task networks would need below layers on top of layers used by back bone networks
- Sampling layers
- Argmax layer
- Dilated / Atrous convolution
Layers For Up-sampling
- TIDL supports below layers for sampling the features
- De-convolution /transpose convolution
- Resize
- Bi-leaner
- Nearest Neighbor
- We have validated below networks
- Unet trained on tensorflow and converted TFLite model with Nearest Neighbor resize
- MobileNet V2 + ASSP - network trained on Pytorch with Bi-leaner Resize - In ONNX model format
- Deconvolution (Kernel size 4x4 with stride 2x2) based SegNet trained on Caffe
Example TIDL Proto File for Custom SSD network
In the below example box_input: "376", 376 is the output tensor name of convolutions layer with box/loc prediction. in_width and in_height are base image resolution. This shall match with import config with and height All the other parameters are as defined by Original caffe -SSD implementation
name: "TIAD SSD ARCH"
caffe_ssd {
name: "ssd_post_proc"
box_input: "376"
box_input: "380"
box_input: "384"
box_input: "388"
box_input: "392"
box_input: "396"
class_input: "378"
class_input: "382"
class_input: "386"
class_input: "390"
class_input: "394"
class_input: "398"
output: "psd_bboxes"
in_width: 768
in_height: 384
prior_box_param {
min_size: 46.1
max_size: 113.7
aspect_ratio: 3.0
flip: true
clip: false
variance: 0.1
variance: 0.1
variance: 0.2
variance: 0.2
offset: 0.5
step: 16
}
prior_box_param {
min_size: 113.7
max_size: 181.2
aspect_ratio: 3.0
aspect_ratio: 5.0
flip: true
clip: false
variance: 0.1
variance: 0.1
variance: 0.2
variance: 0.2
offset: 0.5
step: 32
}
prior_box_param {
min_size: 181.2
max_size: 248.8
aspect_ratio: 3.0
aspect_ratio: 5.0
flip: true
clip: false
variance: 0.1
variance: 0.1
variance: 0.2
variance: 0.2
offset: 0.5
step: 64
}
prior_box_param {
min_size: 248.8
max_size: 316.4
aspect_ratio: 3.0
aspect_ratio: 5.0
flip: true
clip: false
variance: 0.1
variance: 0.1
variance: 0.2
variance: 0.2
offset: 0.5
step: 128
}
prior_box_param {
min_size: 316.4
max_size: 384.0
aspect_ratio: 3.0
aspect_ratio: 5.0
flip: true
clip: false
variance: 0.1
variance: 0.1
variance: 0.2
variance: 0.2
offset: 0.5
step_w: 256
step_h: 192
}
prior_box_param {
min_size: 384.0
max_size: 768.0
aspect_ratio: 3.0
aspect_ratio: 5.0
flip: true
clip: false
variance: 0.1
variance: 0.1
variance: 0.2
variance: 0.2
offset: 0.5
step: 384
}
detection_output_param {
num_classes: 4
share_location: true
background_label_id: 0
nms_param {
nms_threshold: 0.60
top_k: 100
}
code_type: CENTER_SIZE
keep_top_k: 100
confidence_threshold: 0.5
}
}