8. Semantic Segmentation

This ti_vision_cnn node is a versatile deep-learning (DL) inference ROS node that is optimized on DL cores and hardware accelerator of TI Processors. The ti_vision_cnn node supports compute-intensive DL inference operations, including 2D object detection and semantic segmentation. Figure 1 shows the high-level block diagram of the applications around the ti_vision_cnn node, which consists of multiple processing blocks that are deployed on hardware accelerators and DSP processors for pre-processing and post-processing in an optimized manner.

Figure 1. Semantic segmentation demo: block diagram

8.1. System Description

Referring to Figure 1, below are the descriptions of the processing blocks implemented in this application:

The first step to process input images with lens distortion is performed by the LDC (Lens Distortion Correction) hardware accelerator (HWA), which handles the undistortion or rectification. Tools for generating LDC look-up-tables are provided with the SDK (tools/stereo_camera/generate_rect_map.py and tools/mono_camera/generate_rect_map_mono.py). Note that the LDC HWA not only removes lens distortion or rectifies, but also changes the image format. Input image to the application is in YUV422 (UYVY) format, which is converted to YUV420 (NV12) by the LDC.
Input images are resized to match to the input tensor size of the semantic segmentation model. The MSC (Multi-Scaler) HWA resizes the images to a desired size.
Input images are resized to match the input tensor size required by the semantic segmentation model. This resizing to the desired size. is done by the MSC (Multi-Scaler) HWA.
The pre-processing block, which runs on C6x, converts the YUV420 (NV12) format to RGB. This RGB format is the expected input format for the semantic segmentation model.
The semantic segmentation network is accelerated by C7x/MMA with DLR runtime and outputs a tensor that contains class information for every pixel.

8.2. Semantic Segmentation Model

A CNN model for semantic segmentation has been developed using Jacinto AI DevKit (PyTorch).

Model: deeplabv3lite_mobilenetv2_tv (for details, see LINK)
Input image: 768 x 432 pixels in RGB
Training data: Cityscapes Dataset, and a small dataset collected from ZED camera

Note

cam_id and subdev_id for cameras: You can check the device_id and subdev_id for the camera attached to the SK board by running /opt/edgeai-gst-apps/scripts/setup_cameras.sh. Accordingly please update the parameters or pass as launch arguments.

Note

IMX390 Camera: Before running, please refer to gscam2/README_TI.md for generating required LUT files for tiovxldc.

8.3. Run the Application in ROS 2

[SK] To launch the semantic segmentation demo with a ZED stereo camera, run the following command inside the ROS 2 Docker container on the target:

ros2 launch ti_vision_cnn zed_semseg_cnn_launch.py cam_id:=x zed_sn:=SNxxxxx

To process the image stream from a USB mono camera:

ros2 launch ti_vision_cnn gscam_semseg_cnn_launch.py cam_id:=x

For IMX219 camera as input,

ros2 launch ti_vision_cnn gscam_semseg_cnn_imx219_launch.py cam_id:=x subdev_id=y

For IMX390 camera as input, depending on choice of resolution, run one from the following commands.

ros2 launch ti_vision_cnn gscam_semseg_cnn_imx390_launch.py cam_id:=x subdev_id=y width:=1920 height:=1080
ros2 launch ti_vision_cnn gscam_semseg_cnn_imx390_launch.py cam_id:=x subdev_id=y width:=1280 height:=720
ros2 launch ti_vision_cnn gscam_semseg_cnn_imx390_launch.py cam_id:=x subdev_id=y width:=960 height:=540

[Visualization on Ubuntu PC] For setting up environment of the remote PC, please follow Docker Setup for ROS 2

Depending on the resolution setting of the capture node, run one from the following:

ros2 launch ti_viz_nodes rviz_semseg_cnn_launch.py width:=1920 height:=1080
ros2 launch ti_viz_nodes rviz_semseg_cnn_launch.py width:=1280 height:=720 $ default for USB camera and IMX219
ros2 launch ti_viz_nodes rviz_semseg_cnn_launch.py width:=960 height:=540

8.4. Launch File Parameters

Parameter	Description	Value
rosparam file	Algorithm configuration parameters (see “ROSPARAM Parameters” section)	config/params.yaml
image_format	Input image format: 0 - VX_DF_IMAGE_U8, 1 - VX_DF_IMAGE_NV12, 2 - VX_DF_IMAGE_UYVY	0, 1, 2
enable_ldc_node	LDC node enable/disable flag	0, 1
lut_file_path	LDC rectification table path	String
dl_model_path	DL model artifacts folder path	String
input_topic_name	Subscribe topic name for input camera image	camera/right/image_raw
rectified_image_topic	Publish topic name for output rectified image	camera/right/image_rect_nv12
rectified_image_frame_id	frame_id for output rectified image	right_frame
vision_cnn_tensor_topic	Publish topic name for output tensor	vision_cnn/tensor
exportPerfStats	Flag for exporting the performance data to a file: 0 - disable, 1 - enable	0, 1

8.5. ROSPARAM Parameters

The table below describes the parameters in config/params.yaml:

Parameter	Description	Value
width	Input image width	Integer
height	Input image height	Integer
num_classes	Number of semantic segmentation classes	0 - 20
pipeline_depth	OpenVX graph pipeline depth	1 - 4

8.6. Known Issue

The default semantic segmentation CNN was initially trained with the Cityscapes dataset and subsequently re-trained with a small dataset collected from a stereo camera (ZED camera, HD mode). This re-training was done for a limited set of scenarios with coarse annotation. As a result, the model’s accuracy performance may be limited when used with a different camera model or in different environmental scenes.