8.10. Developing deep learning applications¶

Introduction
Documentation References
Source Code References
Steps to run your custom network with TIDL Runtime

8.10.1. Introduction ¶

TI Jacinto 7 SoCs supports below methods to deploy Deep learning Applications on the device

TensorFlow Lite based heterogeneous execution on cortex-A** + C7x-MMA

ONNX Runtime based heterogeneous execution on cortex-A** + C7x-MMA

TVM/Neo-AI-DLR based heterogeneous execution on cortex-A** + C7x-MMA

OpenVX API in A72 to execute complete CNN model on C7x-MMA

8.10.1.1. OpenVX API in A72 to execute complete CNN model on C7x-MMA ¶

TI Deep Learning Product is a deep learning inference engine to accelerate your deep learning networks on TI Jacinto 7 SoCs. TIDL runs on C7x/MMA. TI OpenVX allows to access TIDL from A72. Hence to use TIDL in system scenario one need to understand the TIDL Product and also how to use it from within OpenVX.

This developer note describes the high level steps one should follow to know more about TIDL and how to run custom networks on TI SoC using TIDL.

Note

It is HIGHLY recommended to follow the steps listed below to run your networks on EVM.
DO NOT skip the steps listed in Steps to run your custom network with TIDL Runtime. Each step verifies a certain aspect of the custom network and would help better identify and isolate issues in network execution.

8.10.1.2. TFLite Runtime based Heterogeneous Execution on A72+C7x-MMA ¶

The Processor SDK implements heterogeneous execution of CNN models on A72 and C7x-MMA using the TensorFlow Lite runtime. This heterogeneous execution enables:

TensorFlow Lite as the top level inference API for user applications

Offloading subgraphs to C7x/MMA for accelerated execution with TIDL

Executing layers on the ARM A72 core for layers that are not supported by TIDL

Please refer section TFLite Runtime for detailed instruction on usage.

8.10.1.3. ONNX Runtime based Heterogeneous Execution on A72+C7x-MMA ¶

The Processor SDK implements heterogeneous execution of CNN models on A72 and C7x-MMA using the ONNX runtime. This heterogeneous execution enables:

ONNX Runtime as the top level inference API for user applications

Offloading subgraphs to C7x/MMA for accelerated execution with TIDL

Executing layers on the ARM A72 core for layers that are not supported by TIDL

Please refer section ONNX Runtime for detailed instruction on usage.

8.10.1.4. TVM/Neo-AI-DLR based Heterogeneous Execution on A72+C7x-MMA ¶

The Processor SDK implements heterogeneous execution of CNN models on A72 and C7x-MMA using the TVM runtime and Neo-AI-DLR runtime. This heterogeneous execution enables:

TVM/Neo-AI-DLR as the top level inference API for user applications

Offloading subgraphs to C7x/MMA for accelerated execution with TIDL

Generating code and running on the ARM A72 core for layers that are not supported by TIDL

Please refer section TVM/Neo-AI-DLR for detailed instruction on usage.

8.10.2. Documentation References ¶

SDK Component	Documentation	Description	Section
TIDL	LINK	TIDL Product on C7x	Main Page
TIDL	TFLite Runtime	TensorFlow Lite Runtime	Open Source Runtime > TensorFlow Lite Runtime
TIDL	ONNX Runtime	ONNX Runtime	Open Source Runtime > ONNX Runtime
TIDL	TVM/Neo-AI-DLR	TVM/Neo-AI DLR	Open Source Runtime > TVM/Neo-AI DLR
TI OpenVX	LINK	TIDL OpenVX Node API on A72	TIOVX User Guide > TIOVX Support Kernels > TI Extension Kernels > tivxTIDLNode
vision apps	LINK	Deep Learning Applications	Application / Demos > DL Demos
vision apps	LINK	Pre/post processing sample OpenVX nodes	APIs > TI defined OpenVX Kernels for Vision Apps > TIVX Kernels for Image Pre/Post Processing

8.10.3. Source Code References ¶

SDK Component	File / Folder	Decription
TIDL	c7x-mma-tidl/ti_dl/inc/itidl_ti.h	TIDL Product interface on C7x
TIDL	c7x-mma-tidl/arm-tidl/rt/out/PC/x86_64/LINUX/release/PC_dsp_test_dl_algo_host_rt.out	Standalone TIDL Runtime Host emulation executable
TIDL	c7x-mma-tidl/arm-tidl/rt/out/J721E/A72/LINUX/release/TI_DEVICE_armv8_test_dl_algo_host_rt.out	TIDL A72 executable for running model on C7x-MMA using openVX
TIDL	c7x-mma-tidl/ti_dl/demos/readme.txt	TFLite Runtime Demo doing file based image classification
TIDL	c7x-mma-tidl/ti_dl/demos/readme.txt	ONNX Runtime Demo doing file based image classification
TIDL	c7x-mma-tidl/ti_dl/demos/readme.txt	NEO-AI-DLR Demo doing file based image classification
TI OpenVX	c7x-mma-tidl/arm-tidl/tiovx_kernels/tidl	TIDL OpenVX Node implementation on C7x and A72
TI OpenVX	c7x-mma-tidl/arm-tidl/tiovx_kernels/include/TI/j7_tidl.h	TIDL OpenVX Node interface on A72
vision apps	vision_apps/apps/dl_demos/app_tidl	TIDL OpenVX Demo doing file based image classification
vision apps	vision_apps/apps/dl_demos/app_tidl_cam	TIDL OpenVX Demo doing camera based image classification
vision apps	vision_apps/apps/dl_demos/app_tidl_od	TIDL OpenVX Demo doing file based object detection
vision apps	vision_apps/apps/dl_demos/app_tidl_seg	TIDL OpenVX Demo doing file based semantic segmentation
vision apps	vision_apps/apps/dl_demos/app_tidl_vl	TIDL OpenVX Demo doing file based visual localization
vision apps	vision_apps/apps/dl_demos/app_tidl_avp*	TIDL OpenVX Demo doing Auto valet parking applications
vision apps	vision_apps/kernels/img_proc	Pre/post processing sample OpenVX nodes

8.10.4. Steps to run your custom network with TIDL Runtime ¶

8.10.4.1. Step 1: Import your network to TIDL Runtime network format ¶

See TIDL Product user guide [LINK], section TIDL Runtime > TIDL-RT Getting Started

Important

Make sure your network only uses the TIDL runtime supported layers. See TIDL product user guide [LINK], section TIDL Runtime > TIDL-RT Supported layers

Proceed to next step, after network import is successful.

8.10.4.2. Step 2: Run the imported TIDL network in PC HOST emulation mode ¶

Run the newly imported network on PC to confirm that TIDL runtime can execute this network.
See TIDL user guide [LINK], section TIDL Runtime > TIDL-RT Inference
Proceed to next step, after network execution on PC is successful.

Note

Typically one can use the pre-built executable and no need to build the executable from source.
It is enough to run test the network with 1-2 frames at this step
Here the actual execution speed will be slow since its running in a emulated manner on PC

8.10.4.3. Step 3: Run the imported TIDL network on EVM with OpenVX (non-pipelined mode)¶

Run the newly imported network on EVM with OpenVX. Here one can feed more number of frames. Use Linux to feed a large number of frames and/or visualize the output on display
Choose and customize the demos as per your need,
- vision_apps/apps/dl_demos/app_tidl for image classification
- vision_apps/apps/dl_demos/app_tidl_od for object detection
- vision_apps/apps/dl_demos/app_tidl_seg for semantic segmentation
- vision_apps/apps/dl_demos/app_tidl_cam for camera based image classification

Important

By default the demos are customized to run with a pre-trained network and specific pre/post processing requirements. You will need to modify these demos to run and visualize your DL network as required.

Note

In these demos, pre-processing, inference, post processing, visualization are all run on different cores.
By default they run in a pipeline execution manner but can be made to run back-to-back by NOT defining APP_ENABLE_PIPELINE_FLOW macro in respective demo’s vision_apps/apps/dl_demos/app_tidl_XXX/main.c file.
However this is not the most optimal from system execution point of view, since in final system many of these steps can be pipelined at frame boundary to allow overall higher system FPS.

8.10.4.4. Step 4: Run the imported TIDL network on EVM with OpenVX (pipelined mode)¶

At this stage you may want to run your network with rest of system processing like camera or display or other networks.
Now you can make your own application according to your system requirements.
To get maximum performance in system, one must pipeline TIDL network execution with other processing like camera, pre-processing, post-processing, visualization, display

Note

Start with single network, single task demo applications such as app_tidl_od, app_tidl_seg or app_tidl_cam, modify to run your DL network.
Refer to other Auto Valet Parking Demos to perform more tasks such as running multiple networks, multiple cameras, multiple tasks etc.