8.10. Developing deep learning applications

8.10.1. Introduction

TI Jacinto 7 SoCs supports below methods to deploy Deep learning Applications on the device

  • TensorFlow Lite based heterogeneous execution on cortex-A** + C7x-MMA

  • ONNX Runtime based heterogeneous execution on cortex-A** + C7x-MMA

  • TVM/Neo-AI-DLR based heterogeneous execution on cortex-A** + C7x-MMA

  • OpenVX API in A72 to execute complete CNN model on C7x-MMA

8.10.1.1. OpenVX API in A72 to execute complete CNN model on C7x-MMA

TI Deep Learning Product is a deep learning inference engine to accelerate your deep learning networks on TI Jacinto 7 SoCs. TIDL runs on C7x/MMA. TI OpenVX allows to access TIDL from A72. Hence to use TIDL in system scenario one need to understand the TIDL Product and also how to use it from within OpenVX.

This developer note describes the high level steps one should follow to know more about TIDL and how to run custom networks on TI SoC using TIDL.

Note

  • It is HIGHLY recommended to follow the steps listed below to run your networks on EVM.

  • DO NOT skip the steps listed in Steps to run your custom network with TIDL Runtime. Each step verifies a certain aspect of the custom network and would help better identify and isolate issues in network execution.

8.10.1.2. TFLite Runtime based Heterogeneous Execution on A72+C7x-MMA

The Processor SDK implements heterogeneous execution of CNN models on A72 and C7x-MMA using the TensorFlow Lite runtime. This heterogeneous execution enables:

  • TensorFlow Lite as the top level inference API for user applications

  • Offloading subgraphs to C7x/MMA for accelerated execution with TIDL

  • Executing layers on the ARM A72 core for layers that are not supported by TIDL

Please refer section TFLite Runtime for detailed instruction on usage.

8.10.1.3. ONNX Runtime based Heterogeneous Execution on A72+C7x-MMA

The Processor SDK implements heterogeneous execution of CNN models on A72 and C7x-MMA using the ONNX runtime. This heterogeneous execution enables:

  • ONNX Runtime as the top level inference API for user applications

  • Offloading subgraphs to C7x/MMA for accelerated execution with TIDL

  • Executing layers on the ARM A72 core for layers that are not supported by TIDL

Please refer section ONNX Runtime for detailed instruction on usage.

8.10.1.4. TVM/Neo-AI-DLR based Heterogeneous Execution on A72+C7x-MMA

The Processor SDK implements heterogeneous execution of CNN models on A72 and C7x-MMA using the TVM runtime and Neo-AI-DLR runtime. This heterogeneous execution enables:

  • TVM/Neo-AI-DLR as the top level inference API for user applications

  • Offloading subgraphs to C7x/MMA for accelerated execution with TIDL

  • Generating code and running on the ARM A72 core for layers that are not supported by TIDL

Please refer section TVM/Neo-AI-DLR for detailed instruction on usage.

8.10.2. Documentation References

SDK Component

Documentation

Description

Section

TIDL

LINK

TIDL Product on C7x

Main Page

TIDL

TFLite Runtime

TensorFlow Lite Runtime

Open Source Runtime > TensorFlow Lite Runtime

TIDL

ONNX Runtime

ONNX Runtime

Open Source Runtime > ONNX Runtime

TIDL

TVM/Neo-AI-DLR

TVM/Neo-AI DLR

Open Source Runtime > TVM/Neo-AI DLR

TI OpenVX

LINK

TIDL OpenVX Node API on A72

TIOVX User Guide > TIOVX Support Kernels > TI Extension Kernels > tivxTIDLNode

vision apps

LINK

Deep Learning Applications

Application / Demos > DL Demos

vision apps

LINK

Pre/post processing sample OpenVX nodes

APIs > TI defined OpenVX Kernels for Vision Apps > TIVX Kernels for Image Pre/Post Processing

8.10.3. Source Code References

SDK Component

File / Folder

Decription

TIDL

c7x-mma-tidl/ti_dl/inc/itidl_ti.h

TIDL Product interface on C7x

TIDL

c7x-mma-tidl/arm-tidl/rt/out/PC/x86_64/LINUX/release/PC_dsp_test_dl_algo_host_rt.out

Standalone TIDL Runtime Host emulation executable

TIDL

c7x-mma-tidl/arm-tidl/rt/out/J721E/A72/LINUX/release/TI_DEVICE_armv8_test_dl_algo_host_rt.out

TIDL A72 executable for running model on C7x-MMA using openVX

TIDL

c7x-mma-tidl/ti_dl/demos/readme.txt

TFLite Runtime Demo doing file based image classification

TIDL

c7x-mma-tidl/ti_dl/demos/readme.txt

ONNX Runtime Demo doing file based image classification

TIDL

c7x-mma-tidl/ti_dl/demos/readme.txt

NEO-AI-DLR Demo doing file based image classification

TI OpenVX

c7x-mma-tidl/arm-tidl/tiovx_kernels/tidl

TIDL OpenVX Node implementation on C7x and A72

TI OpenVX

c7x-mma-tidl/arm-tidl/tiovx_kernels/include/TI/j7_tidl.h

TIDL OpenVX Node interface on A72

vision apps

vision_apps/apps/dl_demos/app_tidl

TIDL OpenVX Demo doing file based image classification

vision apps

vision_apps/apps/dl_demos/app_tidl_cam

TIDL OpenVX Demo doing camera based image classification

vision apps

vision_apps/apps/dl_demos/app_tidl_od

TIDL OpenVX Demo doing file based object detection

vision apps

vision_apps/apps/dl_demos/app_tidl_seg

TIDL OpenVX Demo doing file based semantic segmentation

vision apps

vision_apps/apps/dl_demos/app_tidl_vl

TIDL OpenVX Demo doing file based visual localization

vision apps

vision_apps/apps/dl_demos/app_tidl_avp*

TIDL OpenVX Demo doing Auto valet parking applications

vision apps

vision_apps/kernels/img_proc

Pre/post processing sample OpenVX nodes

8.10.4. Steps to run your custom network with TIDL Runtime

8.10.4.1. Step 1: Import your network to TIDL Runtime network format

  • See TIDL Product user guide [LINK], section TIDL Runtime > TIDL-RT Getting Started

Important

Make sure your network only uses the TIDL runtime supported layers. See TIDL product user guide [LINK], section TIDL Runtime > TIDL-RT Supported layers

  • Proceed to next step, after network import is successful.

8.10.4.2. Step 2: Run the imported TIDL network in PC HOST emulation mode

  • Run the newly imported network on PC to confirm that TIDL runtime can execute this network.

  • See TIDL user guide [LINK], section TIDL Runtime > TIDL-RT Inference

  • Proceed to next step, after network execution on PC is successful.

Note

  • Typically one can use the pre-built executable and no need to build the executable from source.

  • It is enough to run test the network with 1-2 frames at this step

  • Here the actual execution speed will be slow since its running in a emulated manner on PC

8.10.4.3. Step 3: Run the imported TIDL network on EVM with OpenVX (non-pipelined mode)

  • Run the newly imported network on EVM with OpenVX. Here one can feed more number of frames. Use Linux to feed a large number of frames and/or visualize the output on display

  • Choose and customize the demos as per your need,

    • vision_apps/apps/dl_demos/app_tidl for image classification

    • vision_apps/apps/dl_demos/app_tidl_od for object detection

    • vision_apps/apps/dl_demos/app_tidl_seg for semantic segmentation

    • vision_apps/apps/dl_demos/app_tidl_cam for camera based image classification

Important

By default the demos are customized to run with a pre-trained network and specific pre/post processing requirements. You will need to modify these demos to run and visualize your DL network as required.

Note

  • In these demos, pre-processing, inference, post processing, visualization are all run on different cores.

  • By default they run in a pipeline execution manner but can be made to run back-to-back by NOT defining APP_ENABLE_PIPELINE_FLOW macro in respective demo’s vision_apps/apps/dl_demos/app_tidl_XXX/main.c file.

  • However this is not the most optimal from system execution point of view, since in final system many of these steps can be pipelined at frame boundary to allow overall higher system FPS.

8.10.4.4. Step 4: Run the imported TIDL network on EVM with OpenVX (pipelined mode)

  • At this stage you may want to run your network with rest of system processing like camera or display or other networks.

  • Now you can make your own application according to your system requirements.

  • To get maximum performance in system, one must pipeline TIDL network execution with other processing like camera, pre-processing, post-processing, visualization, display

Note

  • Start with single network, single task demo applications such as app_tidl_od, app_tidl_seg or app_tidl_cam, modify to run your DL network.

  • Refer to other Auto Valet Parking Demos to perform more tasks such as running multiple networks, multiple cameras, multiple tasks etc.