Introduction

The Processor SDK implements TIDL offload support using the TFlite Delegates TFLite Delgate runtime

This heterogeneous execution enables:

TFlite runtime as the top level inference API for user applications
Offloading subgraphs to C7x/MMA for accelerated execution with TIDL
Runs optimized code on ARM core for layers that are not supported by TIDL

TFLite runtime based user work flow

The diagram below illustrates the TFLite based work flow. The User needs to run the model compilation (sub-graph(s) creation and quantization) on PC and the generated artifacts can be used for inference on the device.

TFLite runtime based user work flow

Model Compilation on PC

OSRT Compile Steps

The Processor SDK package includes all the required python packages for runtime support.

Pre-requisite : PSDK RA should be installed on the Host Ubuntu 18.04 machine and able to run pre-built demos on EVM.

Following steps need to be followed : (Note - All below scripts to be run from ${PSDKRA_PATH}/tidl_xx_xx_xx_xx/ti_dl/test/tflrt/ folder)

Prepare the Environment for the Model compilation
source prepare_model_compliation_env.sh
This script needs to be executed only once when user opens a new terminal. It performs the below operations. User also can perform these steps manually by following the scripts.
- Installs all the python dependent packages like tensorflow lite runtime, numPy, Python image library (Pillow) etc. If user has a conflicting package version because of other installations, we recommend to create conda environment with python 3.6.x and run these scripts.
- Download the models used by the OOB scripts if not available in the file system
- Sets the environment variables required by the script e.g. path to tools and shared libraries
- Checks if all the TIDL required tools are available in tools path
  
  Note
  If you observe any issue in pip. Run below command to update pip
  python -m pip install --upgrade pip
Run for model compilation – This step generates artifacts needed for inference in the tflrt-artifacts folder. Each subgraph is identified in the artifacts using the tensor index of its output in the model
python3 tflrt_delegate.py -c
Run Inference on PC - Optionally user can test the inference in host emulation mode and check the output; the output images will be saved in the corresponding specified artifacts folder
python3 tflrt_delegate.py
Run Inference on PC without offload - Optionally user can test the inference in host emulation mode without using any delegation to TI Delegate
python3 tflrt_delegate.py -d

Model Inference on EVM

The artifacts generated by python scripts in the above section can be inferred using either python or C/C++ APIs. The following steps are for running inference using python API. Refer Link for usage of C APIs for the same

Python API based inference

OSRT Run Steps

Copy the “${PSDKR_PATH}/tidl_xx_xx_xx_xx/ti_dl/test/” folder to the file system where the EVM is running Linux (SD card or NFS mount). This has all the OOB scripts and artifacts.
"cd tflrt"
"LD_LIBRARY_PATH=/usr/lib python3 tflrt_delegate.py" - Run the inference on the EVM and check the results, performance etc.

Note : These scripts are only for basic functionally testing and performance check. Accuracy of the models can be benchmarked using the python module released here edgeai-benchmark

We also have IPython Notebooks for running inference on EVM. More details on this can be found here Link

User options

An example call to interpreter from the python interface using delegate mechanism:

interpreter = tflite.Interpreter(model_path='path_to_model', \
                    experimental_delegates=[tflite.load_delegate('libtidl_tfl_delegate.so.1.0', delegate_options)])

'delegate_options' in the interpreter call comprise of the below options (required and optional):

Required

The following options need to be specified by user while creating TFLite interpreter:

Name	Value
tidl_tools_path	to be set to ${PSDKRA_PATH}/tidl_xx_xx_xx_xx/tidl_tools/ - Path from where to pick TIDL related tools
artifacts_folder	folder where user intends to store all the compilation artifacts

Optional

The following options are set to default values, to be specified if modification needed by user. Below optional arguments are specific to model compilation and not applicable to inference except the 'debug_level'

Name	Description	Default values
platform	"J7"	"J7"
version	TIDL version - open source runtimes supported from version 7.2 onwards	(7,3)
tensor_bits	Number of bits for TIDL tensor and weights - 8/16	8
debug_level	0 - no debug, 1 - rt debug prints, >=2 - increasing levels of debug and trace dump	0
max_num_subgraphs	offload up to <num> tidl subgraphs	16
deny_list	force disable offload of a particular operator to TIDL [^2]	"" - Empty list
accuracy_level	0 - basic calibration, 1 - higher accuracy(advanced bias calibration), 9 - user defined [^3]	1
advanced_options:calibration_frames	Number of frames to be used for calibration - min 10 frames recommended	20
advanced_options:calibration_iterations	Number of bias calibration iterations [^1]	50
advanced_options:output_feature_16bit_names_list	List of names of the layers (comma separated string) as in the original model whose feature/activation output user wants to be in 16 bit [^1] [^4]	""
advanced_options:params_16bit_names_list	List of names of the output layers (separated by comma or space or tab) as in the original model whose parameters user wants to be in 16 bit [^1] [^5]	""
advanced_options:quantization_scale_type	0 for non-power-of-2, 1 for power-of-2	0
advanced_options:high_resolution_optimization	0 for disable, 1 for enable	0
advanced_options:pre_batchnorm_fold	Fold batchnorm layer into following convolution layer, 0 for disable, 1 for enable	1
advanced_options:add_data_convert_ops	Adds the Input and Output format conversions to Model and performs the same in DSP instead of ARM. This is currently a experimental feature.	0
object_detection:confidence_threshold	Override "nms_score_threshold" parameter threshold in tflite detection post processing layer	Read from model
object_detection:nms_threshold	Override "nms_iou_threshold" parameter threshold in tflite detection post processing layer	Read from model
object_detection:top_k	Override "detections_per_class" parameter threshold in tflite detection post processing layer	Read from model
object_detection:keep_top_k	Override "max_detections" parameter threshold in tflite detection post processing layer	Read from model
ti_internal_nc_flag	internal use only	-

Below options will be overwritten only if accuracy_level = 9, else will be discarded. For accuracy level 9, specified options will be overwritten, rest will be set to default values. For accuracy_level = 0/1, these are preset internally.

Name	Description	Default values
advanced_options:activation_clipping	0 for disable, 1 for enable [^1]	1
advanced_options:weight_clipping	0 for disable, 1 for enable [^1]	1
advanced_options:bias_calibration	0 for disable, 1 for enable [^1]	1
advanced_options:channel_wise_quantization	0 for disable, 1 for enable [^1]	0

[^1]: Advanced calibration can help improve 8-bit quantization. Please see TIDL Quantization for details.
[^2]: Denylist is a string of comma separated numbers which represent the operators as identified in tflite builtin ops. Please refer Tflite builtin ops , e.g. deny_list = "1, 2" to deny offloading 'AveragePool2d' and 'Concatenation' operators to TIDL.
[^3]: Advanced calibration options can be specified by setting accuracy_level = 9.
[^4]: Note that if for a given layer feature/activations is in 16 bit then parameters will automatically become 16 bit and user need not specify them as part of "advanced_options:params_16bit_names_list". Example format - "conv1_2, fire9/concat_1"
[^5]: This is not the name of the parameter of the layer but is expected to be the output name of the layer. Note that, if a given layers feature/activations is in 16 bit then parameters will automatically become 16 bit even if its not part of this list

C API based inference

Pre-requisite: Compiled artifacts stored in artifacts folder as specified in step 2 of 'Model Compilation on PC' above

Following steps are needed to run C API based demo for tflite runtime:

cd ${PSDKRA_PATH}
git clone --single-branch -b r2.4 https://github.com/tensorflow/tensorflow.git

cd tensorflow
git checkout 582c8d236cb079023657287c318ff26adb239002
git am ../tidl_j7_xx_xx_xx_xx/ti_dl/tfl_delegate/0001-tflite-interpreter-add-support-for-custom-data.patch
./tensorflow/lite/tools/make/download_dependencies.sh

cd ..
export PSDK_INSTALL_PATH=$(pwd)

cd targetfs/usr/lib/
ln -s libtbb.so.2 libtbb.so
ln -s libtiff.so.5 libtiff.so
ln -s libwebp.so.7 libwebp.so
ln -s libopencv_highgui.so.4.1 libopencv_highgui.so
ln -s libopencv_imgcodecs.so.4.1 libopencv_imgcodecs.so
ln -s libopencv_core.so.4.1.0  libopencv_core.so
ln -s libopencv_imgproc.so.4.1.0 libopencv_imgproc.so  

cd tidl_j7_xx_xx_xx_xx
make demos DIRECTORIES=tfl

# To run on EVM:

Mount ${PSDKR_PATH} on EVM
export LD_LIBRARY_PATH=/usr/lib
cd ${PSDKR_PATH}/tidl_j7_xx_xx_xx_xx/ti_dl/demos/out/J7/A72/LINUX/release/

./tidl_tfl_classification.out -m ../../../../../../test/testvecs/models/public/tflite/mobilenet_v1_1.0_224.tflite -l ../../../../../../test/testvecs/input/labels.txt -i ../../../../../../test/testvecs/input/airshow.bmp -a 1

Known Issues/Limitations

Some amount of accuracy degradation is seen in quant models on multiple subgraph creation, this will be fixed in the next release.
tidl_denylist feature has not been extensively tested for object detection networks in this release.
Some networks with odd input resolutions e.g. 513X513 are seen to get stuck on target with the use of 8 bit mode, please use 16 bit mode for these networks. This issue will be fixed in next release.

Table of Contents

Introduction

TFLite runtime based user work flow

Model Compilation on PC

Model Inference on EVM

Python API based inference

User options

Required

Optional

C API based inference

Known Issues/Limitations