TI Deep Learning Library User Guide
|
The Processor SDK implements TIDL offload support using the TVM runtime and Neo-AI-DLR runtime. This heterogeneous execution enables:
Neo-AI-DLR is an open source common runtime for machine learning models compiled by AWS SageMaker Neo, TVM, or Treelite. For the Processor SDK, we focus on models compiled by TVM. For these models, the Neo-AI-DLR runtime can be considered as a wrapper around the TVM runtime.
The following sections describe the details for compiling and deploying machine learning models for TVM/Neo-AI-DLR + TIDL heterogeneous execution.
Find Below picture for TVM/NEO-AI-DLR based work flow. User needs to run the model complitation (Sub graphs creation and quantization) on PC and the generated artifacts can be used for running inference on the device.
The Processor SDK package includes all the required python packages for runtime support.
Pre-requisite : PSDK RA should be installed on the Host Ubuntu 18.04 machine and able to run pre-built demos on EVM.
Following steps need to be followed : (Note - All below scripts to be run from ${PSDKRA_PATH}/tidl_xx_xx_xx_xx/ti_dl/test/tvm-dlr/ folder)
Note
This scripts downalod the TVM and DLR python packages from latest PSDK release in the ti.com If you are using diffrent version of SDK (Example RC versions), then please update links for these two python whl files in the download_models.py - 'dlr-1.4.0-py3-none-any.whl' : {'url':'http://swubn03.india.englab.ti.com/webgen/publish/PROCESSOR-SDK-LINUX-J721E/07_02_00_05/exports//dlr-1.4.0-py3-none-any.whl', 'dir':'./'} If you observe any issue in pip. Run below command to update pippython -m pip install --upgrade pip
Note : These scripts are only for basic functionally testing and performance check. For accuracy benchmarking, we will be releasing more tutorials in upcoming releases
There are only 4 lines that are specific to TIDL offload in "test_tidl_j7.py". The rest of the script is no different from a regular TVM compilation script without TIDL offload.
tidl_compiler = tidl.TIDLCompiler(tidl_platform, tidl_version, num_tidl_subgraphs=num_tidl_subgraphs, artifacts_folder=tidl_artifacts_folder, tidl_tools_path=get_tidl_tools_path(), tidl_tensor_bits=8, tidl_calibration_options={'iterations':10}, tidl_denylist=args.denylist)
We first instantiate a TIDLCompiler object. The parameters are explained in the following table.
Name/Position | Value |
---|---|
tidl_platform | "J7" |
tidl_version | (7,1) |
num_tidl_subgraphs | offload up to <num> tidl subgraphs |
artifacts_folder | where to store deployable module |
tidl_tools_path | set to environment variable TIDL_TOOLS_PATH |
tidl_tensor_bits | 8 or 16 for import TIDL tensor and weights |
tidl_calibration_options | optional, a dictionary to overwrite default calibration options |
tidl_denylist | optional, deny a TVM relay op for TIDL offloading |
Advanced calibration can help improve 8-bit quantization. Please see TIDL Quantization for details. Default calibration options are specified in tvm source file, python/tvm/relay/backend/contrib/tidl.py. Please grep for "default_calib_options".
mod, status = tidl_compiler.enable(mod_orig, params, model_input_list)
In this step, the original machine learning model/network represented in TVM Relay IR, "mod_orig", goes through the following transformations:
with tidl.build_config(tidl_compiler=tidl_compiler): graph, lib, params = relay.build_module.build(mod, target=target, params=params)
In this step, TVM code generation takes place. Inside the TVM codegen, there is a TIDL codegen backend. "tidl.build_config" creates a context and tells the TIDL codegen backend where the artifacts from TIDL importing are. The backend then embeds the artifacts into the "lib".
tidl.remove_tidl_params(params)
This optional step removes the weights in TIDL subgraphs that have already been imported into the artifacts. Removing them results in a smaller deployable module.
NEO-AI-DLR on the EVM/Target supports both Pythin and C-API. This section of describes the usage of C-API
This demo uses previously created TVM artifacts from NN models. For details on how to compile NN models in TVM, using TI's J7 platform as a target, please refer to "Compilation with TVM compiler" in the "TVM/Neo-AI-DLR + TIDL Heterogeneous Execution" section.
Platform | Linux x86_64 | Linux+RTOS mode | QNX+RTOS mode | SoC |
---|---|---|---|---|
Support | NO | YES | NO | J721e |
This demo showcases parallel execution of heterogenous TVM/NEO-AI-DLR with a display, using pthreads and openVX graphs.
The result of compilation is called a "deployable module". It consists of three files:
Taking the output of "test_tidl_j7.py" for TensorFlow MobilenetV1 for example, the deployable module for J7 target is located in "artifacts_MobileNetV1_target/". You can copy this deployable module to the target EVM for execution. Please see the "Inference" sections below for details.
artifacts_MobileNetV1_target |-- deploy_graph.json |-- deploy_lib.so |-- deploy_param.params
All other compilation artifacts are stored in the "tempDir" directory under the specified "artifacts_folder". Interested users can look into this directory for TIDL importing details. This directory is for information only, and is not needed for inference/deployment.
One useful file is "relay.gv.svg". It gives a graphical view of the whole network and where the TIDL subgraphs are. You can view it using a browser or other viewer, for example:
firefox artifacts_MobileNetV1_target/tempDir/relay.gv.svg
You can set the environment variable TIDL_RELAY_IMPORT_DEBUG to 0, 1, 2, 3, or 4 for detailed internal debug information and progress during TVM compilation. For example the compiler will dump the graph represented in TVM Relay IR, RelayIR to TIDL importing, etc.
When TIDL_RELAY_IMPORT_DEBUG is set to 4, TIDL import will generate the output for each TIDL layer in the imported TIDL subgraph, using calibration inputs. The compilation will also generate corresponding output from running the original model in floating point mode, by compiling and running on the host using TVM. We name the tensors from TIDL quantized calibration execution "tidl_tensor"; we name the corresponding tensors from TVM floating point execution "tvm_tensor". A simple script, "compare_tensors.py", is provided to compare these two tensors.
TIDL_RELAY_IMPORT_DEBUG=4 python3 ./test_tidl_j7.py --target # python3 ./compare_tensors.py <artifacts_folder> <subgraph_id> <layer_id> python3 ./compare_tensors.py artifacts_MobileNetV1_target 0 1