TI Deep Learning Library User Guide
|
While you can use TIDL and its API directly, the Processor SDK also implements TIDL offload support using the TVM runtime and Neo-AI-DLR runtime. This heterogeneous execution enables:
Neo-AI-DLR is an open source common runtime for machine learning models compiled by AWS SageMaker Neo, TVM, or Treelite. For the Processor SDK, we focus on models compiled by TVM. For these models, the Neo-AI-DLR runtime can be considered as a wrapper around the TVM runtime.
The following sections describe the details for compiling and deploying machine learning models for TVM/Neo-AI-DLR + TIDL heterogeneous execution.
The Processor SDK does not package TVM by default. You will need to build TVM on an x86-64 Linux machine running Ubuntu 18.04. We assume that the TIDL package is already built or available from the Processor SDK installation.
# Starting point: A x86_64 Linux environment running Ubuntu 18.04 # Install pre-requisites cd ${HOME} sudo apt install cmake python3-pip libtinfo-dev libtest libxml2-dev graphviz sudo apt install lib32ncurses5 lib32z1 pip3 install matplotlib decorator pytest antlr4-python3-runtime typed_ast pip3 install onnx gluoncv mxnet tflite torch torchvision pip3 uninstall tensorflow pip3 install tensorflow==1.14 # Set TIDL_PATH export TIDL_PATH=/path/to/your/TIDL/package # Set ARM_GCC_PATH/ARM64_GCC_PATH to your installation # Download 64-bit gcc-arm from: https://developer.arm.com/tools-and-software/open-source-software/developer-tools/gnu-toolchain/gnu-a/downloads/9-2-2019-12 wget https://developer.arm.com/-/media/Files/downloads/gnu-a/9.2-2019.12/binrel/gcc-arm-9.2-2019.12-x86_64-aarch64-none-linux-gnu.tar.xz tar xf gcc-arm-9.2-2019.12-x86_64-aarch64-none-linux-gnu.tar.xz export ARM_GCC_PATH=${HOME}/gcc-arm-9.2-2019.12-x86_64-aarch64-none-linux-gnu/bin export ARM64_GCC_PATH=${ARM_GCC_PATH} # Download llvm 10.0 wget https://github.com/llvm/llvm-project/releases/download/llvmorg-10.0.0/clang+llvm-10.0.0-x86_64-linux-gnu-ubuntu-18.04.tar.xz tar xf clang+llvm-10.0.0-x86_64-linux-gnu-ubuntu-18.04.tar.xz # Make a j7 directory mkdir -p ${HOME}/tvm-j7 # Clone TVM and build with TIDL backends cd ${HOME}/tvm-j7 git clone --single-branch -b tidl-j7 https://github.com/TexasInstruments/tvm.git cd tvm # Check out tag in the format of REL.TIDL.J7.XX.YY.ZZ.WW, please use # the corresponding TIDL package version in Processor SDK for XX.YY.ZZ.WW # e.g. git checkout REL.TIDL.J7.01.03.00.11 for Processor SDK 7.1 git checkout REL.TIDL.J7.<TIDL_package_version> git submodule init git submodule update --init --recursive mkdir build; cd build cmake -DUSE_SORT=ON -DUSE_LLVM=${HOME}/clang+llvm-10.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/llvm-config -DUSE_TIDL=ON -DUSE_TIDL_RT_PATH=${TIDL_PATH}/ti_dl/rt .. make -j12 export TVM_HOME=${HOME}/tvm-j7/tvm export PYTHONPATH=${TVM_HOME}/python:${TVM_HOME}/topi/python:${TVM_HOME}/nnvm/python:${PYTHONPATH}
We provide an example compilation test script called "test_tidl_j7.py" to illustrate TVM compilation with TIDL offload.
# Set TIDL tools directory, copy or link files from your TIDL package cd ${HOME}/tvm-j7 mkdir -p tidl_tools; cd tidl_tools ln -s $TIDL_PATH/ti_dl/test/testvecs/config/import/device_config.cfg . ln -s $TIDL_PATH/ti_dl/test/PC_dsp_test_dl_algo.out . ln -s $TIDL_PATH/ti_dl/utils/perfsim/ti_cnnperfsim.out . ln -s $TIDL_PATH/ti_dl/utils/tidlModelImport/out/tidl_model_import_relay.so . export TIDL_TOOLS_PATH=${HOME}/tvm-j7/tidl_tools # Run compilation tests. The --target option specifies compilation for ARM, # for running inference on the EVM. With --host, TVM compiles for X86, # for running inference in host emulation mode. # Run with "-h" option for help messages. "--target" is the default. cd ${HOME}/tvm-j7/tvm/tests/python/relay/ti_tests python3 ./test_tidl_j7.py -h python3 ./test_tidl_j7.py --target python3 ./test_tidl_j7.py --host
There are only 4 lines that are specific to TIDL offload in "test_tidl_j7.py". The rest of the script is no different from a regular TVM compilation script without TIDL offload.
tidl_compiler = tidl.TIDLCompiler(tidl_platform, tidl_version, num_tidl_subgraphs=num_tidl_subgraphs, artifacts_folder=tidl_artifacts_folder, tidl_tools_path=get_tidl_tools_path(), tidl_tensor_bits=8, tidl_calibration_options={'iterations':10}, tidl_denylist=args.denylist)
We first instantiate a TIDLCompiler object. The parameters are explained in the following table.
Name/Position | Value |
---|---|
tidl_platform | "J7" |
tidl_version | (7,1) |
num_tidl_subgraphs | offload up to <num> tidl subgraphs |
artifacts_folder | where to store deployable module |
tidl_tools_path | set to environment variable TIDL_TOOLS_PATH |
tidl_tensor_bits | 8 or 16 for import TIDL tensor and weights |
tidl_calibration_options | optional, a dictionary to overwrite default calibration options |
tidl_denylist | optional, deny a TVM relay op for TIDL offloading |
Advanced calibration can help improve 8-bit quantization. Please see TIDL Quantization for details. Default calibration options are specified in tvm source file, python/tvm/relay/backend/contrib/tidl.py. Please grep for "default_calib_options".
mod, status = tidl_compiler.enable(mod_orig, params, model_input_list)
In this step, the original machine learning model/network represented in TVM Relay IR, "mod_orig", goes through the following transformations:
with tidl.build_config(tidl_compiler=tidl_compiler): graph, lib, params = relay.build_module.build(mod, target=target, params=params)
In this step, TVM code generation takes place. Inside the TVM codegen, there is a TIDL codegen backend. "tidl.build_config" creates a context and tells the TIDL codegen backend where the artifacts from TIDL importing are. The backend then embeds the artifacts into the "lib".
tidl.remove_tidl_params(params)
This optional step removes the weights in TIDL subgraphs that have already been imported into the artifacts. Removing them results in a smaller deployable module.
The result of compilation is called a "deployable module". It consists of three files:
Taking the output of "test_tidl_j7.py" for TensorFlow MobilenetV1 for example, the deployable module for J7 target is located in "artifacts_MobileNetV1_target/". You can copy this deployable module to the target EVM for execution. Please see the "Inference" sections below for details.
artifacts_MobileNetV1_target |-- deploy_graph.json |-- deploy_lib.so |-- deploy_param.params
All other compilation artifacts are stored in the "tempDir" directory under the specified "artifacts_folder". Interested users can look into this directory for TIDL importing details. This directory is for information only, and is not needed for inference/deployment.
One useful file is "relay.gv.svg". It gives a graphical view of the whole network and where the TIDL subgraphs are. You can view it using a browser or other viewer, for example:
firefox artifacts_MobileNetV1_target/tempDir/relay.gv.svg
You can set the environment variable TIDL_RELAY_IMPORT_DEBUG to 0, 1, 2, 3, or 4 for detailed internal debug information and progress during TVM compilation. For example the compiler will dump the graph represented in TVM Relay IR, RelayIR to TIDL importing, etc.
When TIDL_RELAY_IMPORT_DEBUG is set to 4, TIDL import will generate the output for each TIDL layer in the imported TIDL subgraph, using calibration inputs. The compilation will also generate corresponding output from running the original model in floating point mode, by compiling and running on the host using TVM. We name the tensors from TIDL quantized calibration execution "tidl_tensor"; we name the corresponding tensors from TVM floating point execution "tvm_tensor". A simple script, "compare_tensors.py", is provided to compare these two tensors.
TIDL_RELAY_IMPORT_DEBUG=4 python3 ./test_tidl_j7.py --target # python3 ./compare_tensors.py <artifacts_folder> <subgraph_id> <layer_id> python3 ./compare_tensors.py artifacts_MobileNetV1_target 0 1
The Neo-AI-DLR runtime is pre-built and packaged in target filesystem.
We provide an example inference script, "test_tidl_j7_deploy.py" to illustrate running a TVM-compiled deployable module with the Neo-AI-DLR runtime. Note that this script does not have anything TIDL specific: with or without TIDL offload, this script remains the same. Run the script with "-h" option for help messages. "--target --dlr --cv" are the default options.
# on target J7 EVM cd /PATH/TO/tvm/tests/python/relay/ti_tests python3 ./test_tidl_j7_deploy.py -h python3 ./test_tidl_j7_deploy.py --target --dlr --cv <input_image> python3 ./test_tidl_j7_deploy.py <input_image>
Please refer to the example inference script for details.
module = DLRModel(artifacts_dir) results = module.run({input_tensor : input_data}) tvm_output = results[0]
If the model was compiled using the TIDLCompiler augmentation as shown above, TIDL-supported subgraphs will be offloaded and accelerated.
Should you want debug information during inference, you can set the environment variable TIDL_RT_DEBUG to 1, 2, 3, or 4 to see the internal more details of the inference progress.
TIDL_RT_DEBUG=1 python3 ./test_tidl_j7_deploy.py <input_image>
If you choose to run the deployable module with the TVM runtime (rather than Neo-AI_DLR) on the target EVM, you will need to build TVM on the target. The steps are similar to building TVM on an x86_64 host, except that the target toolchain will be for 64-bit ARM.
Then, run the inference script with the –tvm option:
# on target J7 EVM cd /PATH/TO/tvm/tests/python/relay/ti_tests python3 ./test_tidl_j7_deploy.py --tvm
Please refer to the example inference script for details.
loaded_json = open(artifacts_dir + "deploy_graph.json").read() loaded_lib = tvm.runtime.load_module(artifacts_dir + "deploy_lib.so") loaded_params = bytearray(open(artifacts_dir + "deploy_param.params", "rb").read()) # create a runtime executor module module = runtime.create(loaded_json, loaded_lib, tvm.cpu()) # load params into the module module.load_params(loaded_params) # feed input data module.set_input(input_tensor, tvm.nd.array(input_data)) # run module.run() # get output tvm_output = module.get_output(0).asnumpy()