1.3. Compiling Models¶
TI TVM supports two options for compiling models with TIDL offload. The distinction is based on where layers that are not supported by TIDL are executed during inference:
Executing unsupported layers on Arm (See flow in Figure 1).
Executing unsupported layers on C7x (See flow in Figure 2). This option frees up the Arm device to run other aspects of the application. It can also improve overall inference performance by minimizing communication across the Arm and C7x.
The following figures show components added by TIDL and the TI TVM fork outlined in red.
Figure 1: Mapped to Arm |
|
Figure 2: Mapped to C7x |
See Compilation Explained for further details about TI TVM compilation.
1.3.1. Compiling for TIDL Offload¶
TVM compilation is typically performed using a Python compilation script. Example scripts are provided in the TI edgeai-tidl-tools and TI TVM fork Git repositories. You can use these examples as templates to modify for your own use cases.
An overview of TI Open Source Runtime and compilation options are provided in the TI edgeai-tidl-tools overview.
The following Python functions are used in the examples provided by the TI TVM fork.
1.3.1.1. compile_model¶
The compile_model
function encapsulates the steps required to compile a model with TIDL offload.
- relay.ti_tests.compile_model.compile_model(model_name: str, platform: str, compile_for_device: bool, enable_tidl_offload: bool, enable_c7x_codegen: bool, batch_size: int = 0)[source]¶
Compile a model based on the parameters specified.
- Parameters
model_name – name of the model as specified in models.py (E.g. mv2_onnx).
platform – in [“J7”, “J721S2””, “AM62A”]
compile_for_device – True => Compile module for inference on device (aarch64). False => Compile module for inference on host (x86).
enable_tidl_offload – Set to True to enable TIDL offload.
enable_c7x_codegen – True => Enable c7x code generation for layers not offloaded to TIDL. i.e. entire network runs on the C7x. False => Enable Arm code generation for layers not offloaded to TIDL. i.e. Unsupported layers are run on Arm (aarch64).
batch_size – 0: use the batch size that comes with the model otherwise: override the default batch size
- Return type
True for success, False for failure.
Listing 1.1 shows the key functions called from compile_model.
1 os.environ["TIDL_RELAY_MAX_BATCH_SIZE"] = str(batch_size)
2
3 # Obtain model and convert to Relay
4 mod, params = get_relay_model(model_name, batch_size)
5
6 # Get inputs to use for calibraton (required for TIDL offload)
7 calibration_input_list = get_calib_inputs(model_name, batch_size)
8
9 # Generate a name for the artifacts folder based on the model and other parameters
10 artifacts_folder = get_artifacts_folder(model_name, platform, compile_for_device,
11 enable_tidl_offload, enable_c7x_codegen,
12 batch_size)
13
14 # Compile the model using TVM and place the output in the artifacts_folder
15 result = compile.compile_relay(mod, params, calibration_input_list, platform, compile_for_device,
After a successful compile, the artifacts required to deploy the model are stored in the artifacts_folder.
1.3.1.2. compile_relay¶
The compile_relay
function uses the TIDLCompiler class to partition the relay graph for offload subgraphs to TIDL.
- tvm.contrib.tidl.compile.compile_relay(mod: IRModule, params: Dict[str, NDArray], calibration_input_list: List[Dict[str, NDArray]], platform: str, compile_for_device: bool, enable_tidl_offload: bool, enable_c7x_codegen: bool, artifacts_folder: str, tidl_tensor_bits: int = 8) bool [source]¶
Compile Relay IR module based on the parameters specified
- Parameters
mod – Input Relay IR module.
params – The parameter dict used by Relay.
platform – in [“J7”, “J721S2”, “AM62A”]
calibration_input_list – A dictionary where the key is input name and the value is input tensor.
compile_for_device – True => Compile module for inference on device (aarch64). False => Compile module for inference on host (x86).
enable_tidl_offload – Set to True to enable TIDL offload.
enable_c7x_codegen – True => Enable c7x code generation for layers not offloaded to TIDL. i.e. entire network runs on the C7x. False => Enable Arm code generation for layers not offloaded to TIDL. Unsupported layers are run on Arm (aarch64).
tidl_tensor_bits – Number of bits used to represent TIDL tensors and weights.
- Return type
True for success, False for failure.