1.3. Compiling Models

TI TVM supports two options for compiling models with TIDL offload. The distinction is based on where layers that are not supported by TIDL are executed during inference:

  1. Executing unsupported layers on Arm (See flow in Figure 1).

  2. Executing unsupported layers on C7x (See flow in Figure 2). This option frees up the Arm device to run other aspects of the application. It can also improve overall inference performance by minimizing communication across the Arm and C7x.

The following figures show components added by TIDL and the TI TVM fork outlined in red.

Table 1.1 Model compilation with unsupported layers

Figure 1: Mapped to Arm

../_images/TVM_Compile_Arm.png

Figure 2: Mapped to C7x

../_images/TVM_Compile_C7x.png

See Compilation Explained for further details about TI TVM compilation.

1.3.1. Compiling for TIDL Offload

TVM compilation is typically performed using a Python compilation script. Example scripts are provided in the TI edgeai-tidl-tools and TI TVM fork Git repositories. You can use these examples as templates to modify for your own use cases.

An overview of TI Open Source Runtime and compilation options are provided in the TI edgeai-tidl-tools overview.

The following Python functions are used in the examples provided by the TI TVM fork.

1.3.1.1. compile_model

The compile_model function encapsulates the steps required to compile a model with TIDL offload.

relay.ti_tests.compile_model.compile_model(model_name: str, platform: str, compile_for_device: bool, enable_tidl_offload: bool, enable_c7x_codegen: bool, batch_size: int = 0)[source]

Compile a model based on the parameters specified.

Parameters
  • model_name – name of the model as specified in models.py (E.g. mv2_onnx).

  • platform – in [“J7”, “J721S2””, “AM62A”]

  • compile_for_device – True => Compile module for inference on device (aarch64). False => Compile module for inference on host (x86).

  • enable_tidl_offload – Set to True to enable TIDL offload.

  • enable_c7x_codegen – True => Enable c7x code generation for layers not offloaded to TIDL. i.e. entire network runs on the C7x. False => Enable Arm code generation for layers not offloaded to TIDL. i.e. Unsupported layers are run on Arm (aarch64).

  • batch_size – 0: use the batch size that comes with the model otherwise: override the default batch size

Return type

True for success, False for failure.

Listing 1.1 shows the key functions called from compile_model.

Listing 1.1 Compiling a model with TIDL offload
 1    os.environ["TIDL_RELAY_MAX_BATCH_SIZE"] = str(batch_size)
 2
 3  # Obtain model and convert to Relay
 4  mod, params = get_relay_model(model_name, batch_size)
 5
 6  # Get inputs to use for calibraton (required for TIDL offload)
 7  calibration_input_list = get_calib_inputs(model_name, batch_size)
 8
 9  # Generate a name for the artifacts folder based on the model and other parameters
10  artifacts_folder = get_artifacts_folder(model_name, platform, compile_for_device,
11                                          enable_tidl_offload, enable_c7x_codegen,
12                                          batch_size)
13
14  # Compile the model using TVM and place the output in the artifacts_folder
15  result = compile.compile_relay(mod, params, calibration_input_list, platform, compile_for_device,

After a successful compile, the artifacts required to deploy the model are stored in the artifacts_folder.

1.3.1.2. compile_relay

The compile_relay function uses the TIDLCompiler class to partition the relay graph for offload subgraphs to TIDL.

tvm.contrib.tidl.compile.compile_relay(mod: IRModule, params: Dict[str, NDArray], calibration_input_list: List[Dict[str, NDArray]], platform: str, compile_for_device: bool, enable_tidl_offload: bool, enable_c7x_codegen: bool, artifacts_folder: str, tidl_tensor_bits: int = 8) bool[source]

Compile Relay IR module based on the parameters specified

Parameters
  • mod – Input Relay IR module.

  • params – The parameter dict used by Relay.

  • platform – in [“J7”, “J721S2”, “AM62A”]

  • calibration_input_list – A dictionary where the key is input name and the value is input tensor.

  • compile_for_device – True => Compile module for inference on device (aarch64). False => Compile module for inference on host (x86).

  • enable_tidl_offload – Set to True to enable TIDL offload.

  • enable_c7x_codegen – True => Enable c7x code generation for layers not offloaded to TIDL. i.e. entire network runs on the C7x. False => Enable Arm code generation for layers not offloaded to TIDL. Unsupported layers are run on Arm (aarch64).

  • tidl_tensor_bits – Number of bits used to represent TIDL tensors and weights.

Return type

True for success, False for failure.