1. Compilation Explained¶
This section explains how to use the TI Neural Network Compiler for MCUs to compile neural networks.
1.1. Environment Setup¶
For a particular TI family of MCU devices, begin by downloading the corresponding SDK and setting environment variables to help with compilation.
1.1.1. Setup for TI C28 Device Family¶
Follow these steps to set up your environment for using the TI Neural Network Compiler for MCUs for the TI C28 device family.
Download and install C2000WARE (version 5.02 or newer) from https://www.ti.com/tool/C2000WARE.
Define the following environment variables, specifying your installation path, device, and FPU type:
export C2000WARE_PATH=/path/to/C2000WARE export C2000_DEVICE=<your_device> # e.g. f28p55x or f2837xd or f28p65x export C2000_DEVICE_FPU=<fpu32, fpu64> # e.g. fpu32 for f28p55x and f2837xd, fpu64 for f28p65x
Download and install the C2000 Code Generation Tools from https://www.ti.com/tool/C2000-CGT.
Define the following environment variables, specifying your CGT installation path:
export C2000_CGT_PATH=/path/to/C2000-CGT export PATH=$C2000_CGT_PATH/bin:$PATH export TVMC_OPTIONS="--output-format=a --runtime=crt --executor=aot --executor-aot-interface-api=c --executor-aot-unpacked-api=1 --pass-config tir.disable_vectorize=1 --pass-config tir.usmp.algorithm=hill_climb" export CL2000_OPTIONS="--float_support=${C2000_DEVICE_FPU} --abi=eabi -O3 --opt_for_speed=5 --c99 -v28 -ml -mt --gen_func_subsections -I${C2000_CGT_PATH}/include -I${C2000WARE_PATH}/driverlib/${C2000_DEVICE}/driverlib -I${C2000WARE_PATH}/device_support/${C2000_DEVICE}/common/include -I."
1.2. Compilation Command¶
The following compilation command example compiles a neural network (model) into a library that can run on a C28 host core with TVM auto-generated code.
tvmc compile $TVMC_OPTIONS --target="c, ti-npu type=soft" --target-c-mcpu=c28 ./int_model.onnx -o artifacts_c28_soft_auto/mod.a --cross-compiler="cl2000" --cross-compiler-options="$CL2000_OPTIONS -Iartifacts_c28_soft_auto --obj_directory=."
In this example:
int_model.onnx
is the input model.artifacts_c28_soft_auto/mod.a
specifiesartifacts_c28_soft_auto
as the compilation artifacts directory andmod.a
as the generated library.
You can choose different names for the directory and the library.
1.2.1. Compiler Options¶
You can replace different portions of the example compilation command above using the following options. Note that options following ti-npu
can be stacked together. For example, you could change --target="c, ti-npu type=soft"
to --target="c, ti-npu type=hard skip_normalize=true output_int=true"
.
Host Processor Options |
Description |
---|---|
|
Host processor is C28 |
Accelerator Options |
Description |
---|---|
|
Run layers with an optimized software library on a host processor. |
|
Run layers with an optimized software implementation on a host processor. |
|
Run layers on an (NPU) |
Additional Options |
Description |
---|---|
|
Skip float to integer input normalization sequence; see Performance Options for details |
|
Skip integer to float output casting; see Performance Options for details |
1.3. Compilation Artifacts¶
Compilation artifacts are stored in the specified artifacts directory–for example, artifacts_c28_soft_auto
in the above compilation command example.
This artifacts directory will contain:
A header file (for example,
tvmgen_default.h
)Generated C code files (for example,
lib0/lib1/lib2.c
)A library file (for example,
mod.a
)
During compilation, the header file and generated C code files are compiled along with runtime C code files in the tinie-api
directory to generate the mod.a
library file. This makes the output from the compiler easier to integrate into a CCS project as described in the following section.
1.4. Integrating Compilation Artifacts into a CCS Project¶
1.4.1. Add to CCS Project¶
Follow these steps to integrate the output from the TI Neural Network Compiler for MCUs into a CCS project:
Copy the library and header file from the compilation artifacts directory into the CCS project for the user application.
In the CCS project’s linker command file, place the
.rodata.tvm
section in FLASH, and place the.bss.noinit.tvm
section in SRAM.Ensure that the compiler options used to compile the model are compatible with those in the CCS project properties and the device that will run the application. For example, set the
float_support
option (C2000_DEVICE_FPU) used to compile the model tofpu32
for F28P55x devices and set it tofpu64
for F28P65x devices.
1.4.2. Hardware NPU Specific¶
If the hardware NPU accelerator was specified as the target (--target="c, ti-npu type=hard"
) when compiling the neural network, then the CCS project needs the following settings:
Place
.bss.noinit.tvm
in global shared SRAM so that hardware NPU can access it. For example,RAMGS0
,RAMGS1
,RAMGS2
, orRAMGS3
.Make sure the RAM model is selected for the CCS Project properties > Linker > Advanced Options > Runtime Environment > Initialization model option.
1.5. Performance Options¶
1.5.1. Skip Input Feature Normalization (skip_normalize=true)¶
NPU-trained models have an input feature normalization sequence that converts float input data to quantized int8_t/uint8_t data. By default, TVM generates code for this float to integer conversion.
If the NPU option skip_normalize=true
is specified, TVM prunes the model to skip the input feature normalization sequence; instead, it directly computes from integer data. If this option is specified, the user application should provide integer data–instead of float data–as input to the model.
That is, the user application should perform input feature normalization outside of the TVM-generated code.
The user application may be able to improve performance when processing time-series data in a sliding-window fashion, because the application can choose to normalize only new data in the window, while reusing already normalized results for old data in the window.
TVM logs the bias, scale, and shift parameters used for input feature normalization in a generated header file (for example, tvmgen_default.h
).
If no input feature normalization sequence is found in the model, the generated model library still uses the data type of the original model input.
Please refer to the generated header file in the artifacts directory for details.
1.5.2. Skip Output Cast (output_int=true)¶
NPU-trained models produce float outputs by default. By default, TVM generates code to cast the model output from integer to float, if the original ONNX model output is in float.
When the option output_int=true
is specified, TVM prunes the model to skip the final int to float cast and directly outputs the int, and user applications should interpret the inference results as int.
1.6. Mapping Between tvmc Command-Line Options and Tiny ML ModelMaker Arguments¶
If you are using TI’s EdgeAI Studio IDE, you will not use the TI Neural Network Compiler for MCUs directly. EdgeAI Studio interfaces with a command-line tool called Tiny ML ModelMaker, which in turn interfaces with the TI Neural Network Compiler.
Tiny ML ModelMaker directly uses parsed arguments instead of the tvmc
command line,
For example, the --target-c-mcpu=c28
tvmc command-line option is used as an entry
{"target-c-mcpu" : "c28", ...}
in the args dictionary (python) from the Tiny ML ModelMaker tool.