3. Inference Explained¶
This section explains how to invoke the inference function from the compiled library.
There are two inference scenarios, depending on the ti-npu type=
option specified during compilation:
Running the model on a host processor with an optimized software implementation.
Running the model on a dedicated hardware accelerator, for example, the Neural network Processing Unit (NPU)
3.1. Inference Function and Input/Output¶
In the header file generated by the compiler (tvmgen_default.h
) and stored in the artifacts directory,
information about input/output data shapes and types is provided,
and input/output data structures and the inference function are defined. For example:
/* The generated model library expects the following inputs/outputs:
* Inputs:
* Tensor[(1, 3, 128, 1), float32]
* Outputs:
* Tensor[(1, 6), float32]
* Tensor[(1, 60), float32]
* \brief Input tensor pointers for TVM module "default"
struct tvmgen_default_inputs {
void* onnx__Add_0;
* \brief Output tensor pointers for TVM module "default"
struct tvmgen_default_outputs {
void* output0;
void* output1;
* \brief entrypoint function for TVM module "default"
* \param inputs Input tensors for the module
* \param outputs Output tensors for the module
int32_t tvmgen_default_run(
struct tvmgen_default_inputs* inputs,
struct tvmgen_default_outputs* outputs
In this example, the neural network takes one tensor input and produces two tensor outputs. The expected input is a 1x3x128x1 tensor in floating point type. The two outputs are 1x6 and 1x60 tensors in floating point type. Define the input/output in your application code and initialize the struct members with pointers to your input/output data. For example:
struct tvmgen_default_inputs inputs = { (void*) &my_input[0] }; struct tvmgen_default_outputs outputs = { (void*) &my_output0[0], (void*) &my_output1[0] };
When the skip_normalize=true
and output_int=true
compiler options are specified,
the generated model library may expect integer input and produce integer outputs.
Please refer to the tvmgen_default.h
file for details about input/output data types and
the input feature normalization parameters. For example:
/* The generated model library expects the following inputs/outputs:
* Inputs:
* Tensor[(1, 3, 128, 1), int8]
* Outputs:
* Tensor[(1, 6), int8]
* Tensor[(1, 60), uint8]
/* Input feature normalization parameters:
* input_int = clip(((int32_t)((input_float + bias) * scale)) >> shift, min, max)
* where (min, max) = (-128, 127) if int8 type, (0, 255) if uint8 type
extern const int32_t tvmgen_default_bias_data[] __attribute__((weak)) = {12, 12, 18};
extern const int32_t tvmgen_default_scale_data[] __attribute__((weak)) = {171};
extern const int32_t tvmgen_default_shift_data[] __attribute__((weak)) = {5};
3.2. Converting Input Data from Float to Int with Input Normalization¶
When running the model with the skip_normalize=true
compiler option, you
must convert model input values from floating point to integer before invoking
the model inference function. The tvmgen_default_bias_data
, tvmgen_default_shift_data
definitions in tvmgen_default.h
can be used to convert model input
from float to int using the example code below.
for (c=0; c<...) {
int32_t scaled_val = (int32_t)floorf((float_val
+ tvmgen_default_bias_data[c % TVMGEN_DEFAULT_BIAS_LEN])
* tvmgen_default_scale_data[c % TVMGEN_DEFAULT_SCALE_LEN]);
int32_t shifted_val = scaled_val >> tvmgen_default_shift_data[c % TVMGEN_DEFAULT_SHIFT_LEN];
// clip to 8-bit range
On C2000 devices, since there are no 8-bit integer data types, int8_t
is aliased to int16_t
and uint8_t
is aliased to uint16_t
This type aliasing is consistent with the TI C2000Ware SDK.
Please use int16_t
to declare your input/output tensor data accordingly.
The input feature normalization sequence should still clip the values in the range of [-128, 127] or [0, 255], respectively.
3.3. Running Model on Host Processor (ti-npu type=soft)¶
When running the model on the host processor, the symbol TVMGEN_DEFAULT_TI_NPU_SOFT
is defined in the generated header file
/* Symbol defined when running model on the host processor */
#error Conflicting definition for where model should run.
After the input/output data structure has been set up, running inference is as simple as invoking the inference function.
#include "tvmgen_default.h"
tvmgen_default_run(&inputs, &outputs);
When the inference function returns, the inference results are stored in outputs
3.4. Running Model on Hardware NPU Accelerator (ti-npu)¶
Because the NPU accelerator is a separate core that runs asynchronously to the host processor,
the accelerator must be initialized by the application before it can be used. Perform this initialization by calling the TI_NPU_init()
After invoking the inference function, the application needs to check a volatile variable as follows to see if the inference has been completed.
When running the model on the hardware NPU accelerator, the symbol TVMGEN_DEFAULT_TI_NPU
is defined in the generated header file (tvmgen_default.h
Both the NPU initialization function and the flag for checking model completion are declared in the generated header file (tvmgen_default.h
/* Symbol defined when running model on TI NPU hardware accelerator */
#error Conflicting definition for where model should run.
/* TI NPU hardware accelerator initialization */
extern void TI_NPU_init();
/* Flag for model execution completion on TI NPU hardware accelerator */
extern volatile int32_t tvmgen_default_finished;
Example code for running inference on hardware NPU accelerator is as follows:
#include "tvmgen_default.h"
TI_NPU_init(); /* one time initialization */
/* ... other code can go here ... */
tvmgen_default_run(&inputs, &outputs);
/* ... other code can go here ... */
while (!tvmgen_default_finished) ;