2. Inference Explained

This section explains how to invoke the inference function from the compiled library.

There are two inference scenarios, depending on the ti-npu type= option specified during compilation:

  • Running the model on a host processor with an optimized software implementation.

  • Running the model on a dedicated hardware accelerator, for example, the Neural network Processing Unit (NPU)

2.1. Inference Function and Input/Output

In the header file generated by the compiler (tvmgen_default.h) and stored in the artifacts directory, information about input/output data shapes and types is provided, and input/output data structures and the inference function are defined. For example:

/* The generated model library expects the following inputs/outputs:
 * Inputs:
 *    Tensor[(1, 3, 128, 1), float32]
 * Outputs:
 *    Tensor[(1, 6), float32]
 *    Tensor[(1, 60), float32]
 */

/*!
 * \brief Input tensor pointers for TVM module "default"
 */
struct tvmgen_default_inputs {
  void* onnx__Add_0;
};

/*!
 * \brief Output tensor pointers for TVM module "default"
 */
struct tvmgen_default_outputs {
  void* output0;
  void* output1;
};

/*!
 * \brief entrypoint function for TVM module "default"
 * \param inputs Input tensors for the module
 * \param outputs Output tensors for the module
 */
int32_t tvmgen_default_run(
  struct tvmgen_default_inputs* inputs,
  struct tvmgen_default_outputs* outputs
);

In this example, the neural network takes one tensor input and produces two tensor outputs. The expected input is a 1x3x128x1 tensor in floating point type. The two outputs are 1x6 and 1x60 tensors in floating point type. Define the input/output in your application code and initialize the struct members with pointers to your input/output data. For example:

struct tvmgen_default_inputs  inputs  = { (void*) &my_input[0] };
struct tvmgen_default_outputs outputs = { (void*) &my_output0[0], (void*) &my_output1[0] };

When the skip_normalize=true and output_int=true compiler options are specified, the generated model library may expect integer input and produce integer outputs. Please refer to the tvmgen_default.h file for details about input/output data types and the input feature normalization parameters. For example:

/* The generated model library expects the following inputs/outputs:
 * Inputs:
 *    Tensor[(1, 3, 128, 1), int8]
 * Outputs:
 *    Tensor[(1, 6), int8]
 *    Tensor[(1, 60), uint8]
 */

/* Input feature normalization parameters:
 *   input_int = clip(((int32_t)((input_float + bias) * scale)) >> shift, min, max)
 *   where (min, max) = (-128, 127) if int8 type, (0, 255) if uint8 type
 */
  #define TVMGEN_DEFAULT_NUM_CHANNELS 3
  #define TVMGEN_DEFAULT_INPUT_NORMALIZATION_IS_CHANNELWISE 1
  const int32_t tvmgen_default_bias_data[] = {12, 12, 18};
  const int32_t tvmgen_default_scale_data[] = {171};
  const int32_t tvmgen_default_shift_data[] = {5};

Note

On C2000 devices, since there are no 8-bit integer data types, int8_t is aliased to int16_t, and uint8_t is aliased to uint16_t. This type aliasing is consistent with the TI C2000Ware SDK. Please use int16_t/uint16_t to declare your input/output tensor data accordingly. The input feature normalization sequence should still clip the values in the range of [-128, 127] or [0, 255], respectively.

2.2. Running Model on Host Processor (ti-npu type=soft)

After the input/output data structure has been set up, running inference is as simple as invoking the inference function.

#include "tvmgen_default.h"

tvmgen_default_run(&inputs, &outputs);

When the inference function returns, the inference results are stored in outputs.

2.3. Running Model on Hardware NPU Accelerator (ti-npu type=hard)

Because the NPU accelerator is a separate core that runs asynchronously to the host processor, the accelerator must be initialized by the application before it can be used. Perform this initialization by calling the TI_NPU_init() function.

After invoking the inference function, the application needs to check a volatile variable as follows to see if the inference has been completed.

Both the NPU initialization function and the flag for checking model completion are declared in the generated header file (tvmgen_default.h).

/* TI NPU hardware accelerator initialization */
extern void TI_NPU_init();

/* Flag for model execution completion on TI NPU hardware accelerator */
extern volatile int32_t tvmgen_default_finished;

Example code for running inference on hardware NPU accelerator is as follows:

#include "tvmgen_default.h"

TI_NPU_init();  /* one time initialization */

/* ... other code can go here ... */

tvmgen_default_run(&inputs, &outputs);

/* ... other code can go here ... */

while (!tvmgen_default_finished) ;