3. Inference Explained

This section explains how to invoke the inference function from the compiled library. There are two inference scenarios, depending on the ti-npu type= option specified during compilation:

  • Running the model on a host processor with an optimized software implementation.

  • Running the model on a dedicated hardware accelerator, for example, the Neural network Processing Unit (NPU)

3.1. Inference Function and Input/Output

In the header file generated by the compiler (tvmgen_default.h) and stored in the artifacts directory, information about input/output data shapes and types is provided, and input/output data structures and the inference function are defined. For example:

/* The generated model library expects the following inputs/outputs:
 * Inputs:
 *    Tensor[(1, 3, 128, 1), float32]
 * Outputs:
 *    Tensor[(1, 6), float32]
 *    Tensor[(1, 60), float32]
 */

/*!
 * \brief Input tensor pointers for TVM module "default"
 */
struct tvmgen_default_inputs {
  void* onnx__Add_0;
};

/*!
 * \brief Output tensor pointers for TVM module "default"
 */
struct tvmgen_default_outputs {
  void* output0;
  void* output1;
};

/*!
 * \brief entrypoint function for TVM module "default"
 * \param inputs Input tensors for the module
 * \param outputs Output tensors for the module
 */
int32_t tvmgen_default_run(
  struct tvmgen_default_inputs* inputs,
  struct tvmgen_default_outputs* outputs
);

In this example, the neural network takes one tensor input and produces two tensor outputs. The expected input is a 1x3x128x1 tensor in floating point type. The two outputs are 1x6 and 1x60 tensors in floating point type. Define the input/output in your application code and initialize the struct members with pointers to your input/output data. For example:

struct tvmgen_default_inputs  inputs  = { (void*) &my_input[0] };
struct tvmgen_default_outputs outputs = { (void*) &my_output0[0], (void*) &my_output1[0] };

When the skip_normalize=true and output_int=true compiler options are specified, the generated model library may expect integer input and produce integer outputs. Please refer to the tvmgen_default.h file for details about input/output data types and the input feature normalization parameters. For example:

/* The generated model library expects the following inputs/outputs:
 * Inputs:
 *    Tensor[(1, 3, 128, 1), int8]
 * Outputs:
 *    Tensor[(1, 6), int8]
 *    Tensor[(1, 60), uint8]
 */

/* Input feature normalization parameters:
 *   input_int = clip(((int32_t)((input_float + bias) * scale)) >> shift, min, max)
 *   where (min, max) = (-128, 127) if int8 type, (0, 255) if uint8 type
 */
#define TVMGEN_DEFAULT_BIAS_LEN 3
#define TVMGEN_DEFAULT_SCALE_LEN 1
#define TVMGEN_DEFAULT_SHIFT_LEN 1
extern const int32_t tvmgen_default_bias_data[] __attribute__((weak)) = {12, 12, 18};
extern const int32_t tvmgen_default_scale_data[] __attribute__((weak)) = {171};
extern const int32_t tvmgen_default_shift_data[] __attribute__((weak)) = {5};

3.2. Converting Input Data from Float to Int with Input Normalization

When running the model with the skip_normalize=true compiler option, you must convert model input values from floating point to integer before invoking the model inference function. The tvmgen_default_bias_data, TVMGEN_DEFAULT_BIAS_LEN, tvmgen_default_scale_data, TVMGEN_DEFAULT_SCALE_LEN, tvmgen_default_shift_data, and TVMGEN_DEFAULT_SHIFT_LEN definitions in tvmgen_default.h can be used to convert model input from float to int using the example code below.

for (c=0; c<...) {
   int32_t scaled_val = (int32_t)floorf((float_val
       + tvmgen_default_bias_data[c % TVMGEN_DEFAULT_BIAS_LEN])
       * tvmgen_default_scale_data[c % TVMGEN_DEFAULT_SCALE_LEN]);
   int32_t shifted_val = scaled_val >> tvmgen_default_shift_data[c % TVMGEN_DEFAULT_SHIFT_LEN];
   // clip to 8-bit range
}

Note

On C2000 devices, since there are no 8-bit integer data types, int8_t is aliased to int16_t, and uint8_t is aliased to uint16_t. This type aliasing is consistent with the TI C2000Ware SDK. Please use int16_t/uint16_t to declare your input/output tensor data accordingly. The input feature normalization sequence should still clip the values in the range of [-128, 127] or [0, 255], respectively.

3.3. Running Model on Host Processor (ti-npu type=soft)

When running the model on the host processor, the symbol TVMGEN_DEFAULT_TI_NPU_SOFT is defined in the generated header file (tvmgen_default.h).

/* Symbol defined when running model on the host processor */
#define TVMGEN_DEFAULT_TI_NPU_SOFT
#ifdef TVMGEN_DEFAULT_TI_NPU
   #error Conflicting definition for where model should run.
#endif

After the input/output data structure has been set up, running inference is as simple as invoking the inference function.

#include "tvmgen_default.h"

tvmgen_default_run(&inputs, &outputs);

When the inference function returns, the inference results are stored in outputs.

3.4. Running Model on Hardware NPU Accelerator (ti-npu)

Because the NPU accelerator is a separate core that runs asynchronously to the host processor, the accelerator must be initialized by the application before it can be used. Perform this initialization by calling the TI_NPU_init() function.

After invoking the inference function, the application needs to check a volatile variable as follows to see if the inference has been completed.

When running the model on the hardware NPU accelerator, the symbol TVMGEN_DEFAULT_TI_NPU is defined in the generated header file (tvmgen_default.h). Both the NPU initialization function and the flag for checking model completion are declared in the generated header file (tvmgen_default.h).

/* Symbol defined when running model on TI NPU hardware accelerator */
#define TVMGEN_DEFAULT_TI_NPU
#ifdef TVMGEN_DEFAULT_TI_NPU_SOFT
   #error Conflicting definition for where model should run.
#endif

/* TI NPU hardware accelerator initialization */
extern void TI_NPU_init();

/* Flag for model execution completion on TI NPU hardware accelerator */
extern volatile int32_t tvmgen_default_finished;

Example code for running inference on hardware NPU accelerator is as follows:

#include "tvmgen_default.h"

#ifdef TVMGEN_DEFAULT_TI_NPU
TI_NPU_init();  /* one time initialization */
#endif

/* ... other code can go here ... */

tvmgen_default_run(&inputs, &outputs);

/* ... other code can go here ... */

#ifdef TVMGEN_DEFAULT_TI_NPU
while (!tvmgen_default_finished) ;
#endif