1.4. Running Inference¶

TVM inference can be run using a Python script or a C/C++ application. Examples are provided in the TI edgeai-tidl-tools and TI TVM fork repositories on Github. You can use these examples as templates to modify for your own use cases.

The default runtime setup used by edgeai-tidl is DLR (Deep Learning Runtime from Amazon AWS). For the purpose of running TVM compiled models with the DLR runtime, DLR is simply a wrapper around the TVM runtime.

See Inference Explained for further details about TVM inference.

The following Python functions and C++ code are used in the examples provided in the TI TVM fork.

1.4.1. Python¶

The run_model function shows how to run inference with the DLR or the TVM Runtime.

relay.ti_tests.infer_model.run_model(artifacts_folder: str, input_dict, use_dlr: bool)[source]¶

Run model with given input using DLR or TVM Runtime.

Parameters

artifacts_folder – Folder containing compilation artifacts.
input_dict – Dictionary of input name (str) to input data (numpy.ndarray).
use_dlr – If True, use DLR. If False, use the TVM runtime directly.

Returns

results

Return type

List of result tensors.

1.4.2. C++¶

Listing 1.2 shows how to use the C++ and TVM Runtime APIs to (relay_mul example in TI TVM repo)

Load compilation artifacts (shared library, parameter file, and JSON representation of the network graph)
Create a TVM Graph Executor
Set up inputs to the Graph Executor
Run inference on the Graph Executor
Extract outputs from the Graph Executor

Listing 1.2 Running inference using C++ and the TVM Runtime¶

void DeployGraphExecutor() {
  const std::string artifacts_folder("artifacts_relay_mul_c7x_target/");

  // load in the library
  DLDevice dev{kDLCPU, 0};
  tvm::runtime::Module loaded_lib = tvm::runtime::Module::LoadFromFile(artifacts_folder + "deploy_lib.so");

    // Load JSON
  std::ifstream loaded_json(artifacts_folder + "deploy_graph.json");
  std::string json_data((std::istreambuf_iterator<char>(loaded_json)), std::istreambuf_iterator<char>());
  loaded_json.close();

  // Load params from file
  std::ifstream loaded_params(artifacts_folder + "deploy_param.params", std::ios::binary);
  std::string params_data((std::istreambuf_iterator<char>(loaded_params)), std::istreambuf_iterator<char>());
  loaded_params.close();
  TVMByteArray params_arr;
  params_arr.data = params_data.c_str();
  params_arr.size = params_data.length();

  LOG(INFO) << "Creating graph executor...";
  // Create the graph executor module
  int device_type = dev.device_type; // Need an int, the DLDeviceType enum 
                                     // results in an ambiguity for TVMArgsSetter.

  tvm::runtime::Module mod = 
    (*tvm::runtime::Registry::Get("tvm.graph_executor.create"))(json_data,
                                                                loaded_lib,
                                                                device_type,
                                                                dev.device_id);

  // Load params into Graph Executor
  LOG(INFO) << "Loading params ...";
  tvm::runtime::PackedFunc load_params = mod.GetFunction("load_params");
  load_params(params_arr);

  tvm::runtime::PackedFunc set_input           = mod.GetFunction("set_input");
  tvm::runtime::PackedFunc get_output          = mod.GetFunction("get_output");
  tvm::runtime::PackedFunc run                 = mod.GetFunction("run");

  LOG(INFO) << "Initializing inputs ...";
  auto f32 = tvm::runtime::DataType::Float(32);
  tvm::runtime::NDArray a = tvm::runtime::NDArray::Empty({672, 14, 14}, f32, dev);
  tvm::runtime::NDArray b = tvm::runtime::NDArray::Empty({672, 1, 1},   f32, dev);
  tvm::runtime::NDArray c = tvm::runtime::NDArray::Empty({672, 14, 14}, f32, dev);

  for (int i = 0; i < 672; ++i)
    static_cast<float*>(b->data)[i] = 4;

  for (int i = 0; i < 672*14*14; ++i)
    static_cast<float*>(a->data)[i] = i;

  set_input("a", a);
  set_input("b", b);

  // run the code
  LOG(INFO) << "Running ...";
  run();

  // get the output
  get_output(0, c);

  for (int i = 0; i < 672*14*14; ++i)
    ICHECK_EQ(static_cast<float*>(c->data)[i], i * 4);

  LOG(INFO) << "Pass";
}

Listing 1.3 shows how to use the C++ and DLR Runtime APIs to run inference (test_dlr_cpp example in TI TVM repo).

Listing 1.3 Running inference using C++ and the DLR Runtime¶

  // Step 1: Create DLR model from compiled model artifacts
  DLRModelHandle model;
  const char *model_path = "../artifacts/mv2_onnx_J7_target_tidl_c7x";
  if (argc > 1)  model_path = argv[1];
  status = CreateDLRModel(&model, model_path, 1, 0);
  check_status(status, &model, "CreateDLRModel");

  // Step 2: Set input tensor
  const char *model_input_name = "data";
  int64_t shape[4] = {1, 3, 224, 224};
  DLTensor in_tensor = { (void*) airshow,
                         {kDLCPU, 0},
                         4,
                         {kDLFloat, 32, 1},
                         shape,
                         NULL,
                         0
                       };
  status = SetDLRInputTensorZeroCopy(&model, model_input_name, &in_tensor);
  check_status(status, &model, "SetDLRInputTensorZeroCopy");

  // Step 3: Run inference
  status = RunDLRModel(&model);
  check_status(status, &model, "RunDLRModel");

  // Step 4: Get output
  float *probs;
  status = GetDLROutputPtr(&model, 0, (const void**) &probs);
  check_status(status, &model, "GetDLROutputPtr");

  int64_t size;
  int dim;
  int64_t out_shape[8];
  char* type_name;
  status = GetDLROutputSizeDim(&model, 0, &size, &dim);
  check_status(status, &model, "GetDLROutputSizeDim");
  status = GetDLROutputShape(&model, 0, out_shape);
  check_status(status, &model, "GetDLROutputShape");
  status = GetDLROutputType(&model, 0, (const char**) &type_name);
  check_status(status, &model, "GetDLROutputType");

  printf("\nModel output 0 size=%" PRId64 ", dim=%d\n", size, dim);
  printf("Model output 0 shape: ");
  for (int i = 0; i < dim; i++)  printf("%" PRId64 "x", out_shape[i]);
  printf("\n");
  printf("Model output 0 type: %s\n", type_name);

  // Step 5: Interpret results
  int imax = 0;
  for (int i = 0; i < size; i++)
    if (probs[i] > probs[imax])
      imax = i;
  printf("Top 1 index = %d, probability = %f\n\n", imax, probs[imax]);

  // Step 6: Tear down
  status = DeleteDLRModel(&model);
  check_status(status, NULL, "DeleteDLRModel");