1.4. Running Inference

TVM inference can be run using a Python script or a C/C++ application. Examples are provided in the TI edgeai-tidl-tools and TI TVM fork repositories on Github. You can use these examples as templates to modify for your own use cases.

The default runtime setup used by edgeai-tidl is DLR (Deep Learning Runtime from Amazon AWS). For the purpose of running TVM compiled models with the DLR runtime, DLR is simply a wrapper around the TVM runtime.

See Inference Explained for further details about TVM inference.

The following Python functions and C++ code are used in the examples provided in the TI TVM fork.

1.4.1. Python

The run_model function shows how to run inference with the DLR or the TVM Runtime.

relay.ti_tests.infer_model.run_model(artifacts_folder: str, input_dict, use_dlr: bool)[source]

Run model with given input using DLR or TVM Runtime.

  • artifacts_folder – Folder containing compilation artifacts.

  • input_dict – Dictionary of input name (str) to input data (numpy.ndarray).

  • use_dlr – If True, use DLR. If False, use the TVM runtime directly.



Return type

List of result tensors.

1.4.2. C++

Listing 1.2 shows how to use the C++ and TVM Runtime APIs to (relay_mul example in TI TVM repo)

  • Load compilation artifacts (shared library, parameter file, and JSON representation of the network graph)

  • Create a TVM Graph Executor

  • Set up inputs to the Graph Executor

  • Run inference on the Graph Executor

  • Extract outputs from the Graph Executor

Listing 1.2 Running inference using C++ and the TVM Runtime
 1void DeployGraphExecutor() {
 2  const std::string artifacts_folder("artifacts_relay_mul_c7x_target/");
 4  // load in the library
 5  DLDevice dev{kDLCPU, 0};
 6  tvm::runtime::Module loaded_lib = tvm::runtime::Module::LoadFromFile(artifacts_folder + "deploy_lib.so");
 8    // Load JSON
 9  std::ifstream loaded_json(artifacts_folder + "deploy_graph.json");
10  std::string json_data((std::istreambuf_iterator<char>(loaded_json)), std::istreambuf_iterator<char>());
11  loaded_json.close();
13  // Load params from file
14  std::ifstream loaded_params(artifacts_folder + "deploy_param.params", std::ios::binary);
15  std::string params_data((std::istreambuf_iterator<char>(loaded_params)), std::istreambuf_iterator<char>());
16  loaded_params.close();
17  TVMByteArray params_arr;
18  params_arr.data = params_data.c_str();
19  params_arr.size = params_data.length();
21  LOG(INFO) << "Creating graph executor...";
22  // Create the graph executor module
23  int device_type = dev.device_type; // Need an int, the DLDeviceType enum 
24                                     // results in an ambiguity for TVMArgsSetter.
26  tvm::runtime::Module mod = 
27    (*tvm::runtime::Registry::Get("tvm.graph_executor.create"))(json_data,
28                                                                loaded_lib,
29                                                                device_type,
30                                                                dev.device_id);
32  // Load params into Graph Executor
33  LOG(INFO) << "Loading params ...";
34  tvm::runtime::PackedFunc load_params = mod.GetFunction("load_params");
35  load_params(params_arr);
37  tvm::runtime::PackedFunc set_input           = mod.GetFunction("set_input");
38  tvm::runtime::PackedFunc get_output          = mod.GetFunction("get_output");
39  tvm::runtime::PackedFunc run                 = mod.GetFunction("run");
41  LOG(INFO) << "Initializing inputs ...";
42  auto f32 = tvm::runtime::DataType::Float(32);
43  tvm::runtime::NDArray a = tvm::runtime::NDArray::Empty({672, 14, 14}, f32, dev);
44  tvm::runtime::NDArray b = tvm::runtime::NDArray::Empty({672, 1, 1},   f32, dev);
45  tvm::runtime::NDArray c = tvm::runtime::NDArray::Empty({672, 14, 14}, f32, dev);
47  for (int i = 0; i < 672; ++i)
48    static_cast<float*>(b->data)[i] = 4;
50  for (int i = 0; i < 672*14*14; ++i)
51    static_cast<float*>(a->data)[i] = i;
53  set_input("a", a);
54  set_input("b", b);
56  // run the code
57  LOG(INFO) << "Running ...";
58  run();
60  // get the output
61  get_output(0, c);
63  for (int i = 0; i < 672*14*14; ++i)
64    ICHECK_EQ(static_cast<float*>(c->data)[i], i * 4);
66  LOG(INFO) << "Pass";

Listing 1.3 shows how to use the C++ and DLR Runtime APIs to run inference (test_dlr_cpp example in TI TVM repo).

Listing 1.3 Running inference using C++ and the DLR Runtime
 1  // Step 1: Create DLR model from compiled model artifacts
 2  DLRModelHandle model;
 3  const char *model_path = "../artifacts/mv2_onnx_J7_target_tidl_c7x";
 4  if (argc > 1)  model_path = argv[1];
 5  status = CreateDLRModel(&model, model_path, 1, 0);
 6  check_status(status, &model, "CreateDLRModel");
 8  // Step 2: Set input tensor
 9  const char *model_input_name = "data";
10  int64_t shape[4] = {1, 3, 224, 224};
11  DLTensor in_tensor = { (void*) airshow,
12                         {kDLCPU, 0},
13                         4,
14                         {kDLFloat, 32, 1},
15                         shape,
16                         NULL,
17                         0
18                       };
19  status = SetDLRInputTensorZeroCopy(&model, model_input_name, &in_tensor);
20  check_status(status, &model, "SetDLRInputTensorZeroCopy");
22  // Step 3: Run inference
23  status = RunDLRModel(&model);
24  check_status(status, &model, "RunDLRModel");
26  // Step 4: Get output
27  float *probs;
28  status = GetDLROutputPtr(&model, 0, (const void**) &probs);
29  check_status(status, &model, "GetDLROutputPtr");
31  int64_t size;
32  int dim;
33  int64_t out_shape[8];
34  char* type_name;
35  status = GetDLROutputSizeDim(&model, 0, &size, &dim);
36  check_status(status, &model, "GetDLROutputSizeDim");
37  status = GetDLROutputShape(&model, 0, out_shape);
38  check_status(status, &model, "GetDLROutputShape");
39  status = GetDLROutputType(&model, 0, (const char**) &type_name);
40  check_status(status, &model, "GetDLROutputType");
42  printf("\nModel output 0 size=%" PRId64 ", dim=%d\n", size, dim);
43  printf("Model output 0 shape: ");
44  for (int i = 0; i < dim; i++)  printf("%" PRId64 "x", out_shape[i]);
45  printf("\n");
46  printf("Model output 0 type: %s\n", type_name);
48  // Step 5: Interpret results
49  int imax = 0;
50  for (int i = 0; i < size; i++)
51    if (probs[i] > probs[imax])
52      imax = i;
53  printf("Top 1 index = %d, probability = %f\n\n", imax, probs[imax]);
55  // Step 6: Tear down
56  status = DeleteDLRModel(&model);
57  check_status(status, NULL, "DeleteDLRModel");