1.4. Running Inference¶
TVM inference can be run using a Python script or a C/C++ application. Examples are provided in the TI edgeai-tidl-tools and TI TVM fork repositories on Github. You can use these examples as templates to modify for your own use cases.
The default runtime setup used by edgeai-tidl is DLR (Deep Learning Runtime from Amazon AWS). For the purpose of running TVM compiled models with the DLR runtime, DLR is simply a wrapper around the TVM runtime.
See Inference Explained for further details about TVM inference.
The following Python functions and C++ code are used in the examples provided in the TI TVM fork.
1.4.1. Python¶
The run_model
function shows how to run inference with the DLR or the TVM Runtime.
- relay.ti_tests.infer_model.run_model(artifacts_folder: str, input_dict, use_dlr: bool)[source]¶
Run model with given input using DLR or TVM Runtime.
- Parameters
artifacts_folder – Folder containing compilation artifacts.
input_dict – Dictionary of input name (str) to input data (numpy.ndarray).
use_dlr – If True, use DLR. If False, use the TVM runtime directly.
- Returns
results
- Return type
List of result tensors.
1.4.2. C++¶
Listing 1.2 shows how to use the C++ and TVM Runtime APIs to (relay_mul example in TI TVM repo)
Load compilation artifacts (shared library, parameter file, and JSON representation of the network graph)
Create a TVM Graph Executor
Set up inputs to the Graph Executor
Run inference on the Graph Executor
Extract outputs from the Graph Executor
1void DeployGraphExecutor() {
2 const std::string artifacts_folder("artifacts_relay_mul_c7x_target/");
3
4 // load in the library
5 DLDevice dev{kDLCPU, 0};
6 tvm::runtime::Module loaded_lib = tvm::runtime::Module::LoadFromFile(artifacts_folder + "deploy_lib.so");
7
8 // Load JSON
9 std::ifstream loaded_json(artifacts_folder + "deploy_graph.json");
10 std::string json_data((std::istreambuf_iterator<char>(loaded_json)), std::istreambuf_iterator<char>());
11 loaded_json.close();
12
13 // Load params from file
14 std::ifstream loaded_params(artifacts_folder + "deploy_param.params", std::ios::binary);
15 std::string params_data((std::istreambuf_iterator<char>(loaded_params)), std::istreambuf_iterator<char>());
16 loaded_params.close();
17 TVMByteArray params_arr;
18 params_arr.data = params_data.c_str();
19 params_arr.size = params_data.length();
20
21 LOG(INFO) << "Creating graph executor...";
22 // Create the graph executor module
23 int device_type = dev.device_type; // Need an int, the DLDeviceType enum
24 // results in an ambiguity for TVMArgsSetter.
25
26 tvm::runtime::Module mod =
27 (*tvm::runtime::Registry::Get("tvm.graph_executor.create"))(json_data,
28 loaded_lib,
29 device_type,
30 dev.device_id);
31
32 // Load params into Graph Executor
33 LOG(INFO) << "Loading params ...";
34 tvm::runtime::PackedFunc load_params = mod.GetFunction("load_params");
35 load_params(params_arr);
36
37 tvm::runtime::PackedFunc set_input = mod.GetFunction("set_input");
38 tvm::runtime::PackedFunc get_output = mod.GetFunction("get_output");
39 tvm::runtime::PackedFunc run = mod.GetFunction("run");
40
41 LOG(INFO) << "Initializing inputs ...";
42 auto f32 = tvm::runtime::DataType::Float(32);
43 tvm::runtime::NDArray a = tvm::runtime::NDArray::Empty({672, 14, 14}, f32, dev);
44 tvm::runtime::NDArray b = tvm::runtime::NDArray::Empty({672, 1, 1}, f32, dev);
45 tvm::runtime::NDArray c = tvm::runtime::NDArray::Empty({672, 14, 14}, f32, dev);
46
47 for (int i = 0; i < 672; ++i)
48 static_cast<float*>(b->data)[i] = 4;
49
50 for (int i = 0; i < 672*14*14; ++i)
51 static_cast<float*>(a->data)[i] = i;
52
53 set_input("a", a);
54 set_input("b", b);
55
56 // run the code
57 LOG(INFO) << "Running ...";
58 run();
59
60 // get the output
61 get_output(0, c);
62
63 for (int i = 0; i < 672*14*14; ++i)
64 ICHECK_EQ(static_cast<float*>(c->data)[i], i * 4);
65
66 LOG(INFO) << "Pass";
67}
Listing 1.3 shows how to use the C++ and DLR Runtime APIs to run inference (test_dlr_cpp example in TI TVM repo).
1 // Step 1: Create DLR model from compiled model artifacts
2 DLRModelHandle model;
3 const char *model_path = "../artifacts/mv2_onnx_J7_target_tidl_c7x";
4 if (argc > 1) model_path = argv[1];
5 status = CreateDLRModel(&model, model_path, 1, 0);
6 check_status(status, &model, "CreateDLRModel");
7
8 // Step 2: Set input tensor
9 const char *model_input_name = "data";
10 int64_t shape[4] = {1, 3, 224, 224};
11 DLTensor in_tensor = { (void*) airshow,
12 {kDLCPU, 0},
13 4,
14 {kDLFloat, 32, 1},
15 shape,
16 NULL,
17 0
18 };
19 status = SetDLRInputTensorZeroCopy(&model, model_input_name, &in_tensor);
20 check_status(status, &model, "SetDLRInputTensorZeroCopy");
21
22 // Step 3: Run inference
23 status = RunDLRModel(&model);
24 check_status(status, &model, "RunDLRModel");
25
26 // Step 4: Get output
27 float *probs;
28 status = GetDLROutputPtr(&model, 0, (const void**) &probs);
29 check_status(status, &model, "GetDLROutputPtr");
30
31 int64_t size;
32 int dim;
33 int64_t out_shape[8];
34 char* type_name;
35 status = GetDLROutputSizeDim(&model, 0, &size, &dim);
36 check_status(status, &model, "GetDLROutputSizeDim");
37 status = GetDLROutputShape(&model, 0, out_shape);
38 check_status(status, &model, "GetDLROutputShape");
39 status = GetDLROutputType(&model, 0, (const char**) &type_name);
40 check_status(status, &model, "GetDLROutputType");
41
42 printf("\nModel output 0 size=%" PRId64 ", dim=%d\n", size, dim);
43 printf("Model output 0 shape: ");
44 for (int i = 0; i < dim; i++) printf("%" PRId64 "x", out_shape[i]);
45 printf("\n");
46 printf("Model output 0 type: %s\n", type_name);
47
48 // Step 5: Interpret results
49 int imax = 0;
50 for (int i = 0; i < size; i++)
51 if (probs[i] > probs[imax])
52 imax = i;
53 printf("Top 1 index = %d, probability = %f\n\n", imax, probs[imax]);
54
55 // Step 6: Tear down
56 status = DeleteDLRModel(&model);
57 check_status(status, NULL, "DeleteDLRModel");