1.3. Quick Start¶

This page gets you from a trained model to a running inference on the EVM in four steps. For a full explanation of each step, see Compiling Models and Running Inference.

1.3.1. Prepare calibration data¶

import numpy as np

# Replace "input" with your model's actual input name, and preprocess()
# and calib_images with your own data loading and preprocessing logic.
calib = np.stack([preprocess(img) for img in calib_images])
np.savez_compressed("calibration.npz", input=calib)  # key must match model input name

1.3.2. Compile the model¶

python -m tvm.driver.tvmc compile model.onnx \
  --target tidl \
  --c7x-codegen 1 \
  --tidl-calibration-input calibration.npz \
  --output ./artifacts/

This produces three files in ./artifacts/: deploy_lib.so, deploy_graph.json, and deploy_param.params.

1.3.3. Copy artifacts to the EVM¶

scp -r ./artifacts/ root@<evm-ip>:~/model/

1.3.4. Run inference on the EVM¶

import tvm
from tvm.contrib import graph_executor
import numpy as np

# Load the compiled module
lib    = tvm.runtime.load_module("model/deploy_lib.so")
graph  = open("model/deploy_graph.json").read()
params = bytearray(open("model/deploy_param.params", "rb").read())

dev = tvm.cpu(0)
module = graph_executor.create(graph, lib, dev)
module.load_params(params)

# Run inference
module.set_input("input", input_data)
module.run()
output = module.get_output(0).numpy()
print(output)

Note

The first inference on the EVM includes TIDL initialization overhead. Run a warm-up inference before measuring performance.

Next steps:

Compiling Models — full TVMC and Python API reference, compilation artifacts, debugging
Running Inference — inference runtimes, performance profiling, debugging

Once inference results match expectations, use the performance profiling tools in Running Inference to measure execution time and identify bottlenecks.