1.3. Quick Start¶
This page gets you from a trained model to a running inference on the EVM in four steps. For a full explanation of each step, see Compiling Models and Running Inference.
1.3.1. Prepare calibration data¶
import numpy as np
# Replace "input" with your model's actual input name, and preprocess()
# and calib_images with your own data loading and preprocessing logic.
calib = np.stack([preprocess(img) for img in calib_images])
np.savez_compressed("calibration.npz", input=calib) # key must match model input name
1.3.2. Compile the model¶
python -m tvm.driver.tvmc compile model.onnx \
--target tidl \
--c7x-codegen 1 \
--tidl-calibration-input calibration.npz \
--output ./artifacts/
This produces three files in ./artifacts/: deploy_lib.so,
deploy_graph.json, and deploy_param.params.
1.3.3. Copy artifacts to the EVM¶
scp -r ./artifacts/ root@<evm-ip>:~/model/
1.3.4. Run inference on the EVM¶
import tvm
from tvm.contrib import graph_executor
import numpy as np
# Load the compiled module
lib = tvm.runtime.load_module("model/deploy_lib.so")
graph = open("model/deploy_graph.json").read()
params = bytearray(open("model/deploy_param.params", "rb").read())
dev = tvm.cpu(0)
module = graph_executor.create(graph, lib, dev)
module.load_params(params)
# Run inference
module.set_input("input", input_data)
module.run()
output = module.get_output(0).numpy()
print(output)
Note
The first inference on the EVM includes TIDL initialization overhead. Run a warm-up inference before measuring performance.
Next steps:
Compiling Models — full TVMC and Python API reference, compilation artifacts, debugging
Running Inference — inference runtimes, performance profiling, debugging
Once inference results match expectations, use the performance profiling tools in Running Inference to measure execution time and identify bottlenecks.