Performance Visualization Tool

The performance visualization tool can be used to view all the performance statistics recorded when running the edge AI C++ demo application. This includes the CPU and HWA loading, DDR bandwidth and FPS obtained. Refer to Available options for details on the performance metrics available to be plotted.

The edge AI C++ demo will generate log files in the directory edge_ai_apps/perf_logs. Each log file contains real-time values for some performance metrics, averaged over a 2s window. The performance visualization tool then parses these log files one by one, and plots the performance metrics to be viewed in any browser. This can be run in parallel with the edge AI C++ demo application or offline after having generated the log files.

Note

While the log files are always generated when running the C++ demo, the performance visualization tool can only be run inside Docker environment for the current release. Please refer to Docker Environment for building and running a docker container.

Running the tool

To use this tool, simply start a docker session and then run the command given below. This script expects some log files to be present in the directory edge_ai_apps/perf_logs after running any C++ demo. One can also bring up this tool while running the demo but it might affect the performance of the demo itself as it consumes a bit of ARM cycles during launch but stabilizes over a certain duration.

[docker] root@j7-evm:/opt/edge_ai_apps# streamlit run scripts/perf_vis.py --theme.base="light"

A network URL can be seen in the terminal output. The graphs can be viewed by visiting this URL in any browser. The plotted graphs will keep updating based on the available log files.

_images/perf_plots.png

Fig. 19 Performance visualizer dashboard showing CPU and HWA loading and DDR bandwidth

To exit press Ctrl+C in the terminal.

Note

The above screenshot is for image classification task using the model : TVM-CL-3410-gluoncv-mxnet-mobv2. Also note that the lighting conditions affect FPS obtained from the camera, and the above statistics were obtained at about 24 FPS. When running at 30 FPS, there is an additional 10% increase in the A72 load.

Available options

Using the checkboxes in the sidebar, one can select which performance metrics to view. There are 14 metrics available to be plotted, as seen from the above image:

  • CPU Load: Total loading for the A72(mpu1_0), R5F(mcu2_0/1), C66x(c6x_1/2) and C71x(c7x_1) DSPs.

  • HWA Load: Loading (percentage) for the various available hardware accelerators.

  • DDR Bandwidth: Average read, write and total bandwidth recorded in the previous 2s interval.

  • FPS Statistics: Average frames per second recorded by the application.

  • Task Table: A separate graph for each cpu showing the loading due to various tasks running on it.

  • Heap Table: A separate graph for each cpu showing the heap memory usage statistics.

For the first four metrics, there is a choice to view line graphs with a 30s history or bar graphs with only the real-time values. The remaining ten have real-time bar graphs as the only option.