7. Measuring performance

The performance visualization tool can be used to view all the performance statistics recorded when running the Edge AI C++ demo application. This includes the CPU and HWA loading, DDR bandwidth, Junction Temperatures and FPS obtained.

7.1. Logging device metrics

Each log file contains real-time values for some performance metrics, averaged over a 2s window. The temperature sensor values are sampled in real time, every 2s. The performance visualization tool then parses these log files one by one based on the modification timestamps.

The Edge AI C++ demo will automatically generate log files and store them in the directory ../perf_logs, that is, one level up from where the C++ app is run. For example, if the app is run from edgeai-gst-apps/apps_cpp, the logs will be stored in edgeai-gst-apps/perf_logs.

Similarly, there is a binary executable that can be compiled that does the same logging standalone. The source for this is available under edgeai-gst-apps/scripts/perf_stats/. The README.md file has simple instructions to build and run this standalone logger binary. After building it, use following command to print the statistics on the terminal as well as save them in log files that can be parsed.

/opt/edgeai-gst-apps/scripts/perf_stats/build# ../bin/Release/ti_perfstats -l

7.1.1. Available metrics

Average frames per second (FPS) recorded by the application is displayed by default. Using the checkboxes in the sidebar, one can select which performance metrics to view. There are 14 metrics available to be plotted, as seen from the above image:

  • CPU Load: Total loading for the A72(mpu1_0), C7x(c7x_0/1/2/3) DSPs

  • HWA Load: Loading (percentage) for the various available hardware accelerators

  • DDR Bandwidth: Average read, write and total bandwidth recorded in the previous 2s interval

  • Junction Temperatures: The live temperatures recorded at various junctions

  • Task Table: A separate graph for each cpu showing the loading due to various tasks running on it

  • Heap Table: A separate graph for each cpu showing the heap memory usage statistics

For the first three metrics, there is a choice to view line graphs with a 30s history or bar graphs with only the real-time values. The remaining eleven have real-time bar graphs as the only option.

7.2. Reporting tools

There are simple tools to get the perf numbers like core loadings, DDR bandwidths, junction temperatures, GStreamer element latencies etc.. on the bash terminal.

7.2.1. tiperfoverlay GStreamer plugin

This custom GStreamer plugin allows users to include these non-intrusive elements in the pipeline which overlays the performance information directly on the output image displayed on the screen. The entire processing, is done on native NV12 format which makes it convenient to use along with opTIFlow pipelines. A preview of performance overlay on the display is as shown,

../_images/tda4vm_perf_overlay.jpg

7.2.2. Perf-stats tool

Perf-stats tool is a simple cpp application which prints stats on the terminal and updates it every second. To use this tool, it needs to be compiled and run in a parallel ssh terminal along with the application. For detailed instructions please refer to edgeai-gst-apps/scripts/perf_stats/README.md

below is the sample output of the tool

Summary of CPU load,
====================
CPU: mpu1_0: TOTAL LOAD = 43.81 % ( HWI =  0.74 %, SWI =  0.24 % )
CPU: c7x_1: TOTAL LOAD = 12. 0 % ( HWI =  0. 0 %, SWI =  0. 0 % )

HWA performance statistics,
===========================
HWA:  MSC0: LOAD =  6.93 % ( 45 MP/s )
HWA:  MSC1: LOAD =  6.93 % ( 60 MP/s )

DDR performance statistics,
===========================

DDR: READ BW: AVG =  1455 MB/s, PEAK =  6140 MB/s
DDR: WRITE BW: AVG =   332 MB/s, PEAK =  2138 MB/s
DDR: TOTAL BW: AVG =  1787 MB/s, PEAK =  8278 MB/s

7.2.3. Parse GST Tracers

GStreamer has a feature called tracers to get useful statistics like element wise latency, cpu loading, etc. as a part of GST debug logs. These logs are very verbose and very difficult to interpret in the raw format. We provide a simple python script to parse these logs on the fly and display the stats on the terminal. For detailed instructions to use the script please refer to edgeai-gst-apps/scripts/gst_tracers/README.md

below is the sample output of the script

+------------------------------------------------------------------------------+
|element                      latency     out-latancy     out-fps    frames    |
+------------------------------------------------------------------------------+
|h264parse0                   1.72        6580.05         0          3         |
|v4l2h264dec0                 329.79      33.29           30         886       |
|tiovxmemalloc0               0.11        33.29           30         886       |
|capsfilter0                  0.08        33.29           30         886       |
|split_01                     20.37       16.65           60         1770      |
|queue0                       0.31        33.30           30         885       |
|capsfilter1                  0.16        33.30           30         885       |
|queue1                       0.22        33.30           30         885       |
|capsfilter3                  0.07        33.30           30         885       |
|tiovxdlpreproc0              1.63        33.30           30         885       |
|capsfilter2                  0.43        33.30           30         885       |
|tidlinferer0                 7.28        33.30           30         885       |
|post_0                       2.57        33.30           30         885       |
|queue2                       0.18        33.30           30         885       |
|mosaic_0                     51.00       33.30           30         883       |
|capsfilter4                  0.14        33.30           30         883       |
|queue3                       30.80       33.34           30         882       |
|tiperfoverlay0               3.40        33.34           30         882       |
+------------------------------------------------------------------------------+