7. Measuring performance

There are simple tools to get the performance numbers like core loadings, DDR bandwidths, HWA loadings, GStreamer element latencies etc.. on the bash terminal.

7.1. GStreamer plugin for Performance measurement

This custom GStreamer plugin allows users to include these non-intrusive elements in the pipeline which overlays the performance information directly on the output image displayed on the screen. The entire processing, is done on native NV12 format which makes it convenient to use along with opTIFlow pipelines. For detailed instructions to use the plugin please refer to tiperfoverlay

A preview of performance overlay on the display is as shown,

../_images/am67a_perf_overlay.jpg

7.2. Perf-stats tool

Perf-stats tool is a simple cpp application which prints stats on the terminal and updates it every second. To use this tool, it needs to be compiled and run in a parallel ssh terminal along with the application. For detailed instructions please refer to perf-stats readme

below is the sample output of the tool

Summary of CPU load,
====================
CPU: mpu1_0: TOTAL LOAD = 43.81 % ( HWI =  0.74 %, SWI =  0.24 % )
CPU: c7x_1: TOTAL LOAD = 12. 0 % ( HWI =  0. 0 %, SWI =  0. 0 % )

HWA performance statistics,
===========================
HWA:  MSC0: LOAD =  6.93 % ( 45 MP/s )
HWA:  MSC1: LOAD =  6.93 % ( 60 MP/s )

DDR performance statistics,
===========================

DDR: READ BW: AVG =  1455 MB/s, PEAK =  6140 MB/s
DDR: WRITE BW: AVG =   332 MB/s, PEAK =  2138 MB/s
DDR: TOTAL BW: AVG =  1787 MB/s, PEAK =  8278 MB/s

7.3. Parse GST Tracers

GStreamer has a feature called tracers to get useful statistics like element wise latency, cpu loading, etc. as a part of GST debug logs. These logs are very verbose and very difficult to interpret in the raw format. We provide a simple python script to parse these logs on the fly and display the stats on the terminal. For detailed instructions to use the script please refer to gst-tracers readme

below is the sample output of the script

+------------------------------------------------------------------------------+
|element                      latency     out-latancy     out-fps    frames    |
+------------------------------------------------------------------------------+
|h264parse0                   1.72        6580.05         0          3         |
|v4l2h264dec0                 329.79      33.29           30         886       |
|tiovxmemalloc0               0.11        33.29           30         886       |
|capsfilter0                  0.08        33.29           30         886       |
|split_01                     20.37       16.65           60         1770      |
|queue0                       0.31        33.30           30         885       |
|capsfilter1                  0.16        33.30           30         885       |
|queue1                       0.22        33.30           30         885       |
|capsfilter3                  0.07        33.30           30         885       |
|tiovxdlpreproc0              1.63        33.30           30         885       |
|capsfilter2                  0.43        33.30           30         885       |
|tidlinferer0                 7.28        33.30           30         885       |
|post_0                       2.57        33.30           30         885       |
|queue2                       0.18        33.30           30         885       |
|mosaic_0                     51.00       33.30           30         883       |
|capsfilter4                  0.14        33.30           30         883       |
|queue3                       30.80       33.34           30         882       |
|tiperfoverlay0               3.40        33.34           30         882       |
+------------------------------------------------------------------------------+