TI Deep Learning Product User Guide
TIDL-RT: Performance issues

Steps to Analyze Performance

  • User can set enableLayerPerfTraces = 1 in TIDL-RT inference configuration file to enable layer level performance.
  • Note that in order to get the performance traces, user should provide a valid callback function as part of TIDL_CreateParams.TIDLVprintf and set debugTraceLevel = 1.
  • With this option, TIDL-RT prints the execution cycles for each layer on console out as shown in below figure:
Network Cycles 6294273
Layer, Layer Cycles,kernelOnlyCycles, coreLoopCycles,LayerSetupCycles,dmaPipeupCycles, dmaPipeDownCycles, PrefetchCycles,copyKerCoeffCycles
1, 81811, 48850, 49277, 7779, 14969, 18, 1007, 16,
2, 71051, 52722, 53246, 1473, 3290, 16, 0, 0,
3, 34063, 16700, 17307, 7379, 3952, 18, 17, 16,
4, 60926, 45133, 45431, 6625, 4176, 18, 777, 9,
5, 29990, 5996, 6040, 871, 3432, 9, 0, 0,
6, 30806, 14975, 15275, 6575, 4114, 61, 10, 9,
7, 20355, 5508, 5810, 6360, 3480, 11, 10, 9,
8, 34670, 20921, 21031, 6222, 2291, 18, 727, 9,
  • In the above figure "Network cycles" tells the total cycles consumed to execute a given network on C7x-MMA. This can be translated to time in ms by division of 10^6 (with C7x-MMA@1GHz)
  • The logs in above figure gives detailed information about various profile points but from end user point of view, data under column "Layer Cycles" shall be used to get layer cycles consumed by a particular layer. Rest all the columns are meant to be used by TI's internal team to debug performance issue.
  • These traces are printed in the same order as various layers gets executed on EVM and user can identify the layer by layerIdx identifier as defined in model visualization output.
  • Per layer execution traces are only applicable for target/EVM execution and are not applicable for host emulation mode of the inference execution

Troubleshooting steps for performance issues

  • Please make sure that the available MSMC SRAM on device is rightly provided to TIDL-RT, the same can be provided by TIDL-RT import configuration parameter "msmcSizeKB". Giving smaller MSMC SRAM may result in performance degradation if it results into more layers going into DDR hence user should carefully provide the best available MSMC size to TIDL-RT.
  • Try changing the value of parameter ENABLE_PERSIT_WT_ALLOC between 1 (default) and 0 in device_config file which in-turn is provided as "perfSimConfig" parameter in TIDL-RT import config file. By default device_config file is used from ti_dl/test/testvecs/config/import/device_config.cfg.
  • Please make sure that TIDL_deactivate function is called appropriately, user should call TIDL_deactivate function whenever they switch from one network to the other.
  • If users application can allow processing of multiple inputs together in a batch form, then set numBatches = val in TIDL-RT import config file. This option can give considerable performance improvements especially for small resolution network. User can try different values of batch size, and identify an optimal batch size( batch size of 4 has been identified as good size for 224x224 image sizes). For details about this option please refer here.