TI Deep Learning Library User Guide
TIDL: Steps to Analyze Performance

Steps to Analyze Performance

  • Set enableLayerPerfTraces = 1 in inference configuration file to enable layer level performance.
  • This will help in identifying the layers which are running slower than expected.
  • Network cycles describes time taken to run whole network and then time taken by each layer is given below. Hence network performance should be taken from "Network Cycles"
  • There could be small delta between sum of all layers and network cycles considering time taken for common initialization, de-initialization, etc.
  • User should only look for Layer Cycles, remaining values are meant to be shared with TI for further debug.
  • User should check the graphViz output or imported model to see the layer execution order.
  • Please note that debugTraceLevel should be set to 1 to get the right performance numbers
Network Cycles 6294273
Layer, Layer Cycles,kernelOnlyCycles, coreLoopCycles,LayerSetupCycles,dmaPipeupCycles, dmaPipeDownCycles, PrefetchCycles,copyKerCoeffCycles
1, 81811, 48850, 49277, 7779, 14969, 18, 1007, 16,
2, 71051, 52722, 53246, 1473, 3290, 16, 0, 0,
3, 34063, 16700, 17307, 7379, 3952, 18, 17, 16,
4, 60926, 45133, 45431, 6625, 4176, 18, 777, 9,
5, 29990, 5996, 6040, 871, 3432, 9, 0, 0,
6, 30806, 14975, 15275, 6575, 4114, 61, 10, 9,
7, 20355, 5508, 5810, 6360, 3480, 11, 10, 9,
8, 34670, 20921, 21031, 6222, 2291, 18, 727, 9,

Tips to improve runtime performance

  • If user is planning to run the same network on multiple frames or if user only wants to run a single network then performance of the network can be improved depending on how much MSMC memory is allocated for TIDL.
  • TIDL has an extra feature to utilize any extra MSMC/L2 memory available after allocating the minimum memory required to run the network. This extra memory is intelligently used to avoid fetching the kernel weights for some of the layers depending on how much free memory is available in L2/MSMC. TIDL will continue using this extra memory available to avoid re-fetching the kernels till user call TIDL_deactivate function.
  • By default this feature is enabled and if required it can be disabled by setting ENABLE_PERSIT_WT_ALLOC = 0 in device_config file which needs to provided as part of the import config file ( with parameter perfSimConfig)
  • This feature is typically useful when resolution of the network is small and user wants to run it on multiple images/ROI's.
  • As an example, let's say user imported TIDL model with 7MB of MSMC ( this is configured by setting MSMCSIZE_KB in the perfSimConfig file given during import, by default device_config.cfg file located in ti_dl/test/testvecs/config/import directory).
  • For small resolution lets say TIDL only needs a minimum of 4MB of MSMC memory. Then TIDL can utilize the rest 3MB of MSMC to fetch the weights of the layers which can fit into it and only copy them during TIDL_activate function.
  • Using this feature kernel weights for some layers ( depending on extra memory available) will only be fetch during TIDL_activate call and hence if user doesn't call TIDL_deactivate function for multiple frames then weights will not be fetch again and again for each frame. This can improve performance specially for small resolution networks.
  • Note that user should appropriately call TIDL_deactivate function across multiple different network. Typically user should call TIDL_deactivate whenever they switch from one network to the other