Supported Combinations
Parameter | Value |
CPU + OS | r5fss0-0 nortos |
Toolchain | ti-arm-clang |
Boards | am261x-som, am261x-lp |
Example folder | examples/benchmarks/sram_overlay_benchmark |
Introduction
- This demo provides a rough measurement of the CPU cycles taken for a given code execution (Measured in CPI) which is located in different memory regions (L2OCRAM and Flash memory) and also shows the performance when utilizing the overlay feature.
- The example does the following:
- Initializes the drivers and board
- Initializes the cycle counters for counting the CPU cycles.
- The triggers the FLC(fast Local Copy) HW to copy the code from flash to RAM.
- Three iterations are done one with code executing from ram next from flash(XIP) and third with FLC overlay.
- Average cycles for code execution is calculated in example.
Performance optimization for overlay
- Note
- Here, code does not wait for function to be copied entirely in the internal RAM.
Looking at the sample output shows that, in case, when CPI of a function is less than 5, then, code should wait for function to get copied else it is really not required to wait for copy to complete and execution of the function can be done right away.
Steps to Run the Example
Building SRAM Overlay application
- When using CCS projects to build, import the CCS project for the required combination and build it using the CCS project menu (see Using SDK with CCS Projects).
- When using makefiles to build, note the required combination and build using make command (see Using SDK with Makefiles)
Running the SRAM Overlay application
Flash the application binary to the device, follow the steps mentioned here (see Flash a Hello World example).
Sample output for SRAM Overlay example