7.3. Building and Running Memory Benchmarking Configurations

7.3.1. Demo Overview

  • This demo provides a means of measuring the performance of a realistic application where the text of the application is sitting in various memory locations and the data is sitting in On-Chip-Memory RAM (referred to as OCM, OCMC or OCMRAM).
  • The application executes 10 different configurations of the same text varying by data vs. instruction cache intensity. Each test calls 16 separate functions 500 total times in random order.
  • The most instruction intensive example achieves a instruction cache miss rate (ICM/sec) of ~3-4 million per second when run entirely from OCMRAM. This is a rate that we have similarly seen in real-world customer examples.
  • More data instensive tests have more repetitive code, achieving much lower ICM rates.
Application Output Description
Mem Cpy size    => 100 Size of the memcpy in bytes executed by each task
Exec Time in usec => 2567 Amount of time in microseconds
Iter            => 1 Number of times the test was run
Task calls      => 500 Number of randomly ordered calls to the 16 tasks
Inst Cache miss => 11421 Total instruction cache misses
Inst Cache acc  => 650207 Total instruction cache accesses
num switches    => 1469 Number of total context switches
num instr exec  => 1029260 Total number of executed instructions
ICM/sec         => 4449162 Instruction cache misses per second
INST/sec        => 400958317 Instructions executed per second

7.3.2. Supported Configurations

Core SOC Supported Memory Configs (MEM_CONF)
mcu1_0 j721e ocmc msmc ddr xip
mcu2_0 j721e ocmc msmc ddr xip
mcu1_0 + mcu2_0 j721e ddr xip
mcu1_0 j7200 ocmc msmc ddr xip
mcu2_0 j7200 ocmc msmc ddr xip
mcu1_0 + mcu2_0 j7200 ddr xip

7.3.3. How to Build

  • Go to the build folder, namely, /packages/ti/build/

7.3.3.1. Basic Single Core Benchmarking

  • Build the bootloader
    • make sbl_lib_ospi_clean SOC=j721e BOARD=j721e_evm CORE=mcu1_0 -sj
    • make sbl_ospi_img SOC=j721e BOARD=j721e_evm CORE=mcu1_0 -sj
  • Build the application (XIP and non-XIP):
    • make [MEM_CONF]_memory_benchmarking_app_freertos CORE=[CORE] SOC=j721e BOARD=j721e_evm -sj

7.3.3.2. Multicore Core Benchmarking

  • If building for XIP, edit /boot/sbl/src/ospi/sbl_ospi.c - be sure to “#define BUILD_XIP” and comment out any line that may say “#undef BUILD_XIP”.
  • Run the command:
    • make sbl_ospi_img SOC=j721e BOARD=j721e_evm -sj
    • make [MEM_CONF]_memory_benchmarking_app_freertos CORE=mcu1_0 SOC=j721e BOARD=j721e_evm MULTICORE=1 -sj
    • make [MEM_CONF]_memory_benchmarking_app_freertos CORE=mcu2_0 SOC=j721e BOARD=j721e_evm MULTICORE=1 -sj
    • ../boot/sbl/tools/multicoreImageGen/bin/MulticoreImageGen LE 55 [MEM_CONF]_multicore.appimage 4 ../binary/[MEM_CONF]_memory_benchmarking_app_freertos/bin/j721e_evm/[MEM_CONF]_memory_benchmarking_app_freertos_mcu1_0_release.rprc  6 ../binary/[MEM_CONF]_memory_benchmarking_app_freertos/bin/j721e_evm/[MEM_CONF]_memory_benchmarking_app_freertos_mcu2_0_release.rprc
  • The appimage to flash (line 4 of the flash command) should now be the [mem_conf]_multicore.appimage file in your build directory.
  • If building for XIP, run the following command as well.
    • ../boot/sbl/tools/multicoreImageGen/bin/MulticoreImageGen LE 55 xip_multicore.appimage_xip 4 ../binary/[MEM_CONF]_memory_benchmarking_app_freertos/bin/j721e_evm/xip_mcu1_0_memory_benchmarking_app_freertos_release.rprc_xip 6 ../binary/[MEM_CONF]_memory_benchmarking_app_freertos/bin/j721e_evm/xip_mcu2_0_memory_benchmarking_app_freertos_release.rprc_xip

7.3.4. How to Run (Linux Host Machine Assumed)

7.3.4.1. Via OSPI

  • Build the ospi sbl image
  • Put the EVM in UART boot mode
  • Download and install the uniflash tool

7.3.4.1.1. Non XIP Use Cases

If using a multicore appimage, just replace flashing step 4 with the appimage that was created in the multicore building step.

  • sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <uniflash_directory>/processors/FlashWriter/j721e_evm/uart_j721e_evm_flash_programmer_release.tiimage -i 0
  • sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/boot/sbl/binary/j721e_evm/ospi/bin/sbl_ospi_img_mcu1_0_release.tiimage -d 3 -o 0
  • sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/drv/sciclient/soc/V1/tifs.bin -d 3 -o 80000
  • sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/binary/[MEM_CONF]_memory_benchmarking_app_freertos/bin/j721e_evm/[MEM_CONF]_memory_benchmarking_app_[CORE]_freertos_release.appimage -d 3 -o 100000
  • sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/board/src/flash/nor/ospi/nor_spi_patterns.bin -d 3 -o 3FE0000

7.3.4.1.2. Single Core XIP

  • To flash single core XIP:
    • sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <uniflash_directory>/processors/FlashWriter/j721e_evm/uart_j721e_evm_flash_programmer_release.tiimage -i 0
    • sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/boot/sbl/binary/j721e_evm/ospi/bin/sbl_ospi_img_mcu1_0_release.tiimage -d 3 -o 0
    • sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/drv/sciclient/soc/V2/tifs.bin -d 3 -o 80000
    • sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/binary/[MEM_CONF]_memory_benchmarking_app_freertos/bin/j721e_evm/[MEM_CONF]_memory_benchmarking_app_[CORE]_freertos_release.appimage -d 3 -o 100000
    • sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/binary/[MEM_CONF]_memory_benchmarking_app_freertos/bin/j721e_evm/[MEM_CONF]_memory_benchmarking_app_[CORE]_freertos_release.appimage_xip -d 3
    • sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/board/src/flash/nor/ospi/nor_spi_patterns.bin -d 3 -o 3FE0000

7.3.4.1.3. Multicore XIP

  • To flash multicore XIP (be sure to see the build instructions above for getting xip_multicore.appimage and xip_multicore.appimage_xip):
    • sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <uniflash_directory>/processors/FlashWriter/j721e_evm/uart_j721e_evm_flash_programmer_release.tiimage -i 0
    • sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/boot/sbl/binary/j721e_evm/ospi/bin/sbl_ospi_img_mcu1_0_release.tiimage -d 3 -o 0
    • sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/drv/sciclient/soc/V2/tifs.bin -d 3 -o 80000
    • sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/build/xip_multicore.appimage -d 3 -o 100000
    • sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/build/xip_multicore.appimage_xip -d 3
    • sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/board/src/flash/nor/ospi/nor_spi_patterns.bin -d 3 -o 3FE0000
  • For mcu1_0: Attach USB cable to UART Terminal 1 of the MCU UART port (sudo minicom -D /dev/ttyUSB1) to see the output of the application
  • For mcu2_0: Attach USB cable to UART Terminal 0 of the Main UART port (sudo minicom -D /dev/ttyUSB0) to see the output of the application
  • Power on the EVM in OSPI boot mode

7.3.5. Addtional Notes

  • Again, XIP cannot run via CCS due to the location in memory where the application code sits.
  • When building the sbl_cust_img, if you would like to see more verbose output, you may change the flag in /packages/ti/boot/sbl/sbl_component.mk CUST_SBL_TEST_FLAGS called “-DSBL_LOG_LEVEL” from 1 to 3. However, this will cause the cache miss rate to increase substantially and performance times to decrease. So only use this for debugging reasons, but not for actual performance benchmarking.
  • A “Mem Cpy size” of 0, means that no memcpy occurred and the application test was strictly instructions-based.
  • When building different memory configurations, it is always a good idea to do a clean build. Some consecutive builds will work, but some also will not, so it is best to be safe by building cleanly.
  • Problems have been noted with running at 166Mhz and a chip select of delay of 2. Symptoms can be the uart output being junk characters or the test stopping after simply executing a few of the tasks, but not all of them. This is intermittent and unlikely, but a potential when executing. If you are seeing this issue, change the clock speed to 133M or the CSDA value to 3 in the sbl_ospi.c file.