7.3. Building and Running Memory Benchmarking Configurations¶
7.3.1. Demo Overview¶
- This demo provides a means of measuring the performance of a realistic application where the text of the application is sitting in various memory locations and the data is sitting in On-Chip-Memory RAM (referred to as OCM, OCMC or OCMRAM).
- The application executes 10 different configurations of the same text varying by data vs. instruction cache intensity. Each test calls 16 separate functions 500 total times in random order.
- The most instruction intensive example achieves a instruction cache miss rate (ICM/sec) of ~3-4 million per second when run entirely from OCMRAM. This is a rate that we have similarly seen in real-world customer examples.
- More data instensive tests have more repetitive code, achieving much lower ICM rates.
Application Output | Description |
---|---|
Mem Cpy size => 100 |
Size of the memcpy in bytes executed by each task |
Exec Time in usec => 2567 |
Amount of time in microseconds |
Iter => 1 |
Number of times the test was run |
Task calls => 500 |
Number of randomly ordered calls to the 16 tasks |
Inst Cache miss => 11421 |
Total instruction cache misses |
Inst Cache acc => 650207 |
Total instruction cache accesses |
num switches => 1469 |
Number of total context switches |
num instr exec => 1029260 |
Total number of executed instructions |
ICM/sec => 4449162 |
Instruction cache misses per second |
INST/sec => 400958317 |
Instructions executed per second |
7.3.2. Supported Configurations¶
Core | SOC | Supported Memory Configs (MEM_CONF) |
---|---|---|
mcu1_0 | j721e | ocmc msmc ddr xip |
mcu2_0 | j721e | ocmc msmc ddr xip |
mcu1_0 + mcu2_0 | j721e | ddr xip |
mcu1_0 | j7200 | ocmc msmc ddr xip |
mcu2_0 | j7200 | ocmc msmc ddr xip |
mcu1_0 + mcu2_0 | j7200 | ddr xip |
7.3.3. How to Build¶
- Go to the build folder, namely, /packages/ti/build/
7.3.3.1. Basic Single Core Benchmarking¶
- Build the bootloader
make sbl_lib_cust_clean SOC=j721e BOARD=j721e_evm CORE=mcu1_0 -sj
make sbl_cust_img SOC=j721e BOARD=j721e_evm CORE=mcu1_0 -sj
- Build the application (XIP and non-XIP):
make [MEM_CONF]_memory_benchmarking_app_freertos CORE=[CORE] SOC=j721e BOARD=j721e_evm -sj
7.3.3.2. Multicore Core Benchmarking¶
- If building for XIP, edit /boot/sbl/src/ospi/sbl_ospi.c - be sure to “#define BUILD_XIP” and comment out any line that may say “#undef BUILD_XIP”.
- Run the command:
make sbl_cust_img SOC=j721e BOARD=j721e_evm -sj
make [MEM_CONF]_memory_benchmarking_app_freertos CORE=mcu1_0 SOC=j721e BOARD=j721e_evm MULTICORE=1 -sj
make [MEM_CONF]_memory_benchmarking_app_freertos CORE=mcu2_0 SOC=j721e BOARD=j721e_evm MULTICORE=1 -sj
../boot/sbl/tools/multicoreImageGen/bin/MulticoreImageGen LE 55 [MEM_CONF]_multicore.appimage 8 ../binary/[MEM_CONF]_memory_benchmarking_app_freertos/bin/j721e_evm/[MEM_CONF]_memory_benchmarking_app_freertos_mcu1_0_release.rprc 10 ../binary/[MEM_CONF]_memory_benchmarking_app_freertos/bin/j721e_evm/[MEM_CONF]_memory_benchmarking_app_freertos_mcu2_0_release.rprc
- The appimage to flash (line 4 of the flash command) should now be the [mem_conf]_multicore.appimage file in your build directory.
- If building for XIP, run the following command as well.
../boot/sbl/tools/multicoreImageGen/bin/MulticoreImageGen LE 55 xip_multicore.appimage_xip 8 ../binary/[MEM_CONF]_memory_benchmarking_app_freertos/bin/j721e_evm/xip_memory_benchmarking_app_freertos_mcu1_0_release.rprc_xip 10 ../binary/[MEM_CONF]_memory_benchmarking_app_freertos/bin/j721e_evm/xip_memory_benchmarking_app_freertos_mcu2_0_release.rprc_xip
7.3.4. How to Run (Linux Host Machine Assumed)¶
7.3.4.1. Via OSPI¶
- Build the cust sbl image
- Put the EVM in UART boot mode
- Download and install the uniflash tool
7.3.4.1.1. Non XIP Use Cases¶
If using a multicore appimage, just replace flashing step 4 with the appimage that was created in the multicore building step.
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <uniflash_directory>/processors/FlashWriter/j721e_evm/uart_j721e_evm_flash_programmer_release.tiimage -i 0
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/boot/sbl/binary/j721e_evm/cust/bin/sbl_cust_img_mcu1_0_release.tiimage -d 3 -o 0
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/drv/sciclient/soc/V1/tifs.bin -d 3 -o 80000
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/binary/[MEM_CONF]_memory_benchmarking_app_freertos/bin/j721e_evm/[MEM_CONF]_memory_benchmarking_app_[CORE]_freertos_release.appimage -d 3 -o 100000
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/board/src/flash/nor/ospi/nor_spi_patterns.bin -d 3 -o 3FE0000
7.3.4.1.2. Single Core XIP¶
- To flash single core XIP:
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <uniflash_directory>/processors/FlashWriter/j721e_evm/uart_j721e_evm_flash_programmer_release.tiimage -i 0
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/boot/sbl/binary/j721e_evm/cust/bin/sbl_cust_img_mcu1_0_release.tiimage -d 3 -o 0
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/drv/sciclient/soc/V2/tifs.bin -d 3 -o 80000
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/binary/[MEM_CONF]_memory_benchmarking_app_freertos/bin/j721e_evm/[MEM_CONF]_memory_benchmarking_app_[CORE]_freertos_release.appimage -d 3 -o 100000
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/binary/[MEM_CONF]_memory_benchmarking_app_freertos/bin/j721e_evm/[MEM_CONF]_memory_benchmarking_app_[CORE]_freertos_release.appimage_xip -d 3
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/board/src/flash/nor/ospi/nor_spi_patterns.bin -d 3 -o 3FE0000
7.3.4.1.3. Multicore XIP¶
- To flash multicore XIP (be sure to see the build instructions
above for getting xip_multicore.appimage and
xip_multicore.appimage_xip):
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <uniflash_directory>/processors/FlashWriter/j721e_evm/uart_j721e_evm_flash_programmer_release.tiimage -i 0
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/boot/sbl/binary/j721e_evm/cust/bin/sbl_cust_img_mcu1_0_release.tiimage -d 3 -o 0
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/drv/sciclient/soc/V2/tifs.bin -d 3 -o 80000
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/build/xip_multicore.appimage -d 3 -o 100000
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/build/xip_multicore.appimage_xip -d 3
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/board/src/flash/nor/ospi/nor_spi_patterns.bin -d 3 -o 3FE0000
- For mcu1_0: Attach USB cable to UART Terminal 1 of the MCU UART
port (
sudo minicom -D /dev/ttyUSB1
) to see the output of the application - For mcu2_0: Attach USB cable to UART Terminal 0 of the Main UART
port (
sudo minicom -D /dev/ttyUSB0
) to see the output of the application - Power on the EVM in OSPI boot mode
7.3.5. Addtional Notes¶
- Again, XIP cannot run via CCS due to the location in memory where the application code sits.
- When building the sbl_cust_img, if you would like to see more verbose output, you may change the flag in /packages/ti/boot/sbl/sbl_component.mk CUST_SBL_TEST_FLAGS called “-DSBL_LOG_LEVEL” from 1 to 3. However, this will cause the cache miss rate to increase substantially and performance times to decrease. So only use this for debugging reasons, but not for actual performance benchmarking.
- A “Mem Cpy size” of 0, means that no memcpy occurred and the application test was strictly instructions-based.
- When building different memory configurations, it is always a good idea to do a clean build. Some consecutive builds will work, but some also will not, so it is best to be safe by building cleanly.
- Problems have been noted with running at 166Mhz and a chip select of delay of 2. Symptoms can be the uart output being junk characters or the test stopping after simply executing a few of the tasks, but not all of them. This is intermittent and unlikely, but a potential when executing. If you are seeing this issue, change the clock speed to 133M or the CSDA value to 3 in the sbl_ospi.c file.
- The offset for flashing the nor_spi_patterns.bin is SOC dependednt. Please check SOC specific documentation for the same.
- ttyUSB1 is used as the MCU UART terminal and ttyUSB0 is used as Main UART Terminal. This might not always be the case, please check the same after connecting to the UART.