7.3. Building and Running Memory Benchmarking Configurations¶
7.3.1. Demo Overview¶
- This demo provides a means of measuring the performance of a realistic application where the text of the application is sitting in various memory locations and the data is sitting in On-Chip-Memory RAM (referred to as OCM, OCMC or OCMRAM).
- The application executes 10 different configurations of the same text varying by data vs. instruction cache intensity. Each test calls 16 separate functions 500 total times in random order.
- The most instruction intensive example achieves a instruction cache miss rate (ICM/sec) of ~3-4 million per second when run entirely from OCMRAM. This is a rate that we have similarly seen in real-world customer examples.
- More data instensive tests have more repetitive code, achieving much lower ICM rates.
Application Output | Description |
---|---|
Mem Cpy size => 100 |
Size of the memcpy in bytes executed by each task |
Exec Time in usec => 2567 |
Amount of time in microseconds |
Iter => 1 |
Number of times the test was run |
Task calls => 500 |
Number of randomly ordered calls to the 16 tasks |
Inst Cache miss => 11421 |
Total instruction cache misses |
Inst Cache acc => 650207 |
Total instruction cache accesses |
num switches => 1469 |
Number of total context switches |
num instr exec => 1029260 |
Total number of executed instructions |
ICM/sec => 4449162 |
Instruction cache misses per second |
INST/sec => 400958317 |
Instructions executed per second |
7.3.2. Supported Configurations¶
Core | SOC | Supported Memory Configs (MEM_CONF) |
---|---|---|
mcu1_0 | j721e | ocmc msmc ddr xip |
mcu2_0 | j721e | ocmc msmc ddr xip |
mcu1_0 + mcu2_0 | j721e | ddr xip |
mcu1_0 | j7200 | ocmc msmc ddr xip |
mcu2_0 | j7200 | ocmc msmc ddr xip |
mcu1_0 + mcu2_0 | j7200 | ddr xip |
7.3.3. How to Build¶
- Go to the build folder, namely, /packages/ti/build/
7.3.3.1. Basic Single Core Benchmarking¶
- Build the bootloader
make sbl_lib_ospi_clean SOC=j721e BOARD=j721e_evm CORE=mcu1_0 -sj
make sbl_ospi_img SOC=j721e BOARD=j721e_evm CORE=mcu1_0 -sj
- Build the application (XIP and non-XIP):
make [MEM_CONF]_memory_benchmarking_app_freertos CORE=[CORE] SOC=j721e BOARD=j721e_evm -sj
7.3.3.2. Multicore Core Benchmarking¶
- If building for XIP, edit /boot/sbl/src/ospi/sbl_ospi.c - be sure to “#define BUILD_XIP” and comment out any line that may say “#undef BUILD_XIP”.
- Run the command:
make sbl_ospi_img SOC=j721e BOARD=j721e_evm -sj
make [MEM_CONF]_memory_benchmarking_app_freertos CORE=mcu1_0 SOC=j721e BOARD=j721e_evm MULTICORE=1 -sj
make [MEM_CONF]_memory_benchmarking_app_freertos CORE=mcu2_0 SOC=j721e BOARD=j721e_evm MULTICORE=1 -sj
../boot/sbl/tools/multicoreImageGen/bin/MulticoreImageGen LE 55 [MEM_CONF]_multicore.appimage 4 ../binary/[MEM_CONF]_memory_benchmarking_app_freertos/bin/j721e_evm/[MEM_CONF]_memory_benchmarking_app_freertos_mcu1_0_release.rprc 6 ../binary/[MEM_CONF]_memory_benchmarking_app_freertos/bin/j721e_evm/[MEM_CONF]_memory_benchmarking_app_freertos_mcu2_0_release.rprc
- The appimage to flash (line 4 of the flash command) should now be the [mem_conf]_multicore.appimage file in your build directory.
- If building for XIP, run the following command as well.
../boot/sbl/tools/multicoreImageGen/bin/MulticoreImageGen LE 55 xip_multicore.appimage_xip 4 ../binary/[MEM_CONF]_memory_benchmarking_app_freertos/bin/j721e_evm/xip_mcu1_0_memory_benchmarking_app_freertos_release.rprc_xip 6 ../binary/[MEM_CONF]_memory_benchmarking_app_freertos/bin/j721e_evm/xip_mcu2_0_memory_benchmarking_app_freertos_release.rprc_xip
7.3.4. How to Run (Linux Host Machine Assumed)¶
7.3.4.1. Via OSPI¶
- Build the ospi sbl image
- Put the EVM in UART boot mode
- Download and install the uniflash tool
7.3.4.1.1. Non XIP Use Cases¶
If using a multicore appimage, just replace flashing step 4 with the appimage that was created in the multicore building step.
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <uniflash_directory>/processors/FlashWriter/j721e_evm/uart_j721e_evm_flash_programmer_release.tiimage -i 0
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/boot/sbl/binary/j721e_evm/ospi/bin/sbl_ospi_img_mcu1_0_release.tiimage -d 3 -o 0
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/drv/sciclient/soc/V1/tifs.bin -d 3 -o 80000
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/binary/[MEM_CONF]_memory_benchmarking_app_freertos/bin/j721e_evm/[MEM_CONF]_memory_benchmarking_app_[CORE]_freertos_release.appimage -d 3 -o 100000
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/board/src/flash/nor/ospi/nor_spi_patterns.bin -d 3 -o 3FE0000
7.3.4.1.2. Single Core XIP¶
- To flash single core XIP:
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <uniflash_directory>/processors/FlashWriter/j721e_evm/uart_j721e_evm_flash_programmer_release.tiimage -i 0
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/boot/sbl/binary/j721e_evm/ospi/bin/sbl_ospi_img_mcu1_0_release.tiimage -d 3 -o 0
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/drv/sciclient/soc/V2/tifs.bin -d 3 -o 80000
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/binary/[MEM_CONF]_memory_benchmarking_app_freertos/bin/j721e_evm/[MEM_CONF]_memory_benchmarking_app_[CORE]_freertos_release.appimage -d 3 -o 100000
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/binary/[MEM_CONF]_memory_benchmarking_app_freertos/bin/j721e_evm/[MEM_CONF]_memory_benchmarking_app_[CORE]_freertos_release.appimage_xip -d 3
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/board/src/flash/nor/ospi/nor_spi_patterns.bin -d 3 -o 3FE0000
7.3.4.1.3. Multicore XIP¶
- To flash multicore XIP (be sure to see the build instructions
above for getting xip_multicore.appimage and
xip_multicore.appimage_xip):
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <uniflash_directory>/processors/FlashWriter/j721e_evm/uart_j721e_evm_flash_programmer_release.tiimage -i 0
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/boot/sbl/binary/j721e_evm/ospi/bin/sbl_ospi_img_mcu1_0_release.tiimage -d 3 -o 0
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/drv/sciclient/soc/V2/tifs.bin -d 3 -o 80000
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/build/xip_multicore.appimage -d 3 -o 100000
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/build/xip_multicore.appimage_xip -d 3
sudo <uniflash_directory>/dslite.sh --mode processors -c /dev/ttyUSB1 -f <pdk_path>/packages/ti/board/src/flash/nor/ospi/nor_spi_patterns.bin -d 3 -o 3FE0000
- For mcu1_0: Attach USB cable to UART Terminal 1 of the MCU UART
port (
sudo minicom -D /dev/ttyUSB1
) to see the output of the application - For mcu2_0: Attach USB cable to UART Terminal 0 of the Main UART
port (
sudo minicom -D /dev/ttyUSB0
) to see the output of the application - Power on the EVM in OSPI boot mode
7.3.5. Addtional Notes¶
- Again, XIP cannot run via CCS due to the location in memory where the application code sits.
- When building the sbl_cust_img, if you would like to see more verbose output, you may change the flag in /packages/ti/boot/sbl/sbl_component.mk CUST_SBL_TEST_FLAGS called “-DSBL_LOG_LEVEL” from 1 to 3. However, this will cause the cache miss rate to increase substantially and performance times to decrease. So only use this for debugging reasons, but not for actual performance benchmarking.
- A “Mem Cpy size” of 0, means that no memcpy occurred and the application test was strictly instructions-based.
- When building different memory configurations, it is always a good idea to do a clean build. Some consecutive builds will work, but some also will not, so it is best to be safe by building cleanly.
- Problems have been noted with running at 166Mhz and a chip select of delay of 2. Symptoms can be the uart output being junk characters or the test stopping after simply executing a few of the tasks, but not all of them. This is intermittent and unlikely, but a potential when executing. If you are seeing this issue, change the clock speed to 133M or the CSDA value to 3 in the sbl_ospi.c file.