1. J721E Datasheet¶
1.1. Introduction¶
This section provides the performance numbers of device drivers supported in PDK
1.1.1. Setup Details¶
SOC Details | Values |
---|---|
Core | R5F |
Core Operating Speed | 1GHz |
DDR Speed | 4266 MTs |
VPAC Frequency | 650 MHz |
DMPAC Frequency | 520 MHz |
Cache status | Enabled |
Optimization Details | Values |
---|---|
Profile | Release |
Compile Options for R5F | -g -ms -DMAKEFILE_BUILD -c -qq -pdsw225 –endian=little -mv7R5 –abi=eabi -eo.oer5f -ea.ser5f –symdebug:dwarf –embed_inline_assembly –float_support=vfpv3d16 –emit_warnings_as_errors |
Linker Options for R5F | –emit_warnings_as_errors -w -q -u _c_int00 -c -mv7R5 –diag_suppress=10063 -x –zero_init=on |
Code Placement | DDR |
Data Placement | DDR |
1.1.2. Software Performance Numbers¶
1.1.2.1. VHWA¶
VHWA Driver | Configuration | Measured Throughput (MPix/S) |
---|---|---|
DOF | 2MP (2048x1024), 12b Packed, 6 Levels, SR191x96 | 158.65 |
DOF | 1MP (1312x736), 12b Packed, 5 Levels, SR170x124 | 150.93 |
MSC | 1080P, 8b YUV420, 10 Scales output | 609.46 |
NF | 720P, 8b YUV420, Bilateral filter | 615.54 |
SDE | 2MP (2048x1024), 12b Packed, SR 192, LR Enabled | 84.22 |
SDE | 720P, 12b Packed, SR 192, LR Disabled | 98.92 |
LDC | 1080P, 8b YUV420, Single region | 632.98 |
VISS | 1080P, Raw 12 input, 2 frame Merge, YUV420 12b and 8b output | 603.25 |
1.1.2.2. DSS¶
Display Type | Configuration | CPU Load |
---|---|---|
HDMI | 1080P60 RGB888 | 1.0% (MCU2_0) |
DP | 1080P60 BGRA32 | 1.0% (MCU2_0) |
1.1.2.3. CSI-Rx¶
Capture Type | Configuration | CPU Load |
---|---|---|
CSI2Rx Inst 0 | 4CH 1080P30 IMX390 Sensor Raw12 | 1.2% (MCU2_0) |
1.1.2.4. CPSW¶
1.1.2.4.1. TCP (IP stack) Performance¶
1.1.2.4.1.1. CPSW9G - Main domain R5_0 core 0 (mcu2_0)¶
- Main domain R5_0 at 1GHz
- RGMII interface at 1Gbps
- TCP window size: 128 KByte
Single Direction Test
Test | Measured Throughput (Mbps) | CPU Load (%) |
---|---|---|
TCP RX | 92.9 | 100 |
TCP TX | 73.8 | 100 |
Bidirectional Test
Test | Measured Throughput (Mbps) | CPU Load (%) |
---|---|---|
TCP RX | 35.0 | 100 |
TCP TX | 45.2 |
1.1.2.4.1.2. CPSW2G - MCU domain R5 core 0 (mcu1_0)¶
- MCU domain R5_0 at 1GHz
- RGMII interface at 1Gbps
- TCP window size: 128 KByte
Single Direction Test
Test | Measured Throughput (Mbps) | CPU Load (%) |
---|---|---|
TCP RX | 93.4 | 100 |
TCP TX | 70.6 | 100 |
Bidirectional Test
Test | Measured Throughput (Mbps) | CPU Load (%) |
---|---|---|
TCP RX | 33.5 | 100 |
TCP TX | 44.3 |
1.1.2.4.2. UDP (IP stack) Performance¶
1.1.2.4.2.1. CPSW9G - Main domain R5_0 core 0 (mcu2_0)¶
- Main domain R5_0 at 1GHz
- RGMII interface at 1Gbps
Single Direction Test
Test | Measured Throughput (Mbps) | CPU Load (%) | Drop (%) |
---|---|---|---|
UDP RX | 163.0 | 92 | 0.002 |
UDP TX | 128.0 | 100 | 0.000 |
1.1.2.4.2.2. CPSW2G - MCU domain R5 core 0 (mcu1_0)¶
- MCU domain R5_0 at 1GHz
- RGMII interface at 1Gbps
Single Direction Test
Test | Measured Throughput (Mbps) | CPU Load (%) | Drop (%) |
---|---|---|---|
UDP RX | 163.0 | 93 | 0.002 |
UDP TX | 161.0 | 100 | 0.000 |
Note:
- Current performance numbers are preliminary as throughput profiling is not done in fully optimized environment.
1.1.2.5. UDMA¶
1.1.2.5.1. DMA Parameters¶
- Ring Order ID: 0
- Channel Order ID: 0
- Channel DMA Priority: 1
- Channel Bus Priority: 4
- Channel BUS QOS: 4
- Channel TX FIFO depth: 128
- Channel Fetch Word Size: 16
- Channel Burst Size: 64 bytes for normal channel, 128 bytes for HC and UHC channels
1.1.2.5.2. Test Parameters¶
- Type: TR15 Block copy
- TR: one TR per TRPD in PBR mode
- TR Memory: Same as buffer memory (DDR, MSMC or OCMC depends on the test performed)
- Transfer Size: 1 MB read and 1MB write
- 1MB means 1000x1000 bytes and 1KB means 1000 bytes
Note: Throughput numbers mentioned is the combined memory throughput of both read and write operations
1.1.2.5.3. DRU Blockcopy¶
DRU channel performance with TR submitted through ring
Test Description | Throughput (MCU2) | CPU Load (MCU2) | Throughput (C66x_1/2) | CPU Load (C66x_1/2) |
---|---|---|---|---|
[PDK-3501] 1CH DDR 1MB to DDR 1MB | 11262 MB/sec | 100% | 12011 MB/sec | 100% |
[PDK-3502] 1CH MSMC 1KB Circular to DDR 1MB | 17757 MB/sec | 100% | 18657 MB/sec | 100% |
[PDK-3503] 1CH DDR 1MB to MSMC circular 1KB | 20580 MB/sec | 100% | 22844 MB/sec | 100% |
[PDK-3504] 1CH MSMC 1KB to MSMC circular 1KB (1MB per TR) | 27377 MB/sec | 100% | 29413 MB/sec | 100% |
[PDK-3505] Multi CH DDR 1MB to DDR 1MB | 12446 MB/sec (2CH) | 100% | 12679 MB/sec (4CH) | 100% |
[PDK-3506] Multi CH MSMC 1KB to MSMC circular 1KB (1 MB per TR) | 33581 MB/sec (2CH) | 100% | 33554 MB/sec (4CH) | 100% |
1.1.2.6. IPC¶
1.1.2.6.1. Test Set-up¶
Release build binaries are used for measurement
Ring Buffer : Uncached DDR
Buffer to be sent (RPMSG) – Cached DDR
C66x - L2 Cache 128K
C7x - L2 Cache 128K
Software/Application Used : ipc_multicore_perf_test loaded through SBL. Output is printed to UART.
R5F/MPU config : DDR config
- bufferable - 1
- cacheable - 1
- shareable - 0
Capturing Round trip time in us with different data sizes
1.1.2.6.2. Performance - Host Core A72, Bios, 2 GHz¶
Remote Core | 4 Bytes | 8 Bytes | 16 Bytes | 32 Bytes | 64 Bytes | 128 Bytes | 256 Bytes |
---|---|---|---|---|---|---|---|
MCU R5F0 | 20 | 20 | 22 | 25 | 32 | 44 | 70 |
Main R5F0 | 18 | 19 | 20 | 24 | 29 | 41 | 65 |
C66x1 | 17 | 16 | 17 | 16 | 18 | 20 | 25 |
C7x | 20 | 20 | 20 | 20 | 23 | 24 | 25 |
1.1.2.6.3. Performance - Host Core MCU R5F0, 1 GHz¶
Remote Core | 4 Bytes | 8 Bytes | 16 Bytes | 32 Bytes | 64 Bytes | 128 Bytes | 256 Bytes |
---|---|---|---|---|---|---|---|
A72 (bios) | 21 | 21 | 23 | 26 | 32 | 43 | 68 |
Main R5F0 | 17 | 18 | 19 | 22 | 28 | 39 | 65 |
C66x1 | 17 | 17 | 19 | 22 | 28 | 40 | 64 |
C7x | 18 | 18 | 20 | 23 | 29 | 40 | 66 |
1.1.2.6.4. Performance - Host Core MAIN R5F0, 1 GHz¶
Remote Core | 4 Bytes | 8 Bytes | 16 Bytes | 32 Bytes | 64 Bytes | 128 Bytes | 256 Bytes |
---|---|---|---|---|---|---|---|
A72 (Bios) | 17 | 17 | 18 | 21 | 26 | 37 | 59 |
MCU R5F0 | 16 | 15 | 17 | 20 | 25 | 35 | 58 |
Main R5F1 | 16 | 16 | 17 | 21 | 26 | 36 | 59 |
C66x1 | 16 | 15 | 17 | 20 | 25 | 36 | 58 |
C7x | 16 | 16 | 17 | 20 | 25 | 36 | 58 |
1.1.2.6.5. Performance - Host Core C66X1, 1.35 GHz¶
Remote Core | 4 Bytes | 8 Bytes | 16 Bytes | 32 Bytes | 64 Bytes | 128 Bytes | 256 Bytes |
---|---|---|---|---|---|---|---|
A72 (Bios) | 19 | 18 | 18 | 18 | 18 | 22 | 26 |
MCU R5F0 | 26 | 26 | 28 | 30 | 37 | 52 | 81 |
Main R5F0 | 25 | 25 | 27 | 29 | 35 | 48 | 75 |
C66x2 | 23 | 22 | 22 | 21 | 23 | 28 | 35 |
C7x | 30 | 29 | 29 | 28 | 31 | 34 | 37 |
1.1.2.6.6. Performance - Host Core C7x, 1GHz¶
Remote Core | 4 Bytes | 8 Bytes | 16 Bytes | 32 Bytes | 64 Bytes | 128 Bytes | 256 Bytes |
---|---|---|---|---|---|---|---|
A72 (Bios) | 21 | 21 | 21 | 21 | 24 | 23 | 25 |
Mcu R5F0 | 32 | 32 | 34 | 37 | 45 | 55 | 82 |
Main R5F0 | 28 | 29 | 30 | 34 | 42 | 51 | 75 |
C66x1 | 29 | 28 | 28 | 27 | 20 | 31 | 36 |
1.1.2.7. OSPI¶
1.1.2.7.1. OSPI Memory Non Cached Test Set-up¶
- Platform: J721e EVM.
- OS Type: Baremetal/Sysbios
- Core : R5F_0 at 1 GHz, A72_0 at 2 GHz.
- Software/Application Used: OSPI_Flash_TestApp/OSPI_Flash_Dma_TestApp/OSPI_Baremetal_Flash_TestApp/OSPI_Baremetal_Flash_Dma_TestApp
- System Configuration: Cache OFF, Read/Write Buffer in DDR. DMA Enabled/Disabled, Interrupts ON.
1.1.2.7.2. OSPI Read/Write Performance (DDR Octal Mode)¶
OSPI RCLK | OS | CPU | Mode | Write Tput (MB/s) | Write CPU Load | Read Tput (MB/s) | Read CPU Load |
---|---|---|---|---|---|---|---|
133 MHz | Baremetal | R5F_0 | DAC | 0.209 | 100% | 7.125 | 100% |
DAC DMA | 1.501 | 100% | 218 | 100% | |||
INDAC | 1.493 | 100% | 8.25 | 100% | |||
A72_0 | DAC | 0.075 | 100% | 5.625 | 100% | ||
DAC DMA | 1.501 | 100% | 208.875 | 100% | |||
INDAC | 1.505 | 100% | 8.25 | 100% | |||
RTOS | R5F_0 | DAC | 0.209 | 100% | 7.125 | 100% | |
DAC DMA | 1.501 | 0% | 217.625 | 19% | |||
INDAC | 1.498 | 8% | 8.25 | 59% | |||
A72_0 | DAC | 0.075 | 100% | 5.625 | 100% | ||
DAC DMA | 1.501 | 0% | 209 | 22% | |||
INDAC | 1.499 | 4% | 8.25 | 97% | |||
166 MHz | Baremetal | R5F_0 | DAC | 0.238 | 100% | 8 | 100% |
DAC DMA | 1.581 | 100% | 197 | 100% | |||
INDAC | 1.577 | 100% | 10.375 | 100% | |||
A72_0 | DAC | 0.078 | 100% | 4.875 | 100% | ||
DAC DMA | 1.581 | 100% | 171.75 | 100% | |||
INDAC | 1.586 | 100% | 10.375 | 100% | |||
RTOS | R5F_0 | DAC | 0.237 | 100% | 8 | 100% | |
DAC DMA | 1.581 | 0% | 197.375 | 41% | |||
INDAC | 1.583 | 8% | 10.375 | 68% | |||
A72_0 | DAC | 0.078 | 100% | 6.25 | 100% | ||
DAC DMA | 1.581 | 0% | 173.375 | 48% | |||
INDAC | 1.584 | 5% | 10.375 | 100% |
1.1.2.7.3. OSPI Memory Cached Test Set-up¶
- Platform: J721e EVM.
- OS Type: Baremetal/Sysbios
- Core : R5F_0 at 1 GHz, A72_0 at 2 GHz.
- Software/Application Used: OSPI_Flash_Cache_TestApp/OSPI_Flash_Dma_Cache_TestApp/OSPI_Baremetal_Flash_Cache_TestApp/OSPI_Baremetal_Flash_Dma_Cache_TestApp
- System Configuration: Cache ON, Read/Write Buffer in DDR. DMA Enabled/Disabled, Interrupts ON.
1.1.2.7.4. OSPI Read/Write Performance (DDR Octal Mode)¶
OSPI RCLK | OS | CPU | Mode | Write Tput (MB/s) | Write CPU Load | Read Tput (MB/s) | Read CPU Load |
---|---|---|---|---|---|---|---|
133 MHz | Baremetal | R5F_0 | DAC | 0.299 | 100% | 45 | 100% |
DAC DMA | 1.501 | 100% | 218.25 | 100% | |||
INDAC | 1.491 | 100% | 8.25 | 100% | |||
A72_0 | DAC | 100% | 100% | ||||
DAC DMA | 100% | 100% | |||||
INDAC | 100% | 100% | |||||
RTOS | R5F_0 | DAC | 0.301 | 100% | 45.125 | 100% | |
DAC DMA | 1.501 | 0% | 218.75 | 18% | |||
INDAC | 1.498 | 8% | 8.25 | 59% | |||
A72_0 | DAC | 0.074 | 100% | 5.5 | 100% | ||
DAC DMA | 0.898 | 100% | 209 | 22% | |||
INDAC | 1.497 | 4% | 8.25 | 98% | |||
166 MHz | Baremetal | R5F_0 | DAC | 0.336 | 100% | 51.25 | 100% |
DAC DMA | 1.581 | 100% | 196.625 | 100% | |||
INDAC | 1.576 | 100% | 10.375 | 100% | |||
A72_0 | DAC | 100% | 100% | ||||
DAC DMA | 100% | 100% | |||||
INDAC | 100% | 100% | |||||
RTOS | R5F_0 | DAC | 0.337 | 100% | 51.625 | 100% | |
DAC DMA | 1.581 | 0% | 198 | 41% | |||
INDAC | 1.583 | 8% | 10.375 | 68% | |||
A72_0 | DAC | 0.077 | 100% | 6.125 | 100% | ||
DAC DMA | 0.978 | 0% | 171.875 | 48% | |||
INDAC | 1.583 | 5% | 10.375 | 100% |
1.1.2.8. MMCSD¶
1.1.2.8.1. Test Set-up¶
- Platform: J721e EVM.
- OS Type: Sysbios
- Core : A72_0, 2 GHz.
- Software/Application Used: MMCSD_<EMMC>_Regression_TestApp (A menu based application which outputs the benchmark numbers on UART)
- System Configuration: Cache ON, Read/Write Buffer in DDR. ADMA enabled, Interrupts ON.
- SD Card used: Sandisk 16GB, Class 10. FAT32 formatted with allocation size = 4K (for optimal FAT32 throughput & compatibility with various cards)
- EMMC: EMMC on J721E EVM. Please refer to the EVM data sheet for details
1.1.2.8.2. SD Card Performance¶
1.1.2.8.2.1. DS Mode (25 MHz, 4-bit) Theoretical Max: 12.5 MB/s¶
Size of transfer (KB) | RAW Write Throughput (MB/s) | RAW Read Throughput (MB/s) | FATFS Write Throughput (MB/s) | FATFS Read Throughput (MB/s) |
---|---|---|---|---|
256 | 9.1059 | 9.4340 | 4.1804 | 7.5307 |
512 | 9.8377 | 10.4257 | 4.5550 | 8.0084 |
1024 | 10.0432 | 10.7388 | 4.9630 | 8.2052 |
2048 | 10.4119 | 10.9066 | 5.8666 | 8.0361 |
5120 | 10.0376 | 10.9829 | 4.7683 | 8.3273 |
1.1.2.8.2.2. HS Mode (50 MHz, 4-bit) Theoretical Max: 50 MB/s¶
Size of transfer (KB) | RAW Write Throughput (MB/s) | RAW Read Throughput (MB/s) | FATFS Write Throughput (MB/s) | FATFS Read Throughput (MB/s) |
---|---|---|---|---|
256 | 15.9483 | 16.4356 | 4.3909 | 11.8113 |
512 | 18.5548 | 19.6683 | 6.2893 | 12.6380 |
1024 | 19.9566 | 20.8116 | 6.5560 | 13.1697 |
2048 | 19.9830 | 21.4463 | 6.5847 | 13.4176 |
5120 | 20.0178 | 21.8337 | 6.2207 | 13.4776 |
1.1.2.8.2.3. SDR12 Mode (25 MHz, 4-bit) Theoretical Max: 12.5 MB/s¶
Size of transfer (KB) | RAW Write Throughput (MB/s) | RAW Read Throughput (MB/s) | FATFS Write Throughput (MB/s) | FATFS Read Throughput (MB/s) |
---|---|---|---|---|
256 | 9.0146 | 9.4187 | 4.2206 | 7.4148 |
512 | 9.7703 | 10.4165 | 4.9643 | 8.0081 |
1024 | 10.0714 | 10.7345 | 4.7311 | 8.2015 |
2048 | 9.6667 | 10.8930 | 5.0503 | 8.3087 |
5120 | 10.0025 | 11.0095 | 4.8343 | 8.3287 |
1.1.2.8.2.4. SDR25 Mode (50 MHz, 4-bit) Theoretical Max: 25 MB/s¶
Size of transfer (KB) | RAW Write Throughput (MB/s) | RAW Read Throughput (MB/s) | FATFS Write Throughput (MB/s) | FATFS Read Throughput (MB/s) |
---|---|---|---|---|
256 | 16.2732 | 16.4143 | 5.6652 | 11.2796 |
512 | 18.3847 | 19.6669 | 6.3413 | 12.6358 |
1024 | 19.0623 | 20.8100 | 6.5959 | 13.1657 |
2048 | 17.4704 | 21.3765 | 6.3836 | 13.4073 |
5120 | 19.6133 | 21.8508 | 6.0397 | 12.5147 |
1.1.2.8.2.5. SDR50 Mode (50 MHz, 4-bit) Theoretical Max: 50 MB/s¶
Size of transfer (KB) | RAW Write Throughput (MB/s) | RAW Read Throughput (MB/s) | FATFS Write Throughput (MB/s) | FATFS Read Throughput (MB/s) |
---|---|---|---|---|
256 | 24.6037 | 26.1130 | 4.5208 | 7.6322 |
512 | 29.9576 | 35.3214 | 4.9401 | 7.9848 |
1024 | 32.6505 | 39.1811 | 4.9564 | 8.1912 |
2048 | 30.3629 | 41.3373 | 4.9362 | 8.2954 |
5120 | 34.7683 | 43.0374 | 4.8785 | 8.3285 |
1.1.2.8.2.6. DDR50 Mode (50 MHz, 4-bit) Theoretical Max: 50 MB/s¶
Size of transfer (KB) | RAW Write Throughput (MB/s) | RAW Read Throughput (MB/s) | FATFS Write Throughput (MB/s) | FATFS Read Throughput (MB/s) |
---|---|---|---|---|
256 | 23.4774 | 25.6365 | 4.2197 | 7.5511 |
512 | 26.2276 | 34.4773 | 4.4524 | 7.9936 |
1024 | 34.0707 | 38.1547 | 4.9994 | 8.2083 |
2048 | 29.2400 | 40.1979 | 5.0277 | 8.3036 |
5120 | 32.5992 | 41.6822 | 4.8337 | 8.3316 |
1.1.2.8.3. EMMC Performance¶
1.1.2.8.3.1. DS Mode (25 MHz, 8-bit) Theoretical Max: 25 MB/s¶
Size of transfer (KB) | RAW Write Throughput (MB/s) | RAW Read Throughput (MB/s) |
256 | 15.9600 | 18.5776 |
512 | 18.1068 | 20.1941 |
1024 | 19.4310 | 21.1389 |
2048 | 20.1785 | 21.6574 |
5120 | 20.6573 | 21.9851 |
1.1.2.8.3.2. HS-SDR Mode (50 MHz, 8-bit) Theoretical Max: 50 MB/s¶
Size of transfer (KB) | RAW Write Throughput (MB/s) | RAW Read Throughput (MB/s) |
256 | 25.6862 | 31.8970 |
512 | 31.7678 | 36.9522 |
1024 | 36.0882 | 40.2272 |
2048 | 38.7699 | 42.1508 |
5120 | 39.6647 | 43.3818 |
1.1.2.8.3.3. HS-DDR Mode (50 MHz, 8-bit) Theoretical Max: 100 MB/s¶
Size of transfer (KB) | RAW Write Throughput (MB/s) | RAW Read Throughput (MB/s) |
256 | 34.8107 | 47.9176 |
512 | 41.8965 | 60.3240 |
1024 | 48.6215 | 69.5793 |
2048 | 53.9672 | 75.5317 |
5120 | 56.1397 | 79.6654 |
1.1.2.8.3.4. HS-200 Mode (200 MHz, 8-bit) Theoretical Max: 200 MB/s¶
Size of transfer (KB) | RAW Write Throughput (MB/s) | RAW Read Throughput (MB/s) |
256 | 37.8881 | 68.9168 |
512 | 46.4331 | 97.8488 |
1024 | 50.7672 | 124.6944 |
2048 | 54.6804 | 145.1625 |
5120 | 55.0597 | 160.8638 |
1.1.2.8.3.5. HS-400 Mode (200 MHz, 8-bit) Theoretical Max: 400 MB/s¶
Size of transfer (KB) | RAW Write Throughput (MB/s) | RAW Read Throughput (MB/s) |
256 | 36.2206 | 84.0709 |
512 | 47.7269 | 130.8260 |
1024 | 51.6706 | 184.4708 |
2048 | 55.3375 | 203.5146 |
5120 | 56.7088 | 208.5778 |
1.1.2.9. CSL-FL based Optimized OSPI Example¶
1.1.2.9.1. CPU Mode - Test Set-up¶
Platform: J721e EVM.
OS Type: Baremetal
Core : R5F_0 at 1 GHz
Software/Application Used: csl_ospi_flash_app
- System Configuration:
- RCLK 133/166 MHz
- Cache ON,
- Buffer & Critical Fxn’s in TCMB,
- DMA Disabled,
- Interrupts OFF.
- Theoretical Max Throughput:
- 133 MHz :- 253.67 MB/s
- 166 MHz :- 316.62 MB/s
1.1.2.9.2. DAC Mode OSPI Read Performance (Dual Data Rate - Octal Mode)¶
OSPI RCLK | Size of transfer (B) | Read Time (ns) | Throughput (MB/s) |
---|---|---|---|
133 MHz | 16 | 815 | 19.6 |
32 | 1445 | 22.1 | |
64 | 2700 | 23.7 | |
128 | 5225 | 24.5 | |
256 | 10265 | 24.9 | |
512 | 20360 | 25.1 | |
1024 | 40510 | 25.3 | |
166 MHz | 16 | 945 | 16.9 |
32 | 2330 | 13.7 | |
64 | 4580 | 14.0 | |
128 | 9105 | 14.1 | |
256 | 18145 | 14.1 | |
512 | 36185 | 14.1 | |
1024 | 72295 | 14.2 |
1.1.2.9.3. DMA Mode - Test Set-up¶
Platform: J721e EVM.
OS Type: Baremetal
Core : R5F_0 at 1 GHz
Software/Application Used: udma_baremetal_ospi_flash_testapp
- System Configuration:
- RCLK 133/166 MHz
- Cache ON,
- Buffer & Critical Fxn’s in TCMB,
- DMA Enabled - SW Trigger mode,
- Interrupts OFF.
- Theoretical Max Throughput:
- 133 MHz :- 253.67 MB/s
- 166 MHz :- 316.62 MB/s
1.1.2.9.4. DAC DMA Mode OSPI Read Performance (Dual Data Rate - Octal Mode)¶
OSPI RCLK | Size of transfer (B) | Read Time (ns) | Throughput (MB/s) |
---|---|---|---|
133 MHz | 16 | 800 | 20 |
32 | 805 | 39.8 | |
64 | 970 | 66 | |
128 | 1315 | 97.3 | |
256 | 1955 | 130.9 | |
512 | 3120 | 164.1 | |
1024 | 5450 | 187.9 | |
166 MHz | 16 | 675 | 23.7 |
32 | 805 | 39.8 | |
64 | 850 | 75.3 | |
128 | 1180 | 108.5 | |
256 | 1685 | 151.9 | |
512 | 2730 | 187.5 | |
1024 | 4670 | 219.3 |
1.1.2.10. SBL OSPI Boot Performance App¶
1.1.2.10.1. Test Set-up¶
- Platform: J721e EVM.
- OS Type: Baremetal
- Core : R5F_0 at 1 GHz
- Software/Application Used: sbl_cust_img (with custom flags) and sbl_boot_perf_test appimage
- Please note that these performance numbers were from 8.2 release.
1.1.2.10.2. GP EVM Performance¶
SBL Boot Time Breakdown | Time (ms) |
MCU_PORZ_OUT to MCU_RESETSTATz | 0.63 |
ROM : init + SBL load from OSPI | 14.00 |
SBL : Board_init (PINMUX) | 2.90 |
SBL: SPI_init | 0.15 |
SBL : SBL_SciClientInit: ReadSysfwImage | 6.08 |
Load/Start SYSFW | 7.83 |
Board Config | 2.00 |
PM Config | 2.00 |
RM Config | 0.78 |
Security Config | 0.65 |
SBL: SoC Late-Init | |
SBL : Board_init (PLL) | 1.52 |
SBL: Board_init (CLOCKS) | 0.55 |
SBL: OSPI init | 0.05 |
SBL: Misc (Sciclient_pmSetModule) | |
SBL: App copy to MCU SRAM & Jump to App | 3.90 |
MCUSW: CAN response | 1.00 |
TOTAL time | 44.0 |
1.1.2.11. OSPI Memory Configuration Benchmarking¶
- These numbers were collected from the memory_benchmarking_app demo which provides a means of measuring the performance of a realistic application where the text of the application is sitting in various memory locations and the data is sitting in On-Chip-Memory RAM (referred to as OCM, OCMC or OCMRAM).
- The application executes 10 different configurations of the same text varying by data vs. instruction cache intensity. Each test calls 16 separate functions 500 total times in random order.
- The most instruction intensive example achieves a instruction cache miss rate (ICM/sec) of ~3-4 million per second when run entirely from OCMRAM. This is a rate that we have similarly seen in real-world customer examples.
- More data instensive tests have more repetitive code, achieving much lower ICM rates
- When “Multicore” Configuration is used, it is defined as the execution of the same AUTOSAR application executed simultaneously by means of a synchronization delay on MCU Core 0 (mcu1_0) and MAIN Core 0 (mcu2_0)
- The Memcpy size is just a knob to make the synthetic benchmark application more data or instruction centric with no additional significance. (small memcpy size is more instruction centric with more ICM rate and vice versa)
1.1.2.12. Supported Configurations¶
Core | SOC | Supported Memory Configurations (MEM_CONF) |
---|---|---|
mcu1_0 | j721e | ocmc msmc ddr xip |
mcu2_0 | j721e | ocmc msmc ddr xip |
mcu1_0 + mcu2_0 | j721e | ddr xip |
1.1.2.12.1. Test Set-up¶
- Platform: J721e EVM.
- OS Type: FreeRTOS
- Core – MCU Domain R5_0 (MCU1_0) & Main Domain R5_0 (MCU2_0)
- Software/Application Used: sbl_ospi_img and [MEM_CONF]_memory_benchmarking_app_freertos appimage
1.1.2.12.2. MCU Domain Single Core Execution¶
- Cache miss rate of 3M/sec is at memcpy size ~500 bytes.
- Please note that only XIP 133 MHz benchmarking numbers are taken from 8.2 release due to an issue in measurement with current release.
Memcpy Size | 0 | 50 | 500 | 1000 | 2048 | |
---|---|---|---|---|---|---|
OCMC | OCMC Baseline Execution Time (us) | 4688 | 5004 | 7809 | 9888 | 19851 |
ICM/sec | 4388225 | 4182853 | 2579843 | 2056735 | 1064228 | |
DDR | DDR execution time (us) | 8440 | 8712 | 8934 | 13208 | 23005 |
DDR / OCMC Baseline | 1.800 | 1.741 | 1.144 | 1.335 | 1.158 | |
MSMC | MSMC execution time (us) | 6699 | 7038 | 9273 | 11364 | 20528 |
MSMC / OCMC Baseline | 1.428 | 1.406 | 1.187 | 1.149 | 1.034 | |
XIP | XIP 133 MHz execution time (us) | 13825 | 13813 | 15940 | 18270 | 29760 |
XIP 133 MHz / OCMC Baseline | 3.609 | 3.404 | 2.63 | 2.29 | 1.827 | |
XIP 166 MHz execution time (us) | 14561 | 14816 | 17914 | 20155 | 32672 | |
XIP 166 MHz / OCMC Baseline | 3.106 | 2.960 | 2.294 | 2.038 | 0.607 |
1.1.2.12.3. MAIN Domain Single Core Execution¶
- Cache miss rate of 3M/sec is at memcpy size of ~0 bytes.
- Please note that only XIP 133 MHz benchmarking numbers are taken from 8.2 release due to an issue in measurement with current release.
Memcpy Size | 0 | 50 | 500 | 1000 | 2048 | |
---|---|---|---|---|---|---|
OCMC | OCMC Baseline Execution Time (us) | 6114 | 6371 | 10089 | 12188 | 26229 |
ICM/sec | 3176807 | 3059959 | 2857581 | 1594108 | 761942 | |
DDR | DDR execution time (us) | 9316 | 9609 | 13230 | 15643 | 30689 |
DDR / OCMC Baseline | 1.523 | 1.508 | 1.311 | 1.283 | 1.170 | |
MSMC | MSMC execution time (us) | 7764 | 7961 | 11420 | 13806 | 28793 |
MSMC / OCMC Baseline | 1.199 | 1.249 | 1.131 | 1.133 | 1.065 | |
XIP | XIP 133 MHz execution time (us) | 13928 | 14317 | 17129 | 19967 | 37601 |
XIP 133 MHz / OCMC Baseline | 2.316 | 2.287 | 1.837 | 1.721 | 1.445 | |
XIP 166 MHz execution time (us) | 17700 | 17938 | 21901 | 24249 | 40798 | |
XIP 166 MHz / OCMC Baseline | 2.894 | 2.815 | 2.170 | 1.989 | 1.555 |
1.1.2.12.4. MCU Domain Multi-Core Execution¶
- Cache miss rate of 3M/sec is at memcpy size ~500 bytes.
- Please note that OCMC and XIP 133 MHz memory benchmarking numbers are taken from 8.2 release due to an issue in measurement with current release.
Memcpy Size | 0 | 50 | 500 | 1000 | 2048 | |
---|---|---|---|---|---|---|
OCMC | OCMC Baseline Execution Time (us) | 3831 | 4058 | 6060 | 7977 | 16288 |
ICM/sec | 4912033 | 4712173 | 3078877 | 2380468 | 1173379 | |
DDR | DDR execution time (us) | 8135 | 8483 | 10897 | 12989 | 22813 |
DDR / OCMC Baseline | 2.123 | 2.090 | 1.798 | 1.628 | 1.400 | |
XIP | XIP 133 MHz execution time (us) | 18439 | 18916 | 21984 | 24860 | 38519 |
XIP 133 MHz / OCMC Baseline | 4.813 | 4.661 | 3.628 | 3.116 | 2.365 | |
XIP 166 MHz execution time (us) | 18571 | 18884 | 21225 | 24189 | 36238 | |
XIP 166 MHz / OCMC Baseline | 4.847 | 4.653 | 3.502 | 3.032 | 2.224 |
1.1.2.12.5. MAIN Domain Multi-Core Execution¶
- Cache miss rate of 3M/sec is at memcpy size of ~0 bytes.
- Please note that only OCMC and XIP 133 MHz memory benchmarking numbers are taken from 8.2 release due to an issue in measurement with current release.
Memcpy Size | 0 | 50 | 500 | 1000 | 2048 | |
---|---|---|---|---|---|---|
OCMC | OCMC Baseline Execution Time (us) | 6013 | 6260 | 9326 | 11599 | 26018 |
ICM/sec | 3091801 | 2964217 | 1899099 | 1559875 | 724383 | |
DDR | DDR execution time (us) | 9507 | 9663 | 12839 | 15696 | 30972 |
DDR / OCMC Baseline | 1.581 | 1.543 | 1.376 | 1.353 | 1.190 | |
XIP | XIP 133 MHz execution time (us) | 19077 | 19961 | 22791 | 25153 | 43066 |
XIP 133 MHz / OCMC Baseline | 3.173 | 3.189 | 2.444 | 2.169 | 1.655 | |
XIP 166 MHz execution time (us) | 23180 | 23506 | 27375 | 30321 | 46702 | |
XIP 166 MHz / OCMC Baseline | 3.854 | 3.754 | 2.935 | 2.614 | 1.794 |
1.1.2.12.6. Extra OCMC Baseline Details - MCU Domain¶
- View ICM/sec row to see that cache miss rate of 3M/sec is at memcpy size of ~500 bytes.
Mem Cpy Size | 0 | 50 | 100 | 200 | 500 | 750 | 1000 | 1250 | 1500 | 2048 |
---|---|---|---|---|---|---|---|---|---|---|
Start Time in Usec | 54039 | 333045 | 615045 | 898045 | 1181042 | 1467040 | 1754038 | 2043040 | 2336037 | 2631037 |
Exec Time in Usec | 4688 | 5004 | 5166 | 5836 | 7809 | 8890 | 9888 | 12248 | 14849 | 19851 |
Task Calls | 500 | 500 | 500 | 500 | 500 | 500 | 500 | 500 | 500 | 500 |
Inst Cache Miss | 20572 | 20931 | 20542 | 20962 | 20146 | 20789 | 20337 | 20703 | 20883 | 21126 |
Inst Cache Acc | 1046776 | 1153490 | 1252084 | 1456485 | 3026784 | 2570821 | 3067214 | 3599033 | 4110426 | 5255838 |
Num Instr Exec | 1287974 | 1463678 | 1637911 | 1990681 | 3026935 | 3916351 | 4783907 | 5672015 | 6535782 | 8456080 |
ICM/sec | 4388225 | 4182853 | 3976384 | 3591843 | 2579843 | 2338470 | 2056735 | 1690316 | 1406357 | 1064228 |
INST/sec | 274738481 | 292501598 | 317055942 | 341103666 | 387601997 | 440534420 | 483809364 | 463097240 | 440149639 | 425977532 |
1.1.2.12.7. Extra OCMC Baseline Details - MAIN Domain¶
- View ICM/sec row to see that cache miss rate of 3M/sec is at memcpy size of ~0 bytes.
Mem Cpy Size | 0 | 50 | 100 | 200 | 500 | 750 | 1000 | 1250 | 1500 | 2048 |
---|---|---|---|---|---|---|---|---|---|---|
Start Time in Usec | 53044 | 332051 | 614050 | 897049 | 1181047 | 1470045 | 1760045 | 2052044 | 2347044 | 2646045 |
Exec Time in Usec | 6114 | 6371 | 6846 | 7534 | 10089 | 11222 | 12188 | 15567 | 19362 | 26229 |
Task Calls | 500 | 500 | 500 | 500 | 500 | 500 | 500 | 500 | 500 | 500 |
Inst Cache Miss | 19423 | 19495 | 19563 | 19868 | 19223 | 19700 | 19429 | 19601 | 20003 | 19985 |
Inst Cache Acc | 992278 | 1094765 | 1198109 | 1401401 | 2002737 | 2514784 | 3010884 | 3541694 | 4054954 | 5197623 |
Num Instr Exec | 1288762 | 1463547 | 1639434 | 1991088 | 3027313 | 3917126 | 4784174 | 5673336 | 6536863 | 8458329 |
ICM/sec | 3176807 | 3059959 | 2857581 | 2637111 | 1905342 | 1755480 | 1594108 | 1259137 | 1033106 | 761942 |
INST/sec | 210788681 | 229720138 | 239473269 | 264280329 | 300060759 | 349057743 | 392531506 | 364446328 | 337613004 | 322480041 |