7. Data Sheet for J721e¶
Note
All performance measurements provided in this document are PRELIMINARY and have not necessarily been fully optimized yet.
7.1. Common parameters used for performance benchmarking¶
Any additional parameters or overrides to common parameters, will be specified in respective sections.
Parameter | Value |
---|---|
SoC | J721e |
Board | J721e EVM with infotainment card, Fusion1 card and/or GESI card |
A72 | 2 GHZ (L1P, L1D, L2 cache ON) |
R5F | 1 GHZ (L1P, L1D cache ON) |
C6x | 1.25 GHZ (32K L1P, 32K L1D, 64K L2 cache) |
C7x | 1 GHZ (32K L1P, 32K L1D, 64K L2 cache) |
MSMC cache | 0 bytes |
DDR | 3733 MT/s |
VPAC | 650 MHZ |
DMPAC | 520 MHZ |
Others | Cache Enabled, Release Build, Code/Data sections in DDR |
7.2. Vision Apps (ADAS, Vision, DL demos)¶
For performance details, see vision apps datasheet here [LINK]. See specific demo page to see peformance numbers for that demo.
7.4. MCU Software (MCU SW)¶
For performance details, see MCUSW datasheet here [LINK]. See specific demo page to see peformance numbers for that demo.
7.8. TI Autonomous Driving Algorithms (TIADALG)¶
For performance details, see TIADALG datasheet here [LINK]
7.9. Video Codec (RTOS H264 video encode/decode)¶
For performance details, see video codec datasheet here [LINK]
7.10. AUTOSAR Benchmark¶
7.10.1. Introduction¶
This benchmark simulates a representative AUTOSAR application in which the control-centric code causes instruction cache misses to dominate the performance of the application. The benchmark allows for tuning the application to mimic a customer application with respect to the number of instruction and data cache misses generated.
7.10.2. Setup¶
The application consists of 16 slave tasks placed in memory such that each task occupies the same cache entry. The tasks are randomly signaled by a master task to perform a series of control and data movement (memcpy) operations. The cache miss ratios are tuned by reducing the size of the data movement so that each task executes for a smaller time, resulting in more context switches and hence an increased instruction cache miss rate.
Cache misses are profiled through PMU statics collection and task switches are counted through a BIOS hook function.
7.10.3. Results¶
The following results were obtained on a single Main domain R5F CPU. The two test profiles shown reflect the control-centric scenario, where the memory copy size is zero, and the data-centric scenario, where memory copy size is 2 KB. The smaller memcpy size results in a much higher rate of task switch and instruction cache misses. All figures below are approximated and vary slightly based on exacty placement of memory sections, but they are agnostic to any specific memory region where code/data is located.
Test Profile | Memcpy size | Task Switch Rate | Instruction Cache Miss Rate | Data Cache Miss Rate |
---|---|---|---|---|
Control Centric | 0 | 300K/sec | 3.5M/sec | 250K/sec |
Data Centric | 2048 | 66K/sec | 800K/sec | 5M/sec |
Execution time is shown in microseconds for code and data placement in various memory regions:
Test Profile | Code/Data in OCMC | Code/Data in MSMC | Code/Data in DDR |
---|---|---|---|
Control Centric | 3839 | 4549 | 6248 |
Data Centric | 23470 | 30477 | 41917 |
All memory sections are specified as Outer and Inner write-back, write-allocate Normal memory.
Some attributes of the application and individual slave tasks are as follows,
- Total application image size: 200 KB
- Task size: 2 KB