7. Data Sheet for J721e¶

Note

All performance measurements provided in this document are PRELIMINARY and have not necessarily been fully optimized yet.

7.1. Common parameters used for performance benchmarking¶

Any additional parameters or overrides to common parameters, will be specified in respective sections.

Parameter	Value
SoC	J721e
Board	J721e EVM with infotainment card, Fusion1 card and/or GESI card
A72	2 GHZ (L1P, L1D, L2 cache ON)
R5F	1 GHZ (L1P, L1D cache ON)
C6x	1.25 GHZ (32K L1P, 32K L1D, 64K L2 cache)
C7x	1 GHZ (32K L1P, 32K L1D, 64K L2 cache)
MSMC cache	0 bytes
DDR	3733 MT/s
VPAC	650 MHZ
DMPAC	520 MHZ
Others	Cache Enabled, Release Build, Code/Data sections in DDR

7.2. Vision Apps (ADAS, Vision, DL demos)¶

For performance details, see vision apps datasheet here [LINK]. See specific demo page to see peformance numbers for that demo.

7.3. Platform Development Kit (PDK)¶

For performance details, see PDK datasheet here [LINK]

7.4. MCU Software (MCU SW)¶

For performance details, see MCUSW datasheet here [LINK]. See specific demo page to see peformance numbers for that demo.

7.5. TI Deep learning Library (TIDL)¶

For performance details, see TIDL datasheet here [LINK]

7.6. MMALIB¶

For performance details, see MMALIB datasheet here [LINK]

7.7. TI OpenVX (TIOVX)¶

For performance details, see TI OpenVX datasheet here [LINK]

7.8. TI Autonomous Driving Algorithms (TIADALG)¶

For performance details, see TIADALG datasheet here [LINK]

7.9. Video Codec (RTOS H264 video encode/decode)¶

For performance details, see video codec datasheet here [LINK]

7.10. AUTOSAR Benchmark¶

7.10.1. Introduction¶

This benchmark simulates a representative AUTOSAR application in which the control-centric code causes instruction cache misses to dominate the performance of the application. The benchmark allows for tuning the application to mimic a customer application with respect to the number of instruction and data cache misses generated.

7.10.2. Setup¶

The application consists of 16 slave tasks placed in memory such that each task occupies the same cache entry. The tasks are randomly signaled by a master task to perform a series of control and data movement (memcpy) operations. The cache miss ratios are tuned by reducing the size of the data movement so that each task executes for a smaller time, resulting in more context switches and hence an increased instruction cache miss rate.

Cache misses are profiled through PMU statics collection and task switches are counted through a BIOS hook function.

7.10.3. Results¶

The following results were obtained on a single Main domain R5F CPU. The two test profiles shown reflect the control-centric scenario, where the memory copy size is zero, and the data-centric scenario, where memory copy size is 2 KB. The smaller memcpy size results in a much higher rate of task switch and instruction cache misses. All figures below are approximated and vary slightly based on exacty placement of memory sections, but they are agnostic to any specific memory region where code/data is located.

Test Profile	Memcpy size	Task Switch Rate	Instruction Cache Miss Rate	Data Cache Miss Rate
Control Centric	0	300K/sec	3.5M/sec	250K/sec
Data Centric	2048	66K/sec	800K/sec	5M/sec

Execution time is shown in microseconds for code and data placement in various memory regions:

Test Profile	Code/Data in OCMC	Code/Data in MSMC	Code/Data in DDR
Control Centric	3839	4549	6248
Data Centric	23470	30477	41917

All memory sections are specified as Outer and Inner write-back, write-allocate Normal memory.

Some attributes of the application and individual slave tasks are as follows,

Total application image size: 200 KB
Task size: 2 KB