7. Data Sheet for J721e

Note

All performance measurements provided in this document are PRELIMINARY and have not necessarily been fully optimized yet.

7.1. Common parameters used for performance benchmarking

Any additional parameters or overrides to common parameters, will be specified in respective sections.

Parameter Value
SoC J721e
Board J721e EVM with infotainment card, Fusion1 card and/or GESI card
A72 2 GHZ (L1P, L1D, L2 cache ON)
R5F 1 GHZ (L1P, L1D cache ON)
C6x 1.25 GHZ (32K L1P, 32K L1D, 64K L2 cache)
C7x 1 GHZ (32K L1P, 32K L1D, 64K L2 cache)
MSMC cache 0 bytes
DDR 3733 MT/s
VPAC 650 MHZ
DMPAC 520 MHZ
Others Cache Enabled, Release Build, Code/Data sections in DDR

7.2. Vision Apps (ADAS, Vision, DL demos)

For performance details, see vision apps datasheet here [LINK]. See specific demo page to see peformance numbers for that demo.

7.3. Platform Development Kit (PDK)

For performance details, see PDK datasheet here [LINK]

7.4. MCU Software (MCU SW)

For performance details, see MCUSW datasheet here [LINK]. See specific demo page to see peformance numbers for that demo.

7.5. TI Deep learning Library (TIDL)

For performance details, see TIDL datasheet here [LINK]

7.6. MMALIB

For performance details, see MMALIB datasheet here [LINK]

7.7. TI OpenVX (TIOVX)

For performance details, see TI OpenVX datasheet here [LINK]

7.8. TI Autonomous Driving Algorithms (TIADALG)

For performance details, see TIADALG datasheet here [LINK]

7.9. Video Codec (RTOS H264 video encode/decode)

For performance details, see video codec datasheet here [LINK]

7.10. AUTOSAR Benchmark

7.10.1. Introduction

This benchmark simulates a representative AUTOSAR application in which the control-centric code causes instruction cache misses to dominate the performance of the application. The benchmark allows for tuning the application to mimic a customer application with respect to the number of instruction and data cache misses generated.

7.10.2. Setup

The application consists of 16 slave tasks placed in memory such that each task occupies the same cache entry. The tasks are randomly signaled by a master task to perform a series of control and data movement (memcpy) operations. The cache miss ratios are tuned by reducing the size of the data movement so that each task executes for a smaller time, resulting in more context switches and hence an increased instruction cache miss rate.

Cache misses are profiled through PMU statics collection and task switches are counted through a BIOS hook function.

7.10.3. Results

The following results were obtained on a single Main domain R5F CPU. The two test profiles shown reflect the control-centric scenario, where the memory copy size is zero, and the data-centric scenario, where memory copy size is 2 KB. The smaller memcpy size results in a much higher rate of task switch and instruction cache misses. All figures below are approximated and vary slightly based on exacty placement of memory sections, but they are agnostic to any specific memory region where code/data is located.

Test Profile Memcpy size Task Switch Rate Instruction Cache Miss Rate Data Cache Miss Rate
Control Centric 0 300K/sec 3.5M/sec 250K/sec
Data Centric 2048 66K/sec 800K/sec 5M/sec

Execution time is shown in microseconds for code and data placement in various memory regions:

Test Profile Code/Data in OCMC Code/Data in MSMC Code/Data in DDR
Control Centric 3839 4549 6248
Data Centric 23470 30477 41917

All memory sections are specified as Outer and Inner write-back, write-allocate Normal memory.

Some attributes of the application and individual slave tasks are as follows,

  • Total application image size: 200 KB
  • Task size: 2 KB