1. J721E Datasheet

1.1. Introduction

This section provides the performance numbers of device drivers supported in PDK

1.1.1. Setup Details

SOC Details

Values

Core

R5F

Core Operating Speed

1GHz

DDR Speed

4266 MTs

Cache status

Enabled

Optimization Details

Values

Profile

Release

Compile Options for R5F

-g -ms -DMAKEFILE_BUILD -c -qq -pdsw225 –endian=little -mv7R5 –abi=eabi -eo.oer5f -ea.ser5f –symdebug:dwarf –embed_inline_assembly –float_support=vfpv3d16 –emit_warnings_as_errors

Linker Options for R5F

–emit_warnings_as_errors -w -q -u _c_int00 -c -mv7R5 –diag_suppress=10063 -x –zero_init=on

Code Placement

DDR

Data Placement

DDR

1.1.2. Software Performance Numbers

1.1.2.1. DSS

Display Type

Configuration

FPS

CPU Load

HDMI

1080P60 RGB888

60

1.0% (MCU2_0)

DP

1080P60 BGRA32

60

1.0% (MCU2_0)

1.1.2.2. CSI-Rx

Capture Type

Configuration

CPU Load

CSI2Rx Inst 0

4CH 1080P30 IMX390 Sensor Raw12

1.2% (MCU2_0)

Instance

Configuration

Time taken to receive one frame

ISR latency

CSI2Rx Inst 0

1CH 1080P30 IMX390 Sensor Raw12

33.3ms (MCU2_0)

9us (MCU2_0)

1.1.2.3. CSI-Tx

Instance

Configuration

Time taken to Transmit one frame

ISR latency

CSI2Tx Inst 0

1CH 1080P 2.5GBPS IMX390 Sensor Raw12

6.7ms (MCU2_0)

21us (MCU2_0)

1.1.2.4. CPSW_9G

1.1.2.4.1. Test Setup
_images/enet_j721e_cpsw9g_test_setup.png

Hardware Configuration

Value

Processing Core

Main R5F0 Core 0

Core Frequency

1 GHz

Ethernet Interface Type

RGMII at 1Gbps

Packet buffer memory

DDR

Hardware checksum offload

Yes

Scatter-gather TX

Yes

Scatter-gather RX

No

Software Configuration

Value

RTOS

FreeRTOS

RTOS application

Enet LLD lwIP example

TCP/IP stack

lwIP 2.2.0

Host PC tool version

iperf v2.0.10

1.1.2.4.2. TCP Performance

Test

Bandwidth (Mbps)

CPU Load (%)

TCP RX

144

88

TCP TX

142

100

TCP Bidirectional

RX=74.7 TX=74.5

100

Host PC commands:

iperf -c <evm_ip> -r
iperf -c <evm_ip> -d
1.1.2.4.3. UDP Performance

Test

Datagram Length = 64B

Datagram Length = 256B

Datagram Length = 512B

Datagram Length = 1470B

Bandwidth
(Mbps)

CPU
Load
(%)
Packet
Loss
(%)
Bandwidth
(Mbps)

CPU
Load
(%)
Packet
Loss
(%)
Bandwidth
(Mbps)

CPU
Load
(%)
Packet
Loss
(%)
Bandwidth
(Mbps)

CPU
Load
(%)
Packet
Loss
(%)

UDP RX

4.83

31

0.00

24.2

48

0.00

24.0

35

0.00

24.0

27

0.00

9.68

43

0.00

48.4

79

0.001

48.1

52

0.00

48.1

36

0.00

14.5

56

0.00

105

96.2

87

0.001

96.4

53

0.00

UDP RX (Max)

22.1

77

0.023

51.1

83

0.068

110

90

0.031

235

100

0.28

UDP TX (Max)

24.9

100

0.08

57.0

100

0.004

114

100

0.003

327

100

0.002

Host PC commands:

  • Test with datagram length of 64B:

    iperf -c <evm_ip> -u -l64 -b<bw> -r
    where <bw> is 5M, 10M, 15M, etc
    
  • Test with datagram length of 256B:

    iperf -c <evm_ip> -u -l256 -b<bw> -r
    where <bw> is 25M, 50M, 100M, etc
    
  • Test with datagram length of 512B:

    iperf -c <evm_ip> -u -l512 -b<bw> -r
    where <bw> is 25M, 50M, 100M, etc
    
  • Test with datagram length of 1470B (max):

    iperf -c <evm_ip> -u -b<bw> -r
    where <bw> is 25M, 50M, 100M, etc
    

1.1.2.5. UDMA

1.1.2.5.1. DMA Parameters
  • Ring Order ID: 0

  • Channel Order ID: 0

  • Channel DMA Priority: 1

  • Channel Bus Priority: 4

  • Channel BUS QOS: 4

  • Channel TX FIFO depth: 128

  • Channel Fetch Word Size: 16

  • Channel Burst Size: 64 bytes for normal channel, 128 bytes for HC and UHC channels

1.1.2.5.2. Test Parameters
  • Type: TR15 Block copy

  • TR: one TR per TRPD in PBR mode

  • TR Memory: Same as buffer memory (DDR, MSMC or OCMC depends on the test performed)

  • Transfer Size: 1 MB read and 1MB write

  • 1MB means 1000x1000 bytes and 1KB means 1000 bytes

Note: Throughput numbers mentioned is the combined memory throughput of both read and write operations

1.1.2.5.3. DRU Blockcopy

DRU channel performance with TR submitted through ring

Test Description

Throughput (MCU2)

CPU Load (MCU2)

Throughput (C66x_1/2)

CPU Load (C66x_1/2)

Throughput (C7x_1)

CPU Load (C7x_1)

[PDK-3501] 1CH DDR 1MB to DDR 1MB

11612 MB/sec

10%

11956 MB/sec

3%

11161 MB/sec

7%

[PDK-3502] 1CH MSMC 1KB Circular to DDR 1MB

18444 MB/sec

13%

18493 MB/sec

5%

17432 MB/sec

8%

[PDK-3503] 1CH DDR 1MB to MSMC circular 1KB

21575 MB/sec

11%

22574 MB/sec

4%

20203 MB/sec

9%

[PDK-3504] 1CH MSMC 1KB to MSMC circular 1KB (1MB per TR)

28966 MB/sec

14%

29086 MB/sec

6%

26783 MB/sec

9%

[PDK-3505] Multi CH DDR 1MB to DDR 1MB

12203 MB/sec

25%

12572 MB/sec (4CH)

7%

10603 MB/sec (4CH)

14%

[PDK-3506] Multi CH MSMC 1KB to MSMC circular 1KB (1 MB per TR)

30908 MB/sec

31%

30988 MB/sec (4CH)

15%

18036 MB/sec (4CH)

14%

1.1.2.5.5. MCU NAVSS Blockcopy (Normal Channel)

MCU NAVSS normal channel performance with TR submitted through ring

Test Description

Throughput (MCU1)

CPU Load (MCU1)

[PDK-3490] 1CH DDR 1MB to DDR 1MB

667 MB/sec

2%

[PDK-3491] 1CH MSMC 1KB Circular to DDR 1MB

977 MB/sec

2%

[PDK-3492] 1CH DDR 1MB to MSMC circular 1KB

717 MB/sec

2%

[PDK-3493] 1CH MSMC 1KB to MSMC circular 1KB (1MB per TR)

964 MB/sec

2%

[PDK-3489] 1CH OCMC 1KB to OCMC circular 1KB (1MB per TR)

2453 MB/sec

3%

[PDK-3495] Multi CH DDR 1MB to DDR 1MB

1183 MB/sec (2CH)

3%

[PDK-3497] Multi CH MSMC 1KB to MSMC circular 1KB (1 MB per TR)

1630 MB/sec (2CH)

4%

[PDK-12918] 1CH MCU OCMC 1MB to DDR 1MB

1498 MB/sec

3%

[PDK-12919] 1CH DDR 1MB to MCU OCMC 1 MB

1232 MB/sec

2%

1.1.2.6. IPC

1.1.2.6.1. Test Set-up
  • Release build binaries are used for measurement

  • Ring Buffer : Uncached DDR

  • Buffer to be sent (RPMSG) – Cached DDR

  • C66x - L2 Cache 128K

  • C7x - L2 Cache 128K

  • Software/Application Used : ipc_multicore_perf_test loaded through SBL. Output is printed to UART.

  • R5F/MPU config : DDR config

    • bufferable - 1

    • cacheable - 1

    • shareable - 0

Capturing Round trip time in us with different data sizes

1.1.2.6.2. Performance - Host Core A72, Bios, 2 GHz

Remote Core

4 Bytes

8 Bytes

16 Bytes

32 Bytes

64 Bytes

128 Bytes

256 Bytes

MCU R5F0

20

20

22

25

32

44

70

Main R5F0

18

19

20

24

29

41

65

C66x1

17

16

17

16

18

20

25

C7x

20

20

20

20

23

24

25

1.1.2.6.3. Performance - Host Core MCU R5F0, 1 GHz

Remote Core

4 Bytes

8 Bytes

16 Bytes

32 Bytes

64 Bytes

128 Bytes

256 Bytes

A72 (bios)

21

21

23

26

32

43

68

Main R5F0

17

18

19

22

28

39

65

C66x1

17

17

19

22

28

40

64

C7x

18

18

20

23

29

40

66

1.1.2.6.4. Performance - Host Core MAIN R5F0, 1 GHz

Remote Core

4 Bytes

8 Bytes

16 Bytes

32 Bytes

64 Bytes

128 Bytes

256 Bytes

A72 (Bios)

17

17

18

21

26

37

59

MCU R5F0

16

15

17

20

25

35

58

Main R5F1

16

16

17

21

26

36

59

C66x1

16

15

17

20

25

36

58

C7x

16

16

17

20

25

36

58

1.1.2.6.5. Performance - Host Core C66X1, 1.35 GHz

Remote Core

4 Bytes

8 Bytes

16 Bytes

32 Bytes

64 Bytes

128 Bytes

256 Bytes

A72 (Bios)

19

18

18

18

18

22

26

MCU R5F0

26

26

28

30

37

52

81

Main R5F0

25

25

27

29

35

48

75

C66x2

23

22

22

21

23

28

35

C7x

30

29

29

28

31

34

37

1.1.2.6.6. Performance - Host Core C7x, 1GHz

Remote Core

4 Bytes

8 Bytes

16 Bytes

32 Bytes

64 Bytes

128 Bytes

256 Bytes

A72 (Bios)

21

21

21

21

24

23

25

Mcu R5F0

32

32

34

37

45

55

82

Main R5F0

28

29

30

34

42

51

75

C66x1

29

28

28

27

20

31

36

1.1.2.7. OSPI

1.1.2.7.1. OSPI Memory Non Cached Test Set-up
  • Platform: J721e EVM.

  • OS Type: Baremetal/FreeRTOS

  • Core : R5F_0 at 1 GHz.

  • Software/Application Used: OSPI_Flash_TestApp/OSPI_Flash_Dma_TestApp

  • System Configuration: Cache OFF, Read/Write Buffer in DDR. DMA Enabled/Disabled, Interrupts ON.

1.1.2.7.2. OSPI Phy Tuning Time (DDR Octal Mode)

OSPI RCLK

Tuning Time

133 MHz

3.462

166 MHz

3.113

Note: PHY tuning time varies across silicon samples and PHY tuning point varies with voltage and temperature.

1.1.2.7.3. OSPI Read/Write Performance (DDR Octal Mode)

OSPI RCLK

Mode

Write Tput (MB/s)

Write CPU Load

Read Tput (MB/s)

Read CPU Load

Read Tput Theoretical Max (MB/s)

133 MHz

DAC

0.77

100%

7.186

51%

266

DAC DMA

1.557

70%

264.925

2%

INDAC

1.548

75%

8.331

0%

166 MHz

DAC

0.081

100%

8.213

51%

332

DAC DMA

1.632

71%

330.572

1%

INDAC

1.628

76%

10.414

1%

1.1.2.7.4. OSPI Memory Cached Test Set-up
  • Platform: J721e EVM.

  • OS Type: Baremetal/FreeRTOS

  • Core : R5F_0 at 1 GHz.

  • Software/Application Used: OSPI_Flash_Cache_TestApp/OSPI_Flash_Dma_Cache_TestApp

  • System Configuration: Cache ON, Read/Write Buffer in DDR. DMA Enabled/Disabled, Interrupts ON.

1.1.2.7.5. OSPI Read/Write Performance (DDR Octal Mode)

OSPI RCLK

Mode

Write Tput (MB/s)

Write CPU Load

Read Tput (MB/s)

Read CPU Load

Read Tput Theoretical Max (MB/s)

133 MHz

DAC

0.302

100%

46.284

51%

266

DAC DMA

1.504

75%

264.858

20%

INDAC

1.503

100%

8.331

0%

166 MHz

DAC

0.340

100%

57.503

51%

332

DAC DMA

1.570

72%

339.637

2%

INDAC

1.572

76%

10.414

0%

1.1.2.8. MMCSD

1.1.2.8.1. Test Set-up
  • Platform: J721e EVM.

  • OS Type: FreeRTOS

  • Core : R5F_0 at 1 GHz.

  • Software/Application Used: MMCSD_<EMMC>_Regression_TestApp (A menu based application which outputs the benchmark numbers on UART)

  • System Configuration: Cache ON, Read/Write Buffer in DDR. ADMA enabled, Interrupts ON.

  • SD Card used: Sandisk 16GB, Class 10. FAT32 formatted with allocation size = 4K (for optimal FAT32 throughput & compatibility with various cards)

  • EMMC: EMMC on J721E EVM. Please refer to the EVM data sheet for details

1.1.2.8.2. SD Card Performance
1.1.2.8.2.1. DS Mode (25 MHz, 4-bit) Theoretical Max: 12.5 MB/s

Size of transfer (KB)

RAW Write Throughput (MB/s)

RAW Read Throughput (MB/s)

256

9.946

11.201

512

10.484

11.389

1024

10.778

11.441

2048

11.075

11.465

5120

10.462

11.475

1.1.2.8.2.2. HS Mode (50 MHz, 4-bit) Theoretical Max: 50 MB/s

Size of transfer (KB)

RAW Write Throughput (MB/s)

RAW Read Throughput (MB/s)

256

16.638

21.731

512

20.286

22.450

1024

20.871

22.649

2048

21.686

22.744

5120

21.680

22.803

1.1.2.8.2.3. SDR12 Mode (25 MHz, 4-bit) Theoretical Max: 12.5 MB/s

Size of transfer (KB)

RAW Write Throughput (MB/s)

RAW Read Throughput (MB/s)

256

9.037

11.194

512

10.780

11.391

1024

10.948

11.439

2048

11.036

11.465

5120

11.037

11.480

1.1.2.8.2.4. SDR25 Mode (50 MHz, 4-bit) Theoretical Max: 25 MB/s

Size of transfer (KB)

RAW Write Throughput (MB/s)

RAW Read Throughput (MB/s)

256

13.948

21.719

512

20.519

22.460

1024

21.264

22.645

2048

21.273

22.745

5120

19.968

22.803

1.1.2.8.2.5. SDR50 Mode (50 MHz, 4-bit) Theoretical Max: 50 MB/s

Size of transfer (KB)

RAW Write Throughput (MB/s)

RAW Read Throughput (MB/s)

256

22.783

40.996

512

27.901

43.689

1024

37.582

44.397

2048

40.934

44.773

5120

41.188

44.992

1.1.2.8.2.6. DDR50 Mode (50 MHz, 4-bit) Theoretical Max: 50 MB/s

Size of transfer (KB)

RAW Write Throughput (MB/s)

RAW Read Throughput (MB/s)

256

24.739

39.858

512

32.020

42.436

1024

37.169

43.141

2048

40.220

43.475

5120

40.193

43.702

1.1.2.8.3. EMMC Performance
1.1.2.8.3.1. DS Mode (25 MHz, 8-bit) Theoretical Max: 25 MB/s

Size of transfer (KB)

RAW Write Throughput (MB/s)

RAW Read Throughput (MB/s)

256

15.931

18.426

512

17.999

20.058

1024

19.581

20.994

2048

19.943

21.493

5120

20.487

21.805

1.1.2.8.3.2. HS-SDR Mode (50 MHz, 8-bit) Theoretical Max: 50 MB/s

Size of transfer (KB)

RAW Write Throughput (MB/s)

RAW Read Throughput (MB/s)

256

25.399

31.410

512

30.382

36.506

1024

34.968

39.711

2048

37.818

41.528

5120

39.769

42.711

1.1.2.8.3.3. HS-DDR Mode (50 MHz, 8-bit) Theoretical Max: 100 MB/s

Size of transfer (KB)

RAW Write Throughput (MB/s)

RAW Read Throughput (MB/s)

256

32.824

46.822

512

42.265

59.121

1024

41.457

68.050

2048

52.341

73.576

5120

54.078

77.343

1.1.2.8.3.4. HS-200 Mode (200 MHz, 8-bit) Theoretical Max: 200 MB/s

Size of transfer (KB)

RAW Write Throughput (MB/s)

RAW Read Throughput (MB/s)

256

30.821

48.619

512

43.690

61.828

1024

47.311

71.659

2048

51.253

77.801

5120

51.320

81.940

1.1.2.9. CSL-FL based Optimized OSPI Example

1.1.2.9.1. CPU Mode - Test Set-up
  • Platform: J721e EVM.

  • OS Type: Baremetal

  • Core : R5F_0 at 1 GHz

  • Software/Application Used: csl_ospi_flash_app

  • System Configuration:
    • RCLK 133/166 MHz

    • Cache ON,

    • Buffer & Critical Fxn’s in TCMB,

    • DMA Disabled,

    • Interrupts OFF.

  • Theoretical Max Throughput:
    • 133 MHz :- 253.67 MB/s

    • 166 MHz :- 316.62 MB/s

1.1.2.9.2. DAC Mode OSPI Read Performance (Dual Data Rate - Octal Mode)

OSPI RCLK

Size of transfer (B)

Read Time (ns)

Throughput (MB/s)

133 MHz

16

815

19.6

32

1445

22.1

64

2700

23.7

128

5225

24.5

256

10265

24.9

512

20360

25.1

1024

40510

25.3

166 MHz

16

945

16.9

32

2330

13.7

64

4580

14.0

128

9105

14.1

256

18145

14.1

512

36185

14.1

1024

72295

14.2

1.1.2.9.3. DMA Mode - Test Set-up
  • Platform: J721e EVM.

  • OS Type: Baremetal

  • Core : R5F_0 at 1 GHz

  • Software/Application Used: udma_baremetal_ospi_flash_testapp

  • System Configuration:
    • RCLK 133/166 MHz

    • Cache ON,

    • Buffer & Critical Fxn’s in TCMB,

    • DMA Enabled - SW Trigger mode,

    • Interrupts OFF.

  • Theoretical Max Throughput:
    • 133 MHz :- 253.67 MB/s

    • 166 MHz :- 316.62 MB/s

1.1.2.9.4. DAC DMA Mode OSPI Read Performance (Dual Data Rate - Octal Mode)

OSPI RCLK

Size of transfer (B)

Read Time (ns)

Throughput (MB/s)

133 MHz

16

800

20

32

805

39.8

64

970

66

128

1315

97.3

256

1955

130.9

512

3120

164.1

1024

5450

187.9

166 MHz

16

675

23.7

32

805

39.8

64

850

75.3

128

1180

108.5

256

1685

151.9

512

2730

187.5

1024

4670

219.3

1.1.2.10. SBL Boot Performance Numbers

1.1.2.10.1. Test Set-up
  • Platform: J721E EVM.

  • OS Type: Baremetal

  • Core : R5F_0 at 1 GHz

  • Note that app image load time could vary depending on the actual image size

  • Note that RBL boot time numbers are not accounted in the below table

1.1.2.10.2. GP EVM Performance (Legacy Boot)

Boot Modes

SBL Used

Application Used

MMCSD

sbl_mmcsd_img

sbl_boot_perf_test

eMMC Boot0

sbl_emmc_boot0_img

sbl_boot_perf_test

eMMC UDA

sbl_emmc_uda_img

sbl_boot_perf_test

OSPI NOR

sbl_ospi_img

sbl_boot_perf_test

OSPI NOR Optimized

sbl_boot_perf_cust_img

sbl_boot_perf_early_can_test

SBL Boot Time Breakdown

MMCSD

eMMC BOOT0

OSPI NOR Optimized

OSPI NOR

eMMC UDA

SBL : SBL_SciClientInit: ReadSysfwImage

191.220ms

274.733ms

8.270ms

8.270ms

291.171ms

Load/Start SYSFW

4.093ms

4.144ms

4.067ms

4.144ms

3.982ms

Sciclient_init

3.164ms

3.164ms

3.165ms

3.164ms

3.165ms

Board Config

7.096ms

7.008ms

2.008ms

7.116ms

6.893ms

PM Config

1.441ms

1.360ms

0.106ms

1.220ms

1.445ms

Security Config

3.430ms

3.449ms

5.538ms

3.448ms

3.407ms

RM Config

3.348ms

3.350ms

0.770ms

3.348ms

3.319ms

SBL : Board_init (pinmux)

2.963ms

3.058ms

2.819ms

3.054ms

0.865ms

SBL : Board_init (PLL)

0.225ms

0.227ms

0.796ms

0.214ms

0.185ms

SBL: Board_init (CLOCKS)

1.279ms

1.399ms

0.658ms

1.287ms

1.241ms

SBL: DDR initialization

30.122ms

30.277ms

0.000ms

30.197ms

30.099ms

SBL: Ethernet Configuration

146.182ms

146.191ms

0.000ms

146.192ms

146.180ms

SBL: EEPROM copying time

6.839ms

6.839ms

0.000ms

0.123ms

6.839ms

SBL: HSM Core App Copying Time

0.491ms

0.492ms

0.481ms

0.492ms

0.491ms

SBL: Boot Media Drivers init

77.001ms

24.402ms

2.304ms

2.228ms

213.624ms

SBL: OSPI PHY Tuning time

0.271ms

0.001ms

3.293ms

3.309ms

0.146ms

SBL: Appication Image Verification

0.001ms

0.000ms

0.000ms

0.000ms

0.000ms

SBL: App copy to MCU SRAM & Jump to App

80.067ms

2.516ms

2.604ms

2.606ms

54.541ms

Misc

0.035ms

0.036ms

0.035ms

0.036ms

0.037ms

TOTAL time

559.255ms

512.646ms

36.914ms

220.448ms

767.630ms

1.1.2.10.3. HS EVM Performance (Legacy Boot)

Boot Modes

SBL Used

Application Used

MMCSD

sbl_mmcsd_img_hs

sbl_boot_perf_test

OSPI NOR

sbl_ospi_img_hs

sbl_boot_perf_test

OSPI NOR Optimized

sbl_boot_perf_cust_img_hs

sbl_boot_perf_hs_early_can_test

SBL Boot Time Breakdown

OSPI NOR Optimized

OSPI NOR

MMCSD

SBL : SBL_SciClientInit: ReadSysfwImage

8.270ms

8.270ms

135.894ms

Load/Start SYSFW

12.904ms

12.904ms

12.722ms

Sciclient_init

3.164ms

3.164ms

3.164ms

Board Config

4.209ms

9.303ms

9.196ms

PM Config

0.121ms

1.342ms

1.366ms

Security Config

8.063ms

6.002ms

6.976ms

RM Config

3.062ms

5.631ms

5.635ms

SBL : Board_init (pinmux)

2.819ms

2.895ms

2.894ms

SBL : Board_init (PLL)

0.796ms

0.218ms

0.219ms

SBL: Board_init (CLOCKS)

0.660ms

1.282ms

1.345ms

SBL: DDR initialization

0.000ms

30.111ms

30.159ms

SBL: Ethernet Configuration

0.000ms

146.196ms

146.276ms

SBL: EEPROM copying time

0.000ms

6.838ms

6.839ms

SBL: HSM Core App Copying Time

0.487ms

0.497ms

0.498ms

SBL: Boot Media Drivers init

2.290ms

2.221ms

80.600ms

SBL: OSPI PHY Tuning time

3.346ms

3.331ms

0.223ms

SBL: Appication Image Verification

50.359ms

51.321ms

131.874ms

SBL: App copy to MCU SRAM & Jump to App

1.944ms

3.289ms

2.526ms

Misc

0.035ms

0.035ms

0.035ms

TOTAL time

102.529ms

294.850ms

578.441ms

1.1.2.11. Early CAN Response

  • CAN response is measured from MCU_PORZ_OUT to pulling the CAN-H line out of standby.

  • Below numbers are measured on J721e ES2.0 GP EVM.

Measured Time

Early CAN

53.6 ms

POST + Early CAN

81.3 ms