2.2. Linux Performance Guide

Read This First

All performance numbers provided in this document are gathered using following Evaluation Modules unless otherwise specified.

Name

Description

AM572x EVM

AM57xx Evaluation Module rev A2 with ARM running at 1500MHz, DDR3L-533 (533 MHz/1066 MT/S), TMDSEVM572x

Table: Evaluation Modules


About This Manual

This document provides performance data for each of the device drivers which are part of the Process SDK Linux package. This document should be used in conjunction with release notes and user guides provided with the Process SDK Linux package for information on specific issues present with drivers included in a particular release.

If You Need Assistance

For further information or to report any problems, contact http://community.ti.com/ or http://support.ti.com/

2.2.1. System Benchmarks

2.2.1.1. LMBench

LMBench is a collection of microbenchmarks of which the memory bandwidth and latency related ones are typically used to estimate processor memory system performance. More information about lmbench at http://lmbench.sourceforge.net/whatis_lmbench.html and http://lmbench.sourceforge.net/man/lmbench.8.html

Latency: lat_mem_rd-stride128-szN, where N is equal to or smaller than the cache size at given level measures the cache miss penalty. N that is at least double the size of last level cache is the latency to external memory.

Bandwidth: bw_mem_bcopy-N, where N is is equal to or smaller than the cache size at a given level measures the achievable memory bandwidth from software doing a memcpy() type operation. Typical use is for external memory bandwidth calculation. The bandwidth is calculated as byte read and written counts as 1 which should be roughly half of STREAM copy result.

Benchmarks

am57xx-evm: perf

af_unix_sock_stream_latency (microsec)

34.31

af_unix_socket_stream_bandwidth (MBs)

1643.24

bw_file_rd-io-1mb (MB/s)

1634.14

bw_file_rd-o2c-1mb (MB/s)

948.77

bw_mem-bcopy-16mb (MB/s)

1960.06

bw_mem-bcopy-1mb (MB/s)

4991.05

bw_mem-bcopy-2mb (MB/s)

2955.49

bw_mem-bcopy-4mb (MB/s)

2075.05

bw_mem-bcopy-8mb (MB/s)

1986.10

bw_mem-bzero-16mb (MB/s)

5140.15

bw_mem-bzero-1mb (MB/s)

5524.33 (min 4991.05, max 6057.60)

bw_mem-bzero-2mb (MB/s)

4388.55 (min 2955.49, max 5821.60)

bw_mem-bzero-4mb (MB/s)

3783.52 (min 2075.05, max 5491.99)

bw_mem-bzero-8mb (MB/s)

3621.17 (min 1986.10, max 5256.24)

bw_mem-cp-16mb (MB/s)

1048.63

bw_mem-cp-1mb (MB/s)

4775.13 (min 3540.31, max 6009.95)

bw_mem-cp-2mb (MB/s)

3457.07 (min 1223.46, max 5690.68)

bw_mem-cp-4mb (MB/s)

3288.05 (min 1068.47, max 5507.62)

bw_mem-cp-8mb (MB/s)

3148.87 (min 1028.08, max 5269.66)

bw_mem-fcp-16mb (MB/s)

1147.28

bw_mem-fcp-1mb (MB/s)

4563.51 (min 3069.42, max 6057.60)

bw_mem-fcp-2mb (MB/s)

3535.80 (min 1250.00, max 5821.60)

bw_mem-fcp-4mb (MB/s)

3320.88 (min 1149.76, max 5491.99)

bw_mem-fcp-8mb (MB/s)

3205.57 (min 1154.90, max 5256.24)

bw_mem-frd-16mb (MB/s)

1057.08

bw_mem-frd-1mb (MB/s)

3077.92 (min 3069.42, max 3086.42)

bw_mem-frd-2mb (MB/s)

1767.06 (min 1250.00, max 2284.11)

bw_mem-frd-4mb (MB/s)

1178.98 (min 1149.76, max 1208.19)

bw_mem-frd-8mb (MB/s)

1113.57 (min 1072.24, max 1154.90)

bw_mem-fwr-16mb (MB/s)

5115.09

bw_mem-fwr-1mb (MB/s)

4548.19 (min 3086.42, max 6009.95)

bw_mem-fwr-2mb (MB/s)

3987.40 (min 2284.11, max 5690.68)

bw_mem-fwr-4mb (MB/s)

3357.91 (min 1208.19, max 5507.62)

bw_mem-fwr-8mb (MB/s)

3170.95 (min 1072.24, max 5269.66)

bw_mem-rd-16mb (MB/s)

3048.20

bw_mem-rd-1mb (MB/s)

12158.45 (min 10893.45, max 13423.44)

bw_mem-rd-2mb (MB/s)

7872.41 (min 6918.53, max 8826.29)

bw_mem-rd-4mb (MB/s)

2492.86 (min 1634.39, max 3351.33)

bw_mem-rd-8mb (MB/s)

2202.48 (min 1320.68, max 3084.28)

bw_mem-rdwr-16mb (MB/s)

1188.97

bw_mem-rdwr-1mb (MB/s)

5551.32 (min 3540.31, max 7562.32)

bw_mem-rdwr-2mb (MB/s)

2346.10 (min 1223.46, max 3468.74)

bw_mem-rdwr-4mb (MB/s)

1225.98 (min 1068.47, max 1383.48)

bw_mem-rdwr-8mb (MB/s)

1120.84 (min 1028.08, max 1213.59)

bw_mem-wr-16mb (MB/s)

1288.24

bw_mem-wr-1mb (MB/s)

10492.88 (min 7562.32, max 13423.44)

bw_mem-wr-2mb (MB/s)

5193.64 (min 3468.74, max 6918.53)

bw_mem-wr-4mb (MB/s)

1508.94 (min 1383.48, max 1634.39)

bw_mem-wr-8mb (MB/s)

1267.14 (min 1213.59, max 1320.68)

bw_mmap_rd-mo-1mb (MB/s)

4119.85

bw_mmap_rd-o2c-1mb (MB/s)

1234.92

bw_pipe (MB/s)

434.29

bw_unix (MB/s)

1643.24

lat_connect (us)

48.80

lat_ctx-2-128k (us)

19.15

lat_ctx-2-256k (us)

35.35

lat_ctx-4-128k (us)

21.62

lat_ctx-4-256k (us)

34.18

lat_fs-0k (num_files)

679.00

lat_fs-10k (num_files)

211.00

lat_fs-1k (num_files)

294.00

lat_fs-4k (num_files)

308.00

lat_mem_rd-stride128-sz1000k (ns)

19.27

lat_mem_rd-stride128-sz125k (ns)

19.04

lat_mem_rd-stride128-sz250k (ns)

12.83

lat_mem_rd-stride128-sz31k (ns)

6.47

lat_mem_rd-stride128-sz50 (ns)

2.67

lat_mem_rd-stride128-sz500k (ns)

12.84

lat_mem_rd-stride128-sz62k (ns)

17.19

lat_mmap-1m (us)

49.00

lat_ops-double-add (ns)

1.09

lat_ops-double-mul (ns)

5.01

lat_ops-float-add (ns)

1.09

lat_ops-float-mul (ns)

5.01

lat_ops-int-add (ns)

0.72

lat_ops-int-bit (ns)

0.45

lat_ops-int-div (ns)

76.97

lat_ops-int-mod (ns)

13.79

lat_ops-int-mul (ns)

2.69

lat_ops-int64-add (ns)

1.25

lat_ops-int64-bit (ns)

1.01

lat_ops-int64-div (ns)

163.00

lat_ops-int64-mod (ns)

33.79

lat_pagefault (us)

1.09

lat_pipe (us)

36.82

lat_proc-exec (us)

925.67

lat_proc-fork (us)

845.00

lat_proc-proccall (us)

0.01

lat_select (us)

22.20

lat_sem (us)

4.60

lat_sig-catch (us)

2.96

lat_sig-install (us)

0.61

lat_sig-prot (us)

0.35

lat_syscall-fstat (us)

0.98

lat_syscall-null (us)

0.27

lat_syscall-open (us)

246.23

lat_syscall-read (us)

0.52

lat_syscall-stat (us)

2.86

lat_syscall-write (us)

0.33

lat_tcp (us)

0.69

lat_unix (us)

34.31

latency_for_0.50_mb_block_size (nanosec)

12.84

latency_for_1.00_mb_block_size (nanosec)

9.63 (min 0.00, max 19.27)

pipe_bandwidth (MBs)

434.29

pipe_latency (microsec)

36.82

procedure_call (microsec)

0.01

select_on_200_tcp_fds (microsec)

22.20

semaphore_latency (microsec)

4.60

signal_handler_latency (microsec)

0.61

signal_handler_overhead (microsec)

2.96

tcp_ip_connection_cost_to_localhost (microsec)

48.80

tcp_latency_using_localhost (microsec)

0.69

Table: LM Bench Metrics

2.2.1.2. Dhrystone

Dhrystone is a core only benchmark that runs from warm L1 caches in all modern processors. It scales linearly with clock speed. For standard ARM cores the DMIPS/MHz score will be identical with the same compiler and flags.

Benchmarks

am57xx-evm: perf

cpu_clock (MHz)

1500.00

dhrystone_per_mhz (DMIPS/MHz)

3.40

dhrystone_per_second (DhrystoneP)

9090909.00

Table: Dhrystone Benchmark

2.2.1.3. Whetstone

Benchmarks

am57xx-evm: perf

whetstone (MIPS)

5000.00

Table: Whetstone Benchmark

2.2.1.4. Linpack

Linpack measures peak double precision (64 bit) floating point performance in solving a dense linear system.

Benchmarks

am57xx-evm: perf

linpack (Kflops)

907329.00

Table: Linpack Benchmark

2.2.1.5. NBench

NBench which stands for Native Benchmark is used to measure macro benchmarks for commonly used operations such as sorting and analysis algorithms. More information about NBench at https://en.wikipedia.org/wiki/NBench and https://nbench.io/articles/index.html

Benchmarks

am57xx-evm: perf

assignment (Iterations)

16.49

fourier (Iterations)

26342.00

fp_emulation (Iterations)

169.17

huffman (Iterations)

1358.80

idea (Iterations)

3608.50

lu_decomposition (Iterations)

693.97

neural_net (Iterations)

22.14

numeric_sort (Iterations)

632.83

string_sort (Iterations)

118.08

Table: NBench Benchmarks

2.2.1.6. Stream

STREAM is a microbenchmark for measuring data memory system performance without any data reuse. It is designed to miss on caches and exercise data prefetcher and speculative accesses. It uses double precision floating point (64bit) but in most modern processors the memory access will be the bottleneck. The four individual scores are copy, scale as in multiply by constant, add two numbers, and triad for multiply accumulate. For bandwidth, a byte read counts as one and a byte written counts as one, resulting in a score that is double the bandwidth LMBench will show.

Benchmarks

am57xx-evm: perf

add (MB/s)

3676.60

copy (MB/s)

3959.90

scale (MB/s)

4450.80

triad (MB/s)

3651.00

Table: Stream

2.2.2. Boot-time Measurement

2.2.2.1. Boot media: MMCSD

Boot Configuration

am57xx-evm: boot time (sec)

Kernel boot time test when bootloader, kernel and sdk-rootfs are in mmc-sd

37.40 (min 34.53, max 41.28)

Kernel boot time test when init is /bin/sh and bootloader, kernel and sdk-rootfs are in mmc-sd

14.01 (min 13.40, max 16.44)

Table: Boot time MMC/SD

2.2.3. ALSA SoC Audio Driver

  1. Access type - RW_INTERLEAVED

  2. Channels - 2

  3. Format - S16_LE

  4. Period size - 64

Sampling Rate (Hz)

am57xx-evm: Throughput (bits/sec)

am57xx-evm: CPU Load (%)

8000

255833.00

0.09

11025

352573.00

0.14

16000

511666.00

0.22

22050

705146.00

0.25

24000

705146.00

0.26

32000

1023331.00

0.36

44100

1410291.00

0.38

48000

1534995.00

0.45

88200

2820580.00

0.83

96000

3069989.00

1.06

Table: Audio Capture


Sampling Rate (Hz)

am57xx-evm: Throughput (bits/sec)

am57xx-evm: CPU Load (%)

8000

255936.00

0.07

11025

352715.00

0.13

16000

511872.00

0.10

22050

705430.00

0.15

24000

705430.00

0.13

32000

1023744.00

0.21

44100

1410860.00

0.31

48000

1535615.00

0.26

88200

2821718.00

0.49

96000

3071227.00

0.61

Table: Audio Playback


2.2.4. Sensor Capture

Capture video frames (MMAP buffers) with v4l2c-ctl and record the reported fps

Resolution

Format

am57xx-evm: Fps

am57xx-evm: Sensor

1280x800

nv12

30.01

ov10635

1280x800

rgb4

30.01

ov10635

320x240

nv12

30.01

ov10635

320x240

rgb4

30.01

ov10635

Table: Sensor Capture


2.2.5. Graphics SGX/RGX Driver

2.2.5.1. GLBenchmark

Run GLBenchmark and capture performance reported Display rate (Fps), Fill rate, Vertex Throughput, etc. All display outputs (HDMI, Displayport and/or LCD) are connected when running these tests

2.2.5.1.1. Performance (Fps)

Benchmark

am57xx-evm: Test Number

am57xx-evm: Fps

GLB25_EgyptTestC24Z16FixedTime test

2500005.00

25.17 (min 14.63, max 51.56)

GLB25_EgyptTestC24Z16_ETC1 test

2501001.00

31.65 (min 14.87, max 59.49)

GLB25_EgyptTestC24Z16_ETC1to565 test

2501401.00

32.48 (min 14.87, max 59.49)

GLB25_EgyptTestC24Z16_PVRTC4 test

2501101.00

31.66 (min 14.87, max 59.49)

GLB25_EgyptTestC24Z24MS4 test

2500003.00

27.65 (min 14.87, max 59.49)

GLB25_EgyptTestStandard_inherited test

2000000.00

59.32 (min 51.56, max 59.49)

Table: GLBenchmark 2.5 Performance

2.2.5.1.2. Vertex Throughput

Benchmark

am57xx-evm: Test Number

am57xx-evm: Rate (triangles/sec)

GLB25_TriangleTexFragmentLitTestC24Z16 test

2500511.00

32937770.00

GLB25_TriangleTexTestC24Z16 test

2500301.00

99030688.00

GLB25_TriangleTexVertexLitTestC24Z16 test

2500411.00

39563644.00

Table: GLBenchmark 2.5 Vertex Throughput

2.2.5.1.3. Pixel Throughput

Benchmark

am57xx-evm: Test Number

am57xx-evm: Rate (texel/sec)

am57xx-evm: Fps

GLB25_FillTestC24Z16 test

2500101.00

731048896.00

29.74 (min 29.74, max 29.75)

Table: GLBenchmark 2.5 Pixel Throughput

2.2.5.2. Glmark2

Run Glmark2 and capture performance reported (Score). All display outputs (HDMI, Displayport and/or LCD) are connected when running these tests

Benchmark

am57xx-evm: Score

Glmark2-DRM

56.00

Glmark2-Wayland

375.00

Table: Glmark2


2.2.6. SATA Driver

AM57XX-EVM


Buffer size (bytes)

am57xx-evm: Write EXT2 Throughput (Mbytes/sec)

am57xx-evm: Write EXT2 CPU Load (%)

am57xx-evm: Read EXT2 Throughput (Mbytes/sec)

am57xx-evm: Read EXT2 CPU Load (%)

102400

121.86 (min 89.84, max 130.44)

9.90 (min 5.82, max 24.17)

133.60

11.54

262144

126.69 (min 118.46, max 129.38)

11.61 (min 6.09, max 32.19)

134.17

11.71

524288

126.32 (min 116.93, max 130.24)

11.40 (min 6.05, max 30.93)

134.03

10.75

1048576

126.67 (min 119.65, max 129.25)

11.60 (min 6.31, max 31.77)

133.74

10.80

5242880

127.08 (min 117.61, max 129.64)

11.24 (min 6.08, max 30.33)

133.89

10.34


Buffer size (bytes)

am57xx-evm: Write EXT4 Throughput (Mbytes/sec)

am57xx-evm: Write EXT4 CPU Load (%)

am57xx-evm: Read EXT4 Throughput (Mbytes/sec)

am57xx-evm: Read EXT4 CPU Load (%)

102400

125.87 (min 124.12, max 129.29)

11.49 (min 7.17, max 27.08)

133.20

10.65

262144

127.07 (min 125.81, max 128.05)

11.55 (min 7.02, max 27.32)

130.24

10.13

524288

124.11 (min 122.95, max 124.97)

11.55 (min 7.57, max 26.44)

129.19

9.79

1048576

124.08 (min 123.55, max 124.79)

10.57 (min 7.17, max 23.22)

129.12

10.04

5242880

123.92 (min 121.87, max 125.60)

11.02 (min 7.41, max 24.51)

128.88

8.61



  • Filesize used is : 1G

  • SATA II Harddisk used is: Seagate ST3500514NS 500G

2.2.6.1. mSATA Driver

AM57XX-EVM


Buffer size (bytes)

am57xx-evm: Write VFAT Throughput (Mbytes/sec)

am57xx-evm: Write VFAT CPU Load (%)

am57xx-evm: Read VFAT Throughput (Mbytes/sec)

am57xx-evm: Read VFAT CPU Load (%)

102400

62.41 (min 52.31, max 65.06)

10.21 (min 7.08, max 21.30)

220.29

20.50

262144

62.23 (min 51.78, max 65.05)

10.52 (min 7.50, max 20.89)

228.85

21.31

524288

62.34 (min 51.82, max 65.20)

10.58 (min 7.85, max 20.92)

241.67

21.89

1048576

62.92 (min 51.74, max 66.16)

10.70 (min 7.37, max 20.88)

246.65

21.92

5242880

62.65 (min 51.64, max 65.75)

10.57 (min 7.61, max 20.87)

243.72

21.22


Buffer size (bytes)

am57xx-evm: Write EXT2 Throughput (Mbytes/sec)

am57xx-evm: Write EXT2 CPU Load (%)

am57xx-evm: Read EXT2 Throughput (Mbytes/sec)

am57xx-evm: Read EXT2 CPU Load (%)

102400

64.55 (min 63.06, max 65.61)

4.86 (min 2.64, max 12.28)

225.78

18.32

262144

64.89 (min 63.89, max 66.03)

4.61 (min 3.00, max 10.64)

234.40

19.67

524288

64.79 (min 64.32, max 65.50)

4.84 (min 3.16, max 10.79)

249.80

22.06

1048576

65.93 (min 62.13, max 68.10)

4.99 (min 3.18, max 11.42)

260.06

21.05

5242880

65.39 (min 64.03, max 67.35)

4.60 (min 3.14, max 9.33)

265.28

20.40


Buffer size (bytes)

am57xx-evm: Write EXT4 Throughput (Mbytes/sec)

am57xx-evm: Write EXT4 CPU Load (%)

am57xx-evm: Read EXT4 Throughput (Mbytes/sec)

am57xx-evm: Read EXT4 CPU Load (%)

102400

64.78 (min 63.95, max 65.67)

4.85 (min 3.45, max 9.40)

230.42

17.37

262144

65.69 (min 63.86, max 66.93)

4.64 (min 3.89, max 7.45)

237.43

19.20

524288

67.24 (min 63.78, max 73.80)

5.20 (min 3.60, max 10.60)

254.59

20.58

1048576

64.05 (min 63.17, max 64.99)

4.74 (min 3.49, max 9.34)

264.88

16.99

5242880

64.37 (min 63.46, max 65.15)

4.60 (min 3.31, max 8.95)

271.15

22.86


  • Filesize used is : 1G

  • MSATA Harddisk used is: SMS200S3/30G Kingston mSATA SSD drive


2.2.7. MMC/SD Driver

Warning

IMPORTANT: The performance numbers can be severely affected if the media is mounted in sync mode. Hot plug scripts in the filesystem mount removable media in sync mode to ensure data integrity. For performance sensitive applications, umount the auto-mounted filesystem and re-mount in async mode.


2.2.7.1. AM57XX-EVM


Buffer size (bytes)

am57xx-evm: Write EXT4 Throughput (Mbytes/sec)

am57xx-evm: Write EXT4 CPU Load (%)

am57xx-evm: Read EXT4 Throughput (Mbytes/sec)

am57xx-evm: Read EXT4 CPU Load (%)

1m

16.40

0.70

22.30

0.67

4m

16.10

0.76

22.50

0.59

4k

2.31

2.24

10.70

7.81

256k

15.40

0.48

22.10

0.89


The performance numbers were captured using the following:

  • SanDisk 8GB MicroSDHC Class 10 Memory Card

  • Partition was mounted with async option


2.2.8. UBoot MMC/SD Driver


2.2.8.1. AM57XX-EVM

File size (bytes in hex)

am57xx-evm: Write Throughput (Kbytes/sec)

am57xx-evm: Read Throughput (Kbytes/sec)

400000

6159.40

21222.80

800000

12172.36

21962.47

1000000

19574.67

22321.53

The performance numbers were captured using the following:

  • SanDisk 8GB MicroSDHC Class 10 Memory Card


2.2.9. USB Driver

2.2.9.1. USB Device Controller

Number of Blocks

am57xx-evm: Throughput (MB/sec)

150

34.80

Table: USBDEVICE HIGHSPEED SLAVE READ THROUGHPUT


Number of Blocks

am57xx-evm: Throughput (MB/sec)

150

32.20

Table: USBDEVICE HIGHSPEED SLAVE WRITE THROUGHPUT


2.2.10. CRYPTO Driver

2.2.10.1. OpenSSL Performance

Algorithm

Buffer Size (in bytes)

am57xx-evm: throughput (KBytes/Sec)

aes-128-cbc

1024

14360.23

aes-128-cbc

16

4890.10

aes-128-cbc

16384

35296.60

aes-128-cbc

256

5314.39

aes-128-cbc

64

15177.24

aes-128-cbc

8192

32033.45

aes-192-cbc

1024

15467.52

aes-192-cbc

16

4827.38

aes-192-cbc

16384

35749.89

aes-192-cbc

256

5331.46

aes-192-cbc

64

14566.68

aes-192-cbc

8192

32721.58

aes-256-cbc

1024

14389.59

aes-256-cbc

16

4753.24

aes-256-cbc

16384

33581.74

aes-256-cbc

256

5111.89

aes-256-cbc

64

14157.76

aes-256-cbc

8192

30580.74

des-cbc

1024

8903.68

des-cbc

16

312.33

des-cbc

16384

16149.16

des-cbc

256

3708.93

des-cbc

64

1172.52

des-cbc

8192

15226.20

des3

1024

8768.85

des3

16

319.39

des3

16384

15805.10

des3

256

3824.21

des3

64

1215.30

des3

8192

14158.51

md5

1024

10163.20

md5

16

1023.37

md5

16384

66174.98

md5

256

2971.31

md5

64

3957.12

md5

8192

48283.65

sha1

1024

10249.90

sha1

16

999.31

sha1

16384

66306.05

sha1

256

2899.29

sha1

64

3911.55

sha1

8192

48493.91

sha224

1024

9752.23

sha224

16

894.64

sha224

16384

63673.69

sha224

256

2841.17

sha224

64

3409.17

sha224

8192

47325.18

sha256

1024

10128.38

sha256

16

905.81

sha256

16384

66060.29

sha256

256

2868.22

sha256

64

3447.30

sha256

8192

48209.92

sha384

1024

10723.33

sha384

16

826.69

sha384

16384

76797.27

sha384

256

2832.98

sha384

64

3320.41

sha384

8192

52450.65

sha512

1024

10304.17

sha512

16

841.09

sha512

16384

76693.50

sha512

256

2855.00

sha512

64

3370.90

sha512

8192

53545.64



Algorithm

am57xx-evm: CPU Load

aes-128-cbc

42.00

aes-192-cbc

40.00

aes-256-cbc

42.00

des-cbc

20.00

des3

15.00

md5

50.00

sha1

50.00

sha224

57.00

sha256

51.00

sha384

56.00

sha512

54.00


Listed for each algorithm are the code snippets used to run each benchmark test.

::

time -v openssl speed -elapsed -evp aes-128-cbc

2.2.10.2. IPSec Hardware Performance

Note: queue_len is set to 300 and software fallback threshold set to 9 to enable software support for optimal performance

Algorithm

am57xx-evm: Throughput (Mbps)

am57xx-evm: Packets/Sec

am57xx-evm: CPU Load

3des

68.80

6.00

41.16

aes128

88.70

7.00

61.72

aes192

89.80

7.00

47.42

aes256

91.10

8.00

48.63

2.2.10.3. IPSec Software Performance

Algorithm

am57xx-evm: Throughput (Mbps)

am57xx-evm: Packets/Sec

am57xx-evm: CPU Load

aes128

91.20

8.00

31.45

aes192

91.30

8.00

33.67

aes256

5.40

0.00

50.36