6.1.10. Boot Time Optimizations¶
6.1.10.1. Introduction¶
This guide explains how to reduce the time taken to boot from power on to userspace for Sitara MPU AM62Ax devices. In the fast-evolving landscape of automotive technology, quick processor boot times are paramount for delivering a seamless, responsive user experience and sets the stage for faster access to critical systems.
Note
The same workflow is applied to the entire Sitara MPU family, but for each SoC, specific steps will differ and will be highlighted
The objectives of this document are as follows:
Explain various techniques to reduce boot time
Highlight the tradeoffs to reach the milestone
Measurement and breakdown of default boot time
Measurements after optimizations
6.1.10.1.1. Software environment¶
This guide uses 9.2 Processor SDK as reference.
6.1.10.1.2. Hardware setup and equipment¶
Development kit used for testing
Micro-USB cables for UART connection
Logic Analyzer with at least 4 channels and a sample rate of 10MS/s
6.1.10.1.3. Typical boot flow¶
This section details the Out-Of-Box boot sequence:
PMIC or Power Management IC controls the power supply to the SoC. As a unit, it consists of diode controllers, DC-DC conversion and voltage regulation. TI’s Fulton PMIC needs about 30ms to supply power while Burton PMIC requires 18ms.
BootROM (Primary Program Loader) is executed first from ROM and performs basic initializations such PLLs and SRAM configuration. It then loads a bootloader image in the boot device specified by the boot switches. This entity takes about 12ms to complete.
SPL (Secondary Program Loader) is the first stage of the bootloader. It consists of code that fits into the SRAM and is run by the Main R5. R5 SPL initializes some peripherals and, most importantly, DDR. Subsequently, it loads TF-A, OPTEE and A53 SPL into DDR and then jumps to TF-A. A53 SPL is an intermediate Linux friendly bootloader stage used to jump to U-boot.
TF-A (Trusted Firmware - Arm) provides a reference trusted code base for the Armv8 architecture. It implements various ARM interface standards. The binary is typically included in the bootloader binary. It starts in the early stages of U-Boot. Without ATF, the kernel cannot setup the services which need to be executed in the Secure World environment
OPTEE (Trusted Execution Environment) is designed as a companion to a non-secure Linux kernel running on Arm; Cortex-A cores using the TrustZone technology.
U-boot proper is the second stage bootloader. It offers a flexible way to load and start the Linux Kernel and provides a minimal set of tools to interact with the board’s hardware via a command line interface. It runs from DRAM, initializing additional hardware devices (network, USB, DSI/CSI, etc.). Then, it loads and prepares the device tree (FDT). The main task handled by the U-Boot is the loading and starting of the kernel image itself.
Kernel runs from DDR and takes over the system completely.
Userspace process is executed by a user in the operating system, rather than being part of the operating system itself. It might also be executed by an init system (e.g. systemd), but it isn’t part of the kernel. User space is the area of memory that non-kernel applications run in non-privileged execution mode.
6.1.10.1.4. Optimized bootflow¶
This section describes an overview of the modifications that can be done to achieve shorter boot times. The succeeding sections will detail how to achieve these sequences.
6.1.10.2. Reducing bootloader time¶
Falcon Mode:
This is a feature that allows us to skip A53 SPL and U-boot proper and jump to TF-A and then the kernel directly saving ~5s in our boot time. It is implemented differently depending on the bootloader.
Choosing the right bootmedia:
Part
Bootmedia
Theoretical Read performance (MBps)
Default offering
AM62 EVM
AM62A EVM
AM62P EVM
S28HS512T
OSPI-NOR
330
YES
NO
YES
W35N01JWTBAG
OSPI-NAND
50
NO
YES
NO
MTFC16GAPALBH-IT
eMMC (HS200)
200
YES
YES
NO
MTFC32GAZAQHD-IT
eMMC (HS400)
400
NO
NO
YES
You can track current performance numbers here: AM62AX
Flashing binaries:
6.1.10.2.1. Secondary Boot Loader (SBL)¶
The following section will be in reference to MCU+ SDK’s SBL examples.
Removing unnecessary prints
The default examples contain a large number of prints that impact boot time and need to be removed.
Navigate to the main.c of your example and remove calls to Bootloader_profileAddCore, Bootloader_profileAddProfilePoint, Bootloader_profileUpdateAppimageSize, Bootloader_profileUpdateMediaAndClk and Bootloader_profilePrintProfileLog.
Navigate to <mcu-plus-sdk>/source/drivers/device_manager/sciclient_direct/sciclient_direct_wrapper.c and remove the DebugP_log from Sciclient_getVersionCheck function.
Note
If an RTOS example is being used, remove prints in additional files in <mcu-plus-sdk>/examples/drivers/boot/common/
Skipping OSPI PHY tuning (in case of OSPI bootmedia)
PHY calibration allows the flash to function at maximum performance and this consumes some time that is dependent on the current algorithm implementation. By default, only stage 2 examples will skip tuning.
To validate this, do not remove the log prints from the previous subsection and observe the SBL Board_driversOpen parameter. Currently, the tuning algorithm takes 22ms to complete. If skipping is successful, it should drop down to ~150us.
Open the relevant example’s syscfg by navigating into <mcu-plus-path>/examples/drivers/boot/sbl_ospi_linux_multistage/sbl_ospi_linux_stage1/<soc-name>/<example-type>/ti-arm-clang/ and running
make syscfg-gui
. Navigate to the OSPI section and enable OSPI skip Tuning option. Ensure that Enable PHY is checked as well.Enabling DMA in the bootloader
Open the relevant example’s syscfg and navigate to the Bootloader section and click on Enable DMA if not enabled by default.
6.1.10.3. Reducing Linux kernel boot time¶
Adding
quiet
To save 8+ seconds, add “quiet” argument in the Kernel “bootargs”. It suppresses most messages during the Linux start-up sequence. To access the logs after login, you can run
dmesg
for the logs to be printed. By default, quiet is at a loglevel of 4 and should be adequate to suppress the majority of logs but if finer control is required :litera:`quiet` can be replaced withloglevel=x
where x can be 1-14.The kernel looks for bootargs in 3 places: U-Boot environment variable, the device tree and the kernel config. You can add the following in any of the 3 locations.
U-Boot console:
U-Boot=> setenv bootargs 'console=ttyS2,115200n8 fsck.mode=skip sysrq_always_enabled quiet'
Device Tree:
chosen { ... bootargs = "console=ttyS2,115200n8 earlycon=ns16550a,mmio32,0x02800000 quiet"; ... };
Kernel config:
CONFIG_CMDLINE="console=ttyS2,115200n8 earlycon=ns16550a,mmio32,0x02800000 quiet"
Using a smaller kernel system
By default, the kernel image contains a lot of drivers and filesystems to enable the functionality supported for the board but are not necessary for early boot. Trim kernel capabilities by using
ti_arm64_prune.config - removes irrelevant platform drivers
ti_early_display.config - converts the majority of functionality into loadable modules
Usage:
kernel$ make ARCH=arm64 CROSS_COMPILE=<path-to-compiler>/aarch64-none-linux-gnu- defconfig ti_arm64_prune.config ti_early_display.config
Note: You can access <kernel-path>/kernel/configs/ti_early_display.config and see the breakdown of how much time is saved by disabling each module and take a call on whether the functionality is required and its effect on boot time
Disabling nodes in DT
Unnecessary nodes can be disabled by adding
status = "disabled"
to the nodes. While this will not directly affect boot time, the minimal kernel will not throw probe errors during boot.
6.1.10.4. Reducing userspace boot time¶
It is recommended to use a tiny intermediate filesystem that can be used to run applications early with minimal configuration and then mount into a filesystem with full functionality. For this purpose, the installer packages a tisdk-tiny-initramfs-am62xx-evm.cpio under filesystem/<machine> that can be used as an initramfs.
In order to package the filesystem as initramfs into the kernel, follow these steps:
Extract the cpio archive to a preferred location via GUI or
$ mkdir output $ cd output $ cpio -idv < tisdk-tiny-initramfs-am62xx-evm.cpio
Edit the kernel config:
.config:
CONFIG_INITRAMFS_SOURCE="/path/to/filesystem"
or using
menuconfig
:kernel$ make ARCH=arm64 CROSS_COMPILE=<path-to-compiler>/aarch64-none-linux-gnu- menuconfig General setup -> Initial RAM filesystem and RAM disk (initramfs/initrd) support -> Initramfs source file(s) /path/to/filesystem
Rebuild the kernel
kernel$ make ARCH=arm64 CROSS_COMPILE=<path-to-compiler>/aarch64-none-linux-gnu- Image -j64
The time taken to boot filesystem is measured from Process ID 1(PID1) to login prompt which is 1.98s with the initramfs filesystem. In order to further drop this time, you can:
Note
Please ensure that you do not mistakenly affect the host computer while making the below changes
Remove startup scripts from the tiny filesystem
host$ rm <filesystem>/etc/rc5.d/* host$ cd <filesystem>/etc/rcS.d host$ rm S02banner.sh S04udev S05checkroot.sh S06modutils.sh S07bootlogd S29read-only-rootfs-hook.sh S36bootmisc.sh S37populate-volatile.sh S38dmesg.sh S38urandom
This shaves off 1.536s from filesystem boot time. udev alone takes up 1.152s.
Remove package manager, console logo and add /dev/null in the filesystem
host$ rm <filesystem>/usr/lib/opkg host$ rm <filesystem>/etc/issue host$ cd <filesystem>/dev host$ mknod -m 0600 null c 1 3
This removes 52ms from the boot up time.
6.1.10.5. Measurements¶
The following section displays the time taken by each stage to start and end. Four profile points were used:
MCU_PORz (White) - MCU Power-On-Reset
SBL_start (Brown) - GPIO is set to LOW as soon as SBL stage 1 is started (Before System_init)
SBL_end (Red) - GPIO is set to LOW as soon as SBL stage 2 completes (After App_driversClose)
Kernel_end (Gold) - GPIO is set to HIGH when kernel is about to jump to ramdisk (init/main.c: kernel_init)
ROM time : SBL_start - MCU_PORz = 48ms
SBL time til Linux CPU is started: SBL_end - SBL_start = 624ms
SBL C7X load + DriversClose + TF-A + OPTEE + Kernel time: Kernel_end - SBL_end = 679ms (~450ms for Kernel)
[2024-03-29 13:02:19.196] NOTICE: BL31: v2.10.0(release):v2.10.0-367-g00f1ec6b87-dirty
[2024-03-29 13:02:19.196] NOTICE: BL31: Built : 16:09:05, Feb 9 2024
[2024-03-29 13:02:19.991]
[2024-03-29 13:02:19.991] am62xx-evm login:
To calculate userspace, we use console logs to take timestamps from TF-A to login prompt and subtract kernel time = 345ms
6.1.10.6. Additional notes¶
While AM62A ships with OSPI-NAND, it can be replaced with the OSPI-NOR flash with ease. It is not recommended to resolder used flashes onto a board. NAND flash support needs to be replaced with NOR flash support
6.1.10.7. Troubleshooting¶
If the following logs are noticed and kernel does not come up, it suggests that TF-A is not receiving data from DM which probably hasn’t had enough time to run completely
ERROR: Timeout waiting for thread SP_RESPONSE to fill ERROR: Thread SP_RESPONSE verification failed (-60) ERROR: Message receive failed (-60) ERROR: Failed to get response (-60) ERROR: Transfer send failed (-60)
6.1.10.8. Known issues¶
Due to the implementation of OSPI-NAND software layer in MCU+ SDK, it has been observed that the read performance drops when tuning is skipped. This fix will be incoporated for 10.0 SDK release.