6.1.11. Boot Time Optimizations
6.1.11.1. Introduction
In today’s fast-paced automotive industry, the ability to achieve quick processor boot times is more crucial than ever. This guide will walk you through the necessary steps and considerations for achieving faster boot times on Sitara MPU AM62Ax devices. From adjusting configurations to implementing best practices, you’ll gain the insights needed to deliver a seamless, responsive user experience in your automotive applications. By implementing specific modifications, the default SDK offering can be optimized to boot much faster.
Reducing boot time is essential for enhancing user experience and operational efficiency. Quick boot times lead to more responsive systems, which are critical in automotive applications where every second counts. This ensures that drivers and passengers have immediate access to essential features and systems, contributing to overall safety and satisfaction.
Moreover, in the context of the rapid technological advancements and increasing demands for smart, connected vehicles, optimizing boot times can provide a competitive edge. Whether it’s for infotainment systems, advanced driver-assistance systems (ADAS), or other critical automotive functions, minimizing boot times can significantly improve performance and reliability.
Note
The same workflow is applied to the entire Sitara MPU family, but for each SoC, specific steps will differ and will be highlighted
The objectives of this document are as follows:
Explain various techniques to reduce boot time
Highlight the tradeoffs to reach the milestone
Measurement and breakdown of default boot time
Measurements after optimizations
6.1.11.1.1. Software environment
This guide uses 10.0 Processor SDK as reference.
6.1.11.1.2. Hardware setup and equipment
Development kit used for testing
Micro-USB cables for UART connection
Logic Analyzer with at least 4 channels and a sample rate of 10MS/s
6.1.11.1.3. Typical boot flow
This section details the Out-Of-Box boot sequence:
PMIC or Power Management IC controls the power supply to the SoC. As a unit, it consists of diode controllers, DC-DC conversion and voltage regulation. TI’s Fulton PMIC needs about 30ms to supply power while Burton PMIC requires 18ms.
BootROM (Primary Program Loader) is executed first from ROM and performs basic initializations such PLLs and SRAM configuration. It then loads a bootloader image in the boot device specified by the boot switches. This entity takes about 12ms to complete.
SPL (Secondary Program Loader) is the first stage of the bootloader. It consists of code that fits into the SRAM and is run by the Main R5. R5 SPL initializes some peripherals and, most importantly, DDR. Subsequently, it loads TF-A, OPTEE and A53 SPL into DDR and then jumps to TF-A. A53 SPL is an intermediate Linux friendly bootloader stage used to jump to U-boot.
TF-A (Trusted Firmware - Arm) provides a reference trusted code base for the Armv8 architecture. It implements various ARM interface standards. The binary is typically included in the bootloader binary. It starts in the early stages of U-Boot. Without ATF, the kernel cannot setup the services which need to be executed in the Secure World environment
OPTEE (Trusted Execution Environment) is designed as a companion to a non-secure Linux kernel running on Arm; Cortex-A cores using the TrustZone technology.
U-boot proper is the second stage bootloader. It offers a flexible way to load and start the Linux Kernel and provides a minimal set of tools to interact with the board’s hardware via a command line interface. It runs from DRAM, initializing additional hardware devices (network, USB, DSI/CSI, etc.). Then, it loads and prepares the device tree (FDT). The main task handled by the U-Boot is the loading and starting of the kernel image itself.
Kernel runs from DDR and takes over the system completely.
Userspace process is executed by a user in the operating system, rather than being part of the operating system itself. It might also be executed by an init system (e.g. systemd), but it isn’t part of the kernel. User space is the area of memory that non-kernel applications run in non-privileged execution mode.
6.1.11.1.4. Optimized bootflow
This section describes an overview of the modifications that can be done to achieve shorter boot times. The succeeding sections will detail how to achieve these sequences.
6.1.11.2. Reducing bootloader time
Falcon Mode:
This is a feature that allows us to skip A53 SPL and U-boot proper and jump to TF-A and then the kernel directly saving ~5s in our boot time. It is implemented differently depending on the bootloader.
Choosing the right bootmedia:
Part
Bootmedia
Theoretical Read performance (MBps)
Default offering
AM62 EVM
AM62A EVM
AM62P EVM
S28HS512T
OSPI-NOR
330
YES
NO
YES
W35N01JWTBAG
OSPI-NAND
50
NO
YES
NO
MTFC16GAPALBH-IT
eMMC (HS200)
200
YES
YES
NO
MTFC32GAZAQHD-IT
eMMC (HS400)
400
NO
NO
YES
You can track current performance numbers here: AM62AX
Flashing binaries:
6.1.11.2.1. Secondary Boot Loader (SBL)
The following section will reference AM62AX MCU+ SDK’s SBL examples.
Removing unnecessary prints
The default examples contain a large number of prints that impact boot time and need to be removed.
- Navigate to the main.c of your example and remove calls to the following functions
Bootloader_profileAddCore
Bootloader_profileAddProfilePoint
Bootloader_profileUpdateAppimageSize
Bootloader_profileUpdateMediaAndClk
Bootloader_profilePrintProfileLog
Navigate to
<mcu-plus-sdk>/source/drivers/device_manager/sciclient_direct/sciclient_direct_wrapper.c
and remove theDebugP_log
calls from Sciclient_getVersionCheck function.
Note
If an RTOS example is being used, remove prints of the additional files in
<mcu-plus-sdk>/examples/drivers/boot/common/
Skipping OSPI PHY tuning (in case of OSPI bootmedia)
PHY calibration allows the flash to function at maximum performance but this tuning consumes a significant amount of time that is dependent on the current algorithm implementation. In the SDK, only stage 2 examples are skipped by default.
To validate this, do not remove the log prints from the previous subsection and observe the
SBL Board_driversOpen
parameter. Currently, the tuning algorithm takes 22ms to complete. If skipping is successful, it should drop down to ~150us.Open the relevant example’s syscfg by navigating into
<mcu-plus-path>/examples/drivers/boot/<example>/<soc-name>/<example-type>/ti-arm-clang/
and runningmake syscfg-gui
. Navigate to the OSPI section and enable OSPI skip Tuning option. Ensure that Enable PHY is checked as well.Enabling DMA in the bootloader
Open the relevant example’s syscfg and navigate to the Bootloader section and click on Enable DMA if not enabled by default.
6.1.11.3. Reducing Linux kernel boot time
Adding
quiet
To save 8+ seconds, add “quiet” argument in the Kernel “bootargs”. It suppresses most messages during the Linux start-up sequence. To access the logs after login, you can run
dmesg
for the logs to be printed. By default, quiet is at a loglevel of 4 and should be adequate to suppress the majority of logs but if finer control is requiredquiet
can be replaced withloglevel=x
where x can be 1-14.The kernel looks for bootargs in 3 places: U-Boot environment variable, the device tree and the kernel config. You can add the following in any of the 3 locations.
U-Boot console:
U-Boot=> setenv bootargs 'console=ttyS2,115200n8 fsck.mode=skip sysrq_always_enabled quiet'
Device Tree:
chosen { ... bootargs = "console=ttyS2,115200n8 earlycon=ns16550a,mmio32,0x02800000 quiet"; ... };
Kernel config:
CONFIG_CMDLINE="console=ttyS2,115200n8 earlycon=ns16550a,mmio32,0x02800000 quiet"
Using a smaller kernel system
By default, the kernel image contains a lot of drivers and filesystems to enable the functionality supported for the board but are not necessary for early boot. Trim kernel capabilities by using
ti_arm64_prune.config - removes irrelevant platform drivers
ti_early_display.config - converts the majority of functionality into loadable modules
Usage:
kernel$ make ARCH=arm64 CROSS_COMPILE=<path-to-compiler>/aarch64-none-linux-gnu- defconfig ti_arm64_prune.config ti_early_display.config
Tip
You can access
<kernel-path>/kernel/configs/ti_early_display.config
and see the breakdown of how much time is saved by disabling each module and take a call on whether the functionality is required and its effect on boot timeDisabling nodes in DT
Unnecessary nodes can be disabled by adding
status = "disabled"
to the nodes. While this will not directly affect boot time, the minimal kernel will not throw probe errors during boot.
6.1.11.4. Reducing userspace boot time
It is recommended to use a tiny intermediate filesystem that can be used to run applications early with minimal configuration and then mount into a filesystem with full functionality. For this purpose, the installer packages a filesystem: <PSDK_PATH>/filesystem/<machine>/tisdk-tiny-initramfs-am62xx-evm.cpio
that can be used as an initramfs.
In order to package the filesystem as initramfs into the kernel, follow these steps:
Extract the cpio archive to a preferred location via GUI or
$ mkdir output $ cd output $ cpio -idv < tisdk-tiny-initramfs-am62xx-evm.cpio
Edit the kernel config:
.config:
CONFIG_INITRAMFS_SOURCE="/path/to/filesystem"
or using
menuconfig
:kernel$ make ARCH=arm64 CROSS_COMPILE=<path-to-compiler>/aarch64-none-linux-gnu- menuconfig General setup -> Initial RAM filesystem and RAM disk (initramfs/initrd) support -> Initramfs source file(s) /path/to/filesystem
Rebuild the kernel
kernel$ make ARCH=arm64 CROSS_COMPILE=<path-to-compiler>/aarch64-none-linux-gnu- Image -j64
The time taken to boot filesystem is measured from Process ID 1(PID1) to login prompt which is 1.98s with the initramfs filesystem. In order to further drop this time, you can:
Caution
Please ensure that you do not mistakenly affect the host computer while making the below changes
Remove startup scripts from the tiny filesystem
host$ rm <filesystem>/etc/rc5.d/* host$ cd <filesystem>/etc/rcS.d host$ rm S02banner.sh S04udev S05checkroot.sh S06modutils.sh S07bootlogd S29read-only-rootfs-hook.sh S36bootmisc.sh S37populate-volatile.sh S38dmesg.sh S38urandom
This shaves off 1.536s from filesystem boot time. udev alone takes up 1.152s.
Remove package manager, console logo and add /dev/null in the filesystem
host$ rm -r <filesystem>/usr/lib/opkg host$ rm <filesystem>/etc/issue host$ cd <filesystem>/dev host$ mknod -m 0600 null c 1 3
This removes 52ms from the boot up time.
6.1.11.5. Measurements
The following section displays the time taken by each stage to start and end. Four profile points were used:
PMIC time is taken from the datasheet
MCU_PORz (White)
SBL_start (Brown)
SBL_end (Red)
Kernel_end (Gold)
SBL_start is set to LOW as soon as the Bootloader comes up. To enable this, navigate into the main.c of your bootloader (Example: <mcu-plus-path>/examples/drivers/boot/sbl_ospi_nand_linux_multistage/sbl_nand_ospi_linux_stage1/<soc-name>/<example-type>/main.c
) and add the following section to set MCU_I2C0_SCL (Pin #24 on the MCU Header J11) to LOW. It can be modified for any other pin as well.
#include <drivers/gpio.h>
/* GPIO PIN Macros */
#define CONFIG_GPIO0_BASE_ADDR (CSL_MCU_GPIO0_BASE)
#define CONFIG_GPIO0_PIN (17)
#define CONFIG_GPIO0_DIR (GPIO_DIRECTION_OUTPUT)
#define CONFIG_GPIO0_TRIG_TYPE (GPIO_TRIG_TYPE_NONE)
#define CONFIG_GPIO_NUM_INSTANCES (1U)
static Pinmux_PerCfg_t gPinMuxMcuCfg[] = {
/* MCU_GPIO0 pin config MCU_GPIO0_17 -> MCU_I2C0_SCL (E11) */
{
PIN_MCU_I2C0_SCL,
( PIN_MODE(7) | PIN_INPUT_ENABLE | PIN_PULL_DISABLE )
},
{PINMUX_END, PINMUX_END}
};
int main()
{
Pinmux_config(gPinMuxMcuCfg, PINMUX_DOMAIN_ID_MCU); // Configure PinMux
GPIO_setDirMode(CONFIG_GPIO0_BASE_ADDR, CONFIG_GPIO0_PIN, CONFIG_GPIO0_DIR); //Set GPIO direction
GPIO_pinWriteLow(CONFIG_GPIO0_BASE_ADDR, CONFIG_GPIO0_PIN); // Set GPIO state to LOW
...
For this measurement, the OSPI NAND example was used (<mcu_plus_sdk>/examples/drivers/boot/sbl_ospi_nand_linux_multistage/sbl_ospi_nand_linux_stage2/am62ax-sk/r5fss0-0_nortos/main.c
). Turn the GPIO HIGH after App_loadLinuxImages
function.
drivers/tty/serial/8250/8250_omap.c
). We will be using GPIO0_39 pin on the User Expansion Header (J4).In the relevant dts, add the following:
&main_gpio0 {
...
status = "okay";
pinctrl-names = "default";
pinctrl-0 = <&test_gpio_default>;
};
In &main_pmx0 node, add the relevant IOPAD:
test_gpio_default: test-gpio {
pinctrl-single,pins = <
AM62AX_IOPAD(0x00a0, PIN_INPUT, 7) /* (K17) GPMC0_WPn.GPIO0_39 */
>;
};
In the &main_uart0 node, connect the GPIO by adding
test-gpios = <&main_gpio0 39 GPIO_ACTIVE_HIGH>;
-622ms includes SBL C7x image load
[2024-03-29 13:02:19.196] NOTICE: BL31: v2.10.0(release):v2.10.0-367-g00f1ec6b87-dirty
[2024-03-29 13:02:19.196] NOTICE: BL31: Built : 16:09:05, Feb 9 2024
[2024-03-29 13:02:19.991]
[2024-03-29 13:02:19.991] am62xx-evm login:
Stage |
Time (ms) |
---|---|
PMIC (TPS65931211) |
30 |
ROM |
48 |
SBL |
622 |
Linux Kernel |
450 |
Tiny FS |
345 |
Total |
1495 |
Bootloader loads HSM binary (9KB), MCU/DSP image (50KB) and Kernel+FS image (21MB) in the above measurements
6.1.11.6. Additional notes
While AM62A ships with OSPI-NAND, it can be replaced with the OSPI-NOR flash with ease. NAND flash support needs to be replaced with NOR flash support
SPL:
Rebuild U-Boot with OSPI NOR support.
SBL:
Update the Flash type in Flash section in syscfg to reflect NOR. Save and build SBL.
6.1.11.7. Troubleshooting
If the following logs are noticed and kernel does not come up, it suggests that TF-A is not receiving data from DM which probably hasn’t had enough time to run completely
ERROR: Timeout waiting for thread SP_RESPONSE to fill ERROR: Thread SP_RESPONSE verification failed (-60) ERROR: Message receive failed (-60) ERROR: Failed to get response (-60) ERROR: Transfer send failed (-60)