11.4. LTO Debug Aid¶

11.4.1. Debuggability of Applications Linked with LTO¶

When a given application is linked with LTO enabled, the compiler-generated content of a given function can change drastically from what one would intuitively expect to be generated from the definition of the function as it appears in a C/C++ source file. This is because enabling LTO during the link instructs the linker to recompile the application at link-time with inter-module optimizations turned on. Such a level of optimization provides significant benefit in terms of code size savings and performance improvements, but it also hinders the debuggability of the application.

To mitigate this decrease in debuggability, the linker can be instructed to include pre- and post-LTO symbol reference information into the linker-generated map file. This information can provide knowledge about how a given function is transformed during the linker-invoked recompile of an application.

11.4.2. Generating Pre-/Post-LTO Symbol Reference Information¶

If Link-Time Optimization (LTO) is enabled and a linker-generated map file is requested on the linker command line (--mapfile option), then you can instruct the linker to generate additional information into the map file to help with debugging applications that are built with LTO using the --mapfile_contents=ltosymrefs option.

For each function that is a candidate to be included in the link, the map file will include:

Function Definition Details
- demangled (and mangled, if applicable) name of function
- function symbol’s binding (local, global, or weak)
- pre-LTO recompile size of function
- post-LTO recompile size of function
- run-time address of function symbol
List of Pre-LTO Recompile Symbol References
- demangled (and mangled, if applicable) name of referenced symbol
- type of referenced symbol (function, object, section)
- location in function body where symbol reference(s) occur(s)
List of Post-LTO Recompile Symbol References
- demangled (and mangled, if applicable) name of referenced symbol
- type of referenced symbol (function, object, section)
- location in function body where symbol reference(s) occur(s)

With this information in hand, you can begin to understand how a given function has been transformed by the LTO recompile. The following example explores how to gain insights from the display of pre- and post-LTO symbol references in the map file.

11.4.3. Example¶

Consider a simple application with two source files:

debug_aid_1.c:

    extern void f1(void);
    extern void f2(void);
    extern void f3(void);

    int main() {
      f1();
      f2();
      f3();

      return 0;
    }

debug_aid_2.c:

    #include <stdio.h>

    __attribute__((section(".one_section")))
    void f1(void) {
      printf("this is f1\n");
    }

    __attribute__((section(".one_section")))
    void f2(void) {
      printf("this is f2\n");
    }

    __attribute__((section(".one_section")))
    void f3(void) {
      printf("this is f3\n");
    }

We’ll compile and link these files together with LTO enabled (-flto), instructing the linker to generate a mapfile (-ma.map) with pre-/post-LTO symbol reference information included (--mapfile_contents=ltosymrefs):

%> tiarmclang -mcpu=cortex-m4 -flto -Oz debug_aid_1.c debug_aid_2.c -o a.out \
   -Wl,-llnk.cmd,-ma.map,--mapfile_contents=ltosymrefs

Now consider an excerpt from the linker-generated map file:

%> cat a.map
...

PRE/POST-LTO FUNCTION SYMBOL REFERENCES

...

Function: f1
---------
  Binding:       global
  Pre-LTO Size:  12
  Post-LTO Size: 0
  Run Address:   0x00000001

Pre-LTO Symbol References
-------------------------
Symbol:       .rodata.str1.3318338828525585858.1
Type:         section
Offset:       0x00000008

Symbol:       puts
Type:         function
Offset:       0x00000002


Function: f2
---------
  Binding:       global
  Pre-LTO Size:  12
  Post-LTO Size: 0
  Run Address:   0x0000000d

Pre-LTO Symbol References
-------------------------
Symbol:       .rodata.str1.15142827100918509680.1
Type:         section
Offset:       0x00000014

Symbol:       puts
Type:         function
Offset:       0x0000000e

Function: f3
---------
  Binding:       global
  Pre-LTO Size:  12
  Post-LTO Size: 0
  Run Address:   0x00000019

Pre-LTO Symbol References
-------------------------
Symbol:       .rodata.str1.11035941194088382892.1
Type:         section
Offset:       0x00000020

Symbol:       puts
Type:         function
Offset:       0x0000001a

...

Function: main
---------
  Binding:       local
  Pre-LTO Size:  18
  Post-LTO Size: 36
  Run Address:   0x00000b15

Pre-LTO Symbol References
-------------------------
Symbol:       f1
Type:         function
Offset:       0x00000002

Symbol:       f2
Type:         function
Offset:       0x00000006

Symbol:       f3
Type:         function
Offset:       0x0000000a

Post-LTO Symbol References
--------------------------
Symbol:       .rodata.str1.13341988216064122737.1
Type:         section
Offset:       0x00000014

Symbol:       .rodata.str1.16633329539669029678.1
Type:         section
Offset:       0x0000001c

Symbol:       .rodata.str1.403218229802607084.1
Type:         section
Offset:       0x00000020

Symbol:       puts
Type:         function
Offset:       0x00000018

In the above excerpt of the map file, we can make the following observations:

The definitions of f1(), f2(), and f3() are removed from the application during the LTO recompile. This is indicated by the fact that the Post-LTO Size value for each of these functions is 0 and the absence of a Post-LTO Symbol References list for each function.
The definition of main() has been transformed by the LTO recompile:
- f1(), f2(), and f3() have been inlined into main(). This is indicated by the fact that the references to f1(), f2(), and f3() included in the list of Pre-LTO Symbol References for main() do not appear in main()’s Post-LTO Symbol References list.
- Instead of three separate references to puts() in main()’s Pre-LTO Symbol References list, there is only a single reference to puts() in the Post-LTO Symbol References list for main().

In fact, if we look at the disassembly of main() from the linked application:

%> tiarmobjdump -d -S a.out
...

00000b14 <main>:
     b14: b510          push    {r4, lr}
     b16: 4804          ldr     r0, [pc, #0x10]         @ 0xb28 <main+0x14>
     b18: 4c04          ldr     r4, [pc, #0x10]         @ 0xb2c <main+0x18>
     b1a: 47a0          blx     r4
     b1c: 4804          ldr     r0, [pc, #0x10]         @ 0xb30 <main+0x1c>
     b1e: 47a0          blx     r4
     b20: 4804          ldr     r0, [pc, #0x10]         @ 0xb34 <main+0x20>
     b22: 47a0          blx     r4
     b24: 2000          movs    r0, #0x0
     b26: bd10          pop     {r4, pc}
     b28: 34 0c 00 00   .word   0x00000c34
     b2c: 99 0b 00 00   .word   0x00000b99
     b30: 3f 0c 00 00   .word   0x00000c3f
     b34: 4a 0c 00 00   .word   0x00000c4a

...

We can make some additional observations:

Instead of calling puts() directly, the address of puts() listed in the embedded constant table is loaded into r4 and each call to puts() is effected with a blx r4 indirect call instruction.
Each load of r0 corresponds to the loading of a string constant prior to each call to puts()

Even though the size of main() grows from 18 to 36 bytes due to inlining done during the LTO recompile, the definitions of f1(), f2(), and f3() are removed from the link leaving a net savings of 18 bytes.