11.4. LTO Debug Aid¶
11.4.1. Debuggability of Applications Linked with LTO¶
When a given application is linked with LTO enabled, the compiler-generated content of a given function can change drastically from what one would intuitively expect to be generated from the definition of the function as it appears in a C/C++ source file. This is because enabling LTO during the link instructs the linker to recompile the application at link-time with inter-module optimizations turned on. Such a level of optimization provides significant benefit in terms of code size savings and performance improvements, but it also hinders the debuggability of the application.
To mitigate this decrease in debuggability, the linker can be instructed to include pre- and post-LTO symbol reference information into the linker-generated map file. This information can provide knowledge about how a given function is transformed during the linker-invoked recompile of an application.
11.4.2. Generating Pre-/Post-LTO Symbol Reference Information¶
If Link-Time Optimization (LTO) is enabled and a linker-generated map file is requested on the linker command line (--mapfile option), then you can instruct the linker to generate additional information into the map file to help with debugging applications that are built with LTO using the --mapfile_contents=ltosymrefs option.
For each function that is a candidate to be included in the link, the map file will include:
Function Definition Details
demangled (and mangled, if applicable) name of function
function symbol’s binding (local, global, or weak)
pre-LTO recompile size of function
post-LTO recompile size of function
run-time address of function symbol
List of Pre-LTO Recompile Symbol References
demangled (and mangled, if applicable) name of referenced symbol
type of referenced symbol (function, object, section)
location in function body where symbol reference(s) occur(s)
List of Post-LTO Recompile Symbol References
demangled (and mangled, if applicable) name of referenced symbol
type of referenced symbol (function, object, section)
location in function body where symbol reference(s) occur(s)
With this information in hand, you can begin to understand how a given function has been transformed by the LTO recompile. The following example explores how to gain insights from the display of pre- and post-LTO symbol references in the map file.
11.4.3. Example¶
Consider a simple application with two source files:
debug_aid_1.c:
extern void f1(void);
extern void f2(void);
extern void f3(void);
int main() {
f1();
f2();
f3();
return 0;
}
debug_aid_2.c:
#include <stdio.h>
__attribute__((section(".one_section")))
void f1(void) {
printf("this is f1\n");
}
__attribute__((section(".one_section")))
void f2(void) {
printf("this is f2\n");
}
__attribute__((section(".one_section")))
void f3(void) {
printf("this is f3\n");
}
We’ll compile and link these files together with LTO enabled (-flto), instructing the linker to generate a mapfile (-ma.map) with pre-/post-LTO symbol reference information included (--mapfile_contents=ltosymrefs):
%> tiarmclang -mcpu=cortex-m4 -flto -Oz debug_aid_1.c debug_aid_2.c -o a.out \
-Wl,-llnk.cmd,-ma.map,--mapfile_contents=ltosymrefs
Now consider an excerpt from the linker-generated map file:
%> cat a.map
...
PRE/POST-LTO FUNCTION SYMBOL REFERENCES
...
Function: f1
---------
Binding: global
Pre-LTO Size: 12
Post-LTO Size: 0
Run Address: 0x00000001
Pre-LTO Symbol References
-------------------------
Symbol: .rodata.str1.3318338828525585858.1
Type: section
Offset: 0x00000008
Symbol: puts
Type: function
Offset: 0x00000002
Function: f2
---------
Binding: global
Pre-LTO Size: 12
Post-LTO Size: 0
Run Address: 0x0000000d
Pre-LTO Symbol References
-------------------------
Symbol: .rodata.str1.15142827100918509680.1
Type: section
Offset: 0x00000014
Symbol: puts
Type: function
Offset: 0x0000000e
Function: f3
---------
Binding: global
Pre-LTO Size: 12
Post-LTO Size: 0
Run Address: 0x00000019
Pre-LTO Symbol References
-------------------------
Symbol: .rodata.str1.11035941194088382892.1
Type: section
Offset: 0x00000020
Symbol: puts
Type: function
Offset: 0x0000001a
...
Function: main
---------
Binding: local
Pre-LTO Size: 18
Post-LTO Size: 36
Run Address: 0x00000b15
Pre-LTO Symbol References
-------------------------
Symbol: f1
Type: function
Offset: 0x00000002
Symbol: f2
Type: function
Offset: 0x00000006
Symbol: f3
Type: function
Offset: 0x0000000a
Post-LTO Symbol References
--------------------------
Symbol: .rodata.str1.13341988216064122737.1
Type: section
Offset: 0x00000014
Symbol: .rodata.str1.16633329539669029678.1
Type: section
Offset: 0x0000001c
Symbol: .rodata.str1.403218229802607084.1
Type: section
Offset: 0x00000020
Symbol: puts
Type: function
Offset: 0x00000018
In the above excerpt of the map file, we can make the following observations:
The definitions of f1(), f2(), and f3() are removed from the application during the LTO recompile. This is indicated by the fact that the Post-LTO Size value for each of these functions is 0 and the absence of a Post-LTO Symbol References list for each function.
The definition of main() has been transformed by the LTO recompile:
f1(), f2(), and f3() have been inlined into main(). This is indicated by the fact that the references to f1(), f2(), and f3() included in the list of Pre-LTO Symbol References for main() do not appear in main()’s Post-LTO Symbol References list.
Instead of three separate references to puts() in main()’s Pre-LTO Symbol References list, there is only a single reference to puts() in the Post-LTO Symbol References list for main().
In fact, if we look at the disassembly of main() from the linked application:
%> tiarmobjdump -d -S a.out
...
00000b14 <main>:
b14: b510 push {r4, lr}
b16: 4804 ldr r0, [pc, #0x10] @ 0xb28 <main+0x14>
b18: 4c04 ldr r4, [pc, #0x10] @ 0xb2c <main+0x18>
b1a: 47a0 blx r4
b1c: 4804 ldr r0, [pc, #0x10] @ 0xb30 <main+0x1c>
b1e: 47a0 blx r4
b20: 4804 ldr r0, [pc, #0x10] @ 0xb34 <main+0x20>
b22: 47a0 blx r4
b24: 2000 movs r0, #0x0
b26: bd10 pop {r4, pc}
b28: 34 0c 00 00 .word 0x00000c34
b2c: 99 0b 00 00 .word 0x00000b99
b30: 3f 0c 00 00 .word 0x00000c3f
b34: 4a 0c 00 00 .word 0x00000c4a
...
We can make some additional observations:
Instead of calling puts() directly, the address of puts() listed in the embedded constant table is loaded into r4 and each call to puts() is effected with a blx r4 indirect call instruction.
Each load of r0 corresponds to the loading of a string constant prior to each call to puts()
Even though the size of main() grows from 18 to 36 bytes due to inlining done during the LTO recompile, the definitions of f1(), f2(), and f3() are removed from the link leaving a net savings of 18 bytes.