9.1. Benefits of Using LTO - Enabling Inter-Module Optimizations¶
9.1.1. A Simple Example¶
Consider a simple example application that demonstrates just one of the potential benefits of using LTO to enable inter-module optimization …
Suppose we have a series of source files in which many of the same string constants are referenced repeatedly and across multiple source files.
If we compile and link without LTO turned on:
%> c29clang -mcpu=c29.c0 -Oz constant_merge_test.c ic_s10.c ic_s20.c ic_s30.c ic_s40.c s10.c s20.c s30.c s40.c -o no_lto.out -Wl,-llnk.cmd,-mno_lto.map
The linker generated map file, no_lto.map, reveals that the size of the .rodata section where all of the string constants are defined is reasonably large:
...
SEGMENT ALLOCATION MAP
run origin load origin length init length attrs members
---------- ----------- ---------- ----------- ----- -------
00000020 00000020 00007a4c 00007a4c r-x
00000020 00000020 00004ad2 00004ad2 r-- .rodata
...
...
But if we then compile with LTO enabled:
%> c29clang -mcpu=c29.c0 -flto -Oz constant_merge_test.c ic_s10.c ic_s20.c ic_s30.c ic_s40.c s10.c s20.c s30.c s40.c -o with_lto.out -Wl,-llnk.cmd,-mwith_lto.map
Then the map file, with_lto.map, shows that the .rodata output section is significantly smaller in the LTO-enabled build:
...
SEGMENT ALLOCATION MAP
run origin load origin length init length attrs members
---------- ----------- ---------- ----------- ----- -------
00000020 00000020 00005b84 00005b84 r-x
...
00004530 00004530 00001674 00001674 r-- .rodata
...
The use of LTO in this example enables the compiler to perform an inter-module constant merging optimization that results in a savings of 0x4ad2 - 0x1674 -> 0x345e (13406) bytes in the .rodata section. Note that in this example, the savings in the size of the .rodata section is offset somewhat by increased code size in other sections like .text. The net savings is 0x7a4c - 0x5b84 -> 0x1ec8 (7880) bytes.
9.1.2. Code Size Reduction Due to Use of LTO¶
Significant code size savings can be realized by simply enabling the LTO feature in the build of an application.
9.1.3. Performance Improvement Due to Use of LTO¶
Enabling LTO during an application build can also provide significant speedup. With LTO enabled, example applications built on C29x devices ran significantly faster than when the same applications were built without LTO enabled.
Compile with the -O3 and -flto compiler options to prioritize performance improvement optimizations and enable LTO.
Note
Increased Function Inlining
Using LTO may result in increased function inlining, which may improve performance as well as code size generally but may result in larger stack frames. This may require the user to either increase the size of the of the stack or else prevent certain functions from being inlined that are known to require large stack frames.
To debug this, it is recommended that users use the CCS Stack View to see a view of the static stack usage of each function in the application. See Stack Usage View in CCS for more information. Using the Stack Usage View requires that source code be built with debug enabled. This feature relies on the –call_graph capability provided by the c29ofd - Object File Display Utility.