1.3.8. Optimization Options¶
To enable optimization passes in the tiarmclang compiler, select a level of optimization from among the following -O[0|1|2|3|fast|g|s|z] options. In general, the options below represent various levels of optimization with some options designed to favor smaller compiler generated code size over performance, while others favor performance at the cost of increased compiler generated code size.
For a more precise list of optimizations performed for each level, please see Optimizations Performed at Each Level.
1.3.8.1. Optimization Level Options¶
Among the options listed below, -Oz is recommended as the optimization option to use if small compiler generated code size is a priority for an application. Use of -Oz will still retain performance gains from many of the -O2 level optimizations that are performed.
- -O0¶
No optimization. This level of optimization does not inhibit compile time and generates debug-friendly code.
- -O1, -O¶
Enable restricted optimizations, providing a good trade-off between code size and debug-ability.
- -O2¶
Enable most optimizations, but some optimizations that require significantly additional compile time are disabled.
- -O3¶
Enable all optimizations available at -O2 plus others that require additional compile time to perform.
- -Ofast¶
Enable all optimizations available at -O3 plus additional aggressive optimizations with potential for additional performance gains, but also not guaranteed to be in strict compliance with language standards.
- -Og¶
Enable restricted optimizations while preserving debug-ability.
- -Os¶
Enable all optimizations available at -O2 plus additional optimizations that are designed to reduce code size while mitigating negative impacts on performance.
- -Oz¶
Enable all optimizations available at -O2 plus additional optimizations to further reduce code size with the risk of sacrificing performance.
Note
Optimization Option Recommendations
The -Oz option is recommended for optimizing code size.
The -O3 option is recommended for optimizing performance, but it is likely to increase compiler generated code size.
1.3.8.2. Optimizations Performed at Each Level¶
This table describes some examples of optimizations performed at each optimization level.
Optimization Level
Optimizations Performed
- -O0¶
None
- -O1¶
Control Flow Simplification
Merge contiguous icmps into a memcmp
memcpy/memset/memcmp inlining
Constant Hoisting
Partially inline calls to library functions
Inline for always_inline functions
Global Variable Merging
Merge disjoint stack slots
Loop Strength Reduction
Loop Invariant Code Motion
Common Subexpression Elimination
Dead Argument Elimination
Machine code sinking
Peephole optimization
Tail Predication
Tail Duplication
Load/store optimization
Simple Register Coalescing
Copy Propagation
Conditional Constant Propagation
Called Value Propagation
Control Flow optimization
If-conversion
Thumb2 instruction size reduction
Dead Code Elimination
Loop Vectorization
Printf function specialization
Small memcpy/memset function specialization
Conditionally eliminate dead library calls
Loop Rotation
Loop Unrolling
- -O2¶
Performs all (-O1) optimizations, plus:
Function Integration/Inlining
Instruction speculation
Value Propagation
Jump Threading (non-DFA)
Tail Call Elimination
Merged Load/Store Motion
Global Value Numbering
Memory Dependence Analysis
Dead Store Elimination
Superword-Level Parallelism (SLP) vectorization
Combine redundant instructions
Dead Global Elimination
Global Duplicate Constant Merging
Fast memcpy/memset function specialization
Align loop target boundaries to 16bytes (Cortex-R4/R5)
- -O3¶
Performs all (-O2) optimizations tuned for speed, plus:
Replace functions with supported intrinsics
Additional alias analysis and loop optimization
Aggressive Function Inlining
Call-site splitting
Promote ‘by reference’ arguments to scalars
Combine pattern based expressions
- -Ofast¶
Performs all (-O3) optimizations, plus:
Allow optimizations to treat the sign of a zero argument or result as insignificant
Assumes no Inf values
Assumes no NaN values
Enable optimizations that make unsafe assumptions about IEEE math
Allow reassociation transformations for floating-point instructions
Allow optimizations to use the reciprocal of an argument rather than perform division
Allow more aggressive, lossy floating point math operations that enhance speed
- -Og¶
Performs all (-O2) optimizations, but disables the following:
No Loop Vectorization
No function inlining except for always_inline functions
No instruction speculation
No Jump Threading
No Value Propagation
No Tail Call Elimination
No Merged Load/Store Motion
No Global Value Numbering
No Memory Dependence Analysis
No Superword-Level Parallelism (SLP) Vectorization
No Dead Global Elimination
No Global Duplicate Constant Merging
- -Os¶
Performs all (-O2) optimizations tuned for code size, plus:
Previously enabled optimizations tuned for code size
Small memcpy/memset function specialization
Minimal memcpy/memset/memcmp inlining
Don’t conditionally eliminate dead library calls
Don’t align loop target boundaries to 16bytes (Cortex-R4/R5)
- -Oz¶
Performs all (-Os) optimizations, plus:
Machine Outlining
No Loop Vectorization
Less aggressive optimizations that impact code size
Don’t align loop target boundaries to 16bytes (Cortex-R4/R5)
1.3.8.3. More Specialized Optimization Options¶
1.3.8.3.1. Floating-Point Arithmetic¶
- -ffp-model=<precise|strict|fast>¶
-ffp-model is an umbrella option that is used to establish a model of floating-point semantics that the compiler will operate under. The available arguments to the -ffp-model option will imply settings for the other, single-purpose floating-point options, including -ffast-math, -ffp-contract, and frounding-math (described below).
The available arguments to the -ffp-model option are:
precise - with the exception of floating-point contraction optimizations, all other optimizations that are not value-safe on floating-point data are disabled (ffp-contract=on and -fno-fast-math). The tiarmclang compiler assumes this floating-point model by default.
strict - disable floating-point contraction optimizations, honor dynamically-set floating-point rounding modes (-frounding-math), and disable all ‘fast-math’ floating-point optimizations (-fno-fast-math).
fast - enable all ‘fast-math’ floating-point optimizations (-ffast-math) and enable floating-point contraction optimizations across C/C++ statements (-ffp-contract=fast).
- -ffast-math, -fno-fast-math¶
Enable or disable ‘fast-math’ mode during compilation. By default, the ‘fast-math’ mode is disabled. Enabling ‘fast-math’ mode allows the compiler to perform agressive, not necessarily value-safe, assumptions about floating-point math, such as:
Assume floating-point math is consistent with regular algebraic rules for real numbers (e.g. addition and multiplication are associative, x/y == x * 1/y, and (a + b) * c == a * c + b * c).
Operands to floating-point operations are never NaNs or Inf values.
+0 and -0 are interchangeable.
Use of the ‘fast-math’ mode also instructs the compiler to predefine the __FAST_MATH__ macro symbol.
- -ffp-contract=<fast|on|off|fast-honor-pragmas>¶
Instruct the compiler whether and to what degree it is allowed to form fused floating-point operations, such as floating-point multiply and add (FMA) instructions. This optimization is also known as floating-point contraction. Fused floating-point operations are permitted to produce more precise results than would be otherwise computed if the operations were performed separately.
The available arguments to the -ffp-contract option are:
fast - allow fusing of floating-point operations across C/C++ statements, and ignore any FP_CONTRACT or clang fp contract pragmas that would otherwise affect the compiler’s ability to apply floating-point contraction optimizations.
on - allow floating-point contraction within a given C/C++ statement. The floating-point contraction behavior can be affected by the use of FP_CONTRACT or clang fp contract pragmas.
off - disable all floating-point contraction optimizations.
fast-honor-pragma - same as the fast argument, but the user can alter the behavior via the use of the FP_CONTRACT and/or clang fp contract pragmas.
- -frounding-math, -fno-rounding-math¶
By default, the compiler willl assume that the -fno-rounding-mode option is in effect. This will instruct the compiler to always round-to-nearest for floating-point operations.
The C standard runtime library provides functions such as fesetround and fesetenv that allow you to dynamically alter the floating-point rounding mode. If the -frounding-math option is specified, the compiler will honor any dynamically-set floating-point rounding mode. This can be used to prevent optimizations that may affect the result of a floating-point operation if the current rounding mode has changed or is different from the default (round-to-nearest). For example, floating-point constant folding may be inhibited if the result is not exactly representable.
- -fsigned-zeros, -fno-signed-zeros¶
Assume the presence of signed floating-point zero values. Use of the -fno-signed-zeros option can improve code if the compiler can assume that it doesn’t need to account for the presence of signed floating-point zero values.
- -fhonor-nans, -fno-honor-nans¶
Instruct the compiler to check for and properly handle floating-point NaN values. Use of the -fno-honor-nans can improve code if the compiler can assume that it doesn’t need to check for and enforce the proper handling of floating-point NaN values.
1.3.8.3.2. Function Outlining¶
- -moutline, -moutline-inter-function, -mno-outline¶
Function outlining (aka “machine outlining”) is an optimization that saves code size by identifying recurring sequences of machine code and replacing each instance of the sequence with a call to a new function that the identified sequence of operations.
Function outlining is enabled when the -Oz option is specified on the tiarmclang command-line. There are 3 settings for the function outlining optimization when using the -Oz option:
- -moutline¶
The -moutline option is the default setting and will perform machine outlining within functions. This is less aggressive than -moutline-inter-function, but it is guaranteed to be applied only when doing so will reduce the net code size.
- -moutline-inter-function¶
The -moutline-inter-function option can be specified in combination with the -Oz option to enable inter-function outlining. While this is the more aggressive of the function outlining settings, it does not always guarantee an overall code size reduction if, for example, outlining occurs across multiple functions in a given compilation unit yet only one of those functions is included in the linked application, the application will include the outlined code as well as the additional instructions required to call that code. However, it is likely to be beneficial when all functions defined in a given compilation unit are included in the linked application.
- -mno-outline¶
The -mno-outline option can be used to disable function outlining for a given compilation unit when using the -Oz option.
1.3.8.3.3. Inlining¶
- -finline-functions, -fno-inline-functions¶
Inline suitable functions. The -fno-inline-functions will disable this optimization.
- -finline-hint-functions¶
Inline functions which are explicitly or implicitly marked inline.
- -mllvm -arm-memset-max-stores=<n>¶
When optimization is turned on during a compilation, the tiarmclang compiler will inline calls to runtime support routines memset or memclr if the size of the data is below a certain threshold. For example, in the following source file:
#include <string.h> struct { int t1; int t2; int t3; int t4; short t5; long t6; } my_struct_inline; void func() { memset(&my_struct_inline, 0, sizeof(my_struct_inline)); }
When compiled with -O[1|2|3|fast], the call to memset will be inlined if the clearing of the my_struct_inline data object can be done with <= 8 store instructions:
%> tiarmclang -mcpu=cortex-m0 -O3 -S struct_inline.c %> cat struct_inline.s ... func: ldr r0, .LCPI0_0 movs r1, #0 str r1, [r0] str r1, [r0, #4] str r1, [r0, #8] str r1, [r0, #12] str r1, [r0, #16] str r1, [r0, #20] bx lr .p2align 2 .LCPI0_0: .long my_struct_inline ...
However, when compiled with -O[s|z}, where the compiler is attempting to generate smaller code, the call to memset will be inlined if the clearing of the my_struct_inline data object can be done with <= 4 store instructions. When compiled in combination with the -mcpu=cortex-m0 option, the call to memset will not be inlined, but it will be implemented with a call to __aeabi_memclr:
%> tiamclang -mcpu=cortex-m0 -Oz -S struct_inline.c %> cat struct_inline.s ... func: push {r7, lr} ldr r0, .LCPI0_0 movs r1, #24 bl __aeabi_memclr4 pop {r7, pc} .p2align 2 .LCPI0_0: .long my_struct_inline
The -mllvm -arm-memset-max-stores=<n> option allows you to control the criteria used by the compiler to decide whether or not to inline a call to the memset or memclr function. If the above example is re-compiled with -mcpu=cortex-m0 -Oz -mllvm -arm-memset-max_stores=6, then the call to memset will get inlined since the clearing of my_struct_inline can be acccomplished on Cortex-M0 with 6 store instructions:
%> tiarmclang -mcpu=cortex-m0 -Oz -mllvm -arm-memset-max-stores=6 -S struct_inline.c %> cat struct_inline.s ... func: ldr r0, .LCPI0_0 movs r1, #0 str r1, [r0] str r1, [r0, #4] str r1, [r0, #8] str r1, [r0, #12] str r1, [r0, #16] str r1, [r0, #20] bx lr .p2align 2 .LCPI0_0: .long my_struct_inline ...
The optimal value for the argument <n> to use with the -mllvm -arm-memset-max-stores=<n> option will vary depending on each particular use-case. Adjusting this value will only be beneficial if you are able to control the limit as needed.
Note
Use Caution When Defining Symbols Inside an asm() Statement
Inlining a function that contains an asm() statement that contains a symbol definition when compiling with the tiarmclang compiler can cause a “symbol multiply defined” error.
Please see Inlining Functions that Contain asm() Statements for more details.
1.3.8.3.4. Loop Unrolling¶
- -funroll-loops, -fno-unroll-loops¶
Enable optimizer to unroll loops. The -fno-unroll-loops option will disable this optimization.