2.5. Controlling Optimization

The following command-line options control optimization behavior.

armcl Option (and alias)

tiarmclang Option

--opt_level=<off|0|1|2|3> (-O<off|0|1|2|3>

-O<0|1|2|3|fast|s|z|g>

--opt_level=4 (-O4)

-flto -O<1|2|3|fast|s|z>

The armcl compiler supports 6 levels of optimization beginning with --opt_level=off, or “no optimization,” up to -O4, which enables armcl’s link time program optimization capability.

The tiarmclang compiler supports a variety of different optimization options, including:

  • -O0 - no optimization; generates code that is debug-friendly

  • -O1 - restricted optimizations, providing a good trade-off between code size and debuggability

  • -O2 or -O - most optimizations enabled with an eye towards preserving a reasonable compile-time

  • -O3 - in addition to optimizations available at -O2, -O3 enables optimizations that take longer to perform, trading an increase in compile-time for potential performance improvements

  • -Ofast - enables all optimizations from -O3 along with other aggressive optimizations that may realize additional performance gains, but also may violate strict compliance with language standards

  • -Os - enables all optimizations from -O2 plus some additional optimizations intended to reduce code size while mitigating negative effects on performance

  • -Oz - enables all optimizations from -Os plus additional optimizations to further reduce code size with the risk of sacrificing performance

  • -Og - enables most optimizations from -O1, but may disable some optimizations to improve debuggability.

The armcl -O4 option will map to a combination of the tiarmclang -flto option and one of the tiarmclang optimization level options. The tiarmclang -flto option enables link-time optimization in the tiarmclang linker. Note that the tiarmclang -O0 and -Og options are not allowed in combination with the tiarmclang -flto option.

It is reasonable to expect that the ability to debug a program becomes more challenging at higher levels of optimization. This is especially true when tiarmclang’s link-time optimization is enabled via the -flto option.

armcl Option (and alias)

tiarmclang Option

--opt_for_speed=<0|1|2|3|4|5> (-mf=<0|1|2|3|4|5>)

-O<z|s|3|fast>

--opt_level=4 --opt_for_speed=<0|1|2|3|4|5> (-mf=<0|1|2|3|4|5>)

-flto -O<z|s|3|fast>

The armcl compiler supports an --opt_for_speed option, which allows you to select a code size versus performance “trade-off” level, n, which informs the compiler about how aggressive it can be when optimizing for improved performance at the risk of increasing code size. The available values for n range from 0, which favors optimizations geared towards reducing code size with a high risk of degrading performance, to 5, which favors optimizations intended to improve performance with a high risk of increasing code size.

Some of the tiarmclang compiler’s optimization options roughly correspond to the intended code size vs. performance trade-off that is embodied in the use of armcl’s --opt_for_speed and --opt_level options. The following is an approximate mapping:

  • -Oz - resembles using armcl’s --opt_for_speed=0-1 in combination with --opt_level=2-3 since it favors code size reducing optimizations even if performance is degraded.

  • -Os - resembles using armcl’s --opt_for_speed=2-3 in combination with --opt_level=2-3 since it favors code size reducing optimizations, but tries to preserve performance while doing so.

  • -O3 - resembles using armcl’s --opt_for_speed=3-4 in combination with --opt_level=2-3 since it favors optimizations intended for improving performance, but tries to avoid increases in code size while doing so.

  • -Ofast - resembles using armcl’s --opt_for_speed=4-5 in combination with --opt_level=2-3 since it favors optimizations intended for improving performance even if code size increases (caution: the use of -Ofast may violate strict compliance with language standards).

If link-time optimization is enabled via armcl’s -O4 optimization level, this will map to the tiarmclang -flto option to enable link- time optimization in the tiarmclang linker. The armcl --opt_for_speed option will then map to one of four tiarmclang option combinations as follows:

  • -flto -Oz - resembles using armcl’s --opt_for_speed=0-1 in combination with --opt_level=4, favoring code size reducing inter-module optimizations even if performance is degraded.

  • -flto -Os - resembles using armcl’s --opt_for_speed=2-3 in combination with --opt_level=4, favoring code size reducing inter-module optimizations while trying to preserve performance.

  • -flto -O3 - resembles using armcl’s --opt_for_speed=3-4 in combination with --opt_level=4, favoring performance improving inter-module optimizations while trying to avoid significant increases in code size.

  • -flto -Ofast - resembles using armcl’s --opt_for_speed=4-5 in combination with --opt_level=4, favoring performance improving inter-module optimizations even if code size is significantly increased. Note that the use of the tiarmclang -Ofast option may violate strict conformance with language standards.

armcl Option

tiarmclang Option

--sat_reassoc=off (default) --sat_reassoc=on

not supported

The armcl compiler provides a --sat_reassoc option to enable or disable reassociation of saturating arithmetic. It is off by default.

The tiarmclang compiler does not support an analogous option.

armcl Option (and alias)

tiarmclang Option

--auto_inline=<size> (-oi<size>

-finline-limit=<size>

The armcl compiler provides the --auto_inline option, which, when used in combination with --opt_level=3, allows you to specify a size threshold for automatic inlining of functions that are not explicitly declared as “inline.”

The tiarmclang compiler supports an analogous option, -finline-limit, which allows you to specify a size threshold for functions that can be inlined, where <size> is the number of pseudo instructions.

The tiarmclang compiler also supports the always_inline (“__attribute__((always_inline))”) and noinline ( “__attribute__((noinline))”) function attributes that provide a means for you to control inlining on a function-specific basis. The tiarmclang compiler’s -fno-inline-functions option can be used to disable all inlining.

armcl Option (and alias)

tiarmclang Option

--call_assumptions=<n> (-op<n>)

not supported

The armcl compiler provides the --call_assumptions option, which, when used in combination with --program_level_compile and --opt_level=3, allows you to provide additional information to the compiler about whether the functions defined in a given module are called from other modules and whether global variable definitions in a given module are referenced from other modules.

armcl Option (and alias)

tiarmclang Option

--gen_opt_info=<0|1|2> (-on=<0|1|2>

-fsave-optimization-record -foptimization-record-file=<filename> -Rpass=<expr> -Rpass-missed=<expr> -Rpass-analysis=<expr>

The armcl compiler provides the --gen_opt_info option, which, when used in combination with --opt_level=3, causes the compiler to emit a human-readable optimization information file. The higher the value of the argument specified, the more verbose the optimization information provided will be.

The tiarmclang compiler does not provide an option that matches the exact behavior of armcl’s --gen_opt_info, but tiarmclang reports optimization information via the following available options:

  • -fsave-optimization-record - writes optimization remarks to a YAML file

  • -foptimization-record-file - identifies the name of the YAML file written when using the -fsave-optimization-record option

  • -Rpass - given a regular expression string argument to identify the optimization pass(es) that you want information about, the -Rpass option writes informative remarks to stdout during compilation about when a specified optimization pass makes a transformation

  • -Rpass-missed - given a regular expression string argument to identify the optimization pass(es) that you want information about, the -Rpass-missed option writes informative remarks to stdout during compilation about when a specified optimization pass fails to make a transformation

  • -Rpass-analysis - given a regular expression string argument to identify the optimization pass(es) that you want information about, the -Rpass-analysis option writes informative remarks to stdout during compilation about why a specified optimization pass does or doesn’t perform a transformation

armcl Option (and alias)

tiarmclang Option

--optimizer_interlist (-os)

not supported

The armcl compiler provides the --optimizer_interlist option, which tells the compiler to keep an compiler-generated intermediate assembly source file that is annotated with interlisted comments corresponding C/C++ source code optimizations to the assembly code generated by the compiler.

The tiarmclang compiler does not provide an analogous option. However, you can use tiarmclang’s -Rpass, -Rpass-missed, and -Rpass-analysis options to gain more insight into which optimizations were performed and potential optimizations that were ruled out during compilation.

armcl Option (and alias)

tiarmclang Option

--program_level_compile (-pm)

-flto

The armcl compiler’s --program_level_compile option combines source files into a single compilation unit to enable the compiler’s program-level optimizations.

The tiarmclang -flto option enables inter-module optimizations via link-time optimization. The tiarmclang -flto option can be combined with the tiarmclang -O<1|2|3|fast|s|z> optimization level option to instruct the compiler whether to prioritize improving performance over reducing code size or vice versa.

armcl Option (and alias)

tiarmclang Option

--aliased_variables (-ma)

not supported

The armcl compiler’s -aliased_variables option instructs the compiler to assume that called functions are capable of creating hidden aliases. As a result, the compiler must assume worst-case aliasing. For example, the optimizer cannot assume that it knows the value stored in a local object if that local object might be accessed via a separate pointer.

The tiarmclang compiler does not provide an analogous option. However, tiarmclang’s -fstrict-aliasing and -fno-strict-aliasing options can be used to enable or disable optimizations based on type based alias analysis, but they don’t allow the compiler to violate the aliasing rules of C. Some aliasing behavior can also be controlled via tiarmclang’s optimization options.