TI Arm Clang Compiler Tools - 2.0.0.STS Release Notes

Table of Contents

Introduction

Version 2.0.0.STS of the TI Arm Clang Compiler Tools, also known as the tiarmclang compiler, is derived from the open source LLVM/Clang source code base and the LLVM Compiler Infrastructure source base that can be found in GitHub (github.com).

The tiarmclang compiler can be used to compile and link C/C++ and assembly source files to build static executable application files that can be loaded and run on an Arm Cortex processor (m0, m0plus, m3, m4, m33, r4, and r5). Please see the Device Support section below for further information about which compiler options to use when building an application for a particular Arm Cortex processor configuration.

Short-Term Support Release

This is a Short–Term Support (STS) release.

For definitions and explanations of STS, LTS, and the versioning number scheme, please see SDTO Compiler Version Numbers.

Documentation

The TI Arm Clang Compiler Tools User’s Guide is now available online at the following URL:

Since the tiarmclang compiler is derived from the LLVM project’s Clang compiler source base, much of the generic Clang online documentation is also applicable to the tiarmclang compiler. The latest version of the generic Clang documentation can be found here:

TI E2E Community - Where to Get Help

Post compiler related questions to the TI E2E design community forum and select the TI device being used.

The following is the top-level webpage for all of TI’s Code Generation Tools.

If submitting a defect report, please attach a scaled-down test case with command-line options and the compiler version number to allow us to reproduce the issue easily.

Defect Tracking Database

Compiler defect reports can be tracked at the new Development Tools bug database, SIR. SIR is a JIRA-based view into all public tools defects. The old SDOWP tracking database will be retired.

A my.ti.com account is required to access this page. To find an issue in SIR, enter your defect id in the top right search box once logged in. Alternatively from the top red navigation bar, select “Issues” then “Search for Issues”.

To find an old SDOWP issue, place the SDOWP ID in the search box and use double quotes around the SDOWP ID.

What’s New

Beginning with the tiarmclang 2.0.0.STS release, support for whole-program optimization via link-time inter-module optimizations is available.

The -flto Option Turns on LTO

The LTO feature can be enabled using the -flto option on the tiarmclang command-line.

Building an LTO-Enabled with tiarmclang from the Command-Line Interface

If compiling and linking from a single tiarmclang command, the -flto option can be inserted among the other compiler options. A typical tiarmclang command-line that turns on the LTO feature will look like this:

```
%> tiarmclang -mcpu=cortex-m4 -Oz -flto hello.c -o hello.out \
       -Wl,-llnk.cmd,-mhello.map
```

If compiling and linking in separate steps, the -flto option should be specified on both the tiarmclang compilation and linking commands, like so:

```
%> tiarmclang -mcpu=cortex-m4 -Oz -flto -c hello.c
%> tiarmclang -mcpu=cortex-m4 -Oz -flto hello.o -o hello.out \
       -Wl,-llnk.cmd,-mhello.map
```

Note that when compiling and linking in separate steps, the -flto option must be specified on both tiarmclang commands.

If you would like to compile using the tiarmclang command and link directly in a separate step with the tiarmlnk command, you will need to specify the -flto option on the tiarmclang compilation command:

```
%> tiarmclang -mcpu=cortex-m4 -Oz -flto -c hello.c
```

And then you will use the –llvm_lto=on and a few other linker options when running the link step of the build with the tiarmlnk command:

```
%> tiarmlnk -I/path/to/installation/lib -I/path/to/linker/command/file \
       -o hello.out hello.o -llnk.cmd -mhello.map --llvm_lto=on \
       --start-group -llibc++.a -llibc++abi.a -llibc.a -llibsys.a \
         -llibsysbm.a -llibclang_rt.builtins.a -llibclang_rt.profile.a \
         --end-group \
       --cg_opt_level=z
```

As you can see, invoking the tiarmlnk directly requires that all of the runtime libraries be explicitly referenced in the linker command. When the link step is run from the tiarmclang command, the –llvm_lto=on, –start_group/–end_group, and –cg_opt_level linker options are implicitly passed to the linker.

It is recommended that you use the tiarmclang command to perform the compile and link steps of an application build, even when performing the two steps separately.

Building an LTO-Enabled with tiarmclang in a Code Composer Studio Project

A tiarmclang Code Composer Studio (CCS) project that has been imported into or created in a workspace can be built with LTO enabled by inserting the -flto option into both the Build->Arm Compiler and Build->Arm Linker tabs in the Project->Build Settings pop-up dialog box.

For example, given a simple “Hello World!” CCS project as the project in focus in a workspace, you can click on Project->Build Settings to bring up the Properties pop-up dialog box. Assuming that the “TI Clang v2.0.0.STS” compiler has been selected in the General->Compiler version box and other settings besides -flto have been accounted for, then:

  1. Click on Build->Arm Compiler and edit the Command-line pattern contents as follows:

    Before:  ${command} ${flags} ${output_flag}${output} ${inputs}
    After:   ${command} -flto ${flags} ${output_flag}${output} ${inputs}
  2. Similarly, for the link-step, click on Build->Arm Linker and edit the Command-line pattern contents as follows:

    Before:  ${command} ${flags} ${output_flag}${output} ${inputs}
    After:   ${command} -flto ${flags} ${output_flag}${output} ${inputs}

As the current versions of CCS do not handle the -flto option in the normal way that compile and linker options are handled in the Properties dialog, the above method of specifying the -flto option for a CCS project serves as a stopgap until improved support for the -flto option is added in an upcoming CCS release.

LTO Development Flow

There are essentially two steps to employing link-time inter-module optimizations in the build of a given application.

  1. Compile as much C/C++ source code as possible with the -flto option.

    Compiling a C/C++ source file with the -flto option instructs the compiler to embed an intermediate representation (IR) in the compiler-generated object file that is produced by the compiler. This includes any object files contained in libraries. In fact, all of the runtime libraries that are shipped with the tiarmclang toolchain are built with the -flto option. This allows a given object file from a runtime library to be able to participate in LTO during the link step if LTO is turned on. An object file with embedded IR will be interpreted as a normal object file if LTO is not turned on during the link step.

  2. Turn on the LTO feature during the link of your application

    As explained in the above section, LTO can be turned on by specifying the -flto option on the tiarmclang command during compilation and linking, or by specifying the –llvm_lto=on linker option directly to the linker on the tiarmlnk command.

    When LTO is turned on during the link, the linker will:

    a. Extract the embedded IR content from each input object file that contains embedded IR to create a source IR module. This also applies to object files that are pulled in from object libraries to resolve references to undefined symbols. b. The source IR modules are linked together into a combined IR module. c. The combined IR module is presented to the compiler to “re-compile” the program with inter-module optimizations enabled.

    d. The resulting object file from the “re-compile” is linked with all other input object files that do not contain embedded IR to produce the linked output file.

Benefits of Using LTO - Enabling Inter-Module Optimizations

Let’s consider a simple example application to demonstrate just one of the potential benefits of using LTO to enable inter-module optimization …

Consider a series of source files in which many of the same string constants are referenced repeatedly and across multiple source files.

If we compile and link without LTO turned on:

```
%> tiarmclang -mcpu=cortex-m4 -Oz constant_merge_test.c \
       ic_s10.c ic_s20.c ic_s30.c ic_s40.c s10.c s20.c s30.c s40.c \
       -o no_lto.out -Wl,-llnk.cmd,-mno_lto.map
```

The map file reveals that the size of the .rodata section where all of the string constants are defined is reasonably large:

no_lto.map:

```
...
SEGMENT ALLOCATION MAP

run origin  load origin   length   init length attrs members
----------  ----------- ---------- ----------- ----- -------
00000020    00000020    00007a4c   00007a4c    r-x
  00000020    00000020    00004ad2   00004ad2    r-- .rodata
  ...
...
```

But if we then compile with LTO enabled:

```
%> tiarmclang -mcpu=cortex-m4 -flto -Oz constant_merge_test.c \
       ic_s10.c ic_s20.c ic_s30.c ic_s40.c s10.c s20.c s30.c s40.c \
       -o with_lto.out -Wl,-llnk.cmd,-mwith_lto.map
```

Then the map file shows that the .rodata is significantly smaller in the LTO enabled build:

```
...
SEGMENT ALLOCATION MAP

run origin  load origin   length   init length attrs members
----------  ----------- ---------- ----------- ----- -------
00000020    00000020    00005b84   00005b84    r-x
  ...
  00004530    00004530    00001674   00001674    r-- .rodata
...
```

The use of LTO in this example enables the compiler to perform an inter-module constant merging optimization that results in a savings of 0x4ad2 - 0x1674 -> 0x345e (13406) bytes in the .rodata section. Note that in this example, the savings in the size of the .rodata section is offset somewhat by increased code size in other sections like .text. The net savings is 0x7a4c - 0x5b84 -> 0x1ec8 (7880) bytes.

Improved Compiler Generated Debug Information to Enable Use of CCS Stack Usage View

The tiarmclang 2.0.0.STS compiler will emit estimated stack usage debug information for all functions, including functions defined in runtime libraries, to enable the use of the Stack Usage View in Code Composer Studio (CCS). Additionally, functions in the runtime libraries that are sourced in assembly language have been annotated with assembly directives to supply estimated stack usage information for those functions.

Recently Fixed Issues

CODEGEN-6288 : tiarmclang optimizer removes empty loops that don’t have side effects

In tiarmclang releases prior to 1.2.1.STS, the optimizer would remove an empty while loop that contained no side effects. If a function contained only such a loop, then the optimizer would remove references to the function from other functions in the same compilation unit even if the function were annotated with an optnone function attribute.

In tiarmclang releases starting with 1.2.1.STS, you can now mark a function containing an empty loop with no side effects with an optnone function attribute and references to the function will not be removed.

Alternatively, you can specify an asm() statement inside the body of the empty loop to create a side effect that will prevent the loop from being removed. For example:

  while (1) {
    __asm(" ");
  }

Host Support / Dependencies

The following host-specific versions of the 2.0.0.STS tiarmclang compiler are available:

Device Support

The tiarmclang compiler supports development of applications that are to be loaded and run on one of the following Arm Cortex processor variants:

ARM Processor Variant Options
Cortex-M0 “-mcpu=cortex-m0”
Cortex-M0+ “-mcpu=cortex-m0plus”
Cortex-M3 “-mcpu=cortex-m3”
Cortex-M4 without FPv4SPD16 “-mcpu=cortex-m4 -mfloat-abi=soft”
Cortex-M4 with FPv4SPD16 “-mcpu=cortex-m0 -mfloat-abi=hard -mfpu=fpv4-sp-d16”
Cortex-M33 without FPv5SPD16 “-mcpu=cortex-m33 -mfloat-abi=soft”
Cortex-M33 with FPv5SPD16 “-mcpu=cortex-m33 -mfloat-abi=hard -mfpu=fpv5-sp-d16”
Cortex-R4 (Thumb) without VFPv3D16 “-mcpu=cortex-r4 -mthumb -mfloat-abi=soft”
Cortex-R4 (Thumb) with VFPv3D16 “-mcpu=cortex-r4 -mthumb -mfloat-abi=hard -mfpu=vfpv3-d16”
Cortex-R4 without VFPv3D16 “-mcpu=cortex-r4 -mfloat-abi=soft”
Cortex-R4 with VFPv3D16 “-mcpu=cortex-r4 -mfloat-abi=hard -mfpu=vfpv3-d16”
Cortex-R5 (Thumb) without VFPv3D16 “-mcpu=cortex-r5 -mthumb -mfloat-abi=soft”
Cortex-R5 (Thumb) with VFPv3D16 “-mcpu=cortex-r5 -mthumb -mfloat-abi=hard -mfpu=vfpv3-d16”
Cortex-R5 without VFPv3D16 “-mcpu=cortex-r5 -mfloat-abi=soft”
Cortex-R5 with VFPv3D16 “-mcpu=cortex-r5 -mfloat-abi=hard -mfpu=vfpv3-d16”

Resolved Defects

ID Summary
CODEGEN-9669 TI Arm Clang mismatch between source code and debugger view with function subsections
CODEGEN-9092 tiarmclang mistakenly documents support for -fpic position independent code
CODEGEN-8914 _enable_IRQ in ti_compatibility.h only supports Cortex-M devices
CODEGEN-8899 tiarmlnk generates cinit record for tiny .init_array section
CODEGEN-8887 Compiler does not support linking code that uses C++ exceptions
CODEGEN-8639 tiarmar.exe is denied permission to create an archive file on Windows 7
CODEGEN-8533 Use of virtual functions causes many RTS print functions to be linked into the program
CODEGEN-8255 tiarmclang: zero-initialized static and global variables are being defined in .bss
CODEGEN-6288 tiarmclang: optimizer removes empty loops that don't have side effects

Known Defects

The up-to-date known defects in v2.0.0.STS can be found here (dynamically generated):

Known Defects in v2.0.0.STS

End Of File