TI Arm Clang Compiler Tools - 2.0.0.STS Release Notes
Table of Contents
Introduction
Version 2.0.0.STS of the TI Arm Clang Compiler Tools, also known as the tiarmclang compiler, is derived from the open source LLVM/Clang source code base and the LLVM Compiler Infrastructure source base that can be found in GitHub (github.com).
The tiarmclang compiler can be used to compile and link C/C++ and assembly source files to build static executable application files that can be loaded and run on an Arm Cortex processor (m0, m0plus, m3, m4, m33, r4, and r5). Please see the Device Support section below for further information about which compiler options to use when building an application for a particular Arm Cortex processor configuration.
Short-Term Support Release
This is a Short–Term Support (STS) release.
For definitions and explanations of STS, LTS, and the versioning number scheme, please see SDTO Compiler Version Numbers.
Documentation
The TI Arm Clang Compiler Tools User’s Guide is now available online at the following URL:
Since the tiarmclang compiler is derived from the LLVM project’s Clang compiler source base, much of the generic Clang online documentation is also applicable to the tiarmclang compiler. The latest version of the generic Clang documentation can be found here:
TI E2E Community - Where to Get Help
Post compiler related questions to the TI E2E design community forum and select the TI device being used.
The following is the top-level webpage for all of TI’s Code Generation Tools.
If submitting a defect report, please attach a scaled-down test case with command-line options and the compiler version number to allow us to reproduce the issue easily.
Defect Tracking Database
Compiler defect reports can be tracked at the new Development Tools bug database, SIR. SIR is a JIRA-based view into all public tools defects. The old SDOWP tracking database will be retired.
A my.ti.com account is required to access this page. To find an issue in SIR, enter your defect id in the top right search box once logged in. Alternatively from the top red navigation bar, select “Issues” then “Search for Issues”.
To find an old SDOWP issue, place the SDOWP ID in the search box and use double quotes around the SDOWP ID.
What’s New
Inter-Module Optimizations via Link-Time Optimization (LTO)
Beginning with the tiarmclang 2.0.0.STS release, support for whole-program optimization via link-time inter-module optimizations is available.
The -flto Option Turns on LTO
The LTO feature can be enabled using the -flto option on the tiarmclang command-line.
Building an LTO-Enabled with tiarmclang from the Command-Line Interface
- Compiling and Linking an Application from a Single tiarmclang Command
If compiling and linking from a single tiarmclang command, the -flto option can be inserted among the other compiler options. A typical tiarmclang command-line that turns on the LTO feature will look like this:
```
%> tiarmclang -mcpu=cortex-m4 -Oz -flto hello.c -o hello.out \
-Wl,-llnk.cmd,-mhello.map
```
- Compiling and Linking an Application in Separate Steps with tiarmclang
If compiling and linking in separate steps, the -flto option should be specified on both the tiarmclang compilation and linking commands, like so:
```
%> tiarmclang -mcpu=cortex-m4 -Oz -flto -c hello.c
%> tiarmclang -mcpu=cortex-m4 -Oz -flto hello.o -o hello.out \
-Wl,-llnk.cmd,-mhello.map
```
Note that when compiling and linking in separate steps, the -flto option must be specified on both tiarmclang commands.
- Linking Directly with a tiarmlnk Command
If you would like to compile using the tiarmclang command and link directly in a separate step with the tiarmlnk command, you will need to specify the -flto option on the tiarmclang compilation command:
```
%> tiarmclang -mcpu=cortex-m4 -Oz -flto -c hello.c
```
And then you will use the –llvm_lto=on and a few other linker options when running the link step of the build with the tiarmlnk command:
```
%> tiarmlnk -I/path/to/installation/lib -I/path/to/linker/command/file \
-o hello.out hello.o -llnk.cmd -mhello.map --llvm_lto=on \
--start-group -llibc++.a -llibc++abi.a -llibc.a -llibsys.a \
-llibsysbm.a -llibclang_rt.builtins.a -llibclang_rt.profile.a \
--end-group \
--cg_opt_level=z
```
As you can see, invoking the tiarmlnk directly requires that all of the runtime libraries be explicitly referenced in the linker command. When the link step is run from the tiarmclang command, the –llvm_lto=on, –start_group/–end_group, and –cg_opt_level linker options are implicitly passed to the linker.
It is recommended that you use the tiarmclang command to perform the compile and link steps of an application build, even when performing the two steps separately.
Building an LTO-Enabled with tiarmclang in a Code Composer Studio Project
A tiarmclang Code Composer Studio (CCS) project that has been imported into or created in a workspace can be built with LTO enabled by inserting the -flto option into both the Build->Arm Compiler and Build->Arm Linker tabs in the Project->Build Settings pop-up dialog box.
For example, given a simple “Hello World!” CCS project as the project in focus in a workspace, you can click on Project->Build Settings to bring up the Properties pop-up dialog box. Assuming that the “TI Clang v2.0.0.STS” compiler has been selected in the General->Compiler version box and other settings besides -flto have been accounted for, then:
Click on Build->Arm Compiler and edit the Command-line pattern contents as follows:
Before: ${command} ${flags} ${output_flag}${output} ${inputs} After: ${command} -flto ${flags} ${output_flag}${output} ${inputs}
Similarly, for the link-step, click on Build->Arm Linker and edit the Command-line pattern contents as follows:
Before: ${command} ${flags} ${output_flag}${output} ${inputs} After: ${command} -flto ${flags} ${output_flag}${output} ${inputs}
As the current versions of CCS do not handle the -flto option in the normal way that compile and linker options are handled in the Properties dialog, the above method of specifying the -flto option for a CCS project serves as a stopgap until improved support for the -flto option is added in an upcoming CCS release.
LTO Development Flow
There are essentially two steps to employing link-time inter-module optimizations in the build of a given application.
Compile as much C/C++ source code as possible with the -flto option.
Compiling a C/C++ source file with the -flto option instructs the compiler to embed an intermediate representation (IR) in the compiler-generated object file that is produced by the compiler. This includes any object files contained in libraries. In fact, all of the runtime libraries that are shipped with the tiarmclang toolchain are built with the -flto option. This allows a given object file from a runtime library to be able to participate in LTO during the link step if LTO is turned on. An object file with embedded IR will be interpreted as a normal object file if LTO is not turned on during the link step.
Turn on the LTO feature during the link of your application
As explained in the above section, LTO can be turned on by specifying the -flto option on the tiarmclang command during compilation and linking, or by specifying the –llvm_lto=on linker option directly to the linker on the tiarmlnk command.
When LTO is turned on during the link, the linker will:
a. Extract the embedded IR content from each input object file that contains embedded IR to create a source IR module. This also applies to object files that are pulled in from object libraries to resolve references to undefined symbols. b. The source IR modules are linked together into a combined IR module. c. The combined IR module is presented to the compiler to “re-compile” the program with inter-module optimizations enabled.d. The resulting object file from the “re-compile” is linked with all other input object files that do not contain embedded IR to produce the linked output file.
Benefits of Using LTO - Enabling Inter-Module Optimizations
Let’s consider a simple example application to demonstrate just one of the potential benefits of using LTO to enable inter-module optimization …
Consider a series of source files in which many of the same string constants are referenced repeatedly and across multiple source files.
If we compile and link without LTO turned on:
```
%> tiarmclang -mcpu=cortex-m4 -Oz constant_merge_test.c \
ic_s10.c ic_s20.c ic_s30.c ic_s40.c s10.c s20.c s30.c s40.c \
-o no_lto.out -Wl,-llnk.cmd,-mno_lto.map
```
The map file reveals that the size of the .rodata section where all of the string constants are defined is reasonably large:
no_lto.map:
```
...
SEGMENT ALLOCATION MAP
run origin load origin length init length attrs members
---------- ----------- ---------- ----------- ----- -------
00000020 00000020 00007a4c 00007a4c r-x
00000020 00000020 00004ad2 00004ad2 r-- .rodata
...
...
```
But if we then compile with LTO enabled:
```
%> tiarmclang -mcpu=cortex-m4 -flto -Oz constant_merge_test.c \
ic_s10.c ic_s20.c ic_s30.c ic_s40.c s10.c s20.c s30.c s40.c \
-o with_lto.out -Wl,-llnk.cmd,-mwith_lto.map
```
Then the map file shows that the .rodata is significantly smaller in the LTO enabled build:
```
...
SEGMENT ALLOCATION MAP
run origin load origin length init length attrs members
---------- ----------- ---------- ----------- ----- -------
00000020 00000020 00005b84 00005b84 r-x
...
00004530 00004530 00001674 00001674 r-- .rodata
...
```
The use of LTO in this example enables the compiler to perform an inter-module constant merging optimization that results in a savings of 0x4ad2 - 0x1674 -> 0x345e (13406) bytes in the .rodata section. Note that in this example, the savings in the size of the .rodata section is offset somewhat by increased code size in other sections like .text. The net savings is 0x7a4c - 0x5b84 -> 0x1ec8 (7880) bytes.
Improved Compiler Generated Debug Information to Enable Use of CCS Stack Usage View
The tiarmclang 2.0.0.STS compiler will emit estimated stack usage debug information for all functions, including functions defined in runtime libraries, to enable the use of the Stack Usage View in Code Composer Studio (CCS). Additionally, functions in the runtime libraries that are sourced in assembly language have been annotated with assembly directives to supply estimated stack usage information for those functions.
Recently Fixed Issues
CODEGEN-6288 : tiarmclang optimizer removes empty loops that don’t have side effects
In tiarmclang releases prior to 1.2.1.STS, the optimizer would remove an empty while loop that contained no side effects. If a function contained only such a loop, then the optimizer would remove references to the function from other functions in the same compilation unit even if the function were annotated with an optnone function attribute.
In tiarmclang releases starting with 1.2.1.STS, you can now mark a function containing an empty loop with no side effects with an optnone function attribute and references to the function will not be removed.
Alternatively, you can specify an asm() statement inside the body of the empty loop to create a side effect that will prevent the loop from being removed. For example:
while (1) {
__asm(" ");
}
Host Support / Dependencies
The following host-specific versions of the 2.0.0.STS tiarmclang compiler are available:
- Linux: Ubuntu, RHEL 7
- Windows: 7, 8, 10
- Mac: OSX
Device Support
The tiarmclang compiler supports development of applications that are to be loaded and run on one of the following Arm Cortex processor variants:
ARM Processor Variant | Options |
---|---|
Cortex-M0 | “-mcpu=cortex-m0” |
Cortex-M0+ | “-mcpu=cortex-m0plus” |
Cortex-M3 | “-mcpu=cortex-m3” |
Cortex-M4 without FPv4SPD16 | “-mcpu=cortex-m4 -mfloat-abi=soft” |
Cortex-M4 with FPv4SPD16 | “-mcpu=cortex-m0 -mfloat-abi=hard -mfpu=fpv4-sp-d16” |
Cortex-M33 without FPv5SPD16 | “-mcpu=cortex-m33 -mfloat-abi=soft” |
Cortex-M33 with FPv5SPD16 | “-mcpu=cortex-m33 -mfloat-abi=hard -mfpu=fpv5-sp-d16” |
Cortex-R4 (Thumb) without VFPv3D16 | “-mcpu=cortex-r4 -mthumb -mfloat-abi=soft” |
Cortex-R4 (Thumb) with VFPv3D16 | “-mcpu=cortex-r4 -mthumb -mfloat-abi=hard -mfpu=vfpv3-d16” |
Cortex-R4 without VFPv3D16 | “-mcpu=cortex-r4 -mfloat-abi=soft” |
Cortex-R4 with VFPv3D16 | “-mcpu=cortex-r4 -mfloat-abi=hard -mfpu=vfpv3-d16” |
Cortex-R5 (Thumb) without VFPv3D16 | “-mcpu=cortex-r5 -mthumb -mfloat-abi=soft” |
Cortex-R5 (Thumb) with VFPv3D16 | “-mcpu=cortex-r5 -mthumb -mfloat-abi=hard -mfpu=vfpv3-d16” |
Cortex-R5 without VFPv3D16 | “-mcpu=cortex-r5 -mfloat-abi=soft” |
Cortex-R5 with VFPv3D16 | “-mcpu=cortex-r5 -mfloat-abi=hard -mfpu=vfpv3-d16” |
Resolved Defects
ID | Summary |
---|---|
CODEGEN-9669 | TI Arm Clang mismatch between source code and debugger view with function subsections |
CODEGEN-9092 | tiarmclang mistakenly documents support for -fpic position independent code |
CODEGEN-8914 | _enable_IRQ in ti_compatibility.h only supports Cortex-M devices |
CODEGEN-8899 | tiarmlnk generates cinit record for tiny .init_array section |
CODEGEN-8887 | Compiler does not support linking code that uses C++ exceptions |
CODEGEN-8639 | tiarmar.exe is denied permission to create an archive file on Windows 7 |
CODEGEN-8533 | Use of virtual functions causes many RTS print functions to be linked into the program |
CODEGEN-8255 | tiarmclang: zero-initialized static and global variables are being defined in .bss |
CODEGEN-6288 | tiarmclang: optimizer removes empty loops that don't have side effects |
Known Defects
The up-to-date known defects in v2.0.0.STS can be found here (dynamically generated):
End Of File