TI Arm Clang Compiler Tools - 3.1.0.STS Release Notes
Table of Contents
- Introduction
- Short-Term Support Release
- Documentation
- TI E2E Community - Where to Get Help
- Defect Tracking Database
- What’s New
- Support for ELF Segment “Blocking” (Alignment and Padding)
- OpTI-Flash Smart Placement and Smart Layout
- Support for C++17
- Support for Generating TI-TXT Hex Format from tiarmobjcopy
- ALPHA LEVEL: Support for Static Shared Objects (SSO)
- Support for C++ Exceptions (-fexceptions)
- Enable Compiler Generation of Execute-Only Code for Cortex-M0/M0+ Functions (-mexecute-only)
- Enable Linker Generation of an XML Function Hash Table for OpTI-Flash/OpTI-SHARE (–gen_xml_func_hash)
- Enable Use of Custom Datapath Extension (CDE) Intrinsics on Cortex-M33
- Cortex-M4 and Cortex-R5 Performance Improvements
- Host Support / Dependencies
- Device Support
- Resolved Defects
- Known Defects
Introduction
Version 3.1.0.STS of the TI Arm Clang Compiler Tools, also known as the tiarmclang compiler, is derived from the open source LLVM/Clang source code base and the LLVM Compiler Infrastructure source base that can be found in GitHub (github.com).
The tiarmclang compiler can be used to compile and link C/C++ and assembly source files to build static executable application files that can be loaded and run on an Arm Cortex processor (m0, m0plus, m3, m4, m33, r4, and r5). Please see the Device Support section below for further information about which compiler options to use when building an application for a particular Arm Cortex processor configuration.
Short-Term Support Release
This is a Short-Term Support (STS) release.
For definitions and explanations of STS, LTS, and the versioning number scheme, please see SDTO Compiler Version Numbers.
Documentation
The TI Arm Clang Compiler Tools User’s Guide is now available online at the following URL:
Since the tiarmclang compiler is derived from the LLVM project’s Clang compiler source base, much of the generic Clang online documentation is also applicable to the tiarmclang compiler. The latest version of the generic Clang documentation can be found here:
TI E2E Community - Where to Get Help
Post compiler related questions to the TI E2E design community forum and select the TI device being used.
The following is the top-level webpage for all of TI’s Code Generation Tools.
If submitting a defect report, please attach a scaled-down test case with command-line options and the compiler version number to allow us to reproduce the issue easily.
Defect Tracking Database
Compiler defect reports can be tracked at the new Development Tools bug database, SIR. SIR is a JIRA-based view into all public tools defects. The old SDOWP tracking database will be retired.
A my.ti.com account is required to access this page. To find an issue in SIR, enter your defect id in the top right search box once logged in. Alternatively, from the top red navigation bar, select “Issues” then “Search for Issues”.
To find an old SDOWP issue, place the SDOWP ID in the search box and use double quotes around the SDOWP ID.
What’s New
Support for ELF Segment “Blocking” (Alignment and Padding)
Users may tell the linker to align and pad initialized ELF segments (aka program headers, collections of sections) based on a power-of-two byte boundary using the “–block_init_segments=” linker option. For example, using “–block_init_segments=16” would force these segments to be aligned and padded on a 16byte boundary. This is useful for when an entire segment has to be encrypted with 128bit-aligned blocks.
OpTI-Flash Smart Placement and Smart Layout
The tiarmclang 3.1.0.STS compiler release includes features to support OpTI-Flash. OpTI-Flash enables users to more easily place heavily accessed functions and data objects and also utilize new hardware features available on the AM263P device. The OpTI-Flash tooling support provided by the STS toolchain provide a set of building blocks and can be understood as three categories of functionality:
Smart Placement: Mark critical functions to place in SRAM/TCM
- This requires no special hardware to use, although the RL2 “flash cache” available on the AM263P is useful.
Smart Layout: Place functions in flash and use a fast runtime copy to SRAM for execution (Relies on FLC)
- This relies on the Fast Local Copy (FLC) hardware available on the AM263P.
- It is also possible to order input sections in linear, execution order (i.e. “preorder”) by placing functions in a Smart Layout region or by using the PREORDERED() operator on any output section.
OpTI-Share: Identify shared functions across multiple cores; place in shared memory along with referenced data in a static-shared object (SSO) that is loaded independently of the cores.
- This is available as an alpha-quality feature in the STS. Full use requires use of the Region Address Translation (RAT) hardware on the AM263P, although if read/write sections are not shared in the static-shared object, it can be used with any device.
C/C++ Source-level function attributes:
Smart Placement:
```
__attribute__(({local,onchip,offchip}(priority))) (Corresponding to TCMx, SRAM, FLASH)
```
Smart Layout:
```
__attribute__((fast_local_copy(flcregionid))) (present region limit: 4)
```
Example:
```
__attribute__((local(1))) void func0(void) { .. } // Place in TCM with priority 1
__attribute__((local(2))) void func1(void) { .. } // Place in TCM with priority 2
__attribute__((onchip)) void func2(void) { .. } // Place in SRAM with implied priority 1
__attribute__((fast_local_copy(1))) void func3(void) { .. } // Place in FLC Region 1
```
The attributes can be added to a function definition or a function declaration (if that function is called/referenced in the same compilation unit).
Assembly metainfo directives
Functions can also be annotated by adding an assembly metainfo directive in an assembly file that is compiled and linked with the project using the following format. This would allow users to avoid having to compile other source code.
Smart Placement
```
.global <global function symbol>
.sym_meta_info <global function symbol>, "of_placement", {"local","onchip","offchip"}, <priority>
```
Smart Layout
```
.global <global function symbol>
.sym_meta_info <global function symbol>, "of_placement", "fast_local_copy", <regionid>
```
e.g.
```
.global strcmp
.sym_meta_info strcmp, "of_placement", "local", 1
.global memcpy
.sym_meta_info strcmp, "of_placement", "memcpy", 1
```
A simple node.js script called “generate_syms.js” is also included that generates an assembly file based on a CSV text file of the following format:
```
strcmp,local,1
main,onchip,3
memcpy,fast_local_copy,1
```
Smart Placement Linker Aggregation
With the placement described above, the TI link-step will aggregate function and data input sections into documented output sections while also sorting the input sections. For Smart Placement, the input sections are sorted based on the designated priority. The documented output sections for Smart Placement are:
- .TI.local: Code and initialized data designated for local memory (TCMs)
- .TI.bss.local: Uninitialized data designated for local memory
- .TI.onchip: Code and initialized data designated for onchip memory (RAM)
- .TI.bss.onchip: Uninitialized data designated for onchip memory
- .TI.offchip: Code designated for offchip memory (FLASH)
Note: that data objects placed in .TI.local or .TI.onchip are always directly initialized according to RAM-model initialization. This means that whatever is responsible for loading that code and data into RAM or TCM will also initialize the data, even if ROM-model auto-initialization is used. This means that in ROM-model, CINIT records are not created for this data.
When ROM-model auto-initialization is enabled, zero-initialization CINIT records will be created for the uninitialized memory regions .TI.bss.local and .TI.bss.onchip. When RAM-model initialization is used, it is up to the user to zero-initialized these sections. The linker will export symbols that an initialization routine can link against designated the start and end of these sections:
- .TI.bss.local: __start___TI_bss_local and __stop___TI_bss_local
- .TI.bss.onchip: __start___TI_bss_onchip and __stop___TI_bss_onchip
Note that because symbols are defined for these sections, they cannot be split between multiple memory regions.
A default linker command file needs to place the documented output sections in the corresponding memory regions in both development and deployment flows. This could be autogenerated by sysconfig based on the memory partition or linked using generic macros. For a development flow, this is pretty straightforward, as in the following example.
```
/* Partitioned memory map */
MEMORY
{
R5F_VECS : ORIGIN = 0x00000000 , LENGTH = 0x00000040
R5F_TCMA : ORIGIN = 0x00000040 , LENGTH = 0x00007FC0
R5F_TCMB : ORIGIN = 0x41010000 , LENGTH = 0x00008000
MSRAM : ORIGIN = 0x70080000 , LENGTH = 0x40000
FLASH : ORIGIN = 0x60100000 , LENGTH = 0x80000
}
SECTIONS
{
/* "local" --> split between TCMs and SRAM */
/* "onchip" --> split between SRAM and FLASH */
/* "offchip" --> FLASH */
.TI.local : {} >> R5F_TCMA | R5F_TCMB | MSRAM
.TI.onchip : {} >> MSRAM | FLASH
.TI.offchip : {} > FLASH
.TI.local.bss : {} > R5F_TCMB; /* Exports symbols __start___TI.bss.local, __stop___TI.bss.local */
.TI.onchip.bss: {} > MSRAM; /* Exports symbols __start___TI.bss.onchip, __stop___TI.bss.onchip */
}
```
By default, section splitting should be used as shown above between memory regions to get the full effect of function prioritization.
Enable Smart Data Collection for Smart Placement
When the “–smart_data_collect” linker option is enabled, the TI link-step will not only include explicitly annotated function and data objects into the appropriate documented output sections, it will also pull in referenced initialized data sections and assign them the same priority as the object that references them. For objects placed in .TI.local, referenced read-write and read-only (constant) data sections are pulled in. For objects placed in .TI.onchip, only referenced read-only (constant) data sections are pulled in. Nothing happens for objects placed in .TI.offchip.
Smart Layout Linker Aggregation
For Smart Layout, the input sections are sorted based on linear execution order (function PREORDER) based on a call graph.
The documented output sections for Smart Placement are:
- .TI.flc.region1: Code designated Fast Local Copy (FLC) Region #1
- .TI.flc.region2: Code designated Fast Local Copy (FLC) Region #2
- .TI.flc.region3: Code designated Fast Local Copy (FLC) Region #3
- .TI.flc.region4: Code designated Fast Local Copy (FLC) Region #4
A default linker command file needs to be provided for the customer that will place the output sections in FLASH.
```
SECTIONS
{
.TI.flc.region1 : { } > FLASH
.TI.flc.region2 : { } > FLASH
.TI.flc.region3 : { } > FLASH
.TI.flc.region4 : { } > FLASH
}
```
The linker will automatically generate start/stop symbols for each region so that the FLC can be programmed:
```
__start___TI_flc_region1, __stop___TI_flc_region1
__start___TI_flc_region2, __stop___TI_flc_region2
__start___TI_flc_region3, __stop___TI_flc_region3
__start___TI_flc_region4, __stop___TI_flc_region4
```
Users are responsible for ensuring that FLC regions are programmed according to the hardware interface based on the region symbols above.
Users may also create their own output sections in a linker command file with manually specified input sections and sort them based on execution order using the PREORDERED() operator:
```
SECTIONS
{
.outputSection: {
foo.o(.text)
bar.o(.text.myfunc)
} > FLASH, PREORDERED()
}
```
Support for C++17
Starting with the 3.1.0.STS version of the TI Arm Clang Compiler Tools, the tiarmclang compiler supports C++17 language extensions.
The tiarmclang compiler supports gnu17 language extensions for C programs, and now c++17 language extensions for C++ programs by default.
To choose a different C or C++ language dialect when compiling your program, you can use the tiarmclang -std option to choose your desired dialect. Please see C/C++ Language Standard Options (-std) for a full list of available C and C++ language dialects supported by the tiarmclang compiler.
Support for Generating TI-TXT Hex Format from tiarmobjcopy
The 3.1.0.STS version of the tiarmobjcopy utility now supports generating ti-txt hex format.
In addition to the already supported formats like ihex (Intel) and binary, the tiarmobjcopy utility can be used to generate ti-txt format from an ELF object file.
For example, given an ELF executable file, app.out, you can use tiarmobjcopy to generate an output file containing the raw data content of the app.out file in ti-txt format:
```
%> tiarmobjcopy --input-target=elf32-littlearm --output-target=ti-txt app.out app.hex
```
For further information about using the tiarmobjcopy utility, please see tiarmobjcopy - Object Copying and Editing Tool.
For further information about the TI_TXT format, please see TI-TXT Hex Format.
ALPHA LEVEL: Support for Static Shared Objects (SSO)
The tiarmclang 3.1.0.STS compiler tools provide ALPHA LEVEL support for building Static Shared Object (SSO) files, and for linking applications against an SSO file.
This support enables a user of a multicore system to:
- identify functions and RO data objects that are common among the system’s core applications,
- define and allocate placement in shared memory for those objects in an SSO file, then
- re-link their core applications against the SSO file to realize overall system code size savings.
Summary of New SSO Linker Options
–static_shared_object (–sso)
Use the –sso linker option to instruct the linker to build an SSO file from input object files and a linker command file.
Note that the –sso option is a linker option and must be specified with a -Wl, prefix on the tiarmclang command line. You may also specify it without the -Wl, prefix in the linker command file that you are using for the build of the SSO file.
There are rules that must be adhered to in the content of the linker command file that is used with the –sso option:
- All function and data sections that are to be included in the SSO output file must be explicitly specified with input section specifications in the linker command file
- Do not use wildcards in the specifications of the function and data sections to be included in the SSO output file
For example,
```
SECTIONS
{
...
.shared.text: {
*(.text._leaf_*) /* Ignored due to wildcard use */
a.o(.text._leaf_a) /* OK */
b.o(.text._leaf_b) /* OK */
} > SHARED_MEM
...
}
```
–import_sso=SSO File Name
Use the –import_sso=SSO file name option to identify an input SSO file to the linker (only one is allowed). This will insruct the linker to prefer the definitions of function and data objects in the SSO file over the same function and data object definitions in the application’s input object files.
Note that the –import_sso option is a linker option and must be specified with a -Wl, prefix on the tiarmclang command that is used to link an application against an SSO file.
For example,
```
%> tiarmclang -mcpu=cortex-r5 app.o common.o -o app.out -Wl,--import_sso=sso.out,app.cmd,-mapp.map
```
An Example Using an SSO File
This is a simplistic and contrived example to demonstrate how an SSO file can be used in a multicore system.
Suppose you have two nearly identical applications that are intended to run on separate cores in a multicore system …
The top-level aource file for app1 looks like this:
```
#include <stdio.h>
extern const char *get_red_fish();
extern const char *get_blue_fish();
extern const char *get_old_fish();
extern const char *get_new_fish();
int main() {
printf("rhyme 1:\n");
printf("\t%s, %s\n", get_red_fish(), get_blue_fish());
printf("\t%s, %s\n", get_old_fish(), get_new_fish());
return 0;
}
```
The top-level source file for app2 looks like this:
```
#include <stdio.h>
extern const char *get_new_fish();
extern const char *get_blue_fish();
extern const char *get_old_fish();
extern const char *get_gold_fish();
int main() {
printf("rhyme 2:\n");
printf("\t%s, %s\n", get_new_fish(), get_blue_fish());
printf("\t%s, %s\n", get_old_fish(), get_gold_fish());
return 0;
}
```
Both applications use the following 2 source files …
get_fishes.c
```
extern const char *red_fish_str;
extern const char *blue_fish_str;
extern const char *old_fish_str;
extern const char *new_fish_str;
extern const char *gold_fish_str;
const char *get_red_fish() { return red_fish_str; }
const char *get_blue_fish() { return blue_fish_str; }
const char *get_old_fish() { return old_fish_str; }
const char *get_new_fish() { return new_fish_str; }
const char *get_gold_fish() { return gold_fish_str; }
```
fishes.c:
```
__attribute__((section(".rodata.red_fish_str"))) const char *red_fish_str = "red fish";
__attribute__((section(".rodata.blue_fish_str"))) const char *blue_fish_str = "blue fish";
__attribute__((section(".rodata.new_fish_str"))) const char *old_fish_str = "old fish";
__attribute__((section(".rodata.old_fish_str"))) const char *new_fish_str = "new fish";
__attribute__((section(".rodata.gold_fish_str"))) const char *gold_fish_str = "gold fish";
```
The linker command files for each application …
app1.cmd:
```
/****************************************************************************/
/* app1.cmd */
/****************************************************************************/
-c
-stack 0x8000
-heap 0x2000
--args 0x0100
/* SPECIFY THE CORE1 MEMORY MAP */
MEMORY
{
CORE1_PMEM : org = 0x00100000 len = 0x80000
CORE1_DMEM : org = 0x00180000 len = 0x80000
}
/* SPECIFY THE SECTIONS ALLOCATED INTO CORE1 MEMORY */
SECTIONS
{
.bss : {} > CORE1_DMEM
.data : {} > CORE1_DMEM
.sysmem : {} > CORE1_DMEM
.stack : {} > CORE1_DMEM
.text : {} > CORE1_PMEM
.cinit : {} > CORE1_PMEM
.const : {} > CORE1_PMEM
.rodata : {} > CORE1_PMEM, palign(4)
}
```
app2.cmd:
```
/****************************************************************************/
/* app2.cmd */
/****************************************************************************/
-cr
-stack 0x8000
-heap 0x2000
--args 0x0100
/* SPECIFY THE CORE2 MEMORY MAP */
MEMORY
{
CORE2_PMEM : org = 0x00200000 len = 0x80000
CORE2_DMEM : org = 0x00280000 len = 0x80000
}
/* SPECIFY THE SECTIONS ALLOCATED INTO CORE2 MEMORY */
SECTIONS
{
.bss : {} > CORE2_DMEM
.data : {} > CORE2_DMEM
.sysmem : {} > CORE2_DMEM
.stack : {} > CORE2_DMEM
.text : {} > CORE2_PMEM
.cinit : {} > CORE2_PMEM
.const : {} > CORE2_PMEM
.rodata : {} > CORE2_PMEM, palign(4)
}
```
We can build each application independently like so:
```
%> tiarmclang -mcpu=cortex-r5 -c app1.c app2.c get_fishes.c fishes.c
%> tiarmclang -mcpu=cortex-r5 app1.o get_fishes.o fishes.o -o app1.out -Wl,app1.cmd,-mapp1.map
%> tiarmclang -mcpu=cortex-r5 app2.o get_fishes.o fishes.o -o app2.out -Wl,app2.cmd,-mapp2.map
```
Building an SSO File
If we examine the map files app1.map and app2.map from the above builds, we can see that the 2 applications share the following function and RO data sections:
- get_fishes.o (.text.get_blue_fish)
- get_fishes.o (.text.get_new_fish)
get_fishes.o (.text.get_old_fish)
- fishes.o (.rodata.blue_fish_str)
- fishes.o (.rodata.new_fish_str)
fishes.o (.rodata.old_fish_str)
We can then compose a linker command file, sso.cmd, that allocates the common function and RO data sections to shared memory …
sso.cmd:
```
/****************************************************************************/
/* sso.cmd */
/****************************************************************************/
--sso
/* SPECIFY SHARED MEMORY REGION */
MEMORY
{
SHARED_MEM : org = 0x00300000 len = 0x100000
}
/* SPECIFY THE SECTIONS ALLOCATION INTO SHARED MEMORY */
SECTIONS
{
.shared.text {
get_fishes.o(.text.get_new_fish)
get_fishes.o(.text.get_blue_fish)
get_fishes.o(.text.get_old_fish)
} > SHARED_MEM
.shared.rodata {
fishes.o(.rodata.new_fish_str)
fishes.o(.rodata.blue_fish_str)
fishes.o(.rodata.old_fish_str)
} > SHARED_MEM
}
```
Note the use of the –sso option in the above sso.cmd file.
To build the SSO file, use the following command:
```
%> tiarmclang -mcpu=cortex-r5 -o sso.out -Wl,sso.cmd,-msso.map
```
Linking an Application Against an SSO File
We can then relink both app1 and app2 against sso.out as follows:
```
%> tiarmclang -mcpu=cortex-r5 app1.o get_fishes.o fishes.o -o app1.out -Wl,--import_sso=sso.out,app1.cmd,-mapp1.map
%> tiarmclang -mcpu=cortex-r5 app2.o get_fishes.o fishes.o -o app2.out -Wl,--import_sso=sso.out,app2.cmd,-mapp1.map
```
An examination of the app1 and app2 map files after relinking will show that the function sections specified in the above sso.cmd file are not included in either app1 or app2.
There is currently a pending work item to ensure that RO data objects that are defined in the SSO file are not included in the applications that are linked against the SSO file.
NOTE: The SSO File Must Be Loaded in Addition to the Applications that Were Linked Against It
References from application code to function or RO data objects that are defined in the SSO file create a multicore system dependency between a given application that links against the SSO file and the SSO file itself. For this reason, you must load the SSO output file in addition to the core applications that are intended to run on the system before starting execution of any of the core applications.
Support for C++ Exceptions (-fexceptions)
By default, C++ exceptions are disabled.
Beginning with version 3.0.0.STS of the tiarmclang compiler tools, the -fexceptions compiler option can be specified when the compiler is invoked to enable support for C++ exceptions.
If the -fexceptions compiler option is used to compile an application’s source code, then the linker will be instructed to link with runtime support libraries that support C++ exceptions during the link-step of an application build.
Example
Consider the following simple example of utilizing C++ exceptions:
```
#include <iostream>
int main() {
int age;
std::cout << "How old are you? ";
std::cin >> age;
try {
if (age >= 18) {
std::cout << "Please proceed to an open booth to vote ... Thank you!\n";
}
else {
throw (age);
}
}
catch (int input_age) {
std::cout << "I'd like to help you, but you're too young to vote (" << input_age << ")\n";
}
return 0;
}
```
Compile the above C++ source file as follows:
```
%> tiarmclang -mcpu=cortex-m4 -fexceptions check_age.cpp -o check_age.out -Wl,-llnk.cmd
```
When loaded and run, the above application will generate the following output:
```
How old are you? 21
Please proceed to an open booth to vote ... Thank you!
How old are you? 5
I'd like to help you, but you're too young to vote (5)
```
Enable Compiler Generation of Execute-Only Code for Cortex-M0/M0+ Functions (-mexecute-only)
The -mexecute-only compiler option can be used in version 3.0.0.STS (or newer) of the tiarmclang compiler tools to generate “execute-only” code for Cortex-M0/M0+. Use of the -mexecute-only compiler option on the compiler invocation will prevent constant data from being embedded in the code section that the compiler generates for a function.
When an application’s source files are compiled with the -mexecute-only option, the linker will be instructed to link with execute-only versions of the runtime support libraries during the link-step of an application build.
Example
Consider a simple example with a function that contains a switch statement:
```
#include <stdio.h>
void mySwitch(int n) {
switch (n) {
case 1:
printf("Input value is 1\n");
break;
case 2:
printf("Input value is 2\n");
break;
case 3:
printf("Input value is 3\n");
break;
default:
printf("Invalid input\n");
break;
}
}
```
Compile the above C source file as follows, using the -S to emit compiler-generated assembly:
```
%> tiarmclang -mcpu=cortex-m0plus -mexecute-only -S ex_switch.c
```
The compiler generated assembly file contains the following:
```
%> cat ex_switch.s
...
.section .text.mySwitch,"axy",%progbits,unique,0
.globl mySwitch
.p2align 1
.code 16 @ @mySwitch
.thumb_func
mySwitch:
...
.LBB0_5: @ %sw.bb3
movs r0, :upper8_15:.L.str.2
lsls r0, r0, #8
adds r0, :upper0_7:.L.str.2
lsls r0, r0, #8
adds r0, :lower8_15:.L.str.2
lsls r0, r0, #8
adds r0, :lower0_7:.L.str.2
bl printf
...
.section .rodata.str1.1,"aMS",%progbits,1
.L.str:
.asciz "Input value is 1\n"
.size .L.str, 18
...
.L.str.2:
.asciz "Input value is 3\n"
.size .L.str.2, 18
...
```
Note that in the above compiler-generated code, the address where each string constant resides in the .rodata.str1.1 section is loaded via direct addressing. This requires four 8-bit loads to load each part of the address into a register before the call to printf can be made.
If we then compare this to the code that is generated when execute-only is disabled:
```
%> tiarmclang -mcpu=cortex-m0plus -S ex_switch.c
%> cat ex_switch.s
...
.section .text.mySwitch,"ax",%progbits
.globl mySwitch
.p2align 1
.code 16 @ @mySwitch
.thumb_func
mySwitch:
...
.LBB0_5: @ %sw.bb3
ldr r0, .LCPI0_0
bl printf
...
pop {r7, pc}
.p2align 2
.LCPI0_0:
.long .L.str.2
...
.section .rodata.str1.1,"aMS",%progbits,1
.L.str:
.asciz "Input value is 1\n"
.size .L.str, 18
...
.L.str.2:
.asciz "Input value is 3\n"
.size .L.str.2, 18
...
```
Observe that in the non-execute-only generated code the address of the location where the string constant resides is in a table of constants that is included in the .text section that contains the definition of the mySwitch function. This allows the address to be loaded via PC-relative addressing.
While the non-execute-only compiler-generated code is smaller and more efficient, the execute-only code may reside in special execute-only memory. This can be useful when code security is a concern in your application.
Enable Linker Generation of an XML Function Hash Table for OpTI-Flash/OpTI-SHARE (–gen_xml_func_hash)
In the Sitara OpTI-Flash multicore context, the ability to identify common functions across multiple executables is desired in order to allow users to abstract these functions out and place them in shared memory in order reduce individual executable size. This is also known as “OpTI-SHARE”. In order to identify common functions in a meaningful way (where function name and size are not enough), the tiarmclang version 3.0.0.STS linker can now generate an MD5 hash based on the function’s raw data prior to relocation and emit it within a table of function symbols in the linker-generated XML link info file.
The linker will also generate a list of referenced data sections from each global function uniquely identified by their object component IDs. Common read-only data sections can also be allocated in shared memory. However, writes to read-write data sections from common code must be managed through hardware address translation available on the device (aka “RAT”). These referenced data section lists can also be used in conjunction with “Smart Placement” where fast data access from frequently executed functions is desired.
When linking an application, the aforementioned table and referenced section lists will be generated when the “–xml_link_info” option is used in conjunction with “–gen_xml_func_hash”. The “–xml_link_info” option can be given a specified file name to use for the output.
Example
The generated table is designated by a “func_symbol_table” XML tag, with each global function represented by a “symbol” tag. The associated MD5 hash is indicated by a “value” tag and the referenced data section lists indicated in “refd_ro_sections” and “refd_rw_sections” tags for read-only (constant) data and read-write data, respectively. For example:
```
<func_symbol_table>
<symbol>
<name>func0</name>
<sectname>.text.main</sectname>
<value>b6e5b51736000aef4da6e8afb91846e4</value>
</symbol>
<symbol>
<name>func1</name>
<sectname>.text.foo</sectname>
<value>b1b9d95dd364df1b53f4e8c571ddaf68</value>
</symbol>
<symbol>
<name>func2</name>
<refd_ro_sections>
<object_component_ref idref="oc-92"/>
<object_component_ref idref="oc-99"/>
</refd_ro_sections>
<refd_rw_sections>
<object_component_ref idref="oc-94"/>
<object_component_ref idref="oc-96"/>
<object_component_ref idref="oc-97"/>
<object_component_ref idref="oc-98"/>
</refd_rw_sections>
</symbol>
</func_symbol_table>
```
Enable Use of Custom Datapath Extension (CDE) Intrinsics on Cortex-M33
Support for using Custom Datapath Extension (CDE) intrinsics in source code to be compiled for the TI Arm Cortex-M33 processor has been added to version 3.0.0.STS of the tiarmclang compiler tools.
CDE Intrinsics
The CDE intrinsics are defined in the arm_cde.h header file that is included in the tiarmclang compiler tools installation.
The arm_cde.h header file must be included in any compilation unit that references a CDE intrinsic. For example,
```
#include <arm_cde.h>
void foo(void) {
...
uint32 my_u32 = __arm_cx2a(10, 20, 30, 40);
...
}
```
The available CDE instrinsics include the following:
- uint32_t __arm_cxl(int, uint32_t);
- uint32_t __arm_cx1a(int, uint32_t, uint32_t);
- uint64_t __arm_cx1d(int, uint32_t);
- uint64_t __arm_cx1da(int, uint64_t, uint32_t);
- uint32_t __arm_cx2(int, uint32_t, uint32_t);
- uint32_t __arm_cx2a(int, uint32_t, uint32_t, uint32_t);
- uint64_t __arm_cx2d(int, uint32_t, uint32_t);
- uint64_t __arm_cx2da(int, uint64_t, uint32_t, uint32_t);
- uint32_t __arm_cx3(int, uint32_t, uint32_t, uint32_t);
- uint32_t __arm_cx3a(int, uint32_t, uint32_t, uint32_t, uint32_t);
- uint64_t __arm_cx3d(int, uint32_t, uint32_t, uint32_t);
- uint64_t __arm_cx3da(int, uint64_t, uint32_t, uint32_t, uint32_t);
- uint32_t __arm_vcx1_u32(int, uint32_t);
- uint32_t __arm_vcx1a_u32(int, uint32_t, uint32_t);
- uint64_t __arm_vcx1d_u64(int, uint32_t);
- uint64_t __arm_vcx1da_u64(int, uint64_t, uint32_t);
- uint32_t __arm_vcx2_u32(int, uint32_t, uint32_t);
- uint32_t __arm_vcx2a_u32(int, uint32_t, uint32_t, uint32_t);
- uint64_t __arm_vcx2d_u64(int, uint64_t, uint32_t);
- uint64_t __arm_vcx2da_u64(int, uint64_t, uint64_t, uint32_t);
- uint32_t __arm_vcx3_u32(int, uint32_t, uint32_t, uint32_t);
- uint32_t __arm_vcx3a_u32(int, uint32_t, uint32_t, uint32_t, uint32_t);
- uint64_t __arm_vcx3d_u64(int, uint64_t, uint64_t, uint32_t);
- uint64_t __arm_vcx3da_u64(int, uint64_t, uint64_t, uint64_t, uint32_t);
Each of the CDE intrinsics is defined in arm_cde.h as a static inline function and implemented via a compiler runtime built-in function that is defined in the relevant version of the libclang_rt.builtins.a runtime library, which is included in the tiarmclang 3.0.0.STS compiler tools installation.
Specify -march Option to Enable Use of CDE Intrinsics on Cortex-M33
To enable the use of CDE intrinsics during a compilation of a source file, one of the following -march compiler options must be specified when the compiler is invoked:
- -march=armv8.1-m.main+cdecp0
-march=thumbv8.1-m.main+cdecp0
**NOTE** While the tiarmclang compiler will parse and encode CDE intrinsic instructions for coprocessors other than coprocessor 0 (zero), there are currently no TI Arm Cortex-M33 devices that support CDE intrinsic instructions on coprocessors other than coprocessor 0.
Cortex-M4 and Cortex-R5 Performance Improvements
The 3.0.0.STS version of the tiarmclang compiler tools are capable of generating slightly higher performance Cortex-M4 and Cortex-R5 code versus the tiarmclang 2.1.3.LTS release due to the following improvements:
- Cortex-M4 - add unroll-and-jam optimization pass
- Cortex-R5 - utilize more efficient strcmp() runtime support implementation
tiarmclang 2.1.3.LTS v. tiarmclang 3.0.0.STS Benchmark Scores
The Coremark and Dhrystone performance benchmarks were built and run with both the tiarmclang 2.1.3.LTS and tiarmclang 3.0.0.STS releases. The following tables provide a sense of the performance improvements that can be anticipated when moving an application build from the 2.1.3.LTS compiler tools to the 3.0.0.STS compiler tools.
Cortex-M4:
Benchmark | 2.1.3.LTS Score | 3.0.0.STS Score |
---|---|---|
Coremark (inlining off) | 2.42 | 2.69 |
Coremark (inlining on) | 3.12 | 3.51 |
Dhrystone (inlining off) | 1.00 | 1.13 |
Dhrystone (inlining on) | 1.25 | 1.56 |
Cortex-R5:
Benchmark | 2.1.3.LTS Score | 3.0.0.STS Score |
---|---|---|
Coremark (inlining off) | 2.87 | 2.87 |
Coremark (inlining on) | 3.58 | 3.60 |
Dhrystone (inlining off) | 1.25 | 1.55 |
Dhrystone (inlining on) | 1.61 | 2.05 |
Host Support / Dependencies
The following host-specific versions of the 3.1.0.STS tiarmclang compiler tools are available:
- Linux: Ubuntu, RHEL 7
- Windows: 7, 8, 10
- Mac: OSX
Device Support
The tiarmclang compiler tools support development of applications that are to be loaded and run on one of the following Arm Cortex processor and runtime environment configurations:
Cortex-M0:
Runtime Environment Configuration | Options |
---|---|
Cortex-M0 | “-mcpu=cortex-m0” |
exceptions on | “-mcpu=cortex-m0 -fexceptions” |
execute-only on | “-mcpu=cortex-m0 -mexecute-only” |
execute-only and exceptions on | “-mcpu=cortex-m0 -mexecute-only -fexceptions” |
Cortex-M0+:
Runtime Environment Configuration | Options |
---|---|
Cortex-M0+ | “-mcpu=cortex-m0plus” |
exceptions on | “-mcpu=cortex-m0plus -fexceptions” |
execute-only on | “-mcpu=cortex-m0plus -mexecute-only” |
execute-only on, exceptions on | “-mcpu=cortex-m0plus -mexecute-only -fexceptions” |
Cortex-M3:
Runtime Environment Configuration | Options |
---|---|
Cortex-M3 | “-mcpu=cortex-m3” |
exceptions on | “-mcpu=cortex-m3 -fexceptions” |
Cortex-M4:
Runtime Environment Configuration | Options |
---|---|
Cortex-M4 (FPv4SPD16 on by default) | “-mcpu=cortex-m4” |
FPv4SPD16 on | “-mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16” |
FPv4SPD16 on, exceptions on | “-mcpu=cortex-m4 -fexceptions” |
FPv4SPD16 on, exceptions on | “-mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fexceptions” |
FPv4SPD16 off | “-mcpu=cortex-m4 -mfloat-abi=soft” |
FPv4SPD16 off, exceptions on | “-mcpu=cortex-m4 -mfloat-abi=soft -fexceptions” |
Cortex-M33:
Runtime Environment Configuration | Options |
---|---|
Cortex-M33 (FPv5SPD16 on by default) | “-mcpu=cortex-m33” |
FPv5SPD16 on | “-mcpu=cortex-m33 -mfloat-abi=hard -mfpu=fpv5-sp-d16” |
FPv5SPD16 on, exceptions on | “-mcpu=cortex-m33 -fexceptions” |
FPv5SPD16 on, exceptions on | “-mcpu=cortex-m33 -mfloat-abi=hard -mfpu=fpv5-sp-d16 -fexceptions” |
FPv5SPD16 off | “-mcpu=cortex-m33 -mfloat-abi=soft” |
FPv5SPD16 off, exceptions on | “-mcpu=cortex-m33 -mfloat-abi=soft -fexceptions” |
CDE CP0 on, FPv5SPD16 on | “-mcpu=cortex-m33 -march=armv8.1-m.main+cdecp0” |
CDE CP0 on, FPv5SPD16 on | “-mcpu=cortex-m33 -march=thumbv81-m.main+cdecp0” |
CDE CP0 on, FPv5SPD16 on | “-mcpu=cortex-m33 -march=armv8.1-m.main+cdecp0 -mfloat-abi=hard -mfpu=fpv5-sp-d16” |
CDE CP0 on, FPv5SPD16 on | “-mcpu=cortex-m33 -mfloat-abi=hard -march=thumbv81-m.main+cdecp0 -mfpu=fpv5-sp-d16” |
CDE CP0 on, FPv5SPD16 on, exceptions on | “-mcpu=cortex-m33 -march=armv8.1-m.main+cdecp0 -fexceptions” |
CDE CP0 on, FPv5SPD16 on, exceptions on | “-mcpu=cortex-m33 -march=thumbv81-m.main+cdecp0 -fexceptions” |
CDE CP0 on, FPv5SPD16 on, exceptions on | “-mcpu=cortex-m33 -march=armv8.1-m.main+cdecp0 -mfloat-abi=hard -mfpu=fpv5-sp-d16 -fexceptions” |
CDE CP0 on, FPv5SPD16 on, exceptions on | “-mcpu=cortex-m33 -march=thumbv81-m.main+cdecp0 -mfloat-abi=hard -mfpu=fpv5-sp-d16 -fexceptions” |
CDE CP0 on, FPv5SPD16 off | “-mcpu=cortex-m33 -march=armv8.1-m.main+cdecp0 -mfloat-abi=soft” |
CDE CP0 on, FPv5SPD16 off | “-mcpu=cortex-m33 -march=thumbv81-m.main+cdecp0 -mfloat-abi=soft” |
CDE CP0 on, FPv5SPD16 off, exceptions on | “-mcpu=cortex-m33 -march=armv8.1-m.main+cdecp0 -mfloat-abi=soft -fexceptions” |
CDE CP0 on, FPv5SPD16 off, exceptions on | “-mcpu=cortex-m33 -march=thumbv81-m.main+cdecp0 -mfloat-abi=soft -fexceptions” |
Please Note:
- CDE CP0 refers to support for Custom Datapath Extension intrinsics on Coprocessor 0
One of the following -march options must be specified in the tiarmclang command to enable support for the CDE intrinsics on Coprocessor 0;
-march=armv8.1-m.main+cdecp0 -march=thumbv8.1-m.main+cdecp0
You should only enable CDE intrinsics on Coprorcessor 0 if the target device is known to have a coprocessor unit on board
Cortex-R4:
Runtime Environment Configuration | Options |
---|---|
Cortex-R4 (default Arm mode, VFPv3D16 off) | “-mcpu=cortex-r4” |
Arm mode, VFPv3D16 off | “-mcpu=cortex-r4 -mfloat-abi=soft” |
Arm mode, VFPv3D16 off, exceptions on | “-mcpu=cortex-r4 -fexceptions” |
Arm mode, VFPv3D16 off, exceptions on | “-mcpu=cortex-r4 -mfloat-abi=soft -fexceptions” |
Arm mode, VFPv3D16 on | “-mcpu=cortex-r4 -mfloat-abi=hard -mfpu=vfpv3-d16” |
Arm mode, VFPv3D16 on, exceptions on | “-mcpu=cortex-r4 -mfloat-abi=hard -mfpu=vfpv3-d16” |
Thumb mode, VFPv3D16 off | “-mcpu=cortex-r4 -mthumb” |
Thumb mode, VFPv3D16 off | “-mcpu=cortex-r4 -mthumb -mfloat-abi=soft” |
Thumb mode, VFPv3D16 off, exceptions on | “-mcpu=cortex-r4 -mthumb -fexceptions” |
Thumb mode, VFPv3D16 off, exceptions on | “-mcpu=cortex-r4 -mthumb -mfloat-abi=soft -fexceptions” |
Thumb mode, VFPv3D16 on | “-mcpu=cortex-r4 -mthumb -mfloat-abi=hard -mfpu=vfpv3-d16” |
Thumb mode, VFPv3D16 on, exceptions on | “-mcpu=cortex-r4 -mthumb -mfloat-abi=hard -mfpu=vfpv3-d16 -fexceptions” |
Cortex-R5:
Runtime Environment Configuration | Options |
---|---|
Cortex-R5 (default Arm mode, VFPv3D16 on | “-mcpu=cortex-r5” |
Arm mode, VFPv3D16 on | “-mcpu=cortex-r5 -mfloat-abi=hard -mfpu=vfpv3-d16” |
Arm mode, VFPv3D16 on, exceptions on | “-mcpu=cortex-r5 -fexceptions” |
Arm mode, VFPv3D16 on, exceptions on | “-mcpu=cortex-r5 -mfloat-abi=hard -mfpu=vfpv3-d16 -fexceptions” |
Arm mode, VFPv3D16 off | “-mcpu=cortex-r5 -mfloat-abi=soft” |
Arm mode, VFPv3D16 off, exceptions on | “-mcpu=cortex-r5 -mfloat-abi=soft -fexceptions” |
Thumb mode, VFPv3D16 on | “-mcpu=cortex-r5 -mthumb” |
Thumb mode, VFPv3D16 on | “-mcpu=cortex-r5 -mthumb -mfloat-abi=hard -mfpu=vfpv3-d16” |
Thumb mode, VFPv3D16 on, exceptions on | “-mcpu=cortex-r5 -mthumb -fexceptions” |
Thumb mode, VFPv3D16 on, exceptions on | “-mcpu=cortex-r5 -mthumb -mfloat-abi=hard -mfpu=vfpv3-d16 -fexceptions” |
Thumb mode, VFPv3D16 off | “-mcpu=cortex-r5 -mthumb -mfloat-abi=soft” |
Thumb mode, VFPv3D16 off, exceptions on | “-mcpu=cortex-r5 -mthumb -mfloat-abi=soft -fexceptions” |
Resolved Defects
ID | Summary |
---|---|
CODEGEN-11158 | Do not emit the diagnostic: warning: call to function without interrupt attribute could clobber interruptee's VFP registers; consider using the interrupt_save_fp attribute to prevent this behavior |
CODEGEN-11092 | QKIT ToolDefinition.pdf for tiarmclang 1.3.x.LTS and 2.1.x.LTS has inaccurate Tool Version information |
CODEGEN-10988 | tiarmclang documentation fails to clarify that #pragma pack never increases the default alignment |
CODEGEN-10591 | Optimization of Logical NOT on condition yields incorrect MC/DC test vector tracking |
CODEGEN-10444 | Enabling Code Coverage and LTO results in missing profile data section |
CODEGEN-10383 | Document ‘#pragma clang section bss’ to be used for uninitialized variables |
CODEGEN-10251 | Initialization of array of structures is mistakenly filled with 0 |
CODEGEN-10229 | Crash can occur when loading symbols due to self-referencing DIE |
CODEGEN-10067 | LTO: linker should include undefined symbols that are referenced from a static function in the IR symbol table that is passed to the LTO recompile |
CODEGEN-10000 | LTO: Compiling a source file with cortex-r4/r5 with -mthumb and linking with ARM mode cortex-r4/r5 runtime libraries improperly resolves an R_ARM_CALL relocation |
CODEGEN-9997 | tiarmclang: LTO behaves differently than non-LTO with regards to how zero-initialized variables are defined |
CODEGEN-9850 | Unresolved reference to runtime library function when that function is referenced from asm statement |
CODEGEN-9838 | Update tiarmclang documentation to explain that C++ library does not support features related to threads and concurrency |
CODEGEN-9834 | Compiler ignores attribute((used)) |
CODEGEN-9779 | Function local static array allocated to .data section, and not .bss |
CODEGEN-9669 | TI Arm Clang mismatch between source code and debugger view with function subsections |
CODEGEN-9415 | Compiler inappropriately generates non-empty ARM.exidx sections |
CODEGEN-9092 | tiarmclang mistakenly documents support for -fpic position independent code |
CODEGEN-8914 | _enable_IRQ in ti_compatibility.h only supports Cortex-M devices |
CODEGEN-8899 | tiarmlnk generates cinit record for tiny .init_array section |
CODEGEN-8887 | Compiler does not support linking code that uses C++ exceptions |
CODEGEN-8639 | tiarmar.exe is denied permission to create an archive file on Windows 7 |
CODEGEN-8533 | Use of virtual functions causes many RTS print functions to be linked into the program |
CODEGEN-8471 | Hex utility, when splitting a section as required by the bootloader, ignores the section alignment for the second part of the split |
CODEGEN-8255 | tiarmclang: zero-initialized static and global variables are being defined in .bss |
CODEGEN-8216 | Code coverage symbols not defined when profile counter section manually placed |
CODEGEN-6288 | tiarmclang: optimizer removes empty loops that don't have side effects |
Known Defects
The up-to-date known defects in v3.1.0.STS can be found here (dynamically generated):
End of File