16. Smart Function and Data Placement

The tiarmclang compiler tools support a method to more easily place heavily accessed functions and data objects in faster memory using annotations either at the C/C++ source file level (using attributes) or at link-time using assembly directives. Heavily accessed or critical functions and data can be pre-determined manually by the user or by profiling infrastructure built on top of this. This method prevents the users from having to know function or data subsection information for explicitly placement using a linker command file, which may vary across projects and applications.

The annotations developed correspond to a memory hierarchy: LOCAL memory (Tightly-Coupled Memory/TCM), ONCHIP memory (RAM), or OFFCHIP memory (external flash). A priority can also be assigned to sort functions and data object placement within the same memory at link-time.

The memory regions in which these placements are allocated is controlled using directives in the linker command file according to documented output sections described below.

16.1. C/C++ Source-level annotations

The following source-level annotations can be used to explicitly place functions and data objects in corresponding locations. If priority is omitted, it is assumed to be priority ‘1’ (i.e. “high priority”).:

__attribute__(({local,onchip,offchip}(priority)))

Example:

__attribute__((local(1))) void func0(void) { .. } // Place in TCM with priority 1
__attribute__((local(2))) void func1(void) { .. } // Place in TCM with priority 2
__attribute__((onchip))   void func2(void) { .. } // Place in SRAM with implied priority 1

The attributes can be added to a function definition or a function declaration (as long as that function is called/referenced in the same compilation unit).

16.2. Assembly metainfo directives

Functions can also be annotated by adding an assembly metainfo directive in an assembly file that is compiled and linked with the project using the following format. This would allow users to avoid having to compile other source code:

.global <global function symbol>
.sym_meta_info <global function symbol>, "of_placement", {"local","onchip","offchip"}, <priority>

Example:

.global strcmp
.sym_meta_info strcmp, "of_placement", "local", 1
.global memcpy
.sym_meta_info memcpy, "of_placement", "memcpy", 1

A simple node.js script called generate_syms.js is included in the toolchain that generates an assembly file based on a CSV text file of the following format:

strcmp,local,1
main,onchip,3
memcpy,fast_local_copy,1

16.3. Smart Placement Linker Aggregation

With the placement described above, the TI link-step will aggregate function and data input sections into documented output sections while also sorting the input sections. For Smart Placement, the input sections are sorted based on the designated priority. The documented output sections for Smart Placement are:

  • .TI.local: Code and initialized data designated for local memory (TCMs)

  • .TI.bss.local: Uninitialized data designated for local memory

  • .TI.onchip: Code and initialized data designated for onchip memory (RAM)

  • .TI.bss.onchip: Uninitialized data designated for onchip memory

  • .TI.offchip: Code designated for offchip memory (FLASH)

Note: that data objects placed in .TI.local or .TI.onchip are always directly initialized according to RAM-model initialization. This means that whatever is responsible for loading that code and data into RAM or TCM will also initialize the data, even if ROM-model auto-initialization is used. This means that in ROM-model, CINIT records are not created for this data.

When ROM-model auto-initialization is enabled, zero-initialization CINIT records will be created for the uninitialized memory regions .TI.bss.local and .TI.bss.onchip. When RAM-model initialization is used, it is up to the user to zero-initialized these sections. The linker will export symbols that an initialization routine can link against designated the start and end of these sections:

  • .TI.bss.local: __start___TI_bss_local and __stop___TI_bss_local

  • .TI.bss.onchip: __start___TI_bss_onchip and __stop___TI_bss_onchip

Note that because symbols are defined for these sections, they cannot be split between multiple memory regions.

A default linker command file needs to place the documented output sections in the corresponding memory regions in both development and deployment flows. This could be autogenerated by sysconfig based on the memory partition or linked using generic macros. For a development flow, this is pretty straightforward, as in the following example.

/* Partitioned memory map */
MEMORY
{
    R5F_VECS : ORIGIN = 0x00000000 , LENGTH = 0x00000040
    R5F_TCMA : ORIGIN = 0x00000040 , LENGTH = 0x00007FC0
    R5F_TCMB : ORIGIN = 0x41010000 , LENGTH = 0x00008000
    MSRAM    : ORIGIN = 0x70080000 , LENGTH = 0x40000
    FLASH    : ORIGIN = 0x60100000 , LENGTH = 0x80000
}

SECTIONS
{
    /* "local"   --> split between TCMs and RAM                   */
    /* "onchip"  --> split between RAM and FLASH                  */
    /* "offchip" --> FLASH                                        */

    .TI.local   : {} >> R5F_TCMA | R5F_TCMB | MSRAM
    .TI.onchip  : {} >> MSRAM | FLASH
    .TI.offchip : {} > FLASH
    .TI.local.bss : {} > R5F_TCMB; /* Exports symbols __start___TI_bss_local, __stop___TI_bss_local */
    .TI.onchip.bss: {} > MSRAM;    /* Exports symbols __start___TI_bss_onchip, __stop___TI_bss_onchip */
}

By default, section splitting should be used as shown above between memory regions to get the full effect of function prioritization.

16.4. Enable Smart Data Collection for Smart Placement

When the --smart_data_collect linker option is enabled, the TI link-step will not only include explicitly annotated function and data objects into the appropriate documented output sections, it will also pull in referenced initialized data sections and assign them the same priority as the object that references them. For objects placed in .TI.local, referenced read-write and read-only (constant) data sections are pulled in. For objects placed in .TI.onchip, only referenced read-only (constant) data sections are pulled in. Nothing happens for objects placed in .TI.offchip.