C2000 C/C++ CODE GENERATION TOOLS Release Notes 6.2.11 January 2015 ******************************************************************************* Table of Contents ******************************************************************************* 1. Soprano Device Support Options 1.1 TMU 1.2 VCU Type 2 1.3 CLA Type 1 1.4 32-bit Address Intrinsics 2. Compiler Support for Generating DMAC Instructions 2.1. Automatic Generation 2.2. Assertions for Data Address Alignment 2.3. __dmac Intrinsic 3. New C Language Features 3.1. Hexadecimal Floating-point Numbers 3.2. Binary Literals 3.3. Arrays of Variable Length 4. Plink2000 Utility Integrated into Assembler 5. Map File Info on Data Placement 6. ECC Support in Linker 6.1 Linker Command File Syntax 6.2 VFILL Specifier 7. RPT MACF32 Generation Enabled By --fp_reassoc option ============================================================================== 1. Soprano Device Support Options ============================================================================== The 6.2.2 compiler exposes the compiler options for the following new Soprano device support. ============================================================================== 1.1 TMU ============================================================================== Compiler support for TMU is enabled with compiler switch --tmu_support=tmu0. This switch will enable FPU32 support as well. Source-level intrinsics are available, as well as automatic RTS library call replacement in relaxed floating point mode. 1.1.1 Source-level Intrinsics The following intrinsics are available for TMU instructions: (double) __divf32(double num, double denom); (double) __sqrt(double src); (double) __sin(double src); (double) __cos(double src); (double) __atan(double src); (double) __atan2(double src1, double src2); (double) __mpy2pif32(double src); (double) __div2pif32(double src); (double) __sinpuf32(double src); (double) __cospuf32(double src); (double) __atanpuf32(double src); (double) __atan2puf32(double src1, double src2); (double) __quadf32(double ratio, double y, double x); 1.1.2 Automatic Library Call Replacement If the --fp_mode=relaxed compiler option is used, RTS library calls will be replaced with TMU hardware instructions for the following floating point operations: floating point division, sqrt, sin, cos, atan, and atan2. Note that there are algorithmic differences between the hardware instructions and the library routines so results may vary. ============================================================================== 1.2 VCU Type 2 device support ============================================================================== Assembler support is enabled for the new Type 2 VCU with compiler switch --vcu_support=vcu2. Note that the previous VCU version is vcu0; there is no vcu1. ============================================================================== 1.3 CLA-1 ============================================================================== Compiler support for the new Type 1 CLA with compiler switch --cla_support=cla1. ============================================================================== 1.4 32-bit Address Intrinsics ============================================================================== The following new intrinsics are available to read from and write to memory using 32-bit addresses. These are for special data placed higher than the usual 22-bit address range. (unsigned short) __addr32_read_uint16(unsigned long addr); (unsigned long) __addr32_read_uint32(unsigned long addr); (void) __addr32_write_uint16(unsigned long addr, unsigned short val); (void) __addr32_write_uint32(unsigned long addr, unsigned long val); ============================================================================== 2. Compiler Support for Generating DMAC Instructions ============================================================================== The 6.2.0 version of the C2000 Code Generation Tools introduces compiler support for DMAC instructions. DMACs perform multiply-accumulates on 2 adjacent signed integers at the same time, optionally shifting the products. There are 3 levels of support that require different levels of C-source modification. ============================================================================== 2.1. Automatic Generation ============================================================================== Automatic generation of DMAC instructions requires that the compiler both recognize the C-language statements as a DMAC opportunity and verify that the data addresses being operated upon are 32-bit aligned. This is the best scenario in that it requires no code modification by the user aside from data alignment pragmas. The following is an example: int src1[N], src2[N]; #pragma DATA_ALIGN(src1,2); // int arrays must be 32-bit aligned #pragma DATA_ALIGN(src2,2); {...} int i; long res = 0; for (i = 0; i < N; i++) // N must be a known even constant res += (long)src1[i] * src2[i]; // Arrays must be accessed via array // indices At optimization levels >= -O2, the compiler will generate a RPT || DMAC for the example code above when N is a known even constant. Additionally, the product can be shifted left by 1 or right by 1 - 6: For example: for (i = 0; i < N; i++) res += (long)src1[i] * src2[i] >> 1; // product shifted right by 1 ============================================================================== 2.2. Assertions for Data Address Alignment ============================================================================== In some cases the compiler may be able to recognize a DMAC opportunity in the C-language statements but not be able to verify the necessary alignment of the data addresses passed to the computation. In this case, assertions placed in the code can enable the compiler to generate DMAC instructions. The following is an example: int *src1, *src2; // src1 and src2 are pointers to int arrays of at // least size N. User must ensure that both are // 32-bit aligned addresses. {...} int i; long res = 0; _nassert((long)src1 % 2 == 0); _nassert((long)src2 % 2 == 0); for (i = 0; i < N; i++) // N must be a known even constant res += (long)src1[i] * src2[i]; // src1 and src2 must be accessed via // array indices At optimization levels >= -O2, the compiler will generate a RPT || DMAC for the example code above when N is a known even constant. The _nassert intrinsics are used to assert that the data addresses represented by the src1 and src2 pointers are 32-bit aligned. The user is responsible for ensuring that only 32-bit aligned data addresses are passed via these pointers. Otherwise, the code will result in a runtime failure if the asserted conditions are not true. Additionally, the product can be shifted left by 1 or right by 1 - 6: For example: for (i = 0; i < N; i++) res += (long)src1[i] * src2[i] >> 1; // product shifted right by 1 ============================================================================== 2.3. __dmac Intrinsic ============================================================================== Alternately, users can use the new __dmac intrinsic. In this case, the user is fully responsible for ensuring that the data addresses are properly aligned. void __dmac(long *src1, long *src2, long &accum1, long &accum2, int shift); -- Src1 and src2 must be 32-bit aligned addresses pointing to int arrays. -- Accum1 and accum2 are pass-by-reference longs for the two temporary accumulations. These must be added together after the intrinsic to compute the total sum. -- Shift: The products can optionally be shifted prior to each accumulation. Valid shift values are -1 for a left shift by 1, 0 for no shift, and 1-6 for right shift by 1-6 respectively. Note that this argument is required so 0 must be passed if no shift is desired. Example 1: int src1[2N], src2[2N];// src1 and src2 are int arrays of at least size 2N // User must ensure that both start on 32-bit // aligned boundaries. {...} int i; long res = 0; long temp = 0; for (i=0; i < N; i++) // N does not have to be a known constant __dmac(((long *)src1)[i], ((long *)src2)[i], res, temp, 0); res += temp; Example 2: int *src1, *src2; // src1 and src2 are pointers to int arrays of at // least size 2N. User must ensure that both are // 32-bit aligned addresses. {...} int i; long res = 0; long temp = 0; long *ls1 = (long *)src1; long *ls2 = (long *)src2; for (i=0; i < N; i++) // N does not have to be a known constant __dmac(*ls1++, *ls2++, res, temp, 0); res += temp; In these examples, res holds the final sum of a multiply-accumulate on int arrays of length 2*N, with 2 computations being performed at a time. Additionally, an optimization level >= -O2 must be used to generate efficient code. Moreover, if there is nothing else in the loop body as in these examples, the compiler will generate a RPT || DMAC, further improving performance. ============================================================================== 3. New C Language Features ============================================================================== Beginning with release v6.2.0, the C2000 C/C++ compiler now supports the following C language features. ============================================================================== 3.1. Hexadecimal Floating-point Numbers ============================================================================== ISO C99 added support for floating-point numbers written in a hexadecimal format. These numbers take the form of a floating point value followed by a power of 2 to multiply by. For example, 1.55e1 = 15.5 0x1.fp3 = (1 15/16) * 8 = 15.492 In the second example, the 0x1.f represents the base value, and the p3 means "multiply by 2 to the power of 3". Refer to section 6.4.4.2 in the C99 Standard for more information on this feature. Hexadecimal floating-point numbers are enabled as an extension in all modes, but standard functions that work on floating point numbers such as atof and strtod, as well as the %a and %A formatting specifiers for the printf family of functions do not support hexadecimal floating-point numbers. ============================================================================== 3.2. Binary Literals ============================================================================== As an extension to C89 and C99, integer literals can be represented as a binary number prefixed by '0b', much like the hexadecimal '0x'. Binary literals follow all the same rules as other constants and can be suffixed by modifiers such as 'L'. The following 4 constants are equivalent: - 42 - 0x2a - 052 - 0b101010 This feature is enabled as an extension when the --gcc option is used. ============================================================================== 3.3. Arrays of Variable Length ============================================================================== ISO C99 added support for arrays whose lengths are not constant. Storage for these arrays is allocated at the point of declaration and is automatically deallocated when the array goes out of scope. As such, these arrays cannot be used in the file scope. For example, the array 'str' in the following function has a length that is determined at runtime. Its storage will be allocated during runtime and be deallocated automatically when it goes out of scope. FILE *concat_fopen(char *s1, char *s2, char *mode) { char str[strlen (s1) + strlen (s2) + 1]; strcpy (str, s1); strcat (str, s2); return fopen (str, mode); } Exiting a scope containing a variable length array will deallocate it, but jumping into a scope that contains a variable length array from a scope which does not contain that array is an error. The following is an example of code which will result in this error. void count_by_ten(unsigned int val) { if (val % 10 == 0 && val != 0) { int cnt_arr[val/10]; int i; print_it: for (i = 0; i < val/10; i++) cnt_arr[i] = 10*(i+1); for (i = 0; i < val/10; i++) printf("%u\n",cnt_arr[i]); } else { val = 10; goto print_it; } } These arrays can also be used as arguments to functions. The lengths of the array must be declared before the variable length array itself. Consider the arguments in foo below. The second argument is a variable length array using the argument 'len' as its dimensions. void foo(int len, char data[len][len]) { /* ... */ } Refer to section 6.7.5.2 in the C99 spec for more information on variable length arrays. This feature is currently only enabled when using the --gcc option. ============================================================================== 4. Plink2000 Utility Integrated into Assembler ============================================================================== The plink2000 utility has been integrated into the C2000 assembler and is no longer shipped as a separate executable. A description of the post-link optimization pass and how it is invoked can be found in the compiler user's guide. Additionally, an advice-only option is available for annotating assembly with comments when changes cannot be made safely due to pipeline considerations, such as when float support or vcu support is enabled. Use instead: --plink_advice_only ============================================================================== 5. Map File Info on Data Placement ============================================================================== When generating a map file, the C2000 linker will now, by default, output user symbols sorted by data page.  This is in addition to the normal symbol output tables (sorted by address and by name). As with other map file contents the output can be controlled by the --mapfile_contents option (invoke the linker with --mapfile_contents=help to see the options available) The Data page output can be enabled or suppressed using --mapfile_contents=sym_dp  or --mapfile_contents=nosym_dp. ============================================================================== 6. ECC Support in Linker ============================================================================== Beginning with this release, the C2000 compiler supports automatic generation of Error Correction Codes compatible with the Flash ECC on various TI microcontrollers. ECC can be generated during the final link step of compilation. The ECC data is included in the resulting object file, alongside code and data, as a data section located at the appropriate address. Therefore, no extra ECC generation step is required after compilation, and the ECC can be uploaded to the device along with everything else. The Linker is configured to generate ECC using a new syntax in the linker command file. The command file will specify a separate memory range for the ECC inside the device's memory map, and it will indicate which other memory range corresponds to the Flash data memory covered by the ECC. It will also specify the parameters by which the ECC will be calculated, which can vary somewhat between devices. ------------------------------------------------------------------------------- 6.1 Linker Command File Syntax ------------------------------------------------------------------------------- The memory map for a device supporting Flash ECC may look something like this: MEMORY { PAGE 0 : PROG(R) : origin = 0x3e0000, length = 0x20000 ECC : origin= 0x400000, length = 0x08000 ECC={ input_range=PROG } PAGE 1 : RAM1 (RW) : origin = 0x000402 , length = 0x003FE RAM2 (RW) : origin = 0x001000 , length = 0x16000 RAM3 (RW) : origin = 0x3d0000 , length = 0x10000 } The "ECC" specifier attached to the ECC memory ranges indicates the data memory range that the ECC range covers. The ECC specifier supports the following parameters: input_range= The data memory range covered by the ECC range; required. input_page = The page number of the input range; required only if the input range's name is ambiguous. algorithm = The name of an ECC algorithm defined later in the command file; optional if only one algorithm is defined. (see below) fill = Whether to generate ECC data for holes in the initialized data of the input range; default value is "true". Using fill=false will produce behavior similar to the nowECC tool. In addition to specifying the ECC memory ranges in the memory map, the command file must specify parameters for generating the ECC data. This is done with a new top level directive in the command file: ECC { algo_name : address_mask = 0x003ffff8 hamming_mask = FMC parity_mask = 0xfc mirroring = F021 } This ECC algorithms directive accepts any number of definitions, which consist of an identifier, a colon, and some number of algorithm attributes. The following attributes are supported: address_mask=<32-bit mask> This mask determines what address bits are used in the calculation of ECC. hamming_mask= This determines which data bits each ECC bit encodes the parity for. parity_mask =<8-bit mask> This determines which ECC bits encode even parity, and which bits encode odd parity. mirroring = This determines what pattern of duplication of parity information is used in the ECC memory for redundancy. Each TI device supporting Flash ECC will have exactly one valid set of values for these parameters. The memory maps and ECC algorithm definitions for Flash ECC devices should be included in subsequent releases of Code Composer Studio. Users will not generally need to modify the preset values. ------------------------------------------------------------------------------- 6.2) VFILL Specifier ------------------------------------------------------------------------------- An extension has been made to the fill value specifier for memory ranges in the linker command file. Normally, specifying a fill value for a memory range will create initialized data sections to cover any previously uninitialized areas of memory. To generate ECC for an entire memory range, the linker either needs to have initialized data for the entire range, or needs to know what value uninitialized memory areas will have at run time. To accommodate the case where the user wants to generate ECC for the entire memory range, but does not want to initialize the entire range by specifying a fill value, the new "vfill" specifier may be used instead of a "fill" specifier: MEMORY { FLASH : origin=0x0000 length=0x4000 vfill=0xffff } The vfill specifier is functionally equivalent to omitting a fill specifier, except that it allows ECC generation to occur for areas of the input memory range that remain uninitialized. This has the benefit of reducing the size of the resulting object file. The "vfill" specifier has no effect other than in ECC generation. It cannot be specified along with a "fill" specifier, since that would introduce ambiguity. ============================================================================== 7. RPT MACF32 Generation Enabled By --fp_reassoc ============================================================================== Floating point operations are not naturally associative like their integral counterparts. The RPT MACF32 instruction accumulates two partial sums that are added together afterwards to compute the whole sum. Because this changes the order of additions, the result can vary in precision from a serial floating point multiply accumulate. The compiler flag --fp_reassoc can be used to indicate whether reassociation of floating point operations is acceptable, trading precision accuracy for speed. When --fp_reassoc=on (the default value), the compiler will generate RPT MACF32 instructions when possible. If --fp_reassoc=off, only the non-repeat versions of MACF32 instructions will be generated. In prior versions of the compiler, RPT MACF32 was generated by default and was not controlled by the --fp_reassoc option. It can also be disabled, along with all other RPT instructions, using the --no_rpt flag.