Readme for C7000 Code Generation Tools v5.0.0

0 Introduction to the C7000 Code Generation Tools v5.0.x LTS
1 Documentation
2 TI E2E Community - Where to get help
3 Defect Tracking Database
4 LUT interface change
5 Restrict advice
6 Predicate-generating comparison intrinsics
7 Automatic use of streaming engine and streaming address generator
8 Removed MMA features
9 Vector typename conflicts
10 Moving data between vector predicate to/from general purpose registers
11 New performance optimization that may affect code size
12 Permute behavior
13 Resolved defects
14 Known defects

0 Introduction to the C7000 Code Generation Tools v5.0.x LTS

This C7000 compiler release is a “Long-Term Support” (LTS) release.

This release supports the C7100, C7120, C7504, and C7524 ISA cores. To compile code for the C7100 core, use the compiler command-line option -mv7100 or equivalently, --silicon_version=7100. To compile code for the C7120 core, use the compiler command-line option -mv7120 or equivalently, --silicon_version=7120. To compile code for the C7504 core, use the compiler command-line option -mv7504 or equivalently, --silicon_version=7504. To compile code for the C7524 core, use the compiler command-line option -mv7524 or equivalently, --silicon_version=7524.

For definitions and explanations of STS, LTS, and the versioning number scheme, please see SDTO Compiler Version Numbers

1 Documentation

The following documents provide information on how to use, program, and migrate to the C7000 CPU.

C7000 C/C++ Optimizing Compiler Users Guide (SPRUIG8***.PDF)

C7000 Optimization Guide

C6000-to-C7000 Migration User’s Guide (SPRUIG5***.PDF)

C7000 Host Emulation User’s Guide (SPRUIG6***.PDF)

2 TI E2E Community - Where to get help

Post compiler related questions to the TI E2E design community forum and select the TI device being used.

The E2E Design Support Forum Website

If submitting a defect report, please attach a scaled-down test case with command-line options and the compiler version number to allow us to reproduce the issue easily.

The following is the top-level webpage for all of TI’s Code Generation Tools.

Code Generation Tools Landing Page

3 Defect Tracking Database

Compiler defect reports can be tracked at the Development Tools bug database, SIR. SIR is a JIRA-based view into all public tools defects.

SIR Development Tools Defect Tracking Website

A my.ti.com account is required to access this page. To find an issue in SIR, enter your defect id in the top right search box once logged in. Alternatively from the top red navigation bar, select “Issues” then “Search for Issues”.

4 LUT interface change

As of version 4.0.0, the macros __LUT_SET_LTER, __LUT_SET_LTBR, and __LUT_SET_LTCR defined in c7x_luthist.h have been changed so that the definitions do not end with a semicolon. This is in accordance with the best practice for function-like macros: source code which invokes them should treat them just like function calls, in particular by following the macro invocation with a semicolon.

Before this change, if the source code invoked the macro as part of a containing statement such as an if/else statement, it was not allowed to use semicolons, leading to confusing code:

if (test) __LUT_SET_LTER(a)
else __LUT_SET_LTER(b)

After this change, the code must use semicolons as if the macro were a normal function call:

if (test) __LUT_SET_LTER(a);
else __LUT_SET_LTER(b);

5 Restrict advice

In version 4.0.0, an advice-severity diagnostic message was added that identifies opportunities for qualifying function parameters with restrict if doing so is likely to improve loop performance. See Section 4.16 of the C7000 C/C++ Optimizing Compiler Users Guide.

The diagnostic can be disabled with --diag_suppress=35000, which is also supported in #pragma FUNCTION_OPTIONS.

6 Predicate-generating comparison intrinsics

In version 4.0.0, the __cmp_{ge,gt,le,lt}_{pred,bool} intrinsics are now overloaded to support integer and floating point arguments. Previously, the greater-than versions only supported integer arguments and the less-than versions only supported floating point arguments.

7 Automatic use of streaming engine and streaming address generator

7.1 Overview

Version 4.0.0 of the compiler adds support for automatic use of the streaming engines (SE) and the streaming address generators (SA). This behavior can be controlled with the --auto_stream option:

--auto_stream=off Disables automatic use of the SE and SA.
--auto_stream=saving Enables automatic use of the SE and SA with context saving. This option should be used if an SE or SA may be open when a function call is made. This option is safe, but may be slightly slower than --auto_stream=no_saving and may increase stack usage.
--auto_stream=no_saving Enables automatic use of the SE and SA without context saving. This option should be used if an SE or SA will never be open when a function call is made. This options is less safe than --auto_stream=saving but may be slightly faster and may reduce stack usage.

For C7100 and C7120, this optimization must be enabled manually with --auto_stream=no_saving due to no SE or SA context switching support on C7100 and C7120. For later parts, such as C7504, --auto_stream=saving is enabled by default.

--auto_stream will convert memory accesses in loop nests with addressing patterns that are guaranteed to fit into an SE or SA configuration template. For example:

void example1(char *in, char *restrict out, int len1, int len2)
{
    for (int i = 0; i < len1; i++)
        for (int j = 0; j < len2; j++)
            out[i*len1 + j] = in[i*len1 + j];
}

will be transformed to be equivalent to the following SE configuration on C7504 after being vectorized:

__SE_TEMPLATE_v1 tmplt = __gen_SE_TEMPLATE_v1();
tmplt.ICNT0 = 32;
tmplt.ICNT1 = (len2>>5)+((len2&0x1f) != 0);
tmplt.DIM1 = 32;
tmplt.ICNT2 = len1;
tmplt.DIM2 = len1;
tmplt.VECLEN = __SE_VECLEN_32ELEMS;
tmplt.DIMFMT = __SE_DIMFMT_3D;

7.2 Legality and correctness

The following will not be transformed due to len1 and len2 potentially not fitting in the 32 bit fields of the SE and the loop counters exceeding 32 bit values:

void example2(char *in, char *restrict out, long len1, long len2)
{
    for (long i = 0; i < len1; i++)
        for (long j = 0; j < len2; j++)
            out[i*len1 + j] = in[i*len1 + j];
}

For situations such as above, addressing patterns will almost always map to a stream in practice although edge cases may be possible. Such cases include, but are not limited to:

ICNT values exceeding the range of unsigned 32 bit.
DIM values exceeding the range of signed 32 bit.
Additions or multiplies in addressing exceeding the range of signed 32 bit.
Addressing exceeding the range of INT_MIN to INT_MAX elements.

The --assume_addresses_ok_for_stream option is available to allow the compiler to ignore edge cases such as those above. Using this option will allow example2 to be transformed in the same way as example1.

If the --auto_stream=no_saving option is used when an SE or SA is open when a function call is made, incorrect code may generated. In this case, the state of the SE or SA that is open will be lost if that SE or SA is used automatically by the compiler.

--auto_stream may generate incorrect code if L1D is configured and used as SRAM. In this case, attempting to use the SE to access L1D will fail.

7.3 Profitability and tuning

Automatic use of the SE and SA will only occur if the compiler believes transforming within a loop or loop nest to be profitable, which is primarily related to loop iteration counts. As such, using #pragma PROB_ITERATE and #pragma MUST_ITERATE will help guide this transformation.

Additionally, the compiler will not use an SE or SA if an SE or SA is already used in a function.

#pragma FUNCTION_OPTIONS may be used to control the behavior of automatic SE and SA on a function-by-function basis. For example, #pragma FUNCTION_OPTIONS("--auto_stream=no_saving --assume_addresses_ok_for_stream") could be used to enable automatic SE and SA for a single function on C7100.

8 Removed MMA features

In version 4.0.0, the enum values __MMA_OPEN_FSM_MINRESET and __MMA_OPEN_FSM_MAXRESET, used as the third argument to the __HWAOPEN intrinsic, have been removed from the c7x_mma.h header.

9 Vector typename conflicts

Version 5.0.0.LTS adds 1024 bit vector types: char128, short64, int32, and their unsigned variants, along with float32, cchar64, cshort32 and cint16. Most of these will probably not lead to typename conflicts aside from int32. Some projects may fail to compile due to typename conflicts between the new int32 built-in type and a possibly project-defined int32. The recommended solution to such a problem is to add the --vectypes=off command-line flag to cl7x. This will cause the built-in vector types to be accessible only by __char128 or __int32, rather than char128 or int32.

Additionally, 5.0.0.LTS adds the equivalent of --vectypes=off to host emulation. When TI_VECTYPES_OFF is defined, such as with g++ -DTI_VECTYPES_OFF, vector types will only be accessible by __int32 or similar, rather than int32.

10 Moving data between vector predicate to/from general purpose registers

The original intrinsics for moving data between vector predicate registers and general purpose registers were vpred_t _mvrp(long) and long _mvpb(vpred). These intrinsics (and their variants) are now deprecated.

As of version 5.0.0, new APIs will use an unsigned char vector type rather than a long. This vector type must have the same number of bits as the vector predicate register for the target in question. That is, uchar4 for 7504; uchar8 for 7100, 7120; etc.

#if __C7X_VEC_SIZE_BITS__ == 256
__uchar4  __OVBIF __vpred_to_uchar_vec(__vpred);
#elif __C7X_VEC_SIZE_BITS__ == 512
__uchar8  __OVBIF __vpred_to_uchar_vec(__vpred);
#endif

#if __C7X_VEC_SIZE_BITS__ == 256
__uchar4  __OVBIF __vpred_to_uchar_vec(__vpred);
#elif __C7X_VEC_SIZE_BITS__ == 512
__uchar8  __OVBIF __vpred_to_uchar_vec(__vpred);
#endif

11 New performance optimization that may affect code size

The 5.0.0 version of the C7000 compiler adds an optimization that improves performance in some cases, but may increase code size.

This optimization, called “run-time alias disambiguation,” attempts to significantly improve the performance of certain loops when the restrict keyword is not used. However, it may result in duplicated loops, which will increase code size.

Users that observe undesired code size growth may opt to use -mf3 or lower on those files that are affected. Run-time alias disambiguation is not performed by the compiler at –opt_for_speed=3 or lower. Also, users may want to investigate the possibility of using the restrict keyword on the pointers that are used in the affected loops, in order to prevent the compiler from needing to perform run-time alias disambiguation. Refer to the C7000 Optimization Guide for more information on the use of the –opt_for_speed/-mf option to control the code-size and performance tradeoff on C7000.

C7000 Optimization Guide

12 Permute behavior

As of version 5.0.0, permute intrinsics in c7x.h, such as __permute() or __permute_even_even_int(), will only specify/guarantee behavior for input index values that are in range for the input data vector. For example, an index of 63 is specified on C7100, but an index of 64 is not. If the out-of-range behavior of an underlying instruction is needed, such as VPERM or VPERMEEW, then the corresponding direct intrinsic should be used from c7x_direct.h. For example, __vperm_yvv() or __vpermeew_yvvv().

13 Resolved defects

Resolved defects in v5.0.0:

ID	Summary
CODEGEN-12663	Compiler may generate incorrect code for ternary max idiom with integers
CODEGEN-12527	Compiler may optimize incorrectly when assigning boolean vector result of a vector compare to a non-variable left hand side (e.g. a dereferenced pointer, or struct field)
CODEGEN-12473	Unaligned vector accessor may lead to abnormal optimizer termination
CODEGEN-10115	Restrict parameter to inlined function causes software pipelined loop to have a loop carried dependency

14 Known defects

The up-to-date known defects in v5.0.0 can be found here (dynamically generated):

Known defects in v5.0.0

End Of File

Readme for C7000 Code Generation Tools v5.0.0

Table of Contents

0 Introduction to the C7000 Code Generation Tools v5.0.x LTS

1 Documentation

2 TI E2E Community - Where to get help

3 Defect Tracking Database

4 LUT interface change

5 Restrict advice

6 Predicate-generating comparison intrinsics

7 Automatic use of streaming engine and streaming address generator

7.1 Overview

7.2 Legality and correctness

7.3 Profitability and tuning

8 Removed MMA features

9 Vector typename conflicts

10 Moving data between vector predicate to/from general purpose registers

11 New performance optimization that may affect code size

12 Permute behavior

13 Resolved defects

14 Known defects