Readme for C7000 Code Generation Tools v5.0.0
Table of Contents
- 0 Introduction to the C7000 Code Generation Tools v5.0.x LTS
- 1 Documentation
- 2 TI E2E Community - Where to get help
- 3 Defect Tracking Database
- 4 LUT interface change
- 5 Restrict advice
- 6 Predicate-generating comparison intrinsics
- 7 Automatic use of streaming engine and streaming address generator
- 8 Removed MMA features
- 9 Vector typename conflicts
- 10 Moving data between vector predicate to/from general purpose registers
- 11 New performance optimization that may affect code size
- 12 Permute behavior
- 13 Resolved defects
- 14 Known defects
0 Introduction to the C7000 Code Generation Tools v5.0.x LTS
This C7000 compiler release is a “Long-Term Support” (LTS) release.
This release supports the C7100, C7120, C7504, and C7524 ISA cores. To compile code for the C7100 core, use the compiler command-line option -mv7100
or equivalently, --silicon_version=7100
. To compile code for the C7120 core, use the compiler command-line option -mv7120
or equivalently, --silicon_version=7120
. To compile code for the C7504 core, use the compiler command-line option -mv7504
or equivalently, --silicon_version=7504
. To compile code for the C7524 core, use the compiler command-line option -mv7524
or equivalently, --silicon_version=7524
.
For definitions and explanations of STS, LTS, and the versioning number scheme, please see SDTO Compiler Version Numbers
1 Documentation
The following documents provide information on how to use, program, and migrate to the C7000 CPU.
C7000 C/C++ Optimizing Compiler Users Guide (SPRUIG8***.PDF)
C6000-to-C7000 Migration User’s Guide (SPRUIG5***.PDF)
C7000 Host Emulation User’s Guide (SPRUIG6***.PDF)
2 TI E2E Community - Where to get help
Post compiler related questions to the TI E2E design community forum and select the TI device being used.
If submitting a defect report, please attach a scaled-down test case with command-line options and the compiler version number to allow us to reproduce the issue easily.
The following is the top-level webpage for all of TI’s Code Generation Tools.
3 Defect Tracking Database
Compiler defect reports can be tracked at the Development Tools bug database, SIR. SIR is a JIRA-based view into all public tools defects.
A my.ti.com account is required to access this page. To find an issue in SIR, enter your defect id in the top right search box once logged in. Alternatively from the top red navigation bar, select “Issues” then “Search for Issues”.
4 LUT interface change
As of version 4.0.0, the macros __LUT_SET_LTER, __LUT_SET_LTBR, and __LUT_SET_LTCR defined in c7x_luthist.h have been changed so that the definitions do not end with a semicolon. This is in accordance with the best practice for function-like macros: source code which invokes them should treat them just like function calls, in particular by following the macro invocation with a semicolon.
Before this change, if the source code invoked the macro as part of a containing statement such as an if/else statement, it was not allowed to use semicolons, leading to confusing code:
if (test) __LUT_SET_LTER(a)
else __LUT_SET_LTER(b)
After this change, the code must use semicolons as if the macro were a normal function call:
if (test) __LUT_SET_LTER(a);
else __LUT_SET_LTER(b);
5 Restrict advice
In version 4.0.0, an advice-severity diagnostic message was added that identifies opportunities for qualifying function parameters with restrict
if doing so is likely to improve loop performance. See Section 4.16 of the C7000 C/C++ Optimizing Compiler Users Guide.
The diagnostic can be disabled with --diag_suppress=35000
, which is also supported in #pragma FUNCTION_OPTIONS
.
6 Predicate-generating comparison intrinsics
In version 4.0.0, the __cmp_{ge,gt,le,lt}_{pred,bool}
intrinsics are now overloaded to support integer and floating point arguments. Previously, the greater-than versions only supported integer arguments and the less-than versions only supported floating point arguments.
7 Automatic use of streaming engine and streaming address generator
7.1 Overview
Version 4.0.0 of the compiler adds support for automatic use of the streaming engines (SE) and the streaming address generators (SA). This behavior can be controlled with the --auto_stream
option:
--auto_stream=off
Disables automatic use of the SE and SA.--auto_stream=saving
Enables automatic use of the SE and SA with context saving. This option should be used if an SE or SA may be open when a function call is made. This option is safe, but may be slightly slower than--auto_stream=no_saving
and may increase stack usage.--auto_stream=no_saving
Enables automatic use of the SE and SA without context saving. This option should be used if an SE or SA will never be open when a function call is made. This options is less safe than--auto_stream=saving
but may be slightly faster and may reduce stack usage.
For C7100 and C7120, this optimization must be enabled manually with --auto_stream=no_saving
due to no SE or SA context switching support on C7100 and C7120. For later parts, such as C7504, --auto_stream=saving
is enabled by default.
--auto_stream
will convert memory accesses in loop nests with addressing patterns that are guaranteed to fit into an SE or SA configuration template. For example:
void example1(char *in, char *restrict out, int len1, int len2)
{
for (int i = 0; i < len1; i++)
for (int j = 0; j < len2; j++)
out[i*len1 + j] = in[i*len1 + j];
}
will be transformed to be equivalent to the following SE configuration on C7504 after being vectorized:
__SE_TEMPLATE_v1 tmplt = __gen_SE_TEMPLATE_v1();
tmplt.ICNT0 = 32;
tmplt.ICNT1 = (len2>>5)+((len2&0x1f) != 0);
tmplt.DIM1 = 32;
tmplt.ICNT2 = len1;
tmplt.DIM2 = len1;
tmplt.VECLEN = __SE_VECLEN_32ELEMS;
tmplt.DIMFMT = __SE_DIMFMT_3D;
7.2 Legality and correctness
The following will not be transformed due to len1 and len2 potentially not fitting in the 32 bit fields of the SE and the loop counters exceeding 32 bit values:
void example2(char *in, char *restrict out, long len1, long len2)
{
for (long i = 0; i < len1; i++)
for (long j = 0; j < len2; j++)
out[i*len1 + j] = in[i*len1 + j];
}
For situations such as above, addressing patterns will almost always map to a stream in practice although edge cases may be possible. Such cases include, but are not limited to:
- ICNT values exceeding the range of unsigned 32 bit.
- DIM values exceeding the range of signed 32 bit.
- Additions or multiplies in addressing exceeding the range of signed 32 bit.
- Addressing exceeding the range of INT_MIN to INT_MAX elements.
The --assume_addresses_ok_for_stream
option is available to allow the compiler to ignore edge cases such as those above. Using this option will allow example2
to be transformed in the same way as example1
.
If the --auto_stream=no_saving
option is used when an SE or SA is open when a function call is made, incorrect code may generated. In this case, the state of the SE or SA that is open will be lost if that SE or SA is used automatically by the compiler.
--auto_stream
may generate incorrect code if L1D is configured and used as SRAM. In this case, attempting to use the SE to access L1D will fail.
7.3 Profitability and tuning
Automatic use of the SE and SA will only occur if the compiler believes transforming within a loop or loop nest to be profitable, which is primarily related to loop iteration counts. As such, using #pragma PROB_ITERATE
and #pragma MUST_ITERATE
will help guide this transformation.
Additionally, the compiler will not use an SE or SA if an SE or SA is already used in a function.
#pragma FUNCTION_OPTIONS
may be used to control the behavior of automatic SE and SA on a function-by-function basis. For example, #pragma FUNCTION_OPTIONS("--auto_stream=no_saving --assume_addresses_ok_for_stream")
could be used to enable automatic SE and SA for a single function on C7100.
8 Removed MMA features
In version 4.0.0, the enum values __MMA_OPEN_FSM_MINRESET
and __MMA_OPEN_FSM_MAXRESET
, used as the third argument to the __HWAOPEN intrinsic, have been removed from the c7x_mma.h header.
9 Vector typename conflicts
Version 5.0.0.LTS adds 1024 bit vector types: char128
, short64
, int32
, and their unsigned variants, along with float32
, cchar64
, cshort32
and cint16
. Most of these will probably not lead to typename conflicts aside from int32
. Some projects may fail to compile due to typename conflicts between the new int32
built-in type and a possibly project-defined int32
. The recommended solution to such a problem is to add the --vectypes=off
command-line flag to cl7x
. This will cause the built-in vector types to be accessible only by __char128
or __int32
, rather than char128
or int32
.
Additionally, 5.0.0.LTS adds the equivalent of --vectypes=off
to host emulation. When TI_VECTYPES_OFF
is defined, such as with g++ -DTI_VECTYPES_OFF
, vector types will only be accessible by __int32
or similar, rather than int32
.
10 Moving data between vector predicate to/from general purpose registers
The original intrinsics for moving data between vector predicate registers and general purpose registers were vpred_t _mvrp(long)
and long _mvpb(vpred)
. These intrinsics (and their variants) are now deprecated.
As of version 5.0.0, new APIs will use an unsigned char vector type rather than a long
. This vector type must have the same number of bits as the vector predicate register for the target in question. That is, uchar4
for 7504; uchar8
for 7100, 7120; etc.
#if __C7X_VEC_SIZE_BITS__ == 256
__uchar4 __OVBIF __vpred_to_uchar_vec(__vpred);
#elif __C7X_VEC_SIZE_BITS__ == 512
__uchar8 __OVBIF __vpred_to_uchar_vec(__vpred);
#endif
#if __C7X_VEC_SIZE_BITS__ == 256
__uchar4 __OVBIF __vpred_to_uchar_vec(__vpred);
#elif __C7X_VEC_SIZE_BITS__ == 512
__uchar8 __OVBIF __vpred_to_uchar_vec(__vpred);
#endif
11 New performance optimization that may affect code size
The 5.0.0 version of the C7000 compiler adds an optimization that improves performance in some cases, but may increase code size.
This optimization, called “run-time alias disambiguation,” attempts to significantly improve the performance of certain loops when the restrict keyword is not used. However, it may result in duplicated loops, which will increase code size.
Users that observe undesired code size growth may opt to use -mf3 or lower on those files that are affected. Run-time alias disambiguation is not performed by the compiler at –opt_for_speed=3 or lower. Also, users may want to investigate the possibility of using the restrict keyword on the pointers that are used in the affected loops, in order to prevent the compiler from needing to perform run-time alias disambiguation. Refer to the C7000 Optimization Guide for more information on the use of the –opt_for_speed/-mf option to control the code-size and performance tradeoff on C7000.
12 Permute behavior
As of version 5.0.0, permute intrinsics in c7x.h, such as __permute()
or __permute_even_even_int()
, will only specify/guarantee behavior for input index values that are in range for the input data vector. For example, an index of 63
is specified on C7100, but an index of 64
is not. If the out-of-range behavior of an underlying instruction is needed, such as VPERM
or VPERMEEW
, then the corresponding direct intrinsic should be used from c7x_direct.h. For example, __vperm_yvv()
or __vpermeew_yvvv()
.
13 Resolved defects
Resolved defects in v5.0.0:
ID | Summary |
---|---|
CODEGEN-12663 | Compiler may generate incorrect code for ternary max idiom with integers |
CODEGEN-12527 | Compiler may optimize incorrectly when assigning boolean vector result of a vector compare to a non-variable left hand side (e.g. a dereferenced pointer, or struct field) |
CODEGEN-12473 | Unaligned vector accessor may lead to abnormal optimizer termination |
CODEGEN-10115 | Restrict parameter to inlined function causes software pipelined loop to have a loop carried dependency |
14 Known defects
The up-to-date known defects in v5.0.0 can be found here (dynamically generated):
End Of File