0 Introduction to the C7000 Code Generation Tools v1.3 STS Pre-Release
This release is an STS (Short-Term Support) pre-release. As such, it is not feature complete. Aspects of the support, including names of visible C7x intrinsics, the programming model, and the calling convention and EABI, may change over the course of the releases as feature support improves.
Definitions
Active releases have bug fixes applied pro-actively and patch releases occur on a semi-regular schedule (2-3 months)
Reactive releases only have a subset of bug fixes applied and patch releases occur only when requested or are deemed necessary.
Patch releases only contain bug fixes (no new features)
Short term support (STS) release: All STS branches will be made reactive upon creation. A patch release for this branch will only be created for production stop issues and will only contain fixes for the production stop issues. For all other issues, users are advised to wait for the next STS branch, which may also contain new features. An STS release will occur approximately every 3 months after the first LTS release.
Long term support (LTS) release: The LTS branch will be active upon creation. The branch will be active for at least 2 years. Production stop bugs will be fixed within 15 days of being reported. Planned patch releases expected every 2-3 months to correct any critical bugs within 60 days of being reported. The LTS release is intended for customers to lock down on tools. We will have no more than one LTS per year.
We normally maintain two active branches at any given time. When a new LTS branch is made, the 2nd to last LTS releases will be made reactive.
1 Documentation
The following documents are included to provide information on how to use, program, and migrate to the C7000 CPU.
If submitting a defect report, please attach a scaled-down test case with command-line options and the compiler version number to allow us to reproduce the issue easily.
3 Defect Tracking Database
Compiler defect reports can be tracked at the new Development Tools bug database, SIR. SIR is a JIRA-based view into all public tools defects. The old SDOWP tracking database will be retired.
A my.ti.com account is required to access this page. To find an issue in SIR, enter your defect id in the top right search box once logged in. Alternatively from the top red navigation bar, select “Issues” then “Search for Issues”.
To find an old SDOWP issue, place the SDOWP ID in the search box and use double quotes around the SDOWP ID.
4 SE/SA/MMA Interface Changes
This version of the compiler has a different interface for the streaming engine (SE), streaming address generator (SA) and to a lesser extent, the MMA. These changes involve how to setup and initialize the SE and SA and the enums used in initializing configuration registers and some intrinsics.
Existing code that uses the SE, SA, or MMA will need to be changed to use the new interfaces with this version of the compiler. See the c7x_strm.h, c7x_mma.h, c7x_he_strm.h, and c7x_he_mma.h files in the include directory for the current interface.
Here is a summary of some of the interface changes
The __HWAOP and __HWAOPXFER intrinsics require an enum argument instead of a simple integer constant.
The __HWA_CONFIG_REG name has changed to __HWA_CONFIG_REG_v1 to allow for additional MMA configuration templates in future devices.
Some fields in the __HWA_CONFIG_REG_v1 struct now require enums instead of constants.
It is recommended to use the __gen_HWA_CONFIG_REG_v1 function to set default values for any new instance of the __HWA_CONFIG_REG_v1 struct. There is a similar function, __gen_HWA_OFFSET_REG for initializing instances of the __HWA_OFFSET_REG struct.
SE and SA are now configured using structure types. It is recommended to use __gen_SE_TEMPLATE_v1 and similar to initialize them.
As a temporary migration measure, the option –deprecated_api is available to expose the old API and hide the new API. This option will be removed in a future release.
5 Data Movement Intrinsic Updates
This release implements new data movement intrinsics that break compatibility with release 1.2.0 and previous versions. A data movement intrinsic is defined as an intrisic that moves data without regard for data format, such as a shuffle or deal operation. These intrinsics have been updated to only accept element types that precisely fit the data movement operation. For example, a shuffle operation that operates on 32 bit boundaries no longer accepts char vectors. In addition, any element type that may fit is now available, including complex and floating point types. For reference, the following intrinsics were affected:
__deal_stride2
__deal_stride4
__duplicate
__pack_consec_high_char
__pack_consec_high_int
__pack_consec_high_long
__pack_consec_high_short
__pack_consec_low_char
__pack_consec_low_int
__pack_consec_low_long
__pack_consec_low_short
__pack_even_cross_short
__pack_even_short
__pack_high_char
__pack_high_low_short
__pack_high_short
__pack_low_char
__pack_low_high_char
__pack_low_high_short
__pack_low_int
__pack_low_short
__reverse
__shuffle_stride2
__shuffle_stride2_even_even
__shuffle_stride2_even_odd
__shuffle_stride2_high_high
__shuffle_stride2_low_high
__shuffle_stride2_low_low
__shuffle_stride2_odd_odd
__shuffle_stride4
__swap
This set of intrinsics has been replaced with the following set:
__deal_stride2
__deal_stride4
__duplicate
__duplicate16
__duplicate2
__duplicate32
__duplicate4
__duplciate8
__pack
__pack_consec_high
__pack_consec_low
__pack_even
__pack_even_cross
__pack_high
__pack_high_low
__pack_low
__pack_low_high
__reverse
__shuffle_stride2
__shuffle_stride2_even_even
__shuffle_stride2_even_odd
__shuffle_stride2_high_high
__shuffle_stride2_low_high
__shuffle_stride2_low_low
__shuffle_stride2_odd_odd
__shuffle_stride4
__swap
6 Supported and Unsupported Features
New Features
Host Emulation Additions
The following updates have NOT yet been added to the C7000 Host Emulation Users Guide (SPRUIG6C***.PDF)
Nested subvector accesses (via .lo, .hi, .even, .odd) is limited to depth of 2. This means that you cannot nest subvector accesses more than 2-deep: “vect.lo.lo”. This was necessary to limit the amount of memory space required to support a vector type in C++. As a workaround, if you need to access a subvector deeper than 2, you should use a temporary vector: “uchar8 tmp = vect.lo.lo; dst = tmp.lo”
Complex vectors require a bit more memory because they have more accessors at each nested depth. “cchar32.lo.r, cchar32.r.lo.lo”, etc
Please see the following table to guage how much memory (in bytes) is consumed by each vector type.
All C7x loads now have 6-cycle latency in the compiler; this is particularly important in unprotected mode (pipelined loops) where UNPROT is issued with argument “1”.
Syntax change on Streaming Address Access
From now on, a direct access of the SA simply returns a pointer. “uint16 *ptr = __SA0(uint16, base);" In order to load from the pointer, it is now necessary to dereference the SA pointer just as you would for any other pointer type: “uint16 vector = *__SA0(uint16, base);"
It is no longer necessary to use the address-of operator (&) to extract a pointer to SA data.
Added preliminary automatic vector-predication support into optimization to eliminate loop peeling
Early support for user-level boolean vector types
Note: This support is NOT complete and is not presently optimizable. Continue to use “__vpred" type; intrinsics are documented in c7x_vpred.h
Ensure that speculated scalar loads use the correct mnemonic (e.g. SLDB, SLDH, SLDW, SLDD)
Fix copy-table support
Renamed __convert_booln() intrinsics to __reduce_booln() and __expand_booln(). See c7x.h for details.
Added constant-range checking on all intrinsics that rely on constants
This means that constants supported as arguments on intrinsics must fall within the expected constant range, otherwise a compiler error will be generated.
Added constant folding of vector types
This support also provides minimal support for constexpr vectors when folding occurs.
Reworked directory and file structure for host emulation.
The following header renames have occurred to more closely resemble the toolchain’s header names:
Include directories are now split based on the C7x subtarget. Use the following instead of “include”:
include/C7100
include/C7120
Inclusion of c7x.h now sets subtarget macros, such as C7100 appropriately.
Implementation details for host emulation are now hidden in the ti_he_impl subdirectory to avoid naming conflicts with user applications.
Only c6x_he_migration.h and c7x.h may be included by user applications. This matches the behavior for the toolchain headers.
Supported Intrinsics
The included top-level header files c7x.h and c6x_migration.h list the supported intrinsics for both C7x and C6x, respectively. Note that you must include these header files with your source in order to leverage many of the C7x intrinsics and all of the legacy C6x intrinsics. c7x.h includes other useful header files that document/describe supported intrinsics:
c7x_vpred.h: List of intrinsics supporting low-level “__vpred" vector predicate type.
c7x_direct.h: List of intrinsics that map directly to instructions.
c7x_strm.h: List of intrinsics and flags for C7x Streaming Engine and Stream Address Generator.
c7x_mma.h: List of intrinsics and associated structures and enumerations for the C7x MMA.
c7x_luthist.h: List of intrinsics and flags for C7x Lookup Table / Histogram support.
Unsupported Features
Other performance and codesize optimizations
Automatic leveraging of Streaming Engine and Stream Address Generator
OpenCL-C & LLVM
7 Support for Vector Data Types
The C7000 v1.0.0 STS C/C++ compiler supports the use of OpenCL-like vector data types in C/C++ source files by default.
7.1 Basic Usage
Support for vector data types is available on the C7100 architecure.
Support for vector data types is enabled by default. When the “–vectypes=off” option is specified on the compiler command line, vector types will be interpreted as identifiers rather than types.
Support for vector data types requires the use of the optimizer. That is, the “–vectypes” option must be specified in combination with “-o0”, “-o1”, “-o2”, or “-o3” on the compiler command line.
All of the vector data types and related built-in functions that are supported in the C7x programming model are specified in the “c7x.h” header file that you can find in the “include” sub-directory where your C7000 CGT was installed.
Any C/C++ source file that utilizes vector data types or any of the built-in functions must “#include <c7x.h>” in that source file.
7.2 Vector Data Types and Operations
Vector Data Types
A vector type name is a concatenation of element type name with a number representing vector length. A vector with such type consists of vector length number of vector elements.
The C7x programming model implementation of vector data types and operations follows the OpenCL C language specification very closely. For a more detailed description of OpenCL vector data types and operations, please see “The OpenCL Specification” version 1.2 which is available from the Khronos OpenCL Working Group:
http://www.khronos.org/opencl/
Chapter 6, section 6.1.2 of “The OpenCL Specification” version 1.2 provides a detailed description of the built-in vector data types supported in the OpenCL C programming language.
The C7x programming model provides the following built-in vector data types:
For example, a “uchar8” is a vector of 8 unsigned chars. Its length is 8 and its size is 64 bits.
The C7x programming model also provides an extension to the OpenCL C programming language for representing vectors of complex type. A prefix of ‘c’ is used to indicate a complex type name. Each complex type vector element contains a real part and an imaginary part with the real part occupying the lower address in memory.
Complex element type names and sizes:
cchar complex char type, 16 bits
cshort complex short type, 32 bits
cint complex int type, 64 bits
clonglong complex long long type, 128 bits
cfloat complex float type, 64 bits
cdouble complex double type, 128 bits
Valid lengths for complex type vectors: 1, 2, 4, 8
For example, a “cfloat2” is a vector of 2 complex floats. Its length is 2 and its size is 128 bits. Each “cfloat2” vector element contains a real float and an imaginary float.
Vector Operations: component access
A component access can occur on the left-hand-side (lhs) or right-hand-side (rhs) of an assignment operator. If specified on the lhs of an assignment, each component must be uniquely identifiable.
The C7x programming model implementation supports OpenCL C like swizzle operators:
A suffix of “.x”, “.y”, “.z”, or “.w” can be used to access an element of a vector whose length is <= 4.
A suffix of “.hi” or “.lo” can be used to access the elements in the upper half of a vector (for “.hi”) or the elements in the lower half of a vector (for “.lo”).
Scalar entities or shorter vectors can be concatenated together to form longer vectors. When all of the components involved are constants, the result is a vector literal. Otherwise, the vector’s value is determined at run-time.
Vector literals
(short4)(1, 2, 3, 4);
(float2)(3.2, -2.3);
Vector concatenation
void foo(int a, int b)
{
int2 myvec = (int2)(a, b);
...
}
Vector Operations: conversion and re-interpretation
The C7x programming model includes functions that can convert or re-interpret the elements of one vector type as another vector type.
convert_() can be used to perform an element by element conversion of one vector type object into another vector type object. The source vector type and the destination vector type must be the same length.
void foo(int a, int b)
{
/* initialize a short2 vector from a converted int2 vector */
short2 svec2 = convert_short2((int2)(a, b));
...
}
as_() can be used to re-interpret the original vector type of an object as another vector type. The source type and destination type must be the same size.
Neither the convert_() nor the as_() is available for use with complex types.
Vector Operations: infix operators
When infix operators are applied to vector type objects, the operator is applied element by element. That is, each element in the result vector is the result of applying the infix operator to the corresponding elements in the source vector(s).
Unary:
Negate: -
Bitwise complement: ~
Logical not (integer vectors only): !
int4 pos_i4 = (int4)(1, 2, 3, 4);
int4 neg_i4 = -pos_i4; /* Use of negate operator initializes
* neg_i4 to (-1, -2, -3, -4)
*/
/* On C7100, the compiler will generate a special instruction to */
/* carry out the complex multiply operation and call a built-in (RTS) */
/* function to carry out the divide operation. */
void foo()
{
cfloat2 va = (cfloat2) (1.0, -2.0, 3.0, -4.0);
cfloat2 va = (cfloat2) (4.0, -2.0, -4.0, 2.0);
/* For details about the rules for complex multiplication and */
/* division, please see Annex G of the C99 C language */
/* specification. */
/* vc = < (0.0, -10.0), (-4.0, 22.0)> */
cfloat2 vc = va * vb;
/* vd = < (0.4, -0.3), (-1.0, 0.5)> */
cfloat2 vd = va / vb;
...
}
Vector Operations: built-in functions
Prototypes for all of the vector built-in functions supported in the C7x programming model are listed in the “c7x.h” header file that you can find in the “include” sub-directory where your C7000 CGT package was installed. Please refer to the contents of “c7x.h” for a complete list of the vector built-in functions.
Here is an example which uses vector built-in functions:
8 Removal of MISRA 2004 compiler command-line options
The C7000 C/C++ Compiler does not support MISRA 2004 checking as some other Texas Instruments compilers do. Therefore, the command-line options for MISRA 2004 checking have been removed and are no longer accepted by the compiler.
9 Silicon errata i2117 workaround support
The compiler option “–silicon_errata_i2117” has been added to generate code that automatically works around silicon errata i2117. MMA performance may be negatively impacted by the use of this option in edge cases.