4.12. Vector Programming and the Scalable Vector Programming Model

While the compiler does have powerful auto-vectorization capability, sometimes the compiler cannot always auto-vectorize a loop. Be aware that the compiler may vectorize some operations in a loop, but not others, leading to an inefficient loop. In these instances, it may be best to vectorize the loop by hand using vector types and intrinsics.

If the compiler does not auto-vectorize the code, it may mean that the user will have to vectorize the loop by hand. In this case, the programmer may be able to use vector types, vector predicates, vector operations, and vector intrinsics to dramatically speed up an algorithm.

The C/C++ compiler supports the use of TI vector data types in C/C++ source files. Vector types, vector operations and vector intrinsics allow the programmer to explicitly utilize the vector/SIMD features of the C7000 ISA. By doing so, the programmer can dramatically increase the speed of an algorithm.

4.12.1. Vector Types

Vector data types provide a straightforward way to store multiple data values that then can be used with vector operations and vector intrinsics. The vector operations and vector intrinsics map to vector/SIMD instructions on the C7000 architecture.

Vector data types are similar to an array, in that a vector contains a specified number of elements of a specified type. The allowable number of elements will vary based on the C7000 variant. For variants with 512-bit vectors, the number of elements can only be one of 2, 3, 4, 8, 16, 32, or 64 and the element size multiplied by the number of elements can't exceed the maximum vector length of the C7000 variant. Wherever possible, operators and intrinsics that act upon vectors are optimized to make use of efficient single instruction, multiple data (SIMD) instructions on the device.

As an example, the __int16 type is 512 bits wide and holds 16 int values.

On the 7100 and 7120 variants of the C7000 architecture, the maximum vector size is 512 bits, so __int16 can reside in one vector register. However, on the 7504 and 7524 variants, the maximum vector size is 256 bits so an __int8 will reside in one vector register.

Using __int16 on 7504 and 7524 variants will result in a compiler process called "bisection". Bisection is a process where the compiler will divide expressions that have values that are too large to be represented in a single vector register. This can result in significantly reduced performance.

Therefore the user should only use vector types that fit into the particular vector width of the C7000 variant that is being used.

More information on vector types can be found in the C7000 Optimizing Compiler User's Guide in the section "Vector Data Types".

4.12.2. Scalable Vector Programming Model

There are different C7000 variants, and as a result, there are differing vector lengths on C7000 variants. On some variants, a vector can be up to 512 bits and on other variants, a vector can be up to 256 bits. There may be other C7000 variants with different maximum vector sizes in the future.

Therefore, it can be very helpful if there was a way to write vector code in a vector-length agnostic way. Put another way, it would be useful if a programmer could write the C++ code for a particular algorithm once, and it would automatically compile and run on each C7000 variant without changes to the C++ code, using the maximum vector size that is possible on that C7000 variant.

To support this paradigm, there is a feature of the C7000 C++ Compiler and C7000 Host Emulation called the Scalable Vector Programming Model. The Scalable Vector Programming Model consists of Scalable Vector Types and associated C++ type traits. This programming model is available from the C7000 Compiler (using C++ source) and C7000 Host Emulation (again, using C++ source). The same C++ source code can be used with Host Emulation and the C7000 C++ Compiler.

4.12.2.1. Scalable Vector Types

Scalable Vector Types, along with associated C++ traits, allow the programmer to write their code in such a way as to ensure the code will compile and run seamlessly on all C7000 variants and on the host computer using Host Emulation. Scalable vector types can only be used in C++ code, they cannot be used in C code.

When a scalable vector type is used, the size of the type will depend on the C7000 variant being compiled for (the element type will stay the same). For example, the c7x::char_vec type will be 64 elements or 512 bits in length on 7100 and 7120, but only 32 char elements on 7504 and 7524 variants.

Let's look at a simple example of using a scalable vector type. Below, we show an example of a function accepting two integer vectors, adding them element-wise, and returning an integer vector. C7000 scalable vector types can be accessed by including the c7x_scalable.h file in your source file.

#include <c7x.h>
#include <c7x_scalable.h>

c7x::int_vec add_two_int_vectors(c7x::int_vec a, c7x::int_vec b)
{
    return a + b;
}

In the example above, c7x::int_vec will be 16 elements (512 bits) on 7100 and 7120 variants, and 8 elements (256 bits) on 7504 and 7524 variants.

A listing of all of the full vector types are listed below.

  • bool_vec

  • char_vec

  • uchar_vec

  • short_vec

  • ushort_vec

  • int_vec

  • uint_vec

  • long_vec

  • ulong_vec

  • float_vec

  • double_vec

  • cchar_vec

  • cshort_vec

  • cint_vec

  • cfloat_vec

  • clong_vec

  • cdouble_vec

There are also half-width and quarter-width vector types.

  • char_hvec

  • short_hvec

  • cfloat_hvec

  • etc.

  • char_qvec

  • short_qvec

  • cfloat_qvec

  • etc.

See the c7x_scalable.h file in the include directory in the C7000 compiler installation directory for more details and a full list of the scalable vector types.

4.12.2.2. Vector Type Traits (Queries and Construction)

In addition to scalable vector types, there are also C++ type traits to help the programmer make queries about the nature of a scalable vector type, thereby allowing the programmer to craft loops that adapt to the size of the C7000 variant being used. These traits can also help the programmer craft the appropriate type for the situation.

A full list of scalable vector type traits can be found in the C7000 C/C++ Optimizing Compiler User's Guide, in the section "C7000 Scalable Vector Programming" and also in the c7x_scalable.h file in the include directory of the C7000 compiler installation. These type traits include query and construction type traits.

When using the scalable vector programming model, it is common for the programmer to want to know the number of elements in a given scalable vector type. The c7x::element_count_of<T>::value type trait can be used to obtain this information.

The following example will print "16" for 7100 and 7120 C7000 variants and "8" for 7504 and 7524 variants.

#include <c7x_scalable.h>
    ...
    printf("elements in int_vec: %ld\n",
           c7x::element_count_of<c7x::int_vec>::value);
    ...

The c7x::element_count_of<T>::value trait can be used in the following way to copy elements from one array to another in a C7000-variant agnostic way.

void memcpy_scalable_v1 (const c7x::int_vec *restrict in,
                         c7x::int_vec *restrict out,
                         int len /* bytes */)
{
    // Find the number of vector loads/stores needed to copy the
    // buffer. This code assumes the length of the array to be copied
    // is evenly divisible by the size of c7x::int_vec!
    int cnt = len / c7x::element_count_of<c7x::int_vec>::value;

    // Perform the copy
    for (int i = 0; i < cnt; i++)
    {
        out[i] = in[i];
    }
}

The development of this example continues in chapter 7 where we use a vector predicate intrinsic to take care of the remainder, and then expand the example to use the capabilities of the C7000's Streaming Address Generator and Streaming Engine.

Construction type traits allow the user to create vector types. For example, the c7x::make_vector<T, int> type trait allows the user to create a vector type from a type and a number of elements:

typedef c7x::make_vector<int, 16>::type // yields int16

There is also a c7x::make_full_vector<T>::type type trait where the input is simply an element type. The resulting type is a vector sized to the target width:

typedef c7x::make_full_vector<int>::type // yields int16 on C7100