7.2. Using the Streaming Address GeneratorΒΆ

The Streaming Address Generator was discussed the section Streaming Address Generator.

A Streaming Address Generator is controlled by a structure instance that contains several fields. A structure instance is populated with default values by using the __gen_SA_TEMPLATE_v1() intrinsic. This structure record that results can be modified by the programmer to customize the behavior of the Streaming Address Generator. The modified structure record is then passed to an SA open intrinsic, such as __SA0_OPEN or __SA1_OPEN. The Streaming Address Generator (in this case, SA0) can then be used via the __SA0ADV macro and the __SA0 macro. When using C++ and the scalable vector programming model, the strm_agen<0, type>::get_adv(ptr) operator and strm_agen<0, type>::get(ptr) can be used instead. A Streaming Address Generator is closed via the __SA0_CLOSE() operator (in this case, for SA0).

To obtain a vector predicate from the Streaming Address Generator, the __SA0_VPRED macro can be used. When using C++ and the scalable vector programming model, we suggest using strm_agen<0, type>::get_vpred() instead.

Let us modify our memory copy example from the previous section to utilize the Streaming Address Generator. We're going to use the __gen_SA_TEMPLATE_v1() function to get a default __SA_TEMPLATE_v1 structure that we'll modify and then use to set-up and open Streaming Address Generator SA0 using __SA0_OPEN.

In the loop, instead of creating a vector predicate with a __mask_int intrinsic as in the previous section, we'll obtain a vector predicate from SA0 using the C++ function c7x::strm_agen<0, T>::get_vpred(). Next, we obtain an address that we will use for a predicated store by using the C++ intrinsic c7x::strm_agen<0, T>::get_adv(ptr) where ptr is the base pointer of our store location. This code adds the current address offset of SA0 to ptr to get our store address.

Then, a __vstore_pred intrinsic is used to conditionally store individual bytes that we have loaded from the source pointer into the store location, based on the vector predicate.

When len > 1, the vector predicate will be all 1's, except for the last iteration of the loop, indicating all bytes should be stored. On the last iteration, depending on how len is divided by the number of elements in the vector, the vector predicate may only contain 1's in the first bits and 0's in the rest, indicating which bytes are to be conditionally stored to the copy destination.

#include <c7x.h>
#include <c7x_scalable.h>

void memcpy_scalable_v3 (const c7x::int_vec *restrict in,
                         c7x::int_vec *restrict out,
                         int len)
{
    // Find the maximum number of vector loads/stores needed to copy the
    // buffer.
    int cnt = len / c7x::element_count_of<c7x::int_vec>::value;
    cnt += (len % c7x::element_count_of<c7x::int_vec>::value > 0);

    // Generate a Streaming Address Generator setup template with default
    // values
    __SA_TEMPLATE_v1 out_tmplt = __gen_SA_TEMPLATE_v1();

    // Obtain the __SA_VECLEN enumeration value that indicates to the
    // streaming address generator the number of elements in a vector.
    // Use this value to set the VECLEN member of the SA setup record.
    out_tmplt.VECLEN = c7x::sa_veclen<c7x::int_vec>::value;

    // Modify the SA setup record to indicate to the SA how many total
    // elements we want to cover (in the first and only dimension).
    // Note that this does not need to be a multiple of the number of
    // elements in a vector.
    out_tmplt.ICNT0 = len;

    // Tell the streaming address generator the pattern is 1-dimensional
    out_tmplt.DIMFMT = __SA_DIMFMT_1D;

    // Open the streaming address generator 0 (SA0)
    __SA0_OPEN(out_tmplt);

    // Perform the copy, including any remainder
    int i;
    for (i = 0; i < cnt; i++)
    {
        // Load an int vector's worth of data from the array "in"
        c7x::int_vec data = in[i];

        // Obtain a vector predicate from the streaming address generator 0
        // (SA0).
        __vpred pred = c7x::strm_agen<0, c7x::int_vec>::get_vpred();

        // Obtain an address for the location we will store to next
        // by obtaining the offset of the SA0 and adding it to the
        // address "out" by using the strm_agen get_adv() operator.
        // get_adv() also advances SA0 to the next offset.
        c7x::int_vec * addr = c7x::strm_agen<0, c7x::int_vec>::get_adv(out);

        // Store the data into the location in out, possibly predicated
        // based on the addressing pattern in SA0
        __vstore_pred(pred, addr, data);
    }

    __SA0_CLOSE();
}