7.2. Using the Streaming Address GeneratorΒΆ
The Streaming Address Generator was discussed the section Streaming Address Generator.
A Streaming Address Generator is controlled by a structure instance that
contains several fields. A structure instance is populated with default
values by using the __gen_SA_TEMPLATE_v1()
intrinsic. This structure
record that results can be modified by the programmer
to customize the behavior of the Streaming Address Generator. The
modified structure record is then passed to an SA open intrinsic, such as
__SA0_OPEN
or __SA1_OPEN
. The Streaming Address Generator (in this case, SA0) can
then be used via the __SA0ADV
macro and the __SA0 macro
. When using
C++ and the scalable vector programming model, the
strm_agen<0, type>::get_adv(ptr)
operator and strm_agen<0, type>::get(ptr)
can be used instead. A Streaming Address Generator is closed via the
__SA0_CLOSE()
operator (in this case, for SA0).
To obtain a vector predicate from the Streaming Address Generator, the
__SA0_VPRED
macro can be used. When using C++ and the scalable vector
programming model, we suggest using strm_agen<0, type>::get_vpred()
instead.
Let us modify our memory copy example from the previous section to utilize
the Streaming Address Generator. We're going to use the
__gen_SA_TEMPLATE_v1()
function to get a default __SA_TEMPLATE_v1
structure that we'll modify and then use to set-up and open Streaming Address
Generator SA0 using __SA0_OPEN
.
In the loop, instead of creating a vector predicate with a __mask_int
intrinsic as in the previous section, we'll obtain a vector predicate from SA0
using the C++ function c7x::strm_agen<0, T>::get_vpred()
. Next, we
obtain an address that we will use for a predicated store by using the C++
intrinsic c7x::strm_agen<0, T>::get_adv(ptr)
where ptr
is the base
pointer of our store location. This code adds the current address offset of
SA0 to ptr to get our store address.
Then, a __vstore_pred
intrinsic is used to conditionally
store individual bytes that we have loaded from the source pointer
into the store location, based on the vector predicate.
When len > 1
, the vector predicate will be all 1's, except for the last
iteration of the loop, indicating all bytes should be stored. On the last
iteration, depending on how len
is divided by the number of elements
in the vector, the vector predicate may only contain 1's in the first
bits and 0's in the rest, indicating which bytes are to be
conditionally stored to the copy destination.
#include <c7x.h>
#include <c7x_scalable.h>
void memcpy_scalable_v3 (const c7x::int_vec *restrict in,
c7x::int_vec *restrict out,
int len)
{
// Find the maximum number of vector loads/stores needed to copy the
// buffer.
int cnt = len / c7x::element_count_of<c7x::int_vec>::value;
cnt += (len % c7x::element_count_of<c7x::int_vec>::value > 0);
// Generate a Streaming Address Generator setup template with default
// values
__SA_TEMPLATE_v1 out_tmplt = __gen_SA_TEMPLATE_v1();
// Obtain the __SA_VECLEN enumeration value that indicates to the
// streaming address generator the number of elements in a vector.
// Use this value to set the VECLEN member of the SA setup record.
out_tmplt.VECLEN = c7x::sa_veclen<c7x::int_vec>::value;
// Modify the SA setup record to indicate to the SA how many total
// elements we want to cover (in the first and only dimension).
// Note that this does not need to be a multiple of the number of
// elements in a vector.
out_tmplt.ICNT0 = len;
// Tell the streaming address generator the pattern is 1-dimensional
out_tmplt.DIMFMT = __SA_DIMFMT_1D;
// Open the streaming address generator 0 (SA0)
__SA0_OPEN(out_tmplt);
// Perform the copy, including any remainder
int i;
for (i = 0; i < cnt; i++)
{
// Load an int vector's worth of data from the array "in"
c7x::int_vec data = in[i];
// Obtain a vector predicate from the streaming address generator 0
// (SA0).
__vpred pred = c7x::strm_agen<0, c7x::int_vec>::get_vpred();
// Obtain an address for the location we will store to next
// by obtaining the offset of the SA0 and adding it to the
// address "out" by using the strm_agen get_adv() operator.
// get_adv() also advances SA0 to the next offset.
c7x::int_vec * addr = c7x::strm_agen<0, c7x::int_vec>::get_adv(out);
// Store the data into the location in out, possibly predicated
// based on the addressing pattern in SA0
__vstore_pred(pred, addr, data);
}
__SA0_CLOSE();
}