6.2. Streaming Address GeneratorΒΆ
Computing sequences of multidimensional address offsets using deeply nested loops can cause address computation code to be created at each level of the nested loop. This code can prevent certain compiler optimizations and therefore negatively affect performance. The Streaming Address Generator (SA) is a multi-dimensional offset computation engine that can be programmed by the user to generate address offsets. The address offsets obtained from the Streaming Address Generator are then added to a base pointer and subsequently used to load or store data.
Use of a Streaming Address Generator can help limit the number of instructions required to calculate an address used for a load or store instruction in the outer loops of a nested loop. This in turn can allow the compiler to perform loop collapsing or loop coalescing optimizations, which may lead to a larger portion of a nested loop being software pipelined. This can lead to improved performance of the loop.
A Streaming Address Generator is usually used in conjunction with load/store instructions. The Streaming Address Generator can also generate vector predicates that can be used in vector predicated instructions, such as a vector predicated store.
To get a baseline understanding of the Streaming Address Generator and
its default API, and to see example code, the reader is encouraged to read
first section 4.15 "Streaming Engine and Streaming Address Generator" in the
C7000 Optimizing C/C++ Compiler User's Guide (SPRUIG8) and to peruse the
c7x_strm.h
file in the include
director of the compiler installation
director. The user can also reference the C71x DSP CPU, Instruction Set, and
Matrix Multiply Accelerator Technical Reference Manual (SPRUIP0).
There are four Streaming Address Generators on C7000 variants that are available at the time this was written, named SA0, SA1, SA2 and SA3.
A Streaming Address Generator is controlled by a structure instance that
contains several fields. A structure instance is populated with default
values by using the __gen_SA_TEMPLATE_v1()
intrinsic. This structure
record that results can be modified by the programmer
to customize the behavior of the Streaming Address Generator. The
modified structure record is then passed to an SA open intrinsic, such as
__SA0_OPEN
or __SA1_OPEN
. The Streaming Address Generator (in this case,
SA0) can then be used via the __SA0ADV
macro and the __SA0 macro
. When
using C++ and the scalable vector programming model, the
strm_agen<0, type>::get_adv(ptr)
operator and strm_agen<0, type>::get(ptr)
can be used instead. A Streaming Address Generator is closed via the
__SA0_CLOSE()
operator (in this case, for SA0).
To obtain a vector predicate from the Streaming Address Generator, the
__SA0_VPRED
macro can be used. When using C++ and the scalable vector
programming model, we suggest using strm_agen<0, type>::get_vpred()
instead.
A code example that uses the Streaming Address Generator can be found in the Examples chapter, in the section Using the Streaming Address Generator.
As of v4.0.0 of the C7000 compiler, the compiler may automatically use the Streaming Address Generator, depending on the situation. See section Automatic Use of the Streaming Engine and Streaming Address Generator for more information on what compiler options may be needed to enable automatic use of the Streaming Address Generator. Also see the C7000 Optimizing C/C++ Compiler User's Guide (SPRUIG8), Section 4.15, for more information about the Streaming Address Generator.