6.2. Streaming Address GeneratorΒΆ

Computing sequences of multidimensional address offsets using deeply nested loops can cause address computation code to be created at each level of the nested loop. This code can prevent certain compiler optimizations and therefore negatively affect performance. The Streaming Address Generator (SA) is a multi-dimensional offset computation engine that can be programmed by the user to generate address offsets. The address offsets obtained from the Streaming Address Generator are then added to a base pointer and subsequently used to load or store data.

Use of a Streaming Address Generator can help limit the number of instructions required to calculate an address used for a load or store instruction in the outer loops of a nested loop. This in turn can allow the compiler to perform loop collapsing or loop coalescing optimizations, which may lead to a larger portion of a nested loop being software pipelined. This can lead to improved performance of the loop.

A Streaming Address Generator is usually used in conjunction with load/store instructions. The Streaming Address Generator can also generate vector predicates that can be used in vector predicated instructions, such as a vector predicated store.

To get a baseline understanding of the Streaming Address Generator and its default API, and to see example code, the reader is encouraged to read first section 4.15 "Streaming Engine and Streaming Address Generator" in the C7000 Optimizing C/C++ Compiler User's Guide (SPRUIG8) and to peruse the c7x_strm.h file in the include director of the compiler installation director. The user can also reference the C71x DSP CPU, Instruction Set, and Matrix Multiply Accelerator Technical Reference Manual (SPRUIP0).

There are four Streaming Address Generators on C7000 variants that are available at the time this was written, named SA0, SA1, SA2 and SA3.

A Streaming Address Generator is controlled by a structure instance that contains several fields. A structure instance is populated with default values by using the __gen_SA_TEMPLATE_v1() intrinsic. This structure record that results can be modified by the programmer to customize the behavior of the Streaming Address Generator. The modified structure record is then passed to an SA open intrinsic, such as __SA0_OPEN or __SA1_OPEN. The Streaming Address Generator (in this case, SA0) can then be used via the __SA0ADV macro and the __SA0 macro. When using C++ and the scalable vector programming model, the strm_agen<0, type>::get_adv(ptr) operator and strm_agen<0, type>::get(ptr) can be used instead. A Streaming Address Generator is closed via the __SA0_CLOSE() operator (in this case, for SA0).

To obtain a vector predicate from the Streaming Address Generator, the __SA0_VPRED macro can be used. When using C++ and the scalable vector programming model, we suggest using strm_agen<0, type>::get_vpred() instead.

A code example that uses the Streaming Address Generator can be found in the Examples chapter, in the section Using the Streaming Address Generator.

As of v4.0.0 of the C7000 compiler, the compiler may automatically use the Streaming Address Generator, depending on the situation. See section Automatic Use of the Streaming Engine and Streaming Address Generator for more information on what compiler options may be needed to enable automatic use of the Streaming Address Generator. Also see the C7000 Optimizing C/C++ Compiler User's Guide (SPRUIG8), Section 4.15, for more information about the Streaming Address Generator.