6.1. Streaming EngineΒΆ
The C7100 CPU has two streaming engines. A streaming engine is a feature of the C7000 CPU cores that aids in loading data from memory to the CPU. The streaming engines can significantly improve the performance of the memory hierarchy by prefetching data from memory to a location near the CPU. Prefetching data can significantly reduce the time needed to bring data into the CPU. It may also reduce the number of L1 data cache capacity misses as the L1 cache is bypassed for data accessed through the streaming engine.
The streaming engine supports up to a six-dimensional address
access pattern. When the performance bottleneck involves reads
from memory (if D unit resource bound dominates or cache misses
dominate), consider using one or both of the streaming engines
if the access pattern to the objects in memory is known in
advance. Streaming engines have the greatest effect when used
in conjunction with loops that are vectorized by hand. For more
information on the streaming engine and code examples, please
see the C71x DSP CPU, Instruction Set, and Matrix Multiply
Accelerator Technical Reference Manual (SPRUIP0), the C7000
Optimizing C/C++ Compiler User's Guide
(SPRUIG8), and the
c7x_strm.h
file in the include
directory of the
compiler's installation directory.
As of v4.0.0 of the C7000 compiler, the compiler may automatically use the streaming engine, depending on the situation. See the C7000 Optimizing C/C++ Compiler User's Guide, Section 4.14, for more information.