6.1. Streaming EngineΒΆ

The C7100 CPU has two streaming engines. A streaming engine is a feature of the C7000 CPU cores that aids in loading data from memory to the CPU. The streaming engines can significantly improve the performance of the memory hierarchy by prefetching data from memory to a location near the CPU. Prefetching data can significantly reduce the time needed to bring data into the CPU. It may also reduce the number of L1 data cache capacity misses as the L1 cache is bypassed for data accessed through the streaming engine.

The streaming engine supports up to a six-dimensional address access pattern. When the performance bottleneck involves reads from memory (if D unit resource bound dominates or cache misses dominate), consider using one or both of the streaming engines if the access pattern to the objects in memory is known in advance. Streaming engines have the greatest effect when used in conjunction with loops that are vectorized by hand. For more information on the streaming engine and code examples, please see the C71x DSP CPU, Instruction Set, and Matrix Multiply Accelerator Technical Reference Manual (SPRUIP0), the C7000 Optimizing C/C++ Compiler User's Guide (SPRUIG8), and the c7x_strm.h file in the include directory of the compiler's installation directory.

As of v4.0.0 of the C7000 compiler, the compiler may automatically use the streaming engine, depending on the situation. See the C7000 Optimizing C/C++ Compiler User's Guide, Section 4.14, for more information.