4.7. Loop unrolling

Loop unrolling is a technique to improve performance. Small loops are expanded such that an iteration of the loop is replicated a certain number of times in the loop body. The number of times an iteration is replicated is known as the unroll factor.

4.7.1. Benefits

  • Reduce branch overhead This is especially significant for small loops. For example, the loop in Listing 4.10 is unrolled by a factor of 4. From the assembly Listing 4.11, it is evident that the branch overhead is reduced by a factor of 4 (one BANZ for 4 iterations vs. 1 without unrolling).

There are additional benefits on C28x CPUs with FPU support:

  • Generate RPTB for small loops - loop unrolling increases the number of instructions in the loop body and enables the compiler to meet the minimum block size requirements for the RPTB instruction.

  • Improved floating-point performance - loop unrolling can improve performance by providing the compiler more instructions to schedule across the unrolled iterations. This reduces the number of NOPs generated and also provides the compiler with a greater opportunity to generate parallel instructions.

Note

Loop unrolling will result in a code size increase because the compiler replicates the loop body. #pragma UNROLL(1) can be used to prevent the compiler from unrolling the loop.

4.7.2. Performing unrolling

There are two ways in which loop unrolling can be performed:

  1. The compiler can automatically unroll the loop. Listing 4.12 is an example of a loop that is unrolled 4 times by the compiler.

  2. The UNROLL pragma can be used to indicate to the compiler that the loop is a candidate for unrolling. Refer to UNROLL for details.