5.5. Loop Collapsing and Loop CoalescingΒΆ
The compiler attempts to collapse or coalesce nested loops if it is legal and can improve performance. A nested loop is a set of two loops where one loop resides inside of another enclosing loop. Both collapsing and coalescing involve transforming a nested loop into a single loop. Collapsing takes place when there is no code in the outer loop. Coalescing takes place when there is code in the outer loop.
After the two nested loops are combined into one loop, the code that was in the body of the outer loop must be transformed so that it conditionally executes only when necessary. Collapsing and coalescing can have performance benefits because only one pipe-up and pipe-down are executed when the loop nest is executed, instead of a pipe-down and pipe-up of the inner loop every time the outer loop executes when loop coalescing/collapsing is not performed.
In order to perform loop collapsing or loop coalescing, the combined loop must be able to be software pipelined. This means that the loop nest must not contain function calls. The loops must each have a signed counting iterator that iterates a fixed amount each time. That is, the inner loop must not iterate a different number of times depending on which outer loop iteration execution is in. Also, the outer loop must not contain too much code, otherwise the transformation will not improve performance. If the outer loop carries a memory dependence, loop coalescing and loop collapsing likely will not be performed.
When loop collapsing or loop coalescing take place, the
software pipelined loop indicates the beginning loop source
line ("Loop source line
") near the top of the software
information comment block. When this source line number
references an outer loop, this indicates that the inner loop
has been fully unrolled or the compiler has performed loop
coalescing or collapsing. In cases of loop coalescing, the
compiler uses special instructions, such as NLCINIT, TICK,
GETP, and BNL. A description of these hardware features,
encompassing what is known as the "NLC", is beyond the scope of
this document. More details of the NLC may be found in the
C71x DSP CPU, Instruction Set, and Matrix Multiply Accelerator
Technical Reference Manual (SPRUIP0).