5.4. Automatic Inlining

The compiler sometimes takes functions defined in header files and places the code at the call site. This allows software pipelining in an enclosing loop and thus improves performance. The compiler may also do this to eliminate the cost of calling and returning from a function.

In the following example, the add_and_saturate_to_255() function sums two values and caps the sum at 255 if the sum is over 255. This function is called from a function in inlining.cpp, which includes the inlining.h file via a preprocessor #include directive.

// inlining.cpp
// Compile with "cl7x -mv7100 --opt_level=3
//   --debug_software_pipeline --src_interlist"
#include "inlining.h"

void saturated_vector_sum(int * restrict a, int * restrict b,
                          int * restrict out, int n)
{
    #pragma MUST_ITERATE(1024,,)
    #pragma UNROLL(1)
    for (int i = 0; i < n; i++)
    {
        out[i] = add_and_saturate_to_255(a[i], b[i]);
    }
}

// inlining.h
int add_and_saturate_to_255(int a, int b)
{
    int sum = a + b;
    if (sum > 255) sum = 255;

    return sum;
}

In this case, the compiler will inline the call to add_and_saturate_to_255() so that software pipelining can be performed. You can determine that inlining has been performed by looking at the bottom of the generated assembly file. Here, the compiler places a comment that add_and_saturate_to_255() has been inlined. Note that the function's identifier has been modified due to C++ name mangling.

;; Inlined function references:
;; [0] _Z23add_and_saturate_to_255ii

The inlining can also be seen in the generated assembly code, because there is no CALL instruction to a function in the loop. In fact, because of the inlining (and thus the elimination of the call to a function), the loop can be software pipelined. Software pipelining cannot occur if there is a call to another function in the loop. Note that because of code size concerns, not every call that can be inlined will be inlined automatically. See the C7000 Optimizing Compiler User's Guide for more information on inlining.

;*----------------------------------------------------------------------------*
;*        SINGLE SCHEDULED ITERATION
;*
;*        ||$C$C44||:
;*   0              TICK    ; [A_U]
;*   1              SLDW    .D1     *D1++(4),BL0      ; [A_D1] |5|
;*   2              SLDW    .D2     *D2++(4),BL1      ; [A_D2] |5|
;*   3              NOP     0x5     ; [A_B]
;*   8              ADDW    .L2     BL1,BL0,BL1       ; [B_L2] |5|
;*   9              VMINW   .L2     BL2,BL1,B0        ; [B_L2] |5|
;*  10              STW     .D1X    B0,*D0++(4)       ; [A_D1] |5|
;*     ||           BNL     .B1     ||$C$C44||        ; [A_B] |11|
;*  11              ; BRANCHCC OCCURS {||$C$C44||}    ; [] |11|
;*----------------------------------------------------------------------------*

5.4.1. Automatic Inlining and Code Size

Automatic inlining tends to increase performance at the expense of increasing code size. Automatic inlining is progressively more aggressive as the --opt_for_speed option is increased. Therefore, the user is encouraged to use lower --opt_for_speed/-mf levels for any files or functions that contain code where the performance is not important or small code size is desired.