5.4. Automatic Inlining¶
The compiler sometimes takes functions defined in header files and places the code at the call site. This allows software pipelining in an enclosing loop and thus improves performance. The compiler may also do this to eliminate the cost of calling and returning from a function.
In the following example, the add_and_saturate_to_255()
function sums two values and caps the sum at 255 if the sum is
over 255. This function is called from a function in
inlining.cpp
, which includes the inlining.h
file via a
preprocessor #include directive.
// inlining.cpp
// Compile with "cl7x -mv7100 --opt_level=3
// --debug_software_pipeline --src_interlist"
#include "inlining.h"
void saturated_vector_sum(int * restrict a, int * restrict b,
int * restrict out, int n)
{
#pragma MUST_ITERATE(1024,,)
#pragma UNROLL(1)
for (int i = 0; i < n; i++)
{
out[i] = add_and_saturate_to_255(a[i], b[i]);
}
}
// inlining.h
int add_and_saturate_to_255(int a, int b)
{
int sum = a + b;
if (sum > 255) sum = 255;
return sum;
}
In this case, the compiler will inline the call to
add_and_saturate_to_255()
so that software pipelining can
be performed. You can determine that inlining has been
performed by looking at the bottom of the generated assembly
file. Here, the compiler places a comment that
add_and_saturate_to_255()
has been inlined. Note that the
function's identifier has been modified due to C++ name
mangling.
;; Inlined function references:
;; [0] _Z23add_and_saturate_to_255ii
The inlining can also be seen in the generated assembly code, because there is no CALL instruction to a function in the loop. In fact, because of the inlining (and thus the elimination of the call to a function), the loop can be software pipelined. Software pipelining cannot occur if there is a call to another function in the loop. Note that because of code size concerns, not every call that can be inlined will be inlined automatically. See the C7000 Optimizing Compiler User's Guide for more information on inlining.
;*----------------------------------------------------------------------------*
;* SINGLE SCHEDULED ITERATION
;*
;* ||$C$C44||:
;* 0 TICK ; [A_U]
;* 1 SLDW .D1 *D1++(4),BL0 ; [A_D1] |5|
;* 2 SLDW .D2 *D2++(4),BL1 ; [A_D2] |5|
;* 3 NOP 0x5 ; [A_B]
;* 8 ADDW .L2 BL1,BL0,BL1 ; [B_L2] |5|
;* 9 VMINW .L2 BL2,BL1,B0 ; [B_L2] |5|
;* 10 STW .D1X B0,*D0++(4) ; [A_D1] |5|
;* || BNL .B1 ||$C$C44|| ; [A_B] |11|
;* 11 ; BRANCHCC OCCURS {||$C$C44||} ; [] |11|
;*----------------------------------------------------------------------------*
5.4.1. Automatic Inlining and Code Size¶
Automatic inlining tends to increase performance at the expense of increasing
code size. Automatic inlining is progressively more aggressive as the
--opt_for_speed
option is increased. Therefore, the user is encouraged to use
lower --opt_for_speed
/-mf
levels for any
files or functions that contain code where the performance is not important or
small code size is desired.