5. Common issues with optimizations

A potential scenario during development is that the application works when compiler optimizations are disabled (-Ooff), but fails with higher levels of optimization (-O1, -O2, -O3 or -O4). Typical reasons for this include:

  • Access to shared data from main program and Interrupt Service Routines (ISRs)

    • Volatile qualifiers

    • Atomic updates

  • Accessing memory mapped peripheral registers without volatile

  • Calling asm functions from C code without following C conventions

  • Uninitialized variables

5.1. Shared Data

  • Any global variable that is read/written by main() and one or more ISRs must be annotated volatile

  • Volatile indicates to the compiler that the variable might be modified by something external to the obvious flow of the program such as an ISR

  • This ensures the compiler preserves the number of volatile reads and writes to the global variable exactly as written in C/C++ code. The compiler will not:

    • Eliminate redundant reads or writes

    • Re-order accesses

Table 2.13 illustrates the need for volatile when optimizations are enabled. Without the volatile qualifier on flag, the compiler will remove the if block in main() because its analysis indicates that flag is always 0 and the if condition is always false. volatile indicates to the compiler that something outside of main(), in this case the ISR, can update flag.

Table 5.1 Use of volatile

Main application

Interrupt Service Routine

volatile int flag;
int x;

int main()
{
    flag = 0;
    ...

    if (flag == 1)
        x++;

    ...
}
extern int flag;
interrupt void ISR(void)
{
    ...
    flag = 1;
    ...
}

5.2. Peripheral access

  • The volatile keyword must be used when accessing memory locations that represent memory mapped peripherals.

  • Such memory locations might change value in ways that the compiler cannot predict.

  • This ensures the compiler preserves reads and writes to memory exactly as written in the C code.

  • A missing volatile qualifier can result in the compiler incorrectly optimizing away or reordering reads/writes.

Listing 5.1 Using volatile for peripheral register access
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
    static inline void
    GPIO_writePin(uint32_t pin, uint32_t outVal)
    {
        volatile uint32_t *gpioDataReg;
        uint32_t pinMask;

        //
        // Check the arguments.
        //
        ASSERT(GPIO_isPinValid(pin));

        gpioDataReg = (uint32_t *)GPIODATA_BASE +
                      ((pin / 32U) * GPIO_DATA_REGS_STEP);

        pinMask = (uint32_t)1U << (pin % 32U);

        if(outVal == 0U)
        {
            gpioDataReg[GPIO_GPxCLEAR_INDEX] = pinMask;
        }
        else
        {
            gpioDataReg[GPIO_GPxSET_INDEX] = pinMask;
        }
    }

5.3. Atomic access

  • 16-bit reads/writes are atomic.

  • 32-bit float reads/writes are atomic except: writing a 32-bit float constant is only atomic if performed with a single opcode.

  • 32-bit integer reads/writes:

  • 32-bit reads/writes that use a single opcode are atomic.

  • Atomic accesses within an ISR: By default, accesses within an ISR are atomic. The INTM bit is automatically set (disable interrupts) by the hardware during the context switch. The exception would be if the application re-enables interrupts within the ISR in order to nest interrupts.

  • If possible, group atomic accesses together or create a function to perform the sequence disable-interrupts/atomic-accesses/enable-interrupts.

  • For writes to global variables larger than 32 bits (64 bit long double, structures) disable/re-enable interrupts around the write. This ensures the writer updates the entire variable before the reader accesses it and avoids leaving the variable in an inconsistent or incomplete state.

For other atomic operations, there are two recommended approaches:

  • Use an atomic compiler intrinsic if one is available. These are documented in the compiler user’s guide (www.ti.com/lit/SPRU514). The description will say “in an atomic way”.

  • Disable / enable interrupts around atomic operations using below intrinsics:

    __disable_interrupts( ); __enable_interrupts( );

Listing 2.3 is a code snippet from the Digital Control Library in C2000Ware illustrating disabling interrupts around updates to a struct to ensure atomic updates to the entire structure. Refer to the TMS320C28x Optimizing C/C++ Compiler User’s Guide, Table 7-6, TMS320C28x C/C++ Compiler Intrinsics for details on the __enable_interrupt() and __disable_interrupt() intrinsics.

Listing 5.2 Disable interrupts to ensure atomic struct update
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
    uint16_t val = __disable_interrupts();

    p->Kp = p->sps->Kp;
    p->Ki = p->sps->Ki;
    p->Kd = p->sps->Kd;
    p->Kr = p->sps->Kr;
    p->c1 = p->sps->c1;
    p->c2 = p->sps->c2;
    p->Umax = p->sps->Umax;
    p->Umin = p->sps->Umin;
    DCL_restoreInts(v);

    // If interrupts were originally enabled, re-enable them
    if (0U == (val & 0x1))
        __enable_interrupts();

5.4. Calling asm functions from C code

Any ASM functions called from C code must follow the C calling and register conventions. Refer to the TMS320C28x Optimizing C/C++ Compiler User’s Guide, Sections 7.2 Register Conventions, 7.3 Function Structure and Calling Conventions and 7.5 Interfacing C and C++ With Assembly Language.

Any violation of these conventions can result in application passing with -Ooff, but failing at higher optimization levels.

5.5. Uninitialized variables

  • Using variables without initialization can lead to undefined behavior

  • The behavior of an application with uninitialized variables can change with optimization levels, making debug difficult

  • Local variables

    • Must be explicitly initialized in the application before any use

  • Global variables

    • C standard specifies that global (extern) and static variables without explicit initializations must be initialized to 0 before the program begins running

    • C runtime initialization behavior differs across COFF and EABI

    • Refer to the TMS320C28x Optimizing C/C++ Compiler User’s Guide for details - Sections 7.10.3 Automatic Initialization of Variables for COFF and 7.10.4 Automatic Initialization of Variables for EABI.

5.6. Interrupts

  • RPT instructions are not interruptible, and can potentially delay or block interrupts from executing * For example, if there is a memcpy() instruction in a background function, and the compiler generates RPT instructions for this function, that section of code will be un-interruptible * If the compiler generates RPT instructions within an ISR, interrupts will be blocked, even if interrupt nesting is enabled * To avoid this issue, there are two compiler options available - –no_rpt which will tell the compiler not to generate RPT instructions, or –rpt_threshold which will limit the number of consecutive RPT instructions generated