3. Profiling

Profiling is used to focus optimization efforts on functions that account for a majority of the runtime.

There are different approaches to profiling:

  • Code Composer Studio™ (CCS) Profile clock feature

  • CPUTimer

  • CPUTimer and Function entry/exit hooks

  • Toggle GPIO pin

3.1. CCS Profile Clock

The Code Composer Studio Profile Clock feature can be used to count the number of cycles from one breakpoint to the next. This is a quick way to determine the cycles taken by an arbitrary region of code.

3.2. CPUTimer

Using the CPUTimer is a programmatic approach to determining the number of cycles between any two points in the code. For example, Listing 3.3 illustrates how to determine the number of cycles taken by the loop using the CPUTimer.

Listing 3.1 CPUTimer header file (cycle_counter.h)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
#ifndef _CYCLE_COUNTER_H_
#define _CYCLE_COUNTER_H_

#include <stdint.h>

void     CycleCounter_Init(void);
uint32_t CycleCounter_Read(void);
void     CycleCounter_Stop(void);

#endif
Listing 3.2 CPUTimer source file (cycle_counter.c)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#include "common/inc/cycle_counter.h"
#include "driverlib/cputimer.h"

#define CPUTIMER_MAX_PERIOD (0xFFFFFFFFUL)

void CycleCounter_Init(void)
{
    CPUTimer_clearOverflowFlag(CPUTIMER1_BASE);

    CPUTimer_setPeriod(CPUTIMER1_BASE, CPUTIMER_MAX_PERIOD);

    CPUTimer_setPreScaler(CPUTIMER1_BASE, 0UL);

    CPUTimer_reloadTimerCounter(CPUTIMER1_BASE);

    CPUTimer_stopTimer(CPUTIMER1_BASE);

    CPUTimer_startTimer(CPUTIMER1_BASE);
}

void CycleCounter_Stop(void)
{
    CPUTimer_stopTimer(CPUTIMER1_BASE);
}

// With higher levels of optimization, it’s possible that the compiler moves
// application code before the first call to CycleCounter_Read() or after the
// second call to CycleCounter_Read(). This can result in the reported cycle
// count being lower than the actual cycle count.
// Disabling inlining of CycleCounter_Read prevents this from occurring.
#pragma FUNC_CANNOT_INLINE(CycleCounter_Read)
uint32_t CycleCounter_Read(void)
{
    return CPUTIMER_MAX_PERIOD - CPUTimer_getTimerCount(CPUTIMER1_BASE);
}

Note

  • For simplicity, this implementation does not consider overflow.

  • For details on the DriverLib CPU Timer functions, refer to the DriverLib User’s Guide for the device. E.g. the User’s Guide for F28004x is available at <C2000Ware install dir>/device_support/f28004x/docs/F28004x_DriverLib_Users_Guide.pdf

Listing 3.3 CPUTimer Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
    CycleCounter_Init();

    uint32_t start    = CycleCounter_Read();
    uint32_t overhead = CycleCounter_Read() - start;

    start = CycleCounter_Read();

    int i;
    for (i=0; i < 100; i++)
        asm("\tNOP;");

    uint32_t time = CycleCounter_Read() - start - overhead;

    printf("Cycles: %ld\n", time);

3.3. CPUTimer with Function Entry/Exit Hooks

The CPUTimer can be combined with the Function Entry/Exit Hooks feature available in the compiler to generate a quick profiler.

When the Entry/Exit hooks feature is enabled using the --entry_hook and --exit_hook options, the compiler inserts a call to an entry hook on entry to each function in the program. The compiler also inserts a call to a exit hook on exit of each function. Refer to the TMS320C28x Optimizing C/C++ Compiler User’s Guide for details - Section 2.14, Enabling Entry Hook and Exit Hook Functions.

Listing 3.4 illustrates using the entry and exit hooks along with the CPUTimer to implement a simple profiler.

Listing 3.4 Using extry and exit hooks to implement a profiler
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#include "common/inc/cycle_counter.h"
#include "common/inc/profiling_hooks.h"
#include <stdio.h>

#define MAX_ENTRIES (64)

// Indicate if the timestamp is associated with function entry or exit
typedef enum { PD_ENTRY=0, PD_EXIT } PD_Mode;

// Struct for data associated with a single timestamp
typedef struct {
    uint32_t function_address;
    uint32_t timestamp;
    PD_Mode  mode;
} ProfileData;

// Array to store profile data
ProfileData table[MAX_ENTRIES];
int         index = 0;

// Entry hook function used to record cycle count on entry into function
void __entry_hook(void (*addr)())
{
    if (index >= MAX_ENTRIES) return;

    table[index].function_address = (uint32_t)addr;
    table[index].mode             = PD_ENTRY;
    table[index].timestamp        = CycleCounter_Read();
    index++;
}

// Exit hook function used to record cycle count on exit from function
void __exit_hook(void (*addr)())
{
    if (index >= MAX_ENTRIES) return;

    table[index].timestamp        = CycleCounter_Read();
    table[index].function_address = (uint32_t)addr;
    table[index].mode             = PD_EXIT;
    index++;
}

Files with functions that need to be profiled are compiled using the --entry_hook --entry_parm=address --exit_hook --exit_parm=address options.

When an application is built with the entry/exit hooks in Listing 3.4, the table is populated with profile data. For example, the code snippet in Listing 3.5 results in the following table:

Listing 3.5 Example using the hook functions for profiling
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
int main()
{
    ProfileData_init();

    foo();

    ProfileData_print();

    return 0;
}

void foo()
{
    int i;
    for (i=0; i < 100; i++)
        asm("\tNOP;");

    bar();
    bar();
}

void bar()
{
    int i;
    for (i=0; i < 100; i++)
        asm("\tNOP;");
}
0x00a647, 0, 25
0x00a637, 0, 562
0x00a637, 1, 1090
0x00a637, 0, 1134
0x00a637, 1, 1662
0x00a647, 1, 1697