1. Introduction

This chapter introduces a software development flow that can be used to improve the performance of C code executing on the TMS320C28x CPU in C2000™ MCUs.

1.1. Software development flow

Software development for the C28x CPU can be split into the following phases:

Phase 1

Write, compile and debug the application on a C2000 device. During this phase, compiler optimizations are disabled to provide the best debug experience. The focus of this phase is on functionality and correctness. However, there are some rules to keep in mind at this stage to generate efficient C2000 code and avoid later rework. Refer to Initial Development for details.

Phase 2

Profile the application to determine the regions of code where the application spends a majority of its run time. In some cases, it may be clear that the application spends most of its time in one or two ISRs. In this scenario, profiling can help determine which functions in the ISR account for a majority of the ISR’s runtime.

Profiling is used to focus optimization efforts on the functions which account for a majority of the runtime. There are different approaches to profiling, refer to section Profiling for details.

Phase 3

Optimize the application to meet performance and code size constraints. Typical steps include:

  • Placing the most commonly executed functions and associated data in RAM

  • Enabling the appropriate compiler options:

    • Options to take advantage of optimization passes within the compiler - optimization levels, inlining etc.

    • Options to take advantage of hardware features (FPU, TMU, etc.)

  • Where possible, use optimized libraries from TI (e.g. Digital Control Library)

  • Provide more information to the compiler to help its optimizations (pragmas, restrict, etc.)

  • Use the CLA


Fig. 1.1 Software Development - Profiling and Optimization

For details. refer to Improving performance.

Phases 2 and 3 are iterative. Try an optimization, measure performance/code-size and repeat. It is advisable to set up a self checking application so its correctness can be checked during optimizations.

1.2. Processing elements

C28x CPU

The C28x CPU is a 32-bit fixed-point processor. It incorporates RISC features such as single-cycle instruction execution and register-to-register operations. The modified Harvard architecture of the CPU enables instruction and data fetches to be performed in parallel.

Floating-Point Unit

The FPU extends the capabilities of the C28x fixed-point CPU by adding registers and instructions to support IEEE single-precision floating point operations.


The FPU64 extends the capabilities of the C28x fixed-point CPU by adding registers and instructions to support both IEEE single-precision and double-precision floating point operations.

Trigonometric Math Unit

The TMU extends the capabilities of a C28x+FPU by adding instructions and leveraging existing FPU instructions to speed up the execution of common trigonometric and arithmetic operations.

Viterbi, Complex Math and CRC Unit

The VCU processor extends the capabilities of the C28x CPU by adding registers and instructions to support the following algorithm types: Viterbi decoding, cyclic redundancy check (CRC), complex math.

Further information about the C28x CPU, FPU, TMU and VCU can be found in the following document(s):

Control Law Accelerator (CLA) The Control Law Accelerator is a 32-bit floating point math accelerator that is common on most C2000 MCUs. It aids in the concurrent processing of fast control algorithms. For details on the CLA, refer to the CLA chapter in the device TRM.