Source-Based Code Coverage with tiarmclang

The TI ARM Clang Compiler (tiarmclang) Tools support the ability to instrument user C/C++ source code to help determine how much of the user’s source code is being executed when tested with a given suite of tests. This is commonly known as test coverage or code coverage. A program that has a high degree of code coverage is likely to contain fewer undetected defects than a program with a low degree of code coverage. Thus, code coverage is useful as a measure of a program’s robustness. In addition to being generally useful for thorough application development, code coverage is required by internal and external developers in the Industrial and Automotive markets for Functional Safety. Code Coverage for Functions (including Function Instantiations), Lines, and Regions has already been supported by the LLVM project.

More specifically, the tiarmclang compiler tools support Source-Based Code Coverage, operating directly on an internal representation of the source code, that is particularly suited for embedded applications. The tiarmclang implementations of Source-Based Code Coverage is derived from the LLVM project implementation (see the LLVM project’s Source-Based Code Coverage page for more information).

Definitions of different types of code coverage information:

  • Function Coverage is the percentage of functions which have been executed at least once. A function is considered to be executed if any of its instantiations are executed.
  • Instantiation Coverage is the percentage of function instantiations which have been executed at least once. Template functions and static inline functions from headers are two kinds of functions which may have multiple instantiations.
  • Line Coverage is the percentage of code lines which have been executed at least once. Only executable lines within function bodies are considered to be code lines.
  • Region Coverage is the percentage of code regions which have been executed at least once. A code region may span multiple lines (e.g in a large function body with no control flow). However, it is also possible for a single line to contain multiple code regions (e.g in “return x || y && z”). For the LLVM project, Region Coverage is equivalent to Statement Coverage provided by other vendors.

In addition, the new tiarmclang compiler tools’ support for Branch Coverage (also known as Branch Condition Coverage) provides a finer level of coverage than that which is provided by other vendors, allowing users to track coverage across leaf-level boolean expressions that comprise larger boolean expressions. This makes it much more informative and useful than Decision Coverage that some other vendors support, which only tracks execution counts for a single control flow decision point, which may be a boolean expression comprised of conditions and zero or more boolean logical operators.

Support for Embedded Use Cases

The tiarmclang compiler tools’ support for source-based code coverage is built upon the code coverage infrastructure implemented in the LLVM project (see LLVM Source-Based Code Coverage for more details), making it more amenable to embedded development. The infrastructure implemented in the LLVM project presented the following issues that resulted in increased code size requirements:

  1. All the coverage information must reside in memory and be written to a file from the target even though only the counters are actually modified during program execution. This means that all additional information is just wasted memory.
  2. The runtime support is very large and includes support for merging counters, using an environment variable to control the output, buffering data for writing, etc.
  3. The counters are 64 bits each which will increase both RAM and ROM usage. Embedded applications can typically get away with 32 bit counters.

The first two issues have been addressed in the tiarmclang compiler tools. By allocating memory space for only the counters and keeping all other coverage related information in non-allocatable sections preserved in the object file itself, target memory is only utilized for incrementing counters. In addition, the runtime support has been reduced considerably to only support writing counters to a file as part of a “baremetal” profiling model. Support for writing a full raw profile file, merging counters, etc, is not included.

Note that instrumentation that is inserted to track the counters will introduce cycle performance and codesize overhead, depending on the size of the program. This is due to the additional instructions needed, number of counters needed, and impact to existing code optimization. Reducing the size of counters will be addressed as a future enhancement for the compiler to decrease the memory footprint introduced by code coverage to better mitigate the codesize overhead.

Effects of Code Optimization

The tiarmclang compiler derives instruction-to-source mappings through the Abstract Syntax Trees during the Code Generation (CodeGen) phase that are eventually lowered to an intermediate representation (LLVM IR), which is where counter instrumentation occurs. Counter instrumentation occurs prior to optimization passes that operate on LLVM IR, so this means that coverage data is very accurate with respect to the source code. Counter increments that would have occured in an unoptimized program occur in the optimized variant. For example, counter mapping regions for an inlined function are created with instrumention prior to inlining. If inlining is performed, the instrumentation is inlined along with it. The resulting execution counts map back to the original source as though the function had never been inlined.

While counter instrumentation is not obstructed by optimization, the presence of counter instrumentation may inhibit certain optimizations

Relevant Tools

In addition to the tiarmclang compiler itself, the tools used to produce and visualize code coverage data are tiarmprofdata and tiarmcov. Please review the support options for each of these tools using the –help option.

The tiarmclang tools’ versions of tiarmprofdata and tiarmcov are based on the LLVM project implementation. You can find more information about these utilities at the following web sites:

Generating Instrumented Binaries

Source code must be built using tiarmclang with -fprofile-instr-generate -fcoverage-mapping options. For example:

tiarmclang -fprofile-instr-generate -fcoverage-mapping foo.cc -o foo

Retrieving the Counters From Memory

Once the executable has been loaded and executed one or more times, the counters should be retrieved from memory and written to a raw profile data file on the host. Counters are stored in an allocated memory section named __llvm_prof_cnts, and this section is demarcated with the start and stop symbols, __start__llvm_prf_cnts and __stop__llvm_prf_cnts, which can allow the target memory to be read from the host. The data retrieved in memory should be saved to a file, and this file is the raw profile counter file.

Retrieving counters from memory can be done in Code Composer Studio (CCS) using the following example script, which can be pasted into the CCS scripting console:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
var scriptEnv = Packages.com.ti.ccstudio.scripting.environment.ScriptingEnvironment.instance();
var server = scriptEnv.getServer("DebugServer.1");
var session = server.openSession("Texas Instruments XDS110 USB DebugProbe_0/CORTEX_M4_0");

var cntStart = session.symbol.getAddress("__start___llvm_prf_cnts");
var cntStop = session.symbol.getAddress("__stop___llvm_prf_cnts");

var cntContent = session.memory.readData(0, cntStart, 8, cntStop - cntStart);

var executable = session.symbol.getSymbolFileName();
var outFile = new Packages.java.io.RandomAccessFile(executable + ".cnt" , "rw");

outFile.setLength(0);
for each (var val in cntContent) {
    outFile.writeByte(Number(val));
}
outFile.close();

This example script will produce a raw profile counter file named after the executable using the “.cnt” suffix.

Alternatively, the counter data can also be retrieved from memory using a function that is provided as part of the compiler runtime support, __llvm_profile_write_file(). This function will write the counters from the target to the host using runtime routines (fwrite()). Any other means of downloading the data may also be used. This will produce a raw profile counter file using the default filename default.profraw.

Processing the Raw Profile Counter Data Into an Indexed Profile Data File

An indexed profile data file should be produced for each executable that is run; it is produced based on a raw profile counter file that has the runtime counter data retrieved from memory (see “Retrieving the Counters From Memory” section above).

This is done by invoking the the tiarmprofdata utility and indicating the raw profile counter file as well as the executable used to produce it. This is required since in order to support embedded use cases, pertinent code coverage information must be extracted from non-allocatable sections in the executable. The result is an indexed profile data file. In the example below, the raw profile counter files used as input are app1.profcnts, app2.profcnts, and app3.profcnts. The resulting indexed profile data file produced for each is app1.profdata, app2.profdata, and app3.profdata, respectively.

tiarmprofdata merge -sparse -obj-file=app1.out app1.profcnts -o app1.profdata
tiarmprofdata merge -sparse -obj-file=app2.out app2.profcnts -o app2.profdata
tiarmprofdata merge -sparse -obj-file=app3.out app3.profcnts -o app3.profdata

An indexed profile data file for each executable must be produced before any profile data from multiple executables can be merged. If multiple executables have been run based on the same source code base, the corresponding indexed profile data files for each of the executables can then be merged into a single indexed profile data file.

tiarmprofdata merge -sparse app1.profdata app2.profdata app3.profdata -o app_merged.profdata

Wildcards can be used to identify the range of indexed profile data files used as input.

Visualization

In order to visualize the code coverage, the single merged indexed profile data file along with each of the corresponding executables must be given as input to the tiarmcov visualization tool. The visualization tool can be used to generate a dump of the source file along with a summary report in either HTML or Text format. The names of each executable must be specified individually by name using the “–object=<executable” option.

HTML Format

When generating HTML output, a summary coverage report is also generated at the root of a directory tree that contains coverage data for each of the files. For the source-based coverage views, it is recommended to use –show-expansions and –show-instantiations options to see the full view of all macro expansions and function template instantiations, respectively. In addition, branch coverage information can be included in the source-based view, and it can be represented in terms of execution count or percentage.

The following example will visualize coverage in HTML with macros and templates expanded; it will also include detailed branch coverage in terms of execution count.

tiarmcov show --format=html --show-expansions --show-instantiations --show-branches=count --object=./app1.out --object=./app2.out --object=./app3.out -instr-profile=app-merged.profdata --output-dir=/example/directory

Text Format

When generating Text output, the summary coverage report is generated using a separate tiarmcov report option. For example, to view the source-based coverage view:

tiarmcov show --show-expansions --show-branches=count --object=./app1.out --object=./app2.out --object=./app3.out -instr-profile=app-merged.profdata

To view the report:

tiarmcov report --object=./app1.out --object=./app2.out --object=./app3.out -instr-profile=app-merged.profdata

Useful Visualization Options

Subcommands

  • export - Export instrprof file to structured format either as text (JSON) or as LCOV.
  • report - Summarize instrprof style coverage information.
  • show - Annotate source files using instrprof style coverage.

Function Filtering Options

  • –ignore-filename-regex=<string> - Skip source code files with file paths that match the given regular expression
  • –line-coverage-gt=<number> - Show code coverage only for functions with line coverage greater than the given threshold
  • –line-coverage-lt=<number> - Show code coverage only for functions with line coverage less than the given threshold
  • –name=<string> - Show code coverage only for functions with the given name
  • –name-regex=<string> - Show code coverage only for functions that match the given regular expression
  • –name-whitelist=<string> - Show code coverage only for functions listed in the given file
  • –region-coverage-gt=<number> - Show code coverage only for functions with region coverage greater than the given threshold
  • –region-coverage-lt=<number> - Show code coverage only for functions with region coverage less than the given threshold

General options

  • –instr-profile=<string> - File with the profile data obtained after an instrumented run
  • –num-threads=<uint> - Number of merge threads to use (default: autodetect)
  • –object=<string> - Coverage executable or object file
  • –output-dir=<string> - Directory in which coverage information is written out
  • –path-equivalence=<string> - <from>,<to> Map coverage data paths to local source file paths
  • –project-title=<string> - Set project title for the coverage report
  • –show-branch-summary - Show branch condition statistics in summary table
  • –show-instantiation-summary - Show instantiation statistics in summary table
  • –show-region-summary - Show region statistics in summary table
  • –summary-only - Export only summary information for each source file

Source-Based Viewing Options (for tiarmcov show)

  • –show-branches=<value> - Show coverage for branch conditions, where <value> can be one of the following:

    • count - Show True/False counts
    • percent - Show True/False percent
  • –show-expansions - Show expanded source regions

  • –show-instantiations - Show function instantiations

  • –show-line-counts - Show the execution counts for each line

  • –show-line-counts-or-regions - Show the execution counts for each line, or the execution counts for each region on lines that have multiple regions

  • –show-regions - Show the execution counts for each region

Important Considerations for Branch Coverage

As documented, Source-Based Code Coverage as implemented in the LLVM project supports function coverage, line coverage, and region coverage (see Source-Based Code Coverage - Interpreting Reports for more details).

Branch Coverage is a new feature added to Source-Based Code Coverage.

  • Some other vendors define Branch Coverage as only covering Decisions that may include one or more logical operators. However, Branch Coverage in the tiarmclang compiler supports coverage for all leaf-level boolean expressions (expressions that cannot be broken down into simpler boolean expressions). For example, “x = (y == 2) || (z < 10)” is a boolean expression that is comprised of two conditions, each of which evaluates to either TRUE or FALSE. This support is functionally closer to GCC GCOV/LCOV support.
  • When showing branch coverage, each TRUE and FALSE condition represents a branch that is tied to how many times its corresponding condition evaluated to TRUE or FALSE. This can also be shown in terms of percentage.
44|      3|    if ((VAR1 == 0 && VAR2 == 2) || VAR3 == 34 || VAR1 == VAR3)
------------------
|  Branch (44:10): [True: 1, False: 2]
|  Branch (44:20): [True: 0, False: 1]
|  Branch (44:31): [True: 0, False: 3]
|  Branch (44:42): [True: 0, False: 3]
------------------
  • When viewing branch coverage details in a source-based visualization, it is recommended that users show all macro expansions (using option –show-expansions), particularly since macros may contain hidden boolean expressions. In addition, macro expansions can be nested (macros are often defined in terms of other macros), as demonstrated in the following example. The coverage summary report will always include these macro-based boolean expressions in the overall branch coverage count for a function or source file.
58|      3|        MACRO2;
------------------
|  |    7|      5|#define MACRO2( MACRO)
|  |  ------------------
|  |  |  |    6|      2|#define MACRO (MACRO_CONDITION ? VAR2 : VAR1)
|  |  |  |  ------------------
|  |  |  |  |  |    5|      2|#define MACRO_CONDITION (VAR1 != 9)
|  |  |  |  |  |  ------------------
|  |  |  |  |  |  |  Branch (5:16): [True: 2, False: 0]
|  |  |  |  |  |  ------------------
|  |  |  |  ------------------
|  |  ------------------
|  |  |  Branch (7:17): [True: 2, False: 0]
|  |  ------------------
------------------
  • Coverage is not tracked for branch conditions that the compiler can fold to TRUE or FALSE since for these cases, branches are not generated. This matches the behavior of other code coverage vendors. In the source-based visualization, these branches will be displayed as [Folded - Ignored] so that users are informed about what happened.
38|      2|    if ((VAR1 == 3) && TRUE)
------------------
|  Branch (38:9): [True: 0, False: 2]
|  Branch (38:24): [Folded - Ignored]
------------------
  • Branch coverage is tied directly to branch-generating conditions in the source code. As such (unlike with GCOV), users should not see hidden branches that aren’t actually tied to the source code.
  • For switch statements, a branch region is generated for each switch case, including the default case. If there is no explicitly defined default case, a branch region is generated to correspond to the implicit default case that is generated by the compiler. The implicit branch region is tied to the line and column number of the switch statement condition (since no source code for the implicit case exists). In the example below, no explicit default case exists, and so a corresponding branch region for the implicit default case is created and tied to the switch condition on line 65.
65|      3|    switch (condition)
------------------
|  Branch (65:13): [True: 2, False: 1]
------------------
66|      3|    {
67|      1|        case 0:
------------------
|  Branch (67:9): [True: 1, False: 2]
------------------
68|      1|            printf("case0\n"); // fallthrough
69|      1|        case 2:
------------------
|  Branch (69:9): [True: 0, False: 3]
------------------
70|      1|                               // fallthrough
71|      1|
72|      1|        case 3:
------------------
|  Branch (72:9): [True: 0, False: 3]
------------------
73|      1|            printf("case3\n"); // fallthrough
74|      3|
75|      3|    }

Open Issues

  • Code Composer Studio Integration

    • Presently, CCS doesn’t have direct support for tiarmclang compiler Code Coverage, though support will be added soon. This support will make it very straightforward for users to build projects for code coverage, download counter data from memory, and visualize the data.
  • Counter Size

    • Counters are presently 64bits in size, which may be too large for some embedded use cases
    • Counters that have large counts may overflow either during execution or when counter data is merged together by the tiarmprofdata tool. When the counter data is merged, tiarmprofdata uses saturating addition, so the final value will reflect the largest possible value. This will affect the accuracy of the visualization.
  • Function Differences

    • Different function definitions across multiple executables that have the same function name will likely be reported as having “mismatched data”. This is a known issue in code coverage for common function names like main(). Care should be taken to filter out cases like this using tiarmcov’s filtering mechanism since each instance clearly represents a different function.
    • Two or more functions that have the same code base but built different such that they contain different macro expansions will be visualized as multiple instantiations of the same function. This doesn’t impede coverage.
  • Source Filtering

    • The source filtering facility implemented by tiarmcov isn’t as fully featured as it is for other vendors, like LCOV. Specifically, embedded filter tags aren’t supported (e.g. LCOV_EXCL_[START|STOP]). Please see the filtering options for more information (tiarmcov –help)
  • Branch Coverage

    • Future compiler enhancements will likely be implemented to minimize the number of counters actually used in nested boolean expressions ((A || B) && C)
    • Modified Condition/Decision Coverage is not supported