# The Programmer's Responsibility
Floating-point arithmetic is inherently trickier than integer or fixed-point
arithmetic. There are a lot more performance and precision gotchas, so the
compiler is not as free to optimize the code automatically. For this reason,
the user must be far more aware of the properties of floating-point arithmetic
to get good performance out of the compiler.
Beware: this topic is much deeper than it would seem; **this page barely
scratches the surface of things that the floating-point programmer needs to
Issues to consider are:
* Rounding modes
* Floating-point exceptions
* Do not expect absolute precision; not possible in a finite format
* Negation is an operation; negation is not part of float (or integer) constants
* printf rounds
# The Compiler's Responsibility
The compiler must faithfully translate floating-point arithmetic so that the
computed value remains the same. By default, the compiler is not allowed to
perform any optimization which might affect the result. This behavior can
sometimes severely limit optimization potential. See below for options that
give more aggressive optimization.
## Quality of implementation
The TI compiler strives to provide IEEE-754 support, but there are some
limitations. This is a quality-of-implementation (QoI) issue. In particular,
the compiler's run-time support (RTS) library for some ISAs doesn't always
handle special values, rounding, or accuracy correctly. For devices which have
IEEE-754 floating-point arithmetic support, the compiler can take advantage of
it and it will be as accurate as possible. However, some functions must still
be handled in the library. TI is striving to improve the QoI of the
floating-point handling. At this time, there is no other specific statement of
the correctness or speed of the TI compiler's floating-point capabilities. A
proper characterization of correctness and speed is one of the things to be
# Devices without floating-point arithmetic hardware
Some devices do not have floating-point arithmetic hardware, so C
floating-point types must be emulated in software. These functions are provided
in the compiler's run-time support (RTS) library. These functions are much,
much slower than floating-point hardware, so you will see poor performance if
you try to use floating-point arithmetic on a device which does not support it
Some ISAs have a variety of devices which may or may not support floating-point
arithmetic. If you are using a device which does support it, you must inform
the compiler or it will not be able to take advantage of it. Consult the
[C/C++ Compiler User Guide for your ISA](https://www.ti.com/tool/TI-CGT#technicaldocuments)
for a complete list of options.
Here are some sample FP-enabling options.
ISA | options
ARM | --float_support=VFPv3
ARM | --float_support=VFPv3D16
ARM | --float_support=FPv4SPD16
C2800 | --float_support=fpu32
C6000 | -mv6740
C6000 | -mv6600
# float vs. double vs. long double
Many customers struggle with the achieving acceptable performance when using
floating point code. The most common problem is using double precision
operations instead of single precision. There is a significant performance
penalty when using double precision. For instance, on a Cortex-R4, the result
latency of a single precision multiply is 2 whereas a double precision multiply
In the TI ARM compiler (and all other EABI ARM compilers) the C/C++ type double
is used for double precision (64-bit) data and float is used for single
precision (32-bit) data. Some other hardware vendors specify the double type as
being 32 bits which can lead to performance degradations when porting code from
a different platform. You must ensure that all data types you are using are of
type float in order to generate single precision floating point instructions.
Once all of your data is defined as float, there are still cases where you may
unknowingly cause the compiler to generate double precision operations. The
most common issue is when floating point literals such as 3.14159 are used. In
C these literals are of type double, and if they are used in an expression
consisting of single precision operands, the operations will be promoted to
double precision. The proper way to specify a single precision literal is to
use an 'f' suffix, 3.14159f.
The functions defined in math.h such as sin(), cos(), sqrt(), etc. are defined
as double precision routines. This means that calling these functions will
result in significant overhead. The C99 standard specifies single precision
versions of these routines, which are implemented in the TI ARM compiler. These
routines are defined as the double precision version with an 'f' suffix. So the
routines are of the form sinf(), cosf(), sqrtf(), etc. It is important to note
that these routines are not TI specific and are part of the C99 standard.
Standard C has three real, floating-point types:
* long double
If floating-point precision or speed are important to your application, you
need to be aware of the properties of each type for the ISA you are using, and
you also need to be aware of the type of each expression.
TI ISAs use either IEEE-32 or IEEE-64 to represent these types. On a given
device, IEEE-32 is faster but less precise than IEEE-64.
ISA | float | double | long double
ARM | 32 | 64 | 64
C2800 (COFF) | 32 | 32 | 64
C2800 (EABI) | 32 | 64 | 64
C6000 | 32 | 64 | 64
MSP | 32 | 64 | 64
Note that some ISAs have 32-bit double or long double for legacy compatibility
reasons. The C standard disallows IEEE-32 for these types, so these targets do
not conform to the C standard with respect to these types.
## Making sure your program doesn't use double precision
The option --float_operations_allowed controls the precision of floating point
operations allowed in the compilation. The arguments are: none, 32, 64, all.
If --float_operations_allowed=32 is specified on the command line, the
compiler will issue an error if a double precision operation will be
generated. This can be used to ensure that double precision operations are not
accidentally introduced into an application.
# Special Values
IEEE floating-point representation has some special values. For a complete
description, see IEEE-754 (or ISO/IEC/IEEE-60559) and C99 (ISO/IEC 9899:1999).
* NaN (not a number)
* Inf (positive infinity)
* -Inf (negative infinity)
* -0.0 (negative zero)
* denormal (aka subnormal) numbers
These values may behave strangely in an arithmetic expression, so it may be
desirable to avoid an expression which will create one.
In particular, avoid generating a NaN value.
NaN (not a number) represents the fact that no information is known about the
value. It is the result of an expression that has no reasonable
interpretation, such as 0/0 or Inf/Inf. When a NaN is involved in an
arithmetic expression, the result is always NaN. When a NaN is compared to
another value, NaN is not equal to anything, including itself. Thus NaN==X is
false, and NaN!=X is true, even if X is NaN.
# Not all algebraic idioms are valid
Floating-point arithmetic is full of cases where simple algebraic rules like
X==X or (X*Y)*Z==X*(Y*Z) do not hold. Typically, this is because one of the
inputs is a special value, but sometimes even normal numbers can cause this.
For this reason, the compiler is not allowed to perform every algebraic
simplification that might seem obvious. Here is a partial list of algebraic
rules that are true for integer arithmetic, but may not be true for
* X==X is not equivalent to true if X could be NaN
* X!=X is not equivalent to false if X could be NaN
* (X*Y)*Z is not equivalent to X*(Y*Z) for some values of X,Y,Z (see below)
* -(X-Y) is not equivalent to Y-X if X and Y could be both 0.0.
* X-X is not equivalent to 0.0 if X could be +Inf, -Inf, NaN, or -0.0
* X/X is not equivalent to 1.0 if X could be +Inf, -Inf, NaN, or -0.0
* X*0 is not equivalent to 0.0 if X could be NaN or -0.0
* X=Y) if either X or Y could be NaN
* ((X<0)?-X:X) is not equivalent to fabs(X) if X could be -0.0
Some idioms do hold, but only under restricted circumstances. For example,
X/Y is equivalent to X*(1/Y) where Y is a floating-point constant and 1/Y is
exactly representable. 1/2 is exactly representable, but 1/3 is not; thus, the
optimizer will convert X/2 to X*0.5, but it will not convert X/3 to X*0.333333
(but see below).
In summary, do not expect the compiler to perform most algebraic
transformations on your floating-point expressions. In many cases, you will
need to write the expression exactly as you expect it to be executed.
Algebraic re-association is changing (X*Y)*Z to X*(Y*Z). Unfortunately, in
floating-point arithmetic, this could change the result.
For example ...
* (10000001.0f * 10000001.0f) / 10000001.0f == 10000000.0f
* 10000001.0f * (10000001.0f / 10000001.0f) == 10000001.0f
In the first expression, the actual value of (10000001.0f * 10000001.0f)
exceeds the precision of IEEE-32, so it gets rounded off.
By default, the compiler is not allowed to make this transformation because it
could change the result. Unfortunately this means that the compiler is not
generally free to make profitable loop transformations that would effectively
re-associate a floating-point expression. The next section discusses how to
change this default behavior.
# Better Performance vs Strict IEEE Correctness
By default, the compiler is severely limited with respect to floating-point
optimizations, such as re-associating floating-point expressions, because such
optimizations could slightly change the result. For programs which can
tolerate this small change, you can use compiler options to instruct the
compiler to more aggressively optimize your code. You must take care to make
sure that your program will tolerate the loss of precision. Be aware that
small errors can accumulate into larger errors as the result is fed into
further expressions, especially in a loop.
[C/C++ Compiler User Guide for your ISA](https://www.ti.com/tool/TI-CGT#technicaldocuments)
for the default settings of these modes.
The main compiler option is the --fp_mode option. This option controls the
compiler's overall floating-point optimization strategy.
Relaxed mode prioritizes speed over strict correctness. In relaxed mode, the
compiler may perform speed optimizations at the expense of reducing the
precision of some calculations, typically a tiny amount. For instance, (X/3)
is not precisely equivalent to (X*(1.0/3)), but in relaxed mode, the compiler
is allowed to make this transformation anyway, as multiplication is much
faster than division.
Strict mode enforces strict IEEE-754 semantics, disabling all unsafe
optimizations. The compiler will still perform optimizations that are provably
safe, such as (X/2) -> (X*0.5). Using --fp_mode=strict sets --fp_reassoc=off
The --fp_reassoc option controls whether the compiler is allowed to
re-associate floating-point expressions. This is an important optimization for
ISAs which can perform more than one floating-point operation per cycle, such
as those with vector hardware.
Re-association mode allows the compiler to freely re-associate floating-point
expressions. However, this can slightly change the precision.
Using --fp_reassoc=off prevents the compiler from doing this specific
# Further Reading
* [Floating point (Wikipedia)](https://en.wikipedia.org/wiki/Floating_point)
* [What Every Computer Scientist Should Know About Floating-Point Arithmetic](https://download.oracle.com/docs/cd/E19422-01/819-3693/ncg_goldberg.html)