2.9.1. Arm-Specific Intrinsics

2.9.1.1. ACLE Compiler Intrinsica

The ACLE compiler intrinsics are fully documented in the Arm C Language Extnesions - Release ACLE Q3 2020. A summary of each of the ACLE compiler intrinsics that are supported by the tiarmclang compiler is provided below. Note that many of the intrinsics documented in the ACLE specification are only available on Arm processors that are not currently supported by tiarmclang.

Note

#include <arm_acle.h>

The <arm_acle.h> header file should be included in your compilation unit before using any of the ACLE intrinsics.

2.9.1.1.1. Data-Processing Intrinsics

2.9.1.1.1.1. Rotate and Bit-Manipulation Intrinsics

__ror, __rorl, __rorll

Signatures

uint32_t      __ror(uint32_t x, uint32_t y);
unsigned long __rorl(unsigned long x, unsigned long y);
uint64_t      __rorll(uint64_t x, uint64_t y);

Description

Rotate x by y bits; return the result.

Availability

Cortex-M0/M0+/M3/M4/M33/R4/R5

__clz, __clzl, __clzll**

Signatures

unsigned int __clz(uint32_t x);
unsigned int __clzl(unsigned long x);
unsigned int __clzll(uint64_t x);

Description

Return the number of leading zero bits in x. If x == 0, then the return value is (width of x).

Availability

Cortex-M0/M0+/M3/M4/M33/R4/R5

__cls, __clsl, __clsll**

Signatures

unsigned int __cls(uint32_t x);
unsigned int __clsl(unsigned long x);
unsigned int __clsll(uint64_t x);

Description

Return the number of leading sign bits in x. If x == 0, then the return value is ((width of x) - 1).

Availability

Cortex-M0/M0+/M3/M4/M33/R4/R5

__rev, __revl, __revll**

Signatures

uint32_t      __rev(uint32_t x);
unsigned long __revl(unsigned long x);
uint64_t      __revll(uint64_t x);

Description

Reverse the byte order within the argument x; return the result.

Availability

Cortex-M0/M0+/M3/M4/M33/R4/R5

__rev16, __rev16l, __rev16ll**

Signatures

uint32_t      __rev16(uint32_t x);
unsigned long __rev16l(unsigned long x);
uint64_t      __rev16ll(uint64_t x);

Description

Reverse the byte order within each 16-bit half-word of the input argument x; return the result. For example,

__rev16(0x12345678) -> 0x34127856
__rev16ll(0x1122334455667788) -> 0x2211443366558877

Availability

Cortex-M0/M0+/M3/M4/M33/R4/R5

__revsh**

Signatures

uint16_t __revsh(uint16_t x);

Description

Reverse the byte order within the 16-bit input argument x; return the signed result.

Availability

Cortex-M0/M0+/M3/M4/M33/R4/R5

__rbit, __rbitl, __rbitll**

Signatures

uint32_t      __rbit(uint32_t x);
unsigned long __rbitl(unsigned long x);
uint64_t      __rbitll(uint64_t x);

Description

Reverse the bits within the input argument x; return the result.

Availability

Cortex-M0/M0+/R4/R5

2.9.1.1.1.2. 16-Bit Multiplication Intrinsics

__smulbb**

Signatures

int32_t __smulbb(int32_t x, int32_t y);

Description

result = (x & 0xffff) * (y & 0xffff)

Availability

Cortex-M0/M0+/M3/M4/M33/R4/R5

__smulbt**

Signatures

int32_t __smulbt(int32_t x, int32_t y);

Description

result = (x & 0xffff) * (y >> 16)

Availability

Cortex-M0/M0+/M3/M4/M33/R4/R5

__smultb**

Signatures

int32_t __smultb(int32_t x, int32_t y);

Description

result = (x >> 16) * (y & 0xffff)

Availability

Cortex-M0/M0+/M3/M4/M33/R4/R5

__smultt**

Signatures

int32_t __smultt(int32_t x, int32_t y);

Description

result = (x >> 16) * (y >> 16)

Availability

Cortex-M0/M0+/M3/M4/M33/R4/R5

__smulwb**

Signatures

int32_t __smulwb(int32_t x, int32_t y);

Description

result = (x * (int16_t)(y & 0xffff)) >> 16

Availability

Cortex-M0/M0+/M3/M4/M33/R4/R5

__smulwt**

Signatures

int32_t __smulwt(int32_t x, int32_t y);

Description

result = (x * (int16_t)(y >> 16)) >> 16

Availability

Cortex-M0/M0+/M3/M4/M33/R4/R5

2.9.1.1.1.3. Saturating Intrinsics

__ssat

Signatures

int32_t __smlawb(int32_t x, int32_t y, int32_t z);

Description Availability

Cortex-M0/M0+/M3/M4/M33/R4/R5

__usat

Signatures

int32_t __smlawb(int32_t x, int32_t y, int32_t z);

Description Availability

Cortex-M0/M0+/M3/M4/M33/R4/R5

__qadd

Signatures

int32_t __qadd(int32_t x, int32_t y);

Description

Add x and y with saturation; return result and set Q bit if the addition saturates.

Availability

Cortex-M0/M0+/M3/M4/M33/R4/R5

__qsub

Signatures

int32_t __qsub(int32_t x, int32_t y);

Description

Subtract y from x with saturation; return result and set Q bit if the subtraction saturates.

Availability

Cortex-M0/M0+/M3/M4/M33/R4/R5

__qdbl

Signatures

int32_t __qdbl(int32_t x);

Description

Add x to itself with saturation; return result if the addition saturates.

Availability

Cortex-M0/M0+/M3/M4/M33/R4/R5

__smlabb

Signatures

int32_t __smlabb(int32_t x, int32_t y, int32_t z);

Description

result = ((x & 0xffff) * (y & 0xffff)) + z

Set Q bit if addition overflows.

Availability

Cortex-M0/M0+/M3/M4/M33/R4/R5

__smlabt

Signatures

int32_t __smlabt(int32_t x, int32_t y, int32_t z);

Description

result = ((x & 0xffff) * (y >> 16)) + z

Set Q bit if addition overflows.

Availability

Cortex-M0/M0+/M3/M4/M33/R4/R5

__smlatb

Signatures

int32_t __smlatb(int32_t x, int32_t y, int32_t z);

Description

result = ((x >> 16) * (y & 0xffff)) + z

Set Q bit if addition overflows.

Availability

Cortex-M0/M0+/M3/M4/M33/R4/R5

__smlatt

Signatures

int32_t __smlatt(int32_t x, int32_t y, int32_t z);

Description

result = ((x >> 16) * (y >> 16)) + z

Set Q bit if addition overflows.

Availability

Cortex-M0/M0+/M3/M4/M33/R4/R5

__smlawb

Signatures

int32_t __smlawb(int32_t x, int32_t y, int32_t z);

Description

result = (x * (int16_t)(y & 0xffff)) + z

Set Q bit if addition overflows.

Availability

Cortex-M0/M0+/M3/M4/M33/R4/R5

__smlawt

Signatures

int32_t __smlawt(int32_t x, int32_t y, int32_t z);

Description

result = (x * (int16_t)(y >> 16)) + z

Set Q bit if addition overflows.

Availability

Cortex-M0/M0+/M3/M4/M33/R4/R5

2.9.1.1.1.4. 32-Bit SIMD Intrinsics

Note

Data Types for 32-Bit SIMD Intrinsics

The <arm_acle.h> header file defines the following types:

typedef int32_t int8x4_t;
typedef int32_t int16x2_t;
typedef uint32_t uint8x4_t;
typedef uint32_t uint16x2_t;

These are used in the descriptions below to indicate that a 32-bit term in an arithmetic operation is being broken into 2 16-bit half-words or 4 8-bit bytes. In the intrinsic descriptions below, terms will refer explicitly to the bits from the 32-bit input argument or return value. For example,

result[16..31] = x[16..31] + (int16_t)y[16..23]
result[0..15] = x[0..15] + (int16_t)y[0..7]

indicates that the top 16-bits of x are added to the y[2] byte and the result is placed in the top 16-bits of the return value, while the bottom 16-bits of x are added to the y[0] byte and the result is placed in the bottom 16-bits of the return value (see the description of the __sctab16 intrinsic below).

__ssat16

Signatures

int16x2_t __ssat16(int16x2_t x, unsigned int y);

Description

Saturate x[16..23] and x[0..15] to a bit width, y, in the range [1..16]. The Q bit is set if either operation saturates.

Availability

Cortex-M4/M33/R4/R5

__usat16

Signatures

int16x2_t __usat16(int16x2_t x, unsigned int y);

Description

Saturate x[16..23] and x[0..15] to a bit width, y, in the range [0..15]. The input values are signed and the output values are non-negative, with all negative inputs going to zero. The Q bit is set if either operation saturates.

Availability

Cortex-M4/M33/R4/R5

__sxtab16

Signatures

int16x2_t __sxtab16(int16x2_t x, int8x4_t y);

Description

result[16..31] = x[16..31] + (int16_t)y[16..23]
result[0..15] = x[0..15] + (int16_t)x[0..7]

Availability

Cortex-M4/M33/R4/R5

__sxtb16

Signatures

int16x2_t __sxtb16(int8x4_t x);

Description

result[16..31] = (int16_t)x[16..23]
result[0..15] = (int16_t)x[0..7]

Availability

Cortex-M4/M33/R4/R5

__uxtab16

Signatures

uint16x2_t __uxtab16(uint16x2_t x, uint8x4_t y);

Description

result[16..31] = x[16..31] + (uint16_t)y[16..23]
result[0..15] = x[0..15] + (uint16_t)y[0..7]

Availability

Cortex-M4/M33/R4/R5

__uxtb16

Signatures

uint16x2_t __uxtb16(uint8x4_t x);

Description

result[16..31] = (uint16_t)x[16..23]
result[0..15] = (uint16_t)x[0..7]

Availability

Cortex-M4/M33/R4/R5

__sel

Signatures

uint8x4_t  __sel(uint8x4_t x, uint8x4_t y);
uint16x2_t __sel(uint16x2_t x, uint16x2_t y);

Description

Selects each byte/half-word of the result from either x or y based on the values of the GE bits. For each byte, if the corresponding GE bit is set, then the byte from the first operand (x) is selected. If the corresponding GE bit is clear, then the byte from the second operand (y) is selected.

In the case of half-word operations, two GE bits are set/cleared per half-word operation, so the __sel* intrinsic can be used for either uint8x4_t or uint16x2_t type data.

Availability

Cortex-M4/M33/R4/R5

__qadd8

Signatures

int8x4_t __qadd8(int8x4_t x, int8x4_t y);

Description

result[24..31] = x[24..31] + y[24..31]
result[16..23] = x[16..23] + y[16..23]
result[8..15] = x[8..15] + y[8..15]
result[0..7] = x[0..7] + y[0..7]

Each signed addition result saturates to the range [-128..127].

Availability

Cortex-M4/M33/R4/R5

__qsub8

Signatures

int8x4_t __qsub8(int8x4_t x, int8x4_t y);

Description

result[24..31] = x[24..31] - y[24..31]
result[16..23] = x[16..23] - y[16..23]
result[8..15] = x[8..15] - y[8..15]
result[0..7] = x[0..7] - y[0..7]

Each signed subtraction result saturates to the range [-128..127].

Availability

Cortex-M4/M33/R4/R5

__sadd8

Signatures

int8x4_t __sadd8(int8x4_t x, int8x4_t y);

Description

result[24..31] = x[24..31] + y[24..31]
result[16..23] = x[16..23] + y[16..23]
result[8..15] = x[8..15] + y[8..15]
result[0..7] = x[0..7] + y[0..7]

Set GE bits according to the results.

Availability

Cortex-M4/M33/R4/R5

__shadd8

Signatures

int8x4_t __shadd8(int8x4_t x, int8x4_t y);

Description

result[24..31] = (x[24..31] + y[24..31]) / 2
result[16..23] = (x[16..23] + y[16..23]) / 2
result[8..15] = (x[8..15] + y[8..15]) / 2
result[0..7] = (x[0..7] + y[0..7]) / 2

Availability

Cortex-M4/M33/R4/R5

__shsub8

Signatures

int8x4_t __shsub8(int8x4_t x, int8x4_t y);

Description

result[24..31] = (x[24..31] - y[24..31]) / 2
result[16..23] = (x[16..23] - y[16..23]) / 2
result[8..15] = (x[8..15] - y[8..15]) / 2
result[0..7] = (x[0..7] - y[0..7]) / 2

Availability

Cortex-M4/M33/R4/R5

__ssub8

Signatures

int8x4_t __ssub8(int8x4_t x, int8x4_t y);

Description

result[24..31] = x[24..31] - y[24..31]
result[16..23] = x[16..23] - y[16..23]
result[8..15] = x[8..15] - y[8..15]
result[0..7] = x[0..7] - y[0..7]

Set GE bits according to the results.

Availability

Cortex-M4/M33/R4/R5

__uadd8

Signatures

uint8x4_t __uadd8(uint8x4_t x, uint8x4_t y);

Description

result[24..31] = x[24..31] + y[24..31]
result[16..23] = x[16..23] + y[16..23]
result[8..15] = x[8..15] + y[8..15]
result[0..7] = x[0..7] + y[0..7]

Set GE bits according to the results.

Availability

Cortex-M4/M33/R4/R5

__uhadd8

Signatures

uint8x4_t __uhadd8(uint8x4_t x, uint8x4_t y);

Description

result[24..31] = (x[24..31] + y[24..31]) / 2
result[16..23] = (x[16..23] + y[16..23]) / 2
result[8..15] = (x[8..15] + y[8..15]) / 2
result[0..7] = (x[0..7] + y[0..7]) / 2

Availability

Cortex-M4/M33/R4/R5

__uhsub8

Signatures

uint8x4_t __uhsub8(uint8x4_t x, uint8x4_t y);

Description

result[24..31] = (x[24..31] - y[24..31]) / 2
result[16..23] = (x[16..23] - y[16..23]) / 2
result[8..15] = (x[8..15] - y[8..15]) / 2
result[0..7] = (x[0..7] - y[0..7]) / 2

Availability

Cortex-M4/M33/R4/R5

__uqadd8

Signatures

uint8x4_t __uqadd8(uint8x4_t x, uint8x4_t y);

Description

result[24..31] = x[24..31] + y[24..31]
result[16..23] = x[16..23] + y[16..23]
result[8..15] = x[8..15] + y[8..15]
result[0..7] = x[0..7] + y[0..7]

Result of each unsigned addition saturates to the range of [0..0xff].

Availability

Cortex-M4/M33/R4/R5

__uqsub8

Signatures

uint8x4_t __uqsub8(uint8x4_t x, uint8x4_t y);

Description

result[24..31] = x[24..31] - y[24..31]
result[16..23] = x[16..23] - y[16..23]
result[8..15] = x[8..15] - y[8..15]
result[0..7] = x[0..7] - y[0..7]

Result of each unsigned subtraction saturates to the range of [0..0xff].

Availability

Cortex-M4/M33/R4/R5

__usub8

Signatures

uint8x4_t __usub8(uint8x4_t x, uint8x4_t y);

Description

result[24..31] = x[24..31] - y[24..31]
result[16..23] = x[16..23] - y[16..23]
result[8..15] = x[8..15] - y[8..15]
result[0..7] = x[0..7] - y[0..7]

Set GE bits according to the results.

Availability

Cortex-M4/M33/R4/R5

__usad8

Signatures

uint32_t __usad8(uint8x4_t x, uint8x4_t y);

Description

result = abs(x[24..31] - y[24..31]) +
         abs(x[16..23] - y[16..23]) +
         abs(x[8..15] - y[8..15]) +
         abs(x[0..7] - y[0..7])

Availability

Cortex-M4/M33/R4/R5

__usada8

Signatures

uint32_t __usada8(uint8x4_t x, uint8x4_t y, uint32_t z);

Description

result = abs(x[24..31] - y[24..31]) +
         abs(x[16..23] - y[16..23]) +
         abs(x[8..15] - y[8..15]) +
         abs(x[0..7] - y[0..7]) + z

Availability

Cortex-M4/M33/R4/R5

__qadd16

Signatures

int16x2_t __qadd16(int16x2_t x, int16x2_t y);

Description

result[16..31] = x[16..31] + y[16..31]
result[0..15] = x[0..15] + y[0..15]

Result of each addition saturates to the range [0..0xffff}.

Availability

Cortex-M4/M33/R4/R5

__qasx

Signatures

int16x2_t __qasx(int16x2_t x, int16x2_t y);

Description

result[16..31] = x[16..31] + y[0..15]
result[0..15] = x[0..15] - y[16..31]

Result of each operation saturates to the range [0..0xffff}.

Availability

Cortex-M4/M33/R4/R5

__qsax

Signatures

int16x2_t __qsax(int16x2_t x, int16x2_t y);

Description

result[16..31] = x[16..31] - y[0..15]
result[0..15] = x[0..15] + y[16..31]

Result of each operation saturates to the range [0..0xffff}.

Availability

Cortex-M4/M33/R4/R5

__qsub16

Signatures

int16x2_t __qsub16(int16x2_t x, int16x2_t y);

Description

result[16..31] = x[16..31] - y[16..31]
result[0..15] = x[0..15] - y[0..15]

Result of each subtraction saturates to the range [0..0xffff}.

Availability

Cortex-M4/M33/R4/R5

__sadd16

Signatures

int16x2_t __sadd16(int16x2_t x, int16x2_t y);

Description

result[16..31] = x[16..31] + y[16..31]
result[0..15] = x[0..15] + y[0..15]

Set GE bits according to the results.

Availability

Cortex-M4/M33/R4/R5

__sasx

Signatures

int16x2_t __sasx(int16x2_t x, int16x2_t y);

Description

result[16..31] = x[16..31] + y[0..15]
result[0..15] = x[0..15] - y[16..31]

Set GE bits according to the results.

Availability

Cortex-M4/M33/R4/R5

__shadd16

Signatures

int16x2_t __shadd16(int16x2_t x, int16x2_t y);

Description

result[16..31] = (x[16..31] + y[16..31]) / 2
result[0..15] = (x[0..15] + y[0..15]) / 2

Availability

Cortex-M4/M33/R4/R5

__shasx

Signatures

int16x2_t __shasx(int16x2_t x, int16x2_t y);

Description

result[16..31] = (x[16..31] + y[0..15]) / 2
result[0..15] = (x[0..15] - y[16..31]) / 2

Availability

Cortex-M4/M33/R4/R5

__shsax

Signatures

int16x2_t __shsax(int16x2_t x, int16x2_t y);

Description

result[16..31] = (x[16..31] - y[0..15]) / 2
result[0..15] = (x[0..15] + y[16..31]) / 2

Availability

Cortex-M4/M33/R4/R5

__shsub16

Signatures

int16x2_t __shsub16(int16x2_t x, int16x2_t y);

Description

result[16..31] = (x[16..31] - y[16..31]) / 2
result[0..15] = (x[0..15] - y[0..15]) / 2

Availability

Cortex-M4/M33/R4/R5

__ssax

Signatures

int16x2_t __ssax(int16x2_t x, int16x2_t y);

Description

result[16..31] = x[16..31] - y[0..15]
result[0..15] = x[0..15] + y[16..31]

The GE bits are set according to the results.

Availability

Cortex-M4/M33/R4/R5

__ssub16

Signatures

int16x2_t __ssub16(int16x2_t x, int16x2_t y);

Description

result[16..31] = x[16..31] - y[16..31]
result[0..15] = x[0..15] - y[0..15]

The GE bits are set according to the results.

Availability

Cortex-M4/M33/R4/R5

__uadd16

Signatures

uint16x2_t __uadd16(uint16x2_t x, uint16x2_t y);

Description

result[16..31] = x[16..31] + y[16..31]
result[0..15] = x[0..15] + y[0..15]

The GE bits are set according to the results.

Availability

Cortex-M4/M33/R4/R5

__uasx

Signatures

uint16x2_t __uasx(uint16x2_t x, uint16x2_t y);

Description

result[16..31] = x[16..31] + y[0..15]
result[0..15] = x[0..15] - y[16..31]

Set GE bits according to the result of the unsigned addition.

Availability

Cortex-M4/M33/R4/R5

__uhadd16

Signatures

uint16x2_t __uhadd16(uint16x2_t x, uint16x2_t y);

Description

result[16..31] = (x[16..31] + y[16..31]) / 2
result[0..15] = (x[0..15] + y[0..15]) / 2

Availability

Cortex-M4/M33/R4/R5

__uhasx

Signatures

uint16x2_t __uhasx(uint16x2_t x, uint16x2_t y);

Description

result[16..31] = (x[16..31] + y[0..15]) / 2
result[0..15] = (x[0..15] - y[16..31]) / 2

Availability

Cortex-M4/M33/R4/R5

__uhsax

Signatures

uint16x2_t __uhsax(uint16x2_t x, uint16x2_t y);

Description

result[16..31] = (x[16..31] - y[0..15]) / 2
result[0..15] = (x[0..15] + y[16..31]) / 2

Availability

Cortex-M4/M33/R4/R5

__uhsub16

Signatures

uint16x2_t __uhsub16(uint16x2_t x, uint16x2_t y);

Description

result[16..31] = (x[16..31] - y[16..31]) / 2
result[0..15] = (x[0..15] - y[0..15]) / 2

Availability

Cortex-M4/M33/R4/R5

__uqadd16

Signatures

uint16x2_t __uqadd16(uint16x2_t x, uint16x2_t y);

Description

result[16..31] = x[16..31] + y[16..31]
result[0..15] = x[0..15] + y[0..15]

Result of each addition saturates to the range [0..0xffff}.

Availability

Cortex-M4/M33/R4/R5

__uqasx

Signatures

uint16x2_t __uqasx(uint16x2_t x, uint16x2_t y);

Description

result[16..31] = x[16..31] + y[0..15]
result[0..15] = x[0..15] - y[16..31]

Result of each operation saturates to the range [0..0xffff}.

Availability

Cortex-M4/M33/R4/R5

__uqsax

Signatures

uint16x2_t __uqsax(uint16x2_t x, uint16x2_t y);

Description

result[16..31] = x[16..31] - y[0..15]
result[0..15] = x[0..15] + y[16..31]

Result of each operation saturates to the range [0..0xffff}.

Availability

Cortex-M4/M33/R4/R5

__uqsub16

Signatures

uint16x2_t __uqsub16(uint16x2_t x, uint16x2_t y);

Description

result[16..31] = x[16..31] - y[16..31]
result[0..15] = x[0..15] - y[0..15]

Result of each subtraction saturates to the range [0..0xffff}.

Availability

Cortex-M4/M33/R4/R5

__usax

Signatures

uint16x2_t __usax(uint16x2_t x, uint16x2_t y);

Description

result[16..31] = x[16..31] - y[0..15]
result[0..15] = x[0..15] + y[16..31]

Set GE bits according to the result of unsigned addition.

Availability

Cortex-M4/M33/R4/R5

__usub16

Signatures

uint16x2_t __usub16(uint16x2_t x, uint16x2_t y);

Description

result[16..31] = x[16..31] - y[16..31]
result[0..15] = x[0..15] - y[0..15]

Set GE bits according to the results.

Availability

Cortex-M4/M33/R4/R5

__smlad

Signatures

int32_t __smlad(int16x2_t x, int16x2_t y, int32_t z);

Description

result = (x[0..15] * y[0..15]) + (x[16..31] * y[16..31]) + z

Set Q bit if addition overflows.

Availability

Cortex-M4/M33/R4/R5

__smladx

Signatures

int32_t __smladx(int16x2_t x, int16x2_t y, int32_t z);

Description

result = (x[0..15] * y[16..31]) + (x[16..31] * y[0..15]) + z

Set Q bit if addition overflows.

Availability

Cortex-M4/M33/R4/R5

__smlald

Signatures

int64_t __smlald(int16x2_t x, int16x2_t y, int64_t z);

Description

result = (x[0..15] * y[0..15]) + (x[16..31] * y[16..31]) + z

Availability

Cortex-M4/M33/R4/R5

__smlaldx

Signatures

int64_t __smlaldx(int16x2_t x, int16x2_t y, int64_t z);

Description

result = (x[0..15] * y[16..31]) + (x[16..31] * y[0..15]) + z

Availability

Cortex-M4/M33/R4/R5

__smlsd

Signatures

int32_t __smlsd(int16x2_t x, int16x2_t y, int32_t z);

Description

result = (x[0..15] * y[0..15]) - (x[16..31] * y[16..31]) + z

Set Q bit if the addition overflows.

Availability

Cortex-M4/M33/R4/R5

__smlsdx

Signatures

int32_t __smlsdx(int16x2_t x, int16x2_t y, int32_t z);

Description

result = (x[0..15] * y[16..31]) - (x[16..31] * y[0..15]) + z

Set Q bit if the addition overflows.

Availability

Cortex-M4/M33/R4/R5

__smlsld

Signatures

int64_t __smlsld(int16x2_t x, int16x2_t y, int64_t z);

Description

result = (x[0..15] * y[0..15]) - (x[16..31] * y[16..31]) + z

Availability

Cortex-M4/M33/R4/R5

__smlsldx

Signatures

int64_t __smlsldx(int16x2_t x, int16x2_t y, int64_t z);

Description

result = (x[0..15] * y[16..31]) - (x[16..31] * y[0..15]) + z

Availability

Cortex-M4/M33/R4/R5

__smuad

Signatures

int32_t __smuad(int16x2_t x, int16x2_t y);

Description

result = (x[0..15] * y[0..15]) + (x[16..31] * y[16..31])

Set Q bit if the addition overflows.

Availability

Cortex-M4/M33/R4/R5

__smuadx

Signatures

int32_t __smuadx(int16x2_t x, int16x2_t y);

Description

result = (x[0..15] * y[16..31]) + (x[16..31] * y[0..15])

Set Q bit if the addition overflows.

Availability

Cortex-M4/M33/R4/R5

__smusd

Signatures

int32_t __smusd(int16x2_t x, int16x2_t y);

Description

result = (x[0..15] * y[0..15]) - (x[16..31] * y[16..31])

Availability

Cortex-M4/M33/R4/R5

__smusdx

Signatures

int32_t __smusdx(int16x2_t x, int16x2_t y);

Description

result = (x[0..15] * y[16..31]) - (x[16..31] * y[0..15])

Availability

Cortex-M4/M33/R4/R5

2.9.1.1.2. Floating-Point Data-Processing Intrinsics

__sqrt, __sqrtf

Signatures

double __sqrt(double x);
float  __sqrtf(float x);

Description

result = sqrt(x)

If x < 0, result will be a default NaN value (with possible floating-point exception thrown).

Availability

  • Cortex-M4 w/ FPv4 (-mfpu=fpv4-sp-d16)

  • Cortex-M33 w/ FPv5 (-mfpu=fpv5-sp-d16)

  • Cortex-R4/R5 w/ VFPv3 (-mfpu=vfpv3-d16)

__fma, __fmaf

Signatures

double __fma(double x, double y, double z);
float  __fmaf(float x, float y, float z);

Description

result = (x * y) + z

Availability

  • Cortex-M4 w/ FPv4 (-mfpu=fpv4-sp-d16)

  • Cortex-M33 w/ FPv5 (-mfpu=fpv5-sp-d16)

  • Cortex-R4/R5 w/ VFPv3 (-mfpu=vfpv3-d16)

__rintnf, __rintn

Signatures

double __rintn (double);
float  __rintnf (float);

Description

Convert double/float type x to integer. The return type of these intrinsics are double/float and should be cast to integer when assigning to an integer typedata object. For example,

int my_int = (int)__rintn(1.25);

Availability

  • Cortex-M4 w/ FPv4 (-mfpu=fpv4-sp-d16)

  • Cortex-M33 w/ FPv5 (-mfpu=fpv5-sp-d16)

  • Cortex-R4/R5 w/ VFPv3 (-mfpu=vfpv3-d16)

2.9.1.2. CMSE Support Intrinsics

2.9.1.3. Other Arm-Specific Compiler Intrinsics