3.2. Object Representation

For general information about data types, see Data Types. This section explains how various data objects are sized, aligned, and accessed.

3.2.1. Data Type Storage

The following table lists register and memory storage for various data types:

Data Representation in Registers and Memory

Data Type

Register Storage

Memory Storage

char, signed char

Bits 0-7 of register (Note 1 below)

8 bits aligned to 8-bit boundary

unsigned char, bool

Bits 0-7 of register

8 bits aligned to 8-bit boundary

short, signed short

Bits 0-15 of register (Note 1 below)

16 bits aligned to 16-bit (halfword) boundary

unsigned short, wchar_t

Bits 0-15 of register

16 bits aligned to 16-bit (halfword) boundary

int, signed int

Bits 0-31 of register

32 bits aligned to 32-bit (word) boundary

unsigned int

Bits 0-31 of register

32 bits aligned to 32-bit (word) boundary

long, signed long

Bits 0-31 of register

32 bits aligned to 32-bit (word) boundary

unsigned long

Bits 0-31 of register

32 bits aligned to 32-bit (word) boundary

long long

Even/odd register pair

64 bits aligned to 32-bit (word) boundary (Note 2 below)

unsigned long long

Even/odd register pair

64 bits aligned to 32-bit (word) boundary (Note 2 below)

float

Bits 0-31 of register

32 bits aligned to 32-bit (word) boundary

double

Register pair

64 bits aligned to 32-bit (word) boundary (Note 2 below)

long double

Register pair

64 bits aligned to 32-bit (word) boundary (Note 2 below)

struct

Members are stored as their individual types require.

Members are stored as their individual types require; aligned according to the member with the most restrictive alignment requirement.

array

Members are stored as their individual types require.

Members are stored as their individual types require; aligned to 32-bit (word) boundary. All arrays inside a structure are aligned according to the type of each element in the array.

pointer to data member

Bits 0-31 of register

32 bits aligned to 32-bit (word) boundary

pointer to member function

Components stored as their individual types require

64 bits aligned to 32-bit (word) boundary

Note 1) Negative values are sign-extended to bit 31.

Note 2) 64-bit data is aligned on a 64-bit boundary.

For details about the size of an enum type, see Enum Type Storage.

3.2.1.1. char and short Data Types (signed and unsigned)

The char and unsigned char data types are stored in memory as a single byte and are loaded to and stored from bits 0-7 of a register (see the following figure). Objects defined as short or unsigned short are stored in memory as two bytes at a halfword (2 byte) aligned address and they are loaded to and stored from bits 0-15 of a register.

Figure: Char and Short Data Storage Format

../../_images/data_storage_char_short.png

3.2.1.2. float, int, and long Data Types (signed and unsigned)

The int, unsigned int, float, long and unsigned long data types are stored in memory as 32-bit objects at word (4 byte) aligned addresses. Objects of these types are loaded to and stored from bits 0-31 of a register, as shown in the following figure.

  • In big-endian mode, 4-byte objects are loaded to registers by moving the first byte (that is, the lower address) of memory to bits 24-31 of the register, moving the second byte of memory to bits 16-23, moving the third byte to bits 8-15, and moving the fourth byte to bits 0-7.

  • In little-endian mode, 4-byte objects are loaded to registers by moving the first byte (that is, the lower address) of memory to bits 0-7 of the register, moving the second byte to bits 8-15, moving the third byte to bits 16-23, and moving the fourth byte to bits 24-31.

Figure: 32-Bit Data Storage Format

../../_images/data_storage_32bit.png

3.2.1.3. double, long double, and long long Data Types (signed and unsigned)

Double, long double, long long and unsigned long long data types are stored in memory in a pair of registers and are always referenced as a pair. These types are stored as 64-bit objects at word (4 byte) aligned addresses. For FPA mode, the word at the lowest address contains the sign bit, the exponent, and the most significant part of the mantissa. The word at the higher address contains the least significant part of the mantissa. This is true regardless of the endianness of the target. For VFP mode, the words are ordered based upon the endianness of the target.

Objects of this type are loaded into and stored in register pairs, as shown in the following figure. The most significant memory word contains the sign bit, exponent, and the most significant part of the mantissa. The least significant memory word contains the least significant part of the mantissa.

Figure: Double-Precision Floating-Point Data Storage Format

../../_images/data_storage_double.png

3.2.1.4. Pointer to Data Member Types

Pointer to data member objects are stored in memory like an unsigned int (32 bit) integral type. Its value is the byte offset to the data member in the class, plus 1. The zero value is reserved to represent the NULL pointer to the data member.

3.2.1.5. Pointer to Member Function Types

Pointer to member function objects are stored as a structure with three members, and the layout is equivalent to:

struct {
    short int d;
    short int i;
    union {
        void (f) ();
        long 0; }
};

The parameter d is the offset to be added to the beginning of the class object for this pointer. The parameter I is the index into the virtual function table, offset by 1. The index enables the NULL pointer to be represented. Its value is -1 if the function is non-virtual. The parameter f is the pointer to the member function if it is non-virtual, when I is 0. The 0 is the offset to the virtual function pointer within the class object.

3.2.2. Structure and Array Alignment

Structures are aligned according to the member with the most restrictive alignment requirement. Structures are padded so that the size of the structure is a multiple of its alignment. Arrays are always word aligned. Elements of arrays are stored in the same manner as if they were individual objects.

3.2.3. Bit Fields

Bit fields are the only objects that are packed within a byte. That is, two bit fields can be stored in the same byte. Bit fields can range in size from 1 to 32 bits, but they never span a 4-byte boundary.

For big-endian mode, bit fields are packed into registers from most significant bit (MSB) to least significant bit (LSB) in the order in which they are defined. Bit fields are packed in memory from most significant byte (MSbyte) to least significant byte (LSbyte). For little-endian mode, bit fields are packed into registers from the LSB to the MSB in the order in which they are defined, and packed in memory from LSbyte to MSbyte.

Here are some details about how bit fields are handled:

  • Plain int bit fields are unsigned. Consider the following C code, where bar() is never called, since bit field ‘a’ is unsigned. Use signed int if you need a signed bit field.

    struct st
    {
        int a:5;
    } S;
    
    foo()
    {
        if (S.a < 0)
        bar();
    }
    
  • Bit fields of type long long are supported.

  • Bit fields are treated as the declared type.

  • The size and alignment of the struct containing the bit field depends on the declared type of the bit field. For example, consider the struct, which uses up 4 bytes and is aligned at 4 bytes:

    struct st {int a:4};
    
  • Unnamed bit fields affect the alignment of the struct or union. For example, consider the struct, which uses 4 bytes and is aligned at a 4-byte boundary:

    struct st{char a:4; int :22;};
    
  • Bit fields declared volatile are accessed according to the bit field’s declared type. A volatile bit field reference generates exactly one reference to its storage; multiple volatile bit field accesses are not merged.

The following figure illustrates bit-field packing, using the following bit field definitions:

struct{
    int A:7
    int B:10
    int C:3
    int D:2
    int E:9
}x;

A0 represents the least significant bit of the field A; A1 represents the next least significant bit, etc. Again, storage of bit fields in memory is done with a byte-by-byte, rather than bit-by-bit, transfer.

Figure: Bit-Field Packing in Big-Endian and Little-Endian Formats

../../_images/bit_field_packing_endian.png

3.2.4. Character String Constants

In C, a character string constant is used in one of the following ways:

To initialize an array of characters. For example:

char s[] = "abc";

When a string is used as an initializer, it is simply treated as an initialized array; each character is a separate initializer. For more information about initialization, see System Initialization.

In an expression. For example:

strcpy (s, "abc");

When a string is used in an expression, the string itself is defined in the .const section with the .string assembler directive, along with a unique label that points to the string; the terminating 0 byte is included. For example, the following lines define the string abc, and the terminating 0 byte (the label SL5 points to the string):

     .sect ".const"
SL5: .string "abc",0

String labels have the form SLn, where n is a number assigned by the compiler to make the label unique. The number begins at 0 and is increased by 1 for each string defined. All strings used in a source module are defined at the end of the compiled assembly language module.

The label SLn represents the address of the string constant. The compiler uses this label to reference the string expression.

Because strings are stored in the .const section (possibly in ROM) and shared, it is bad practice for a program to modify a string constant. The following code is an example of incorrect string use:

const char *a = "abc"
a[1] = 'x';             /* Incorrect! undefined behavior */