2. GNU-Syntax Arm Assembly Source Anatomy

2.1. Fields of an Assembly Source Line

GNU-syntax Arm assembly language source statements follow the following general form:

label field: <mnemonic field> <operand list field>

For example, the following Arm instruction is legal in GNU-syntax Arm assembly language:

add_me:   add    r0, r1

where:

  • add_me occupies the label field,

  • add occupies the mnemonic field, and

  • the operand list field consists of registers r0 and r1 with operands separated by a comma.

The following sections provide more details about the rules that govern the content of the label field, mnemonic field, and operand list field of a given GNU-syntax Arm assembly language source statement.

2.2. Labels

An optional label field can be used to associate a value with a symbol. Label symbol names are case sensitive, and a label must begin in the leftmost column of the assembly source line. For GNU-syntax Arm assembly source:

  • Label symbols must begin with a letter, an underscore, or a period (“.”).

  • Label symbols can consist of alphanumeric characters, the dollar sign (“$”), an underscore (“_”), or a period (“.”).

  • Label symbol definitions must be delimited with a terminating colon (“:”), otherwise the tiarmclang assembler tries to interpret the symbol as a mnemonic identifier.

  • A label may be specified on an assembly source line by itself in which case the label symbol’s value will be dependent on the assembly source lines above and below the label specification.

The value assigned to a symbol defined in a label field varies depending on whether the label occurs within the context of an instruction or a directive.

If a label is specified on the same line as or immediately precedes a specification of a GNU-syntax Arm assembly instruction, then the address of the first byte in the object encoding of the instruction is assigned to the label symbol’s value. For example, if the following assembly source:

        nop
        nop
        nop

add_me:
        add   r0,r1
        nop
        nop

is assembled with the following command:

%> tiarmclang -mcpu=cortex-m0 -c add_me.s

Then the output from %>tiarmdis add_me.o would look like this:

Disassembly of add_me.o:

TEXT Section .text, 0xe bytes at 0x00000000
000000:               .thumb
000000:              :
000000: 00BF             NOP
000002: 00BF             NOP
000004: 00BF             NOP
000006:              add_me:
000006: 0844             ADD             R0, R1
000008: 00BF             NOP
00000a: 00BF             NOP
00000c: 00BF             NOP

Notice that the value assigned to the add_me label symbol matches the address of the encoding of the first instruction that follows the label.

The value of a label symbol can also be influenced by a directive that appears before it in the assembly source file. Consider the following assembly source:

        .text
        nop
        nop
        nop

        .p2align 2
add_me:
        add      r0,r1
        nop
        nop
        nop

Using the same tiarmclang command to assemble the source file, the disassembly output would look like this:

Disassembly of add_me.o:

TEXT Section .text, 0x10 bytes at 0x00000000
000000:               .thumb
000000:              :
000000: 00BF             NOP
000002: 00BF             NOP
000004: 00BF             NOP
000006: C046             MOV             R8, R8
000008:              add_me:
000008: 0844             ADD             R0, R1
00000a: 00BF             NOP
00000c: 00BF             NOP
00000e: 00BF             NOP

In this example, the .p2align 2 directive instructs the assembler to advance the current section counter to the next 4-byte boundary before emitting the object encoding for the next instruction. The label symbol is then given the address of the object encoding for the aligned add instruction as its value. The assembler automatically inserts executable padding between the third nop instruction and the encoding of the add instruction.

2.2.1. Local Labels

The GNU-syntax Arm assembler that is integrated into the tiarmclang compiler supports the notion of local labels whose scope and effect are temporary. Local labels cannot be declared with global linkage. The syntax for defining and referring to GNU-syntax local labels is as follows:

  • Local label definitions use the form N: in the label field of a line of GNU-syntax assembly code, where N is an integer in the range [0,9].

  • References to the most recently defined local label use the form Nb, where N is the ID of the local label (an integer in [0,9]) and b indicates a backward reference.

  • References to the nest definition of a local label use the form Nf, where N is the ID of the local label (an integer in [0,9] and f indicates a forward reference.

GNU-syntax local labels can be redefined in the same compilation unit. The GNU-syntax assembler associates a unique ordinal ID for every local label definition so that it is able to distinguish one instance of a local label definition from another that was defined with the same value N.

Simple Local Label Example

Here is an example of local labels being used in the context of a loop:

// assume external global int "sum_tot"
     .global   sum_tot

// assume incoming r0 has loop limit
     .global   foo
     .section  .text
     .thumb
foo:
     ...
     MOVS   r1,#0
     CMP    r1, r0
     BLE    1f
0:
     LDR    r2, C_CON1
     LDR    r3, [r2]
     ADDS   r3, r3, r1
     STR    r3, [r2]
     ADDS   r1, r1, #1
     CMP    r1, r0
     BGT    0b
1:
     ...
     BX     LR

     .align 4
C_CON1:   .int   sum_tot

where:

  • 0: and 1: are local label definitions,

  • a forward reference to 1: is specified as 1f in the above BLE instruction, and

  • there is a backward reference to 0: specified as 0b in the BGT instruction.

Macro Example Use of Local Labels

The following example shows the use of a local label in the context of a macro definition:

// GNU-syntax implementation of trace_pc macro using local labels
     .macro    trace_pc
\@:
     .section  .trace_scn,"aw",%progbits
     .int      \@b
     .previous
     .endm

     .section  .text
foo:
     nop
     trace_pc
     nop
     trace_pc
     nop
     trace_pc
     nop

In this case, the special \@ syntax will be replaced by an automatically-generated integer when the macro is invoked and expanded. The effect is that each invocation of the macro will contain a local label definition and a backwards reference to that local label:

// GNU-syntax implementation of trace_pc macro using local labels
     .macro    trace_pc
\@:
     .section  .trace_scn,"aw",%progbits
     .int      \@b
     .previous
     .endm

     .section  .text
foo:
     nop
0:
     .section  .trace_scn,"aw",%progbits
     .int      0b
     .previous
     nop
1:
     .section  .trace_scn,"aw",%progbits
     .int      1b
     .previous
     nop
2:
     .section  .trace_scn,"aw",%progbits
     .int      2b
     .previous
     nop

The disassembled object code for the above example looks like this:

Disassembly of try.o:

TEXT Section .text, 0x8 bytes at 0x00000000
000000:              :
000000:              foo:
000000:               .thumb
000000: 00BF             NOP
000002: 00BF             NOP
000004: 00BF             NOP
000006: 00BF             NOP

DATA Section .trace_scn, 0xc bytes at 0x00000000
000000: 00000002         .word 0x00000002
000004: 00000004         .word 0x00000004
000008: 00000006         .word 0x00000006

The .trace_scn contains the addresses of the last three NOP instructions in the .text section.

If we were to use a normal label like xyz_\@ for the GNU-syntax implementation of the macro, the assembler would report a duplicate label definition for xyz_0. When the GNU-syntax assembler invokes the trace_pc macro using a local label definition, then the local label 0 is auto-generated for each invocation. Since the GNU-syntax assembler assigns a unique ordinal ID to each instance of a local label, it is able to avoid a duplicate label definition when local labels are used in the macro definition.

2.3. Mnemonics

The mnemonic field of a legal line of assembly code contains a pre-defined textual identifier that indicates whether the source line represents an instruction or a directive.

For example, the push mnemonic in the following line of assembly code is recognized as a valid Arm instruction:

// Simple example
     .text
     .thumb
     .global    simple_function

simple_function:
      push {r7,lr}
      ...

The .text, .thumb, and .global mnemonics are recognized as Arm assembly directives.

For GNU-syntax Arm assembly source, the mnemonic field may begin anywhere on a line of assembly source (including the left-most column 0) as long as it precedes the operand list field (if one is required) and any comments on the line. As mentioned earlier, an identifier that begins in the leftmost column is interpreted as a mnemonic unless it is delimited with a colon (‘:’) suffix.

The GNU-Syntax Arm Assembly Instructions and GNU-Syntax Arm Assembly Directives sections contain further information about specific mnemonics that represent Arm instructions and directives and are recognized by the tiarmclang integrated GNU-syntax assembler.

2.4. Operand List

The syntax rules governing the operand list field is dependent on the identifier specified in the mnemonic field. For example, in the push instruction shown earlier in this section, the operand list field contains a list of one or more registers enclosed in braces, whereas the operand list field of a .global directive expects a legal symbol identifier.

Some operands must be absolute, which means they may not refer to any external symbols or any registers or memory references. The value of the expression must be knowable at assembly time.

Some operands must be well-defined, which means they must use only symbols or constants that have been declared or defined before the expression in which they appear is encountered by the assembler.

More information about GNU-syntax Arm assembler instructions and directives can be found in the GNU-Syntax Arm Assembly Instructions and GNU-Syntax Arm Assembly Directives sections of this user guide.

2.5. Comments

You can insert comments into your assembly source code to enhance readability of your code. In GNU-syntax Arm assembly source, comments can be delimited using:

  • C-style comments; text enclosed between “/*” and “*/” which may span multiple lines.

  • C++-style comments; text appearing after “//” on a line.

  • Text appearing after an at-sign, ‘@’, is interpreted as a comment unless that ‘@’ character appears in a macro definition preceded by a backslash ‘\’. For more details about GNU-syntax macro definitions see the Directives that Affect Macros section.

Now consider the following snippet of GNU-syntax Arm assembly code, and note the use of C-style and C++-style comments:

/*
 * Loop entry - comment can span multiple lines
 */
loop_entry:
        bl      ef1               // call ext func 1, ef1
        bl      ef2               // call ext func 1, ef1
        ldr     r0, [sp]
        adds    r0, #1            // I++ (r0)
        str     r0, [sp]
        movw    r1, :lower16:evar
        movt    r1, :upper16:evar
        ldr     r1, [r1]          // load evar (r1)
        cmp     r0, r1            // I > evar?
        blt     loop_entry        // I < evar, go to loop_entry

/* Loop exit */
loop_exit:
        movs    r0, #0
        pop     {r7, PC}