1. GNU-Syntax Arm Assembly Source Anatomy

1.1. Fields of an Assembly Source Line

GNU-syntax Arm assembly language source statements follow the following general form:

label field: mnemonic field operand list field

For example, the following Arm instruction is legal in GNU-syntax Arm assembly language:

add_me:   add    r0, r1

where:

  • add_me occupies the label field,

  • add occupies the mnemonic field, and

  • the operand list field consists of registers r0 and r1 with operands separated by a comma.

The following sections provide more details about the rules that govern the content of the label field, mnemonic field, and operand list field of a given GNU-syntax Arm assembly language source statement.

1.2. Labels

An optional label field can be used to associate a value with a symbol. Label symbol names are case sensitive, and a label must begin in the leftmost column of the assembly source line. For GNU-syntax Arm assembly source:

  • Label symbols must start with a letter, an underscore, or a period (“.”).

  • Label symbols can consist of alphanumeric characters, the dollar sign (“$”), an underscore (“_”), or a period (“.”).

  • Label symbol definitions must be delimited with a terminating colon (“:”), otherwise the tiarmclang assembler tries to interpret the symbol as a mnemonic identifier.

  • A label may be specified on an assembly source line by itself in which case the label symbol’s value will be dependent on the assembly source lines above and below the label specification.

The value assigned to a symbol defined in a label field varies depending on whether the label occurs within the context of an instruction or a directive. A more detailed discussion of the label field in each of these contexts is provided in the GNU-Syntax Arm Assembly Instructions and the GNU-Syntax Arm Assembly Directives sections.

1.2.1. Local Labels

The GNU-syntax Arm assembler that is integrated into the tiarmclang compiler supports the notion of local labels whose scope and effect are temporary. Local labels cannot be declared with global linkage. The syntax for defining and referring to GNU-syntax local labels is as follows:

  • Local label definitions use the form N: in the label field of a line of GNU-syntax assembly code, where N is an integar in the range [0,9].

  • References to the most recently defined local label use the form Nb, where N is the ID of the local label (an integer in [0,9]) and b indicates a backward reference.

  • References to the nest definition of a local label use the form Nf, where N is the ID of the local label (an integer in [0,9] and f indicates a forward reference.

GNU-syntax local labels can be redefined in the same compilation unit. The GNU-syntax assembler will associate a unique ordinal ID for every local label definition so that it is able to distinguish one instance of a local label definition from another that was defined with the same value N.

Simple Local Label Example

Here is an example of local labels being used in the context of a loop:

// assume external global int "sum_tot"
     .global   sum_tot

// assume incoming r0 has loop limit
     .global   foo
     .section  .text
     .thumb
foo:
     ...
     MOVS   r1,#0
     CMP    r1, r0
     BLE    1f
0:
     LDR    r2, C_CON1
     LDR    r3, [r2]
     ADDS   r3, r3, r1
     STR    r3, [r2]
     ADDS   r1, r1, #1
     CMP    r1, r0
     BGT    0b
1:
     ...
     BX     LR

     .align 4
C_CON1:   .int   sum_tot

where:

  • 0: and 1: are local label definitions,

  • a forward reference to 1: is specified as 1f in the above BLE instruction, and

  • there is a backward reference to 0: specified as 0b in the BGT instruction.

Macro Example Use of Local Labels

The following example shows the use of a local label in the context of a macro definition:

// GNU-syntax implementation of trace_pc macro using local labels
     .macro    trace_pc
\@:
     .section  .trace_scn,"aw",%progbits
     .int      \@b
     .previous
     .endm

     .section  .text
foo:
     nop
     trace_pc
     nop
     trace_pc
     nop
     trace_pc
     nop

In this case, the special @ syntax will get replaced by an automatically generated integer when the macro is invoked and expanded. The effect is that each invocation of the macro will contain a local label definition and a backwards reference to that local label:

// GNU-syntax implementation of trace_pc macro using local labels
     .macro    trace_pc
\@:
     .section  .trace_scn,"aw",%progbits
     .int      \@b
     .previous
     .endm

     .section  .text
foo:
     nop
0:
     .section  .trace_scn,"aw",%progbits
     .int      0b
     .previous
     nop
1:
     .section  .trace_scn,"aw",%progbits
     .int      1b
     .previous
     nop
2:
     .section  .trace_scn,"aw",%progbits
     .int      2b
     .previous
     nop

The disassembled object code for the above example looks like this:

Disassembly of try.o:

TEXT Section .text, 0x8 bytes at 0x00000000
000000:              :
000000:              foo:
000000:               .thumb
000000: 00BF             NOP
000002: 00BF             NOP
000004: 00BF             NOP
000006: 00BF             NOP

DATA Section .trace_scn, 0xc bytes at 0x00000000
000000: 00000002         .word 0x00000002
000004: 00000004         .word 0x00000004
000008: 00000006         .word 0x00000006

The .trace_scn contains the addresses of the last three NOP instructions in the .text section.

If we were to use a normal label like xyz_@ for the GNU-syntax implementation of the macro, the assembler would report a duplicate label definition for xyz_0. When the GNU-syntax assembler invokes the trace_pc macro using a local label definition, then the local label 0 is auto-generated for each invocation. Since the GNU-syntax assembler will assign a unique ordinal ID to each instance of a local label, it is able to avoid a duplicate label definition when local labels are used in the macro definition.

1.3. Mnemonics

The mnemonic field of a legal line of assembly code contains a pre-defined textual identifier that indicates whether the source line represents an instruction or a directive.

For example, the push mnemonic in the following line of assembly code is recognized as a valid Arm instruction:

// Simple example
     .text
     .thumb
     .global    simple_function

simple_function:
      push {r7,lr}
      ...

The .text, .thumb, and .global mnemonics are recognized as Arm assembly directives.

For GNU-syntax Arm assembly source, the mnemonic field may begin anywhere on a line of assembly source (including the left-most column 0) as long as it precedes the operand list field (if one is required) and any comments on the line. As mentioned earlier, an identifier that begins in the leftmost column will be interpreted as a mnemonic unless it is delimited with a colon (‘:’) suffix.

The GNU-Syntax Arm Assembly Instructions and GNU-Syntax Arm Assembly Directives sections contain further information about specific mnemonics that represent Arm instructions and directives and are recognized by the tiarmclang integrated GNU-syntax assembler.

1.4. Operand List

The syntax rules governing the operand list field is dependent on the identifier specified in the mnemonic field. For example, in the push instruction shown earlier in this section, the operand list field contains a list of one or more registers enclosed in braces, whereas the operand list field of a .global directive expects a legal symbol identifier.

More information about GNU-syntax Arm assembler instructions and directives can be found in the GNU-Syntax Arm Assembly Instructions and GNU-Syntax Arm Assembly Directives sections of this user guide.

1.5. Comments

You can insert comments into your assembly source code to enhance readability of your code. In GNU-syntax Arm assembly source, comments can be delimited using:

  • C-style comments; text enclosed between “/*” and “*/” which may span multiple lines.

  • C++-style comments; text appearing after “//” on a line.

  • Text appearing after an at-sign, ‘@’, is interpreted as a comment unless that ‘@’ character appears in a macro definition preceded by a backslash ‘'. For more details about how GNU-syntax macro definitions see the GNU-Syntax Arm Assembly Macros section.

Now consider the following snippet of GNU-syntax assembly code, and note the use of C- and C++-style comments:

/*
 * Loop entry - comment can span multiple lines
 */
loop_entry:
        bl      ef1               // call ext func 1, ef1
        bl      ef2               // call ext func 1, ef1
        ldr     r0, [sp]
        adds    r0, #1            // I++ (r0)
        str     r0, [sp]
        movw    r1, :lower16:evar
        movt    r1, :upper16:evar
        ldr     r1, [r1]          // load evar (r1)
        cmp     r0, r1            // I > evar?
        blt     loop_entry        // I < evar, go to loop_entry

/* Loop exit */
loop_exit:
        movs    r0, #0
        pop     {r7, PC}