2. GNU-Syntax Arm Assembly Source Anatomy¶
2.1. Fields of an Assembly Source Line¶
GNU-syntax Arm assembly language source statements follow the following general form:
label field: <mnemonic field> <operand list field>
For example, the following Arm instruction is legal in GNU-syntax Arm assembly language:
add_me: add r0, r1where:
add_me occupies the label field,
add occupies the mnemonic field, and
the operand list field consists of registers r0 and r1 with operands separated by a comma.
The following sections provide more details about the rules that govern the content of the label field, mnemonic field, and operand list field of a given GNU-syntax Arm assembly language source statement.
2.2. Labels¶
An optional label field can be used to associate a value with a symbol. Label symbol names are case sensitive, and a label must begin in the leftmost column of the assembly source line. For GNU-syntax Arm assembly source:
Label symbols must begin with a letter, an underscore, or a period (“.”).
Label symbols can consist of alphanumeric characters, the dollar sign (“$”), an underscore (“_”), or a period (“.”).
Label symbol definitions must be delimited with a terminating colon (“:”), otherwise the tiarmclang assembler tries to interpret the symbol as a mnemonic identifier.
A label may be specified on an assembly source line by itself in which case the label symbol’s value will be dependent on the assembly source lines above and below the label specification.
The value assigned to a symbol defined in a label field varies depending on whether the label occurs within the context of an instruction or a directive.
If a label is specified on the same line as or immediately precedes a specification of a GNU-syntax Arm assembly instruction, then the address of the first byte in the object encoding of the instruction is assigned to the label symbol’s value. For example, if the following assembly source:
nop
nop
nop
add_me:
add r0,r1
nop
nop
is assembled with the following command:
%> tiarmclang -mcpu=cortex-m0 -c add_me.s
Then the output from %>tiarmdis add_me.o
would look like this:
Disassembly of add_me.o:
TEXT Section .text, 0xe bytes at 0x00000000
000000: .thumb
000000: :
000000: 00BF NOP
000002: 00BF NOP
000004: 00BF NOP
000006: add_me:
000006: 0844 ADD R0, R1
000008: 00BF NOP
00000a: 00BF NOP
00000c: 00BF NOP
Notice that the value assigned to the add_me
label symbol matches the address of the encoding of the first instruction that follows the label.
The value of a label symbol can also be influenced by a directive that appears before it in the assembly source file. Consider the following assembly source:
.text
nop
nop
nop
.p2align 2
add_me:
add r0,r1
nop
nop
nop
Using the same tiarmclang command to assemble the source file, the disassembly output would look like this:
Disassembly of add_me.o:
TEXT Section .text, 0x10 bytes at 0x00000000
000000: .thumb
000000: :
000000: 00BF NOP
000002: 00BF NOP
000004: 00BF NOP
000006: C046 MOV R8, R8
000008: add_me:
000008: 0844 ADD R0, R1
00000a: 00BF NOP
00000c: 00BF NOP
00000e: 00BF NOP
In this example, the .p2align 2
directive instructs the assembler to advance the current section counter to the next 4-byte boundary before emitting the object encoding for the next instruction. The label symbol is then given the address of the object encoding for the aligned add instruction as its value. The assembler automatically inserts executable padding between the third nop instruction and the encoding of the add instruction.
2.2.1. Local Labels¶
The GNU-syntax Arm assembler that is integrated into the tiarmclang compiler supports the notion of local labels whose scope and effect are temporary. Local labels cannot be declared with global linkage. The syntax for defining and referring to GNU-syntax local labels is as follows:
Local label definitions use the form N: in the label field of a line of GNU-syntax assembly code, where N is an integer in the range [0,9].
References to the most recently defined local label use the form Nb, where N is the ID of the local label (an integer in [0,9]) and b indicates a backward reference.
References to the nest definition of a local label use the form Nf, where N is the ID of the local label (an integer in [0,9] and f indicates a forward reference.
GNU-syntax local labels can be redefined in the same compilation unit. The GNU-syntax assembler associates a unique ordinal ID for every local label definition so that it is able to distinguish one instance of a local label definition from another that was defined with the same value N.
Simple Local Label Example
Here is an example of local labels being used in the context of a loop:
// assume external global int "sum_tot"
.global sum_tot
// assume incoming r0 has loop limit
.global foo
.section .text
.thumb
foo:
...
MOVS r1,#0
CMP r1, r0
BLE 1f
0:
LDR r2, C_CON1
LDR r3, [r2]
ADDS r3, r3, r1
STR r3, [r2]
ADDS r1, r1, #1
CMP r1, r0
BGT 0b
1:
...
BX LR
.align 4
C_CON1: .int sum_tot
where:
0: and 1: are local label definitions,
a forward reference to 1: is specified as 1f in the above BLE instruction, and
there is a backward reference to 0: specified as 0b in the BGT instruction.
Macro Example Use of Local Labels
The following example shows the use of a local label in the context of a macro definition:
// GNU-syntax implementation of trace_pc macro using local labels
.macro trace_pc
\@:
.section .trace_scn,"aw",%progbits
.int \@b
.previous
.endm
.section .text
foo:
nop
trace_pc
nop
trace_pc
nop
trace_pc
nop
In this case, the special \@ syntax will be replaced by an automatically-generated integer when the macro is invoked and expanded. The effect is that each invocation of the macro will contain a local label definition and a backwards reference to that local label:
// GNU-syntax implementation of trace_pc macro using local labels
.macro trace_pc
\@:
.section .trace_scn,"aw",%progbits
.int \@b
.previous
.endm
.section .text
foo:
nop
0:
.section .trace_scn,"aw",%progbits
.int 0b
.previous
nop
1:
.section .trace_scn,"aw",%progbits
.int 1b
.previous
nop
2:
.section .trace_scn,"aw",%progbits
.int 2b
.previous
nop
The disassembled object code for the above example looks like this:
Disassembly of try.o:
TEXT Section .text, 0x8 bytes at 0x00000000
000000: :
000000: foo:
000000: .thumb
000000: 00BF NOP
000002: 00BF NOP
000004: 00BF NOP
000006: 00BF NOP
DATA Section .trace_scn, 0xc bytes at 0x00000000
000000: 00000002 .word 0x00000002
000004: 00000004 .word 0x00000004
000008: 00000006 .word 0x00000006
The .trace_scn contains the addresses of the last three NOP instructions in the .text section.
If we were to use a normal label like xyz_\@ for the GNU-syntax implementation of the macro, the assembler would report a duplicate label definition for xyz_0. When the GNU-syntax assembler invokes the trace_pc macro using a local label definition, then the local label 0 is auto-generated for each invocation. Since the GNU-syntax assembler assigns a unique ordinal ID to each instance of a local label, it is able to avoid a duplicate label definition when local labels are used in the macro definition.
2.3. Mnemonics¶
The mnemonic field of a legal line of assembly code contains a pre-defined textual identifier that indicates whether the source line represents an instruction or a directive.
For example, the push mnemonic in the following line of assembly code is recognized as a valid Arm instruction:
// Simple example
.text
.thumb
.global simple_function
simple_function:
push {r7,lr}
...
The .text, .thumb, and .global mnemonics are recognized as Arm assembly directives.
For GNU-syntax Arm assembly source, the mnemonic field may begin anywhere on a line of assembly source (including the left-most column 0) as long as it precedes the operand list field (if one is required) and any comments on the line. As mentioned earlier, an identifier that begins in the leftmost column is interpreted as a mnemonic unless it is delimited with a colon (‘:’) suffix.
The GNU-Syntax Arm Assembly Instructions and GNU-Syntax Arm Assembly Directives sections contain further information about specific mnemonics that represent Arm instructions and directives and are recognized by the tiarmclang integrated GNU-syntax assembler.
2.4. Operand List¶
The syntax rules governing the operand list field is dependent on the identifier specified in the mnemonic field. For example, in the push instruction shown earlier in this section, the operand list field contains a list of one or more registers enclosed in braces, whereas the operand list field of a .global directive expects a legal symbol identifier.
Some operands must be absolute, which means they may not refer to any external symbols or any registers or memory references. The value of the expression must be knowable at assembly time.
Some operands must be well-defined, which means they must use only symbols or constants that have been declared or defined before the expression in which they appear is encountered by the assembler.
More information about GNU-syntax Arm assembler instructions and directives can be found in the GNU-Syntax Arm Assembly Instructions and GNU-Syntax Arm Assembly Directives sections of this user guide.
2.5. Comments¶
You can insert comments into your assembly source code to enhance readability of your code. In GNU-syntax Arm assembly source, comments can be delimited using:
C-style comments; text enclosed between “/*” and “*/” which may span multiple lines.
C++-style comments; text appearing after “//” on a line.
Text appearing after an at-sign, ‘@’, is interpreted as a comment unless that ‘@’ character appears in a macro definition preceded by a backslash ‘\’. For more details about GNU-syntax macro definitions see the Directives that Affect Macros section.
Now consider the following snippet of GNU-syntax Arm assembly code, and note the use of C-style and C++-style comments: