Assembly Source Anatomy

Fields of an Assembly Source Line

While there are many differences in the details between the legacy TI-syntax and GNU-syntax ARM assembly languages, they are generally very similar. Both follow the same general form:

label field: mnemonic field operand list field

For example, the following ARM instruction is legal in both legacy TI-syntax and GNU-syntax ARM assembly language:

add_me:   add    r0, r1

where "add_me" occupies the label field, "add" occupies the
mnemonic field, and the operand list field consists of registers
r0 and r1 with operands separated by a comma.

Many of the assembly directives supported in legacy TI-syntax will need to be converted into their functionally equivalent GNU-syntax counterparts, and while most ARM instructions will likely assemble successfully with either the legacy TI-syntax assembler or the tiarmclang GNU-syntax assembler, there are different rules governing legal syntax in the label, mnemonic, and operand list fields that one should be aware of when migrating assembly source files:

Labels

An optional label field can be used to associate a value with a symbol. Label symbol names are case sensitive, and a label must begin in the leftmost column of the assembly source line. The rules governing the name of a symbol defined by a label are largely similar for legacy TI-syntax and GNU-syntax assembly source code, but there are some subtle differences:

Legacy TI-Syntax:

  • Label symbols must begin with a letter or an underscore
  • Label symbols can consist of alphanumeric characters, the dollar sign (“$”), and underscores (“_”)
  • Label symbol definitions may be delimited by an optional terminating colon (“:”)

GNU-Syntax:

  • Label symbols must start with a letter, an underscore, or a period (“.”)
  • Label symbols can consist of alphanumeric characters, the dollar sign (“$”), an underscore (“_”), or a period (“.”)
  • Label symbol definitions must be delimited with a terminating colon (“:”), otherwise the tiarmclang assembler tries to interpret the symbol as a mnemonic identifier

The value assigned to a symbol defined in a label field may vary depending on whether the label occurs within the context of an instruction or a directive. A more detailed discussion of the label field in each of these contexts is provided in the Converting TI-Syntax ARM Instructions to GNU-Syntax ARM Instructions and the Converting TI-Syntax Assembly Directives to GNU-Syntax Assembly Directives sections.

Mnemonics

The mnemonic field of a legal line of assembly code contains a pre-defined textual identifier that indicates whether the source line represents an instruction or a directive.

For example, the “push” mnemonic in the following line of assembly code is recognized as a valid ARM instruction:

        .text
        .thumb
        .global    simple_function

simple_function:
        push {r7,lr}
        ...

The ".text", ".thumb", and ".global" contents in the mnemonic
field are recognized as ARM assembly directives.

More information about how to convert legacy TI-syntax ARM instructions and directives into GNU-syntax is provided in both the Converting TI-Syntax ARM Instructions to GNU-Syntax ARM Instructions and the Converting TI-Syntax Assembly Directives to GNU-Syntax Assembly Directives sections, respectively.

With regards to syntax rules governing the mnemonic field, the only major difference between legacy TI-syntax and GNU-syntax is that GNU-syntax mnemonic identifiers may begin in the leftmost column of the assembly source line. Legacy TI-syntax mnemonic identifiers must not begin in the leftmost column.

Operand List

The syntax rules governing the operand list field is dependent on the identifier specified in the mnemonic field. For example, in the “push” instruction shown earlier in this section, the operand list field contains a list of one or more registers enclosed in braces, whereas the operand field of a .global directive expects a legal symbol identifier.

You can find more information about the ARM instruction set in the ARM Developer’s Instruction Set Architecture page. Legacy TI-syntax assembler directives are described in the ARM Assembly Language Tools User’s Guide. More information about GNU-syntax ARM assembler directives can be found in an up-to-date description of the GNU as ARM assembler.

Comments

When migrating assembly language source from legacy TI-syntax to GNU-syntax, you’ll need to modify the way that comments are delimited in your code. The syntax rules governing the demarcation of comments are significantly different between legacy TI-syntax and GNU-syntax ARM assembly language.

Legacy TI-Syntax Comment Delimiters

In legacy TI-syntax assembly source, comments can be delimited in two ways:

  • Text appearing after an asterisk, ‘*’, in column 0 is interpreted as a comment.
  • Text appearing after a semi-colon, ‘;’, on any column is interpreted as a comment.

The following snippet of legacy TI-syntax ARM assembly code demonstrates the use of these two methods of delimiting comments:

* Loop entry
loop_entry:
        BL      ef1           ; call ext func 1, ef1
        BL      ef2           ; call ext func 1, ef1
        LDR     A1, [SP, #0]
        ADDS    A1, A1, #1    ; I++ (A1)
        STR     A1, [SP, #0]
        LDR     A1, $C$CON1
        LDR     A2, [SP, #0]  ; load I (A2)
        LDR     A1, [A1, #0]  ; load ext var, evar (A1)
        CMP     A1, A2        ; I > evar?
        BGT     loop_entry    ; I < evar, go to loop_entry

* Loop exit
loop_exit:
        MOVS    A1, #0
        POP     {A4, PC}

GNU-Syntax Comment Delimiters

In GNU-syntax assembly source, comments can be delimited using:

  • C-style comments; text enclosed between “/*” and “*/” which may span multiple lines.
  • C++-style comments; text appearing after “//” on a line.
  • Text appearing after an at-sign, ‘@’, is interpreted as a comment unless that ‘@’ character appears in a macro definition preceded by a backslash ‘’. For more details about how to convert macro definitions from legacy TI-syntax to GNU-syntax please see the Converting TI-syntax Assembly Macros into GNU-syntax Assembly Macros section.

Now consider the following snippet of GNU-syntax assembly code, which implements the same instructions as the above legacy TI-syntax example. Note the use of C- and C++-style comments compared to the above example:

/*
 * Loop entry - comment can span multiple lines
 */
loop_entry:
        bl      ef1               // call ext func 1, ef1
        bl      ef2               // call ext func 1, ef1
        ldr     r0, [sp]
        adds    r0, #1            // I++ (r0)
        str     r0, [sp]
        movw    r1, :lower16:evar
        movt    r1, :upper16:evar
        ldr     r1, [r1]          // load evar (r1)
        cmp     r0, r1            // I > evar?
        blt     loop_entry        // I < evar, go to loop_entry

/* Loop exit */
loop_exit:
        movs    r0, #0
        pop     {r7, PC}