2.4.7. Data allocation for instructions with two memory operands¶
Many instructions on the C2000 ALU take memory operands, meaning they can operate directly on data in memory without having to load to and store back from registers.
For instructions taking 2 memory operands, the second memory operand (*XAR7) uses the program memory bus. The C2000 RAM blocks only support one access to a memory block in a single pipeline cycle. To avoid a pipeline stalls, data arrays should be allocated to different physical RAM blocks. The physical RAM blocks can be found in the memory map of the device in its data manual.
The following instructions use the program memory bus for a second memory access via *XAR7:
MAC
IMACL
QMACL
DMAC
MACF32 (FPU only)
PREAD
Table 2.11 shows the C source for multiplying 2 arrays and accumulating the result. With -O3 --unified_memory
, the compiler generates a RPT in parallel with a MAC instruction for the loop. The MAC instruction has 2 memory operands, corresponding to array_1
and array_2
.
C Source |
Generated Assembly |
---|---|
int32_t mac(int16_t* array1, int16_t* array2, int16_t M)
{
_nassert(M > 0);
int j;
int32_t sum = 0;
for (j=0; j < M; j++)
sum += array1[j] * array2[j];
return sum;
}
|
||mac||:
MOVL XAR7,XAR5
ADDB AL,#-1
MOVZ AR5,AL
MOV P,#0
MOVB ACC,#0
RPT AR5
|| MAC P,*XAR4++,*XAR7++
ADDL ACC,P
LRETR
|
Table 2.12 compares performance on F28004x when the arrays are placed in same and different memory blocks. Both scenarios use the same compiler options: -v28 --abi=eabi --unified_memory --ramfunc=on -O3 --opt_for_speed=5
. Listing 2.1 illustrates how to place the 2 arrays in different physical memory blocks.
Scenario |
Cycles |
---|---|
Linker cmd file places both arrays in same memory block |
141 |
Linker cmd file places both arrays in different memory blocks |
77 |
1 2 3 4 5 | SECTIONS
{
.array_a : > RAMGS2
.array_b : > RAMGS3
}
|