This document goes over what is optishare and how it can be enabled in any project.
The following image shows what optishare does at a very high level.
As shown, optishare tool takes in input the application binary (ELF Format) for different cores and the output is binaries corresponding to each core and one additional binary called shared object
. In the output, all the common function (text) and read-only data is removed each CPU binary and is placed in the shared object
.
IPC Notify Echo Example With OptiShare is an example in SDK which implements OptiShare.
In case when more than 2 core's projects are using same library, that library is kept more than once in memory and this cause wastage of memory.
The solution here is to have a concept of "Shared" code/data.
However, consider the following code:
Assume this code is in all 4 cores. Now without optishare the address of the above variable is:
Symbol Address | Core 0 | core 1 | Core 2 | Core 3 |
---|---|---|---|---|
gCounterFreqHz | 0x7005cc78 | 0x7008a1c0 | 0x700ca1c0 | 0x7010a1c0 |
CycleCounterP_init | 0x70054f84 | 0x700866f8 | 0x700c66f8 | 0x701066f8 |
The Idea is to make the above function in just one location like 0x7007014c
However, there are some technical challenges with the above technique:
In ti-arm-clang, like GCC, basic units if layout are called "sections". Sections is a bytearray which cannot be split. Because sections are atomic units, therefore, if common functions are needed to be identified as shared functions, then a new section is required for each function (in GCC, -ffunction-section and -fdata-section are the flags that does this.). However, ti-arm-clang is by default making sections for each function and data, no extra function is required to be done.
There are 2 parts of optishare viz compile time and run time.
In this implementation, special flags are provided to the linker that makes it generate a .xml file. This XML file has
In the above diagram, the blocks that are of green color are the useable objects. The blocks that are colored as red should be discarded. In the bottom, it shows what are the output binaries
.
To apply OptiShare in an existing project, it is required to
In this implementation of optishare, Region Address Translation (RAT) hardware is being used.
RAT hardware does the following functionality in this specific scenario:
From the above flow, the output binaries that are generated by the optishare which is sso.out contains the shared text/data. Here text contains the functions.
When optishare script runs, it does the following:
CycleCounterP_reset
function. This function is in the call graph of CycleCounterP_init
function. So, from the above algo, CycleCounterP_init
will only be marked as shared if CycleCounterP_reset
is also shared across all cores. However, if CycleCounterP_reset
itself is calling another function which is not being shared, then neither CycleCounterP_reset
nor CycleCounterP_init
will be shared. This basically means that entire call-graph of a function should be shared among all the cores to make a function shared.in .projectspec -Wl,–gen_xml_func_hash and -Wl,–xml_link_info flags should be present so that CCS is able to compile without any issues.
At runtime, optishare needs special hardware features.
The technical challenges that were previously highlighted is solved using virtual memory region. What this mean is that all the functions that are in sso.out will access .data
and .bss
from a virtual memory region. Now each core has its own RAT hardware. This RAT hardware will map this virtual memory region to a physical memory region that contains core specific data.
So, at runtime, each core will configure RAT that is associated to it, to map that virtual memory region to some physical memory address in SRAM.
However, using the above technique forces one more constraint on the layout. Suppose the shared code assumes the following layout of .data section:
Offset | Symbol Name |
---|---|
0 | var1 |
10 | var2 |
12 | var3 |
22 | var4 |
Now because RAT hardware is simply translating the address, each core should have same offset of var1 to var4.
Here IPC Notify Echo Example With OptiShare example is being used.
As previously written, add new flags to generate xml file. The following images shows the additional linker flag that is be added for each core compilation.
This flag will generate .lnkxml file. This XML will have all the link information in XML format.
Other than this, add new rule that links the appliation again but with --import_sso
flag.
Each core's linker file needs to be changed.
For AM263Px, last 512KB of the memory is being used as the shared memory of Optishare.
This looks as follows:
Here
in the SECTION
of linker of core 0, add the following as is:
For core 1, memory section would be C1_SSO_LCL
and so on.
For this memory region, make sure that each core is marking this 512KB of L2 memory as shared in MPU.
The following images shows the same:
Selecting Non-Cached
will make that region as the shared
.
The reason why Cx_SSO_LCL needs to be non-cached is because, the data that is in this region is mostly some global variables. When Shared code updates that global variable, it sends out a virtual address and then RAT in effects updates the physical memory address. However, if caches are on, then, this would cause in-coherency issue.
Optishare script runs, in this example, its cmd is:
Notice the --mem_spec
flag. This flag is sued to pass in the memory specification to the optishare script. This is because, optishare script as of now cannot deduce the shared memory region. That is why, it is required to send out explicit shared memory specifications.
In this example, it is being defined as.
device_mem_regions
is the general information of different memories available in a device. For any device, device_mem_regions
struct should be same as given here.
shared_mem_regions
contains the shared memory specification. Here it splits the SSO_SHM
into different region. This is important, as it is required to split it into RX, RO and RW sections and .shared.bss
and .shared.data
should be placed only in the RW section.
shared_os_placement_instrs
is specifying the section placement. This should not be changed and kept as is.
The only change is required for shared_mem_regions
.
C Code needs to be changed as presented as follows:
As mentioned above, before enabling optishare (which is programming RAT), application should make sure that all the function and their called global variables should not be shared. This can be made sure by adding in do_not_share
attribute to a function:
When, code is relinked with --import_sso
flag, linker generates some symbols which can be used to program RAT. The following code shows how to do that:
In the above code, it programs the RAT before starting the application.
Compiler comes with another program that does the shows the saving of memory that is able to be achieved.
The above command compares the link-xml of application when compiled before optishare and after optishare.
the output is a text file which in this case is a file with extension *.ossr
(OptiShare Savings Report).
The contents of looks like the following:
Section | ipc_notify_echo_optishare.release.lnkxml | ipc_notify_echo_optishare.release.optishare.lnkxml | Saving |
---|---|---|---|
.text.hwi | 2472 | 2360 | 112 |
.text.cache | 1072 | 240 | 832 |
.text.mpu | 520 | 400 | 120 |
.text.boot | 392 | 368 | 24 |
.text:abort | 8 | 0 | 8 |
.text | 30256 | 28496 | 1760 |
.rodata | 5856 | 5152 | 704 |
.data | 1000 | 688 | 312 |
.bss.log_shared_mem | 16384 | 0 | 16384 |
.shared.data | 0 | 4096 | -4096 |
.shared.bss | 0 | 12288 | -12288 |
The above is for core R5F0-0. Run the above script for each core to get the total savings for each core.
MulticcoreELF (Understanding Multicore ELF image format) is image format that SDK use to boot from flash. tools/boot/multicore-elf/genimage.py is the python script that takes in input .out
file of cores and then provides .mcelf
and .mcelf_xip
as the output. The command looks like the follwing when optishare is not enabled:
To enable optishare, –sso flag is to be passed.
sso.out
contains the shared code and data.
Implementation of optishare is bit complex as it requires some understanding of linkers, ARM Memory Protection Unit (MPU), ARM Assembly Addressing Model , SOC level address translation using RAT, Caches etc. However, if implemented correctly, it can lead of a lot of memory savings. In usecase, where there are 2 OS running on different cores, this would make almost all the OS code as shared and leaving more space for user application.