This example provides a basic overview of applying smart placement and compares run time of functions with smart placement and without smart placement.
The Aim of this example is to:
More on smart placement can be read at Smart Placement
Parameter | Value |
---|---|
CPU + OS | r5fss0-0 nortos |
Toolchain | >= ti-arm-clang |
Boards | am263x-cc, am263x-lp |
Example folder | examples/kernel/nortos/basic_smart_placement |
To build this application, compiler ti-cgt-armllvm >= 3.2.0 LTS or later is required. Application can be compiled using make command.
Shown below is a sample output when the application is run,
When the program runs, it first calls basic_smart_placement_main
function which internally calls function_f1
and annotated_function_f1
.
PMU has been used for profiling execution of function_f1
and annotated_function_f1
.
Following steps can be followed to apply smart placement:
Here, critical functions are manually found using inspection. Because this is manual process, criticality of a function is determined purely based on the knowledge of software/firmware/usecase.
Following is critical function's priority table.
Function Name | Priority |
---|---|
annotated_function_f1 | 1 |
annotated_function_f2 | 2 |
annotated_function_f3 | 2 |
annotated_function_f4 | 2 |
Here note that from the manual inspection it has been identified that annotated_function_f1
is more critical than rest of the functions therefore, it has given more a priority number
which is smaller than other function's priority number
. Because rest of the critical functions are all relatively not as of same priority so they are clubbed together by given them same priority number
.
What priority number
to assign to which critical function is also a manual process which involves a good understanding of firmware.
Once the above table has been formed, each function is annotated as shown:
Note:
Here it has to be made sure that following 3 lines are added/present in the SECTIONS of linker.cmd:
The above lines basically channeling all the functions that are annotated to be in local
memory into TCM memory and if total size of the functions that are marked local
is more than the size of R5F_TCMA then all the functions that could be placed in R5F_TCMA will be placed in R5F_TCMA and rest of functions will be moved in R5F_TCMB and even if it still fills R5F_TCMB then remaining function will be moved to MSRAM.
Similar treatment is for all the functions that are marked onchip
, however, they should never be placed in any TCM otherwise it will be logically wrong.
Also all functions which are marked offchip
, should be placed in external FLASH.
From the linker, onchip marked functions are routed to TCMA/B and TCM are fastest memories. Therefore, by annotating the critical functions directly from the source, code is placed in faster memory and hence runtime performance has been improved.
This can be seen from the above sample output. For same number of instructions executed, CPU cycles taken to execute functions without smart placement is 3 times more than with smart placement.
Again, TCMs are never cached. Therefore, for the critical functions which are not cached well, placing them in TCM will help in improving cache rate and ultimately CPI of function.
This example shows how to manually apply smart placement and shows how it improves code performance with ease-of-use because of it allows placement of function at source level.