11. Release Notes¶
Software ManifestTI_MCU_2.1.1
Supported devices
F28P55x and C28x core based devices
F29H85x and C29x core based devices
AM13x and Arm Cortex-M33 core based devices
AM26x and Arm Cortex-R5 core based devices
MSPM0G5187 and Arm Cortex-M0+ core based devices
MSPM33C321A and Arm Cortex-M33 core based devices
CC2745 device and Arm Cortex-M33 core based devices
Added a model memory usage summary and layer offloading summary to compiler output.
Optimized NPU setup time for the F28P55x device, resulting in improved overall performance.
Updated the “skip_normalize” option to extract input normalization scale values as floats instead of integers.
Resolved defects
CODEGEN-14807: TI MCU NNC crashes when NHWC only padding optimization is applied to NCHW data layout
CODEGEN-14838: TI MCU NNC crashes when a non-existent attribute is accessed on a node
CODEGEN-14981: TVM ONNX frontend fails with ONNX 1.20
TI_MCU_2.1.0
Supported devices
F28P55x and C28x core based devices
F29H85x and C29x core based devices
CC2745 device and Arm Cortex-M33 core based devices
Supported all layer configs with 8-bit activations and 8-/4-/2-bit weights that can be offloaded to TI-NPU
Supported all layer configs with 8-bit activations and 8-bit weights that can be accelerated using the M33 Custom Datapath Extension (CDE).
Improved quantized (8-bit) model inference performance on C29x core. Note that floating-point models typically run faster than their quantized counterparts on C29x core, despite requiring a larger memory footprint.
TI_MCU_2.0.0
Supported devices
Expanded layer configs that can be supported on TI-NPU
Added an option to compress TI-NPU layer data
Optimized floating point inference on C29x core
Supported quantized models in QDQ format with integer inference code on CPU
TI_MCU_1.3.0
Windows and Linux support
Multiple model support in the same application
Generic padding support
Optimized performance for motor fault classification models and arc fault detection models
With TI-NPU acceleration, these models saw a 1.2x to 1.5x speedup from TI_MCU_1.2.0 release
With software-only execution on C28, these models saw a 3.9x to 4.8x speedup from TI_MCU_1.2.0 release
Added example and documentation on how to apply TI-NPU quantization to an existing PyTorch training script
Simplified command line options
TI_MCU_1.2.0
Supported devices
F28P55x and C28x core based devices
Supported Models
Motor fault classification
Arc fault detection