MMALIB Release Notes
Version: 02.04.00.06
Contents
- Introduction
- Licensing
- Getting Started
- Documentation
- What's New
- Upgrade and Compatibility Information
- Device Support
- Validation Information
- Fixed Issues
- Deprecation
- Known Issues
- Technical Support
- Package Versioning
Introduction
The MMALIB package consists of the Texas Instruments optimized kernels for CNN, FFT, LINALG algorithms.
Licensing
The licensing information of this library and a complete manifest along with export control information is detailed here [HTML].
Getting Started
The MMALIB User Guide [USER_GUIDE] provides the documentation and references necessary to begin development on TI's platforms.
Documentation
Refer to following documentation for further details:
MMALIB User Guide | Build instructions, API Guide | [USER_GUIDE] |
Test Reports | Misra C reports, conformance test reports, TI platform test reports | [TEST_RESULTS] |
Software Manifest | Licenses, terms of use | [HTML] |
What's New
Here are a few of the new features supported in this release for C7120:
CNN
- MMALIB_CNN_convolve_row_ixX_ixX_oxX
- MMALIB_CNN_convolveBias_row_ixX_ixX_oxX
- MMALIB_CNN_convolve_row_1x1stride2_ixX_ixX_oxX
- MMALIB_CNN_convolve_row_3x3stride2_ixX_ixX_oxX
- MMALIB_CNN_convolve_row_3x3stride3_ixX_ixX_oxX
- MMALIB_CNN_convolve_row_5x5stride2_ixX_ixX_oxX
- MMALIB_CNN_convolve_row_7x7stride2_ixX_ixX_oxX
- MMALIB_CNN_convolve_row_11x11stride4_ixX_ixX_oxX
- MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX
- MMALIB_CNN_convolve_col_smallNo_highPrecision
- MMALIB_CNN_convolve_col_smallNo_3x3stride2_ixX_ixX_oxX
- MMALIB_CNN_deconvolve_row_2x2Stride2_ixX_ixX_oxX
- MMALIB_CNN_deconvolve_row_4x4Stride2_ixX_ixX_oxX
- MMALIB_CNN_deconvolve_row_8x8Stride2_ixX_ixX_oxX
- MMALIB_CNN_pixelShuffleUpscale2_row_ixX_ixX_oxX
- MMALIB_CNN_pixelShuffleUpscale4_row_ixX_ixX_oxX
- MMALIB_CNN_pixelShuffleUpscale8_row_ixX_ixX_oxX
- MMALIB_CNN_fullyConnected_ixX_ixX_oxX
- MMALIB_CNN_fullyConnectedBias_ixX_ixX_oxX
- MMALIB_CNN_generateFillSeamPredicateRegisters
- MMALIB_CNN_tensor_convert_ixX_oxX
- MMALIB_CNN_generateFillBiasPredicateRegisters
LINALG
- 8, 16 and 32 bit signed integer support
- MMALIB_LINALG_matrixMatrixMultiply_ixX_ixX_oxX
- MMALIB_LINALG_pointwiseMatrixMatrixMultiply_ixX_ixX_oxX
- MMALIB_LINALG_matrixMatrixMultiplyAccumulate_ixX_ixX_ixX_oxX
- MMALIB_LINALG_matrixTranspose_ixX_oxX
FFT
- Support for 16-bit and 32-bit signed complex input/output
- Support for real and imaginary parts of data to be in either interleaved or non-interleaved format. Optimal performance assumes interleaved data format
- Support for FFT sizes from 2 through 8192. FFT sizes are assumed to be powers of 2
- Support for batch processing of multiple FFTs in one kernel call. For optimal performance, input data and output data for the entire batch, and twiddle factor data are assumed to fit in L2 memory
- support for C7x only kernels; host-emulation mode support is TBD
- MMALIB_FFT_dftSmall_ixX_cxX_oxX
- MMALIB_FFT_dftLarge_ixX_cxX_oxX
- MMALIB_FFT_highRadixDecompositions_ixX_cxX_oxX
- MMALIB_FFT_fft_ixX_cxX_oxX
- MMALIB_fft1dBatched_i16sc_c16sc_o16sc [C7x only]
- MMALIB_fft1dBatched_i32fc_c32fc_o32fc [C7x only]
- MMALIB_fft1d_i16sc_c16sc_o16sc [C7x only]
- MMALIB_fft1d_i32fc_c32fc_o32fc [C7x only]
- Utility functions (with suffixes _getSizes) to compute sizes required for data and twiddle factor buffers
- Utility functions (with suffixes _twGen) to compute relevant DFT matrices and twiddle factors
DSP
- MMALIB_DSP_fir_ixX_ixX_oxX
- MMALIB_DSP_firSmall_ixX_ixX_oxX
Details
CNN style 2D convolution
-
CNN style 2D convolution FxG stride by 1
- K = 1: optimized 8 bit, 16 bit
- K = 2: optimized 8 bit, 16 bit
- K = 3: optimized 8 bit, 16 bit
- K > 3: optimized
- Dilation : optimized
- 1x1 : supported as regular convolution
- K = 1: optimized 8 bit, 16 bit
- K = 2: optimized 8 bit, 16 bit
- K = 3: optimized 8 bit, 16 bit
- K > 3: optimized
- K = 1: optimized 8 bit tested, 16 bit
- K = 2: optimized 8 bit tested, 16 bit
- K = 3: optimized 8 bit tested, 16 bit
- K > 3: optimized
- K = 1: optimized 8 bit tested, 16 bit
- K = 2: optimized 8 bit tested, 16 bit
- K = 3: optimized 8 bit tested, 16 bit
- K > 3: optimized
- K = 1: optimized 8 bit tested, 16 bit
- K = 2: optimized 8 bit tested, 16 bit
- K = 3: optimized 8 bit tested, 16 bit
- K > 3: optimized
- K = 1: currently natural C
- K = 2: currently natural C
- K = 3: optimized 8 bit tested, 16 bit
- K > 3: optimized
- K = 1: currently natural C
- K = 2: currently natural C
- K = 3: currently natural C
- K > 3: optimized
- Condition for Mflag =1. The seam insertion predicate restricted to support (image width + pad)*image height less than 9Kbytes fits in L1 and otherwise use L2 buffer
- Generic strided convolution is supported as natural C.
- Strided convolution with dilation is supported as natural C.
- K and KBlocks is total filter coefficient per output channle/MMA block size (8 bit 64 and 16 bit 32), MBlocks is Number of output channels/MMA Block size, NBlocks is Feature map (width*height/MMA Size)
- Mflag=1 is the condition with number of output channels do not fit in L2.
- Please refer user guide for more details
CNN style 2D convolution 1x1 stride by 2
CNN style 2D convolution 3x3 stride by 3
CNN style 2D convolution 5x5 stride by 2
CNN style 2D convolution 7x7 stride by 2
CNN style 2D convolution 11x11 stride by 4
All:
CNN style 2D convolution 3x3 5x5 7x7 Ni = No = 1, small
- stride-by-1
- optimized-C and natural-C
- stride-by-1 with dilation
- optimized-C and natural-C
- 3x3 stride-by-2
- optimized-C and natural-C
- dilation with stride is not supported
- optimized-C requires the number of rows in a feature map to be even
- All:
- Bias can be embedded into kernel coefficients or a seperator when calling the reorder weights function
- Split processing of a single group across both sides of the MMA
- Multiple groups processed in single kernel call
- Supports multiple rows of bias provided the rows do not push the computation beyond the MMA block size
- Supports 8- and 16-bit datatypes
- Supports signed datatype with kernel matrix
- Support only for optimized code for 2x2, 4x4, 8x8 stride 2 Support of multiple Bias rows using 2D DECDIM feature Kernel generates continuous even and rows and store in memory. The application need to interleave the rows for generating the final outputs.
- Supports 8- and 16-bit datatypes
- Supports signed and unsigned combinations of datatypes with kernel matrix always assumed to be signed
- Support of multiple Bias rows using 2D DECDIM feature
- 16-bit, interleaved/non-interleaved, any FFT_SIZE, any BATCH_SIZE: works
- 32-bit, interleaved/non-interleaved, any FFT_SIZE, any BATCH_SIZE: works
- 16-bit, interleaved/non-interleaved, 64 < FFT_SIZE < 8192, BATCH_SIZE = 1,2,4,8,16,32,64: works
- 32-bit, interleaved/non-interleaved, 32 < FFT_SIZE < 8192, BATCH_SIZE = 1,2,4,8,16,32,64: works
- 16-bit, interleaved/non-interleaved, any FFT_SIZE, any BATCH_SIZE: works
- 32-bit, interleaved/non-interleaved, any FFT_SIZE, any BATCH_SIZE: works
Upgrade and Compatibility Information
File | Change description | User application change required | User application recompile required |
Device Support
SoC | Host (OS) | Target (OS) | Test Platform |
J7AEP, J7ES C7x, MMA | No OS | No OS |
Validation Information
This release was built and validated using the following tools:
Build Tools (NOT included in MMALIB):
- C7x CGT C7000-CGT-3.0.0 STS
Fixed Issues
- [MMALIB-537] MMALIB shall support convolve row optimization for M< 1, K < 1 and pad > 0 C7100
- [MMALIB-524] MMALIB shall support Tensor convert needed by TVM
- [MMALIB-526]J721E: Fr=3 Fc=1 Convolution fails for the MMALIB_CNN_convolve_row (MMALIB 2.4.0.2)
- [MMALIB-555]Column Flow: 3x3_s1 Kernel driven with ES style padding results in incorrect results in host emulation and gets stuck on target
Deprecation
-
The following functions will be deprecated after next MMALIB.02.04.00.00 release for C7120. These are in the current build for migration from C7100 to C7120. The FFTLIB is created as a seperate package. The FFT kernels in MMALIB are old interface and will be deprecated.
- MMALIB_CNN_convolve_row_ixX_ixX_oxX
- MMALIB_CNN_convolve_row_1x1stride2_ixX_ixX_oxX
- MMALIB_CNN_convolve_row_3x3stride2_ixX_ixX_oxX
- MMALIB_CNN_convolve_row_3x3stride3_ixX_ixX_oxX
- MMALIB_CNN_convolve_row_5x5stride2_ixX_ixX_oxX
- MMALIB_CNN_convolve_row_7x7stride2_ixX_ixX_oxX
- MMALIB_CNN_convolve_row_11x11stride4_ixX_ixX_oxX
- MMALIB_CNN_convolve_col_smallNo_legacy_ixX_ixX_oxX
- MMALIB_CNN_deconvolve_row_2x2Stride2_ixX_ixX_oxX
- MMALIB_CNN_deconvolve_row_4x4Stride2_ixX_ixX_oxX
- MMALIB_CNN_deconvolve_row_8x8Stride2_ixX_ixX_oxX
- MMALIB_CNN_pixelShuffleUpscale2_row_ixX_ixX_oxX
- MMALIB_CNN_pixelShuffleUpscale4_row_ixX_ixX_oxX
- MMALIB_CNN_pixelShuffleUpscale8_row_ixX_ixX_oxX
- MMALIB_CNN_fullyConnected_ixX_ixX_oxX
- MMALIB_CNN_generateFillSeamPredicateRegisters
- MMALIB_CNN_generateFillBiasPredicateRegisters MMALIB_FFT_dftSmall_ixX_cxX_oxX
- MMALIB_FFT_dftLarge_ixX_cxX_oxX
- MMALIB_FFT_highRadixDecompositions_ixX_cxX_oxX
- MMALIB_FFT_fft_ixX_cxX_oxX
- MMALIB_fft1dBatched_i16sc_c16sc_o16sc [C7x only]
- MMALIB_fft1dBatched_i32fc_c32fc_o32fc [C7x only]
- MMALIB_fft1d_i16sc_c16sc_o16sc [C7x only]
- MMALIB_fft1d_i32fc_c32fc_o32fc [C7x only]
Known Issues
-
None
Technical Support
For technical support, please post your questions on TI E2E Forum for Automotive ADAS SoCs.
For additional assistance, contact local TI Field Application Engineer
Package Versioning
Each package version is composed of 4 period-delimited numbers - represented here by the letters M, m, p and b [M.m.p.b]
. The table below provides a descriptive reference regarding package version numbering.
Digit | Meaning | Description |
---|---|---|
1 (M=Major) | Major revision | Incremented when the new version is substantially different from the previous For example, a new module added or an existing module's algorithm significantly altered. |
2 (m=minor) | Minor revision | Incremented when the new version has changed but not in a major way. For example, some minor changes in the API or feature set. |
3 (p=patch) | Patch number | Incremented for all other source code changes. This include any packaging support code. |
4 (b=build) | Build number | Incremented for each release delivery to CM. Reset for any change to M, m or p |
Copyright 2018, Texas Instruments Incorporated