MMALIB Release Notes

Version: 00.08.00.00


Contents

  1. Introduction
  2. Licensing
  3. Getting Started
  4. Documentation
  5. What's New
  6. Upgrade and Compatibility Information
  7. Device Support
  8. Validation Information
  9. Fixed Issues
  10. Known Issues
  11. Technical Support
  12. Package Versioning


Introduction

The MMALIB package consists of the Texas Instruments optimized kernels for CNN, FFT, LINALG algorithms.


Licensing

The licensing information of this library and a complete manifest along with export control information is detailed here [HTML].


Getting Started

The MMALIB User Guide [USER_GUIDE] provides the documentation and references necessary to begin development on TI's platforms.


Documentation

Refer to following documentation for further details:

MMALIB User Guide Build instructions, API Guide [USER_GUIDE]
Test Reports Misra C reports, conformance test reports, TI platform test reports [TBD]
Software Manifest Licenses, terms of use [HTML]


What's New

Here are a few of the new features supported in this release:

CNN

  • [new] Support for small feature maps and K blocks < 3
  • [MMALIB-5, 45] Row convolution optimized
  • [MMALIB-70, 71] Column convolution optimized
  • MMALIB_CNN_convolve_row_ixX_ixX_oxX
  • MMALIB_CNN_convolve_row_1x1stride2_ixX_ixX_oxX
  • MMALIB_CNN_convolve_row_3x3stride2_ixX_ixX_oxX
  • MMALIB_CNN_convolve_row_5x5stride2_ixX_ixX_oxX
  • MMALIB_CNN_convolve_row_7x7stride2_ixX_ixX_oxX
  • MMALIB_CNN_convolve_row_11x11stride4_ixX_ixX_oxX
  • MMALIB_CNN_convolve_col_smallNo_ixX_ixX_oxX
  • MMALIB_CNN_fullyConnected_ixX_ixX_oxX
  • MMALIB_CNN_generateFillSeamPredicateRegisters
  • MMALIB_CNN_generateFillBiasPredicateRegisters
  • LINALG

  • 8, 16 and 32 bit signed integer support
  • [MMALIB-129, 130, 131, 148] Matrix multiply optimized
  • [MMALIB-132, 133,134, 149] Matrix multiply accumulate optimized
  • [MMALIB-135, 136, 137, 150] Pointwise matrix multiply optimized
  • [MMALIB-138, 139, 140, 151] Matrix transpose optimized
  • MMALIB_LINALG_matrixMatrixMultiply_ixX_ixX_oxX
  • MMALIB_LINALG_pointwiseMatrixMatrixMultiply_ixX_ixX_oxX
  • MMALIB_LINALG_matrixMatrixMultiplyAccumulate_ixX_ixX_ixX_oxX
  • MMALIB_LINALG_matrixTranspose_ixX_oxX
  • FFT

  • Support for 16-bit and 32-bit signed complex input/output
  • [MMALIB-141, 142, 143, 144, 145, 146, 152, 153, 154, 155, 156] Optimized FFT support
  • Support for real and imaginary parts of data to be in either interleaved or non-interleaved format. Optimal performance assumes interleaved data format
  • Support for FFT sizes from 2 through 8192. FFT sizes are assumed to be powers of 2
  • Support for batch processing of multiple FFTs in one kernel call. For optimal performance, input data and output data for the entire batch, and twiddle factor data are assumed to fit in L2 memory
  • MMALIB_FFT_dftSmall_ixX_cxX_oxX
  • MMALIB_FFT_dftLarge_ixX_cxX_oxX
  • MMALIB_FFT_highRadixDecompositions_ixX_cxX_oxX
  • MMALIB_FFT_fft_ixX_cxX_oxX
  • Utility functions (with suffixes _getSizes) to compute sizes required for data and twiddle factor buffers
  • Utility functions (with suffixes _twGen) to compute relevant DFT matrices and twiddle factors
  • DSP

  • [MMALIB-157, 158, 159, 160] FIR optimized
  • MMALIB_DSP_fir_ixX_ixX_oxX
  • MMALIB_DSP_firSmall_ixX_ixX_oxX
  • Details

    Dilation

  • D = 1: works
  • D > 1: untested
  • CNN style 2D convolution FxF stride by 1

  • K = 1: optimized 8 bit, 16 bit
  • K = 2: optimized 8 bit, 16 bit
  • K = 3: optimized 8 bit, 16 bit
  • K > 3: optimized
  • 1x1 : supported as regular convolution
  • CNN style 2D convolution 1x1 stride by 2

  • K = 1: optimized 8 bit, 16 bit
  • K = 2: optimized 8 bit, 16 bit
  • K = 3: optimized 8 bit, 16 bit
  • K > 3: optimized
  • CNN style 2D convolution 3x3 stride by 2
  • K = 1: optimized 8 bit tested, 16 bit
  • K = 2: optimized 8 bit tested, 16 bit
  • K = 3: optimized 8 bit tested, 16 bit
  • K > 3: optimized
  • CNN style 2D convolution 5x5 stride by 2

  • K = 1: optimized 8 bit tested, 16 bit
  • K = 2: optimized 8 bit tested, 16 bit
  • K = 3: optimized 8 bit tested, 16 bit
  • K > 3: optimized

    CNN style 2D convolution 7x7 stride by 2

  • K = 1: currently natural C
  • K = 2: currently natural C
  • K = 3: optimized 8 bit tested, 16 bit
  • K > 3: optimized
  • CNN style 2D convolution 11x11 stride by 4

  • K = 1: currently natural C
  • K = 2: currently natural C
  • K = 3: currently natural C
  • K > 3: optimized
  • Restrictions for Row Convolution MBlock * N Block < 3 optimized support for K > 3 for all cases except for 1x1 stride for which the restriction is K > 4
  • CNN style 2D convolution 3x3 5x5 7x7 Ni = No = 1

    All:

  • [new] Bias can be embedded into kernel coefficients or a seperator when calling the reorder weights function
  • [new] Split processing of a single group across both sides of the MMA
  • multiple groups processed in single kernel call
  • supports multiple rows of bias provided the rows do not push the computation beyond the MMA block size
  • CNN style 2D convolution 3x3 5x5 7x7 Ni, No = small
  • All: not yet implemented
  • Fully Connected
  • Supports 8- and 16-bit datatypes
  • Supports signed and unsigned combinations of datatypes with kernel matrix always assumed to be signed
  • Support of multiple Bias rows using 2D DECDIM feature
  • DFT building blocks
  • 16-bit, interleaved/non-interleaved, any FFT_SIZE, any BATCH_SIZE: works
  • 32-bit, interleaved/non-interleaved, any FFT_SIZE, any BATCH_SIZE: works
  • FFT building blocks
  • 16-bit, interleaved/non-interleaved, 64 < FFT_SIZE < 8192, BATCH_SIZE = 1,2,4,8,16,32,64: works
  • 32-bit, interleaved/non-interleaved, 32 < FFT_SIZE < 8192, BATCH_SIZE = 1,2,4,8,16,32,64: works
  • FFT coordination
  • 16-bit, interleaved/non-interleaved, any FFT_SIZE, any BATCH_SIZE: works
  • 32-bit, interleaved/non-interleaved, any FFT_SIZE, any BATCH_SIZE: works

  • Upgrade and Compatibility Information

    File Change description User application change required User application recompile required


    Device Support

    SoC Host (OS) Target (OS) Test Platform
    J7ES C7x, MMA No OS No OS Loki

    Validation Information

    This release was built and validated using the following tools:

    Build Tools (NOT included in MMALIB):


    Fixed Issues

    1. [MMALIB-210] Row Flow : Strided convolution output is in-correct for last block
    2. [MMALIB-215] ColumnFlow: NATC mode is not bit matching with ref for odd panel case
    3. [MMALIB-216] ColumnFlow: NATC mode is not bit matching with ref for stride 2
    4. [MMALIB-217] Row Flow : Not working when MN blocks is <= 2
    5. [MMALIB-218] ColumnFlow: Re-ordering function is not working correctly
    6. [MMALIB-219] Seam insertion logic is not working for Mchannels != SubMChannels case
    7. [MMALIB-220] Row Flow : NATC output is not matching with optimized output for K =2 and pad =1


    Known Issues

    1. None

    Technical Support

    For technical support, please post your questions on TI E2E Forum for Automotive ADAS SoCs.

    For additional assistance, contact local TI Field Application Engineer


    Package Versioning

    Each package version is composed of 4 period-delimited numbers - represented here by the letters M, m, p and b [M.m.p.b]. The table below provides a descriptive reference regarding package version numbering.

    Digit Meaning Description
    1 (M=Major) Major revision Incremented when the new version is substantially different from the previous For example, a new module added or an existing module's algorithm significantly altered.
    2 (m=minor) Minor revision Incremented when the new version has changed but not in a major way. For example, some minor changes in the API or feature set.
    3 (p=patch) Patch number Incremented for all other source code changes. This include any packaging support code.
    4 (b=build) Build number Incremented for each release delivery to CM. Reset for any change to M, m or p

    Copyright 2018, Texas Instruments Incorporated