7. Glossary¶
- Apache TVM¶
An open source machine learning compiler framework for CPUs, GPUs, and other devices. It enables running and optimization of machine learning computations on such hardware.
- C7™ NPU¶
The C7™ Neural Processing Unit combines TI’s C7x DSP with the Matrix Multiplication Accelerator (MMA). It provides highly parallel deep learning instructions optimized for neural network inference. Also referred to as C7x+MMA in older documentation.
- calibration¶
The process of determining quantization parameters (scale and zero-point) for converting floating-point tensors to fixed-point representation. Calibration data should be representative of actual inference inputs.
- compute¶
A mathematical formula that specifies what an operator does.
- DMA¶
Direct Memory Access. Used by the C7™ NPU to efficiently transfer data between memory and processing units without CPU involvement.
- fat binary¶
A deployable module (.so file) that contains both TVM-generated code and embedded TIDL artifacts, allowing a single file to run inference.
- inference¶
The act of using a trained deep learning model to produce a prediction from input data.
- MMA¶
The Matrix Multiplication Accelerator (MMA) is a key hardware accelerator on TDA4/AM6x processors. The MMA provides highly parallel deep learning instructions. It is architected to optimize data flow management for deep learning, while minimizing power and external memory devices. The MMA is accessed as an extension of the C7x instruction set. Together with the C7x DSP, it forms the C7™ NPU.
- OpenVX¶
A cross-platform API for computer vision applications. TI TVM uses OpenVX to dispatch TIDL subgraphs to the C7™ NPU during inference.
- Relay IR¶
TVM’s internal common representation for machine learning models.
- schedule¶
In TVM, a schedule specifies how to realize a computation via loop nests and data movement. Schedules can be tuned to optimize performance for specific hardware.
- Streaming Engine¶
A C7x hardware feature that prefetches data in predictable patterns, improving memory bandwidth utilization for vector operations.
- subgraphs¶
A subset of nodes in a model graph. In TI TVM, the model is partitioned into subgraphs: those that can be accelerated by TIDL run on the C7™ NPU, and the remainder execute via TVM code generation.
- TDA4 and AM6x families¶
The TDA4 and AM6x families include dual/quad Arm Cortex-A72/Cortex-A53 SoCs with C7™ NPU. They are designed for applications in deep-learning, vision and multimedia.
- TIDL¶
TI Deep Learning library is TI’s software ecosystem for deep learning algorithm (CNN) acceleration. It contains highly optimized implementations of common layers on C7™ NPU. TI TVM can offload supported computations to TIDL.
- TVM¶
Tensor Virtual Machine (TVM) is a compiler stack used to compile various deep learning models from different frameworks to specialized CPU, GPU or other accelerator architectures.