The UDMA driver provides API to program the DMA module of the DMSS subsystem to setup and initiate DMA transfers.
The primary goal of the Data Movement Subsystem (DMSS) is to ensure that data can be efficiently transferred from a producer to a consumer so that the real time requirements of the system can be met. The Data Movement architecture aims to facilitate Direct Memory Access (DMA) and to provide a consistent Application Programming Interface (API) to the host software. Data movement tasks are commonly offloaded from the host processor to peripheral hardware to increase system performance. Significant performance gains may result from careful design of the interface between the host software and the underlying acceleration hardware. In networking applications packet transmission and reception are critical tasks. In general purpose compute, ping pong buffer pre-fetch and store are critical tasks as are general misaligned block copy operations.
The block diagram provides a high level picture of not only the 2 different interconnect fabrics but also some key standard data movement components that have been defined and placed in the various parts of the low cost compliant SoC. Packet DMA (PKTDMA) and Block Copy DMA (BCDMA) which are the two instances of the DMSS specification serving different use cases.
The PKTDMA is intended to perform similar functions as the packet oriented DMA. The PKTDMA module supports the transmission and reception of various packet types. The PKTDMA is architected to facilitate the segmentation and reassembly of DMA data structure compliant packets to/from smaller data blocks that are natively compatible with the specific requirements of each connected peripheral. Multiple TX and RX channels are provided within the DMA which allow multiple segmentation or reassembly operations to be ongoing. The DMA controller maintains state information for each of the channels which allows packet segmentation and reassembly operations to be time division multiplexed between channels in order to share the underlying DMA hardware. An internal DMA scheduler is used to control the ordering and rate at which this multiplexing occurs for Transmit operations. The ordering and rate of Receive operations is indirectly controlled by the order in which blocks are pushed into the DMA on the RX PSI-L interface.
The Block Copy DMA is intended to perform similar functions as the EDMA or the UDMA-P/UTC. The BCDMA module moves data from a memory mapped source address set to a corresponding memory mapped address set. The BCDMA maintains state information for each of the channels which allows data copy operations to be time division multiplexed between channels in order to share the underlying DMA hardware. An internal DMA scheduler is used to control the ordering and rate at which this multiplexing occurs.
Below section describes the high level flow of the driver for the data transfer
Transfer configuration is specified in the TR record. Size of TR is variable from 16 bytes to 64 bytes. Specified via TR Type in FLAGS field
Below table summarizes different TR types and the transfer type for which they are used
TR Type | Descriptrion |
---|---|
Type 0 | 1D (word0-3) |
Type 1 | 2D (word0-4) |
Type 2 | 3D (word0-6) |
Type 3 | 4D (word0-8) |
Type 5 | Cache warm (word0-15) (MSMC DRU ONLY) |
Type 8 | 4D Block Copy (word0-15) |
Type 9 | 4D Block Copy with reformatting (word0-15) (MSMC DRU ONLY) |
Type 10 | 2D Block Copy (word0-15) |
Type 11 | 2D Block Copy with reformatting (word0-15) (MSMC DRU ONLY) |
Type 15 | 4D Block Copy with reformatting and indirection (word0-15) (MSMC DRU ONLY) |
Below diagram shows the high level flow for the transfer requests from application and driver
Below diagram shows the UDMA transfer API flow
Include the below file to access the APIs
Channel Open Example
Channel Close Example