5. PSDK QNX Components

5.1. QNX BSP release

The QNX BSP package must be downloaded from QNX Software Center or Contact QNX.

Note

Refer Release Notes software dependencies section for more information.

For the purposes of creating a consolidated build process and in turn allowing for support of scripts to create SD content, the BSP needs to be extracted to the folder within the PSDK QNX build environment as show below.

For QNX SDP 800:

# Unzip the BSP
mkdir -p ${PSDK_RTOS_PATH}/psdkqa/qnx/bsp
cd ${PSDK_RTOS_PATH}/psdkqa/qnx/bsp
unzip ${QNX_BASE}/bsp/BSP_ti-j784s4-evm_br-hw-rel_be-800_<version>.zip

5.1.1. TI Modifications to the BSP

Memory Carveout

Memory sections with pre-defined physical addresses must be set aside in the QNX BSP IFS build file so that the memory is not given to other programs and can be used solely by the remote cores and video codec.

Specify a section to be set aside by modifying the startup line to use the “-r” option. For example, to reserve 0x60000000 bytes (1.5GB), starting at physical address 0xA0000000, on J784S4 QNX BSP, the build file arguments would be (highlighted below, along with other memory carveout reservations):

[+keeplinked] startup-j784s4-evm -v -r0xA0000000,0x60000000,1 -r0x880000000,0x80000000,1 -r0x900000000,0x3C000000,1 -r0x940000000,0x60000000,1 -d -e

Note

Start addresses for carveouts are at 0x90000000 for low-mem carveouts, and at 0x880000000 for high-mem carveouts.

  • For vision apps remote cores, the memory carveouts are as follows:

    • The first remote core carevout of 0x60000000 (1536 MB) is in the lower 2GB memory range starting at address 0xA0000000.

    • The second remote core carevout of 0x80000000 (2 GB) is in the higher 30GB memory range starting at address 0x880000000.

    • The third remote core carveout of 0x3C000000 (960 MB) is in the higher 30GB memory range starting at address 0x900000000.

  • For video codecs, the memory carveouts are as follows:

    • The high-mem codec carveout of 0x60000000 (1536 MB) is in the higher 30GB memory range starting at address 0x940000000.

Reference TI build file

The reference TI build file is provide in the qnx/scripts/bsp/<BSP_REVISION> directory.

# Copy in TI specific build scripts and settings to allow building of a QNX-IFS which supports
# vision_apps demos and video codec demos
cp ${PSDK_RTOS_PATH}/psdkqa/qnx/scripts/bsp/<BSP_REVISION>/j784s4-evm-ti.build ${PSDK_RTOS_PATH}/psdkqa/qnx/bsp/images/

Note

Other build file deltas between QNX and TI PSDK QNX environment may be present. TI modifies build scripts for ease of use on TI EVM and running of demonstration software.

5.2. SCI Client Resource Manager

The SCI Client Resource Manager (tisci-mgr) provides support for multiple users to make use of the sciclient library from PSDK RTOS without interfering with each other requests. It provides a mechanism to serialize the transactions to the DMSC.

Note

The tisci-mgr logs the SYSFW information to the slogger.

Note

Refer PSDK RTOS Components for more information about each individual component.

5.3. IPC Resource Manager

5.3.1. Overview

The IPC resource manager (tiipc-mgr) provides a processor-agnostic API which can be used for communication between processors in a multi-processor environment.

The IPC resource manager provides a user library (tiipc-usr) that exposes the IPC LLD API to QNX applications for communication through the IPC resource manager. The IPC LLD API is described in detail in the PDK documentation.

Note

The size of the input buffer passed to the RPMessage_recv() API MUST be 528 (IPC_RPMESSAGE_MSG_BUFFER_SIZE) bytes.

5.3.2. Example Application

An IPC example application (ipc_test) is provided to test the IPC communication with remote cores running the IPC echo test remote core firmware images.. To use this application, the appropriate remote core firmware images need to be built and placed on the target filesystem. The following remote core firmware images are needed:

Firmware Name

Core(s)

ipc_qnx_echo_testb_freertos

use for mcu1_0

ipc_qnx_echo_test_freertos

use for mcu2_0, mcu2_1, mcu3_0, mcu3_1, mcu4_0 mcu4_1, C7x_1, C7x_2, C7x_3 and C7x_4

For remote core firmware build instructions, please refer to the IPC LLD PDK documentation.

Note

The below step to copy the remote core firmware to the rootfs partition of target filesystem is applicable only to the SPL-UBOOT bootflow. For the SBL-BootApp bootflow, the ipc echo test image need to be bundled into the Appimage. Please refer to the MCUSW documentation for details.

Once the remote core firmware is built, copy the images to the target filesystem in the rootfs partition. Existing firmware binaries should be backed up as required. The ex02* firmware binaries should be renamed to the firmware name expected by the bootloader. For example:

cp ipc_qnx_echo_test_freertos_c7x_1_release.xe71 ${ROOTFS}/lib/firmware/j784s4-c71_0-fw
cp ipc_qnx_echo_test_freertos_c7x_2_release.xe71 ${ROOTFS}/lib/firmware/j784s4-c71_1-fw
cp ipc_qnx_echo_test_freertos_c7x_2_release.xe71 ${ROOTFS}/lib/firmware/j784s4-c71_2-fw
cp ipc_qnx_echo_test_freertos_c7x_3_release.xe71 ${ROOTFS}/lib/firmware/j784s4-c71_3-fw
cp ipc_qnx_echo_test_freertos_mcu2_1_release.xer5f ${ROOTFS}/lib/firmware/j784s4-main-r5f0_1-fw
cp ipc_qnx_echo_test_freertos_mcu2_0_release.xer5f ${ROOTFS}/lib/firmware/j784s4-main-r5f0_0-fw
cp ipc_qnx_echo_test_freertos_mcu3_1_release.xer5f ${ROOTFS}/lib/firmware/j784s4-main-r5f1_1-fw
cp ipc_qnx_echo_test_freertos_mcu3_0_release.xer5f ${ROOTFS}/lib/firmware/j784s4-main-r5f1_0-fw
cp ipc_qnx_echo_test_freertos_mcu4_1_release.xer5f ${ROOTFS}/lib/firmware/j784s4-main-r5f2_1-fw
cp ipc_qnx_echo_test_freertos_mcu4_0_release.xer5f ${ROOTFS}/lib/firmware/j784s4-main-r5f2_0-fw

Note

As seen above, the mcu1_0 firmware image is not copied to the target filesystem in the rootfs partition. Instead it need to be built into the tispl.bin as part of the SPL-UBOOT boot binaries.

For testing ipc with mcu1_0 image, the PSDK Linux package is required to be installed. The default SPL-UBOOT binaries include the ipc test mcu1_0 image that work with Linux only. Hence for QNX, we need to rebuild the SPL-UBOOT to include the “ipc_qnx_echo_testb_freertos_mcu1_0_release” image.

Below are the steps to do this:

  1. Build “ipc_qnx_echo_testb_freertos_mcu1_0_release” image.

cd ${PSDK_RTOS_PATH}/pdk_j784s4_{version}/packages/ti/build
make -s ipc_qnx_echo_testb_freertos BOARD=j784s4_evm CORE=mcu1_0 -j2
  1. Copy the generate mcu1_0 firmware image to the PSDK Linux path mentioned below and rebuild UBOOT.

cp ${PSDK_RTOS_PATH}/pdk_j784s4_{version}/packages/ti/binary/ipc_qnx_echo_testb_freertos/bin/j784s4_evm/ipc_qnx_echo_testb_freertos_mcu1_0_release_strip.xer5f ${PSDK_LINUX_PATH}/board-support/prebuilt-images/ti-dm/j784s4/ipc_echo_testb_mcu1_0_release_strip.xer5f
cd ${PSDK_LINUX_PATH}
make u-boot_clean
make u-boot
  1. Copy the newly generated UBOOT file to the SD card boot partition

cd ${PSDK_LINUX_PATH}/board-support/u-boot_build/r5/tiboot3.bin ${BOOTFS}
cd ${PSDK_LINUX_PATH}/board-support/u-boot_build/a72/tispl.bin  ${BOOTFS}
cd ${PSDK_LINUX_PATH}/board-support/u-boot_build/a72/u-boot.img  ${BOOTFS}

After copying the firmware and booting the target, the ipc_test can be run from the command line (example output given below):

J7EVM@QNX:/# /sd/tibin/ipc_test
IPC_echo_test (core : mpu1_0) .....
responderFxn will stay active. Please use ctrl-c to exit the test when finished.
SendTask7: mpu1_0 <--> C7X_1, Ping- 10, pong - 10 completed
SendTask1: mpu1_0 <--> mcu1_0, Ping- 10, pong - 10 completed
SendTask6: mpu1_0 <--> mcu3_1, Ping- 10, pong - 10 completed
SendTask5: mpu1_0 <--> mcu3_0, Ping- 10, pong - 10 completed
SendTask3: mpu1_0 <--> mcu2_0, Ping- 10, pong - 10 completed
SendTask4: mpu1_0 <--> mcu2_1, Ping- 10, pong - 10 completed
SendTask8: mpu1_0 <--> C7X_2, Ping- 10, pong - 10 completed

Make sure that the tiipc-mgr is running before executing the ipc_test. Note that the test app will not exit. Press “ctrl+c” to exit

Note

Run “ipc-test -s” to avoid waiting for the user to exit the test using “ctrl+c”.

Note

If SPL-UBOOT boot flow is used, the ipc_test will not be able to communicate with mcu1_1. All other core will work. Note that mcu1_1 is not loaded with any firmware image.

Note

If SBL / BootApp boot flow is used, the ipc_test will not be able to communicate with mcu1_0, & mcu1_1. All other core will work.

Note

Refer PSDK RTOS Components for more information about each individual component.

5.4. UDMA Resource Manager

The UDMA resource manager (tiudma-mgr) provides support for multiple users to make use of the UDMA functionality without interfering with each other requests.

Note

Refer PSDK RTOS Components for more information about each individual component.

5.5. Shared Memory Allocator

The Shared Memory Allocator resource manager (shmemallocator) provides support for multiple users to allocate memory from the shared memory region. This shared memory region is carved out of the QNX memory as part of the QNX Startup parameters.

5.6. CPSW2G DEVNP driver

5.6.1. Overview

The CPSW2G DEVNP network driver can be viewed as the “glue” between the underlying cpsw2g low-level driver, and the software infrastructure of io-pkt, the protocol stack above it. The “bottom half” of the driver is coded specifically to interact with the PDK’s cpsw & udma low-level drivers, and the “top half” of the driver is coded specifically for io-pkt.

Note

Refer PSDK RTOS Components for more information about each individual component.

Note

The cpsw2g driver’s CPTS interrupt thread must always be higher priority compared to the RX and TX threads of the driver. By default, the network stack thread is at priority 21. Hence the RX and TX thread are created at this priority level. The CPTS interrupt thread is created at priority 22. These priority values can be changed by providing them as command-line parameters. See driver use command.

5.6.2. Running

Boot the board with the SD card. At the QNX prompt, run below, to mount the SD card and then launch the executable:

tisci-mgr
tiudma-mgr
io-pkt-v6-hc -d cpsw2g
dhclient -nw am0

The cpsw2g driver configures all the driver created threads to have the runmask as 0x0 by default. This is to make sure all the threads run on core 0 of the A72. This can be configured with the run_mask_cpu= command-line option. See the use command or :ref:”Usage of devnp-cpsw2g.so” for further details.

For debug traces, run the below command before starting the DEVNP driver and start the driver with increased verbose parameter (ex: verbose=0x3ff) this will show all the drivers slog messages.

slog2info -c
slog2info -w &

5.6.3. Additonal steps

  • Run “if_up -p am0” to check if the interface is ready

  • Run “ifconfig am0 up” to bring UP the link.

  • Run “dhclient -nw am0” for DHCP server provide the IP address

  • Run “ifconfig -v” to check the assigned IP address and status

  • Run “tcpdump -e” to look at the tcp traffic

5.6.4. Usage of devnp-cpsw2g.so

We can run the below command to get the usage of the DEVNP driver:

$ use devnp-cpsw2g.so

devnp-cpsw2g.so mcu domain cpsw2g ethernet driver based on enet low level driver

Syntax:
  io-pkt-v6-hc -d cpsw2g [option[,option ...]] ...

Options (to override autodetected defaults):
  verbose=num             Set verbosity level (default: 0).
  mac-to-mac=1            Set for mac-to-mac mode (default: 0)
  speed=100|1000          Media data rate for port 0 in Mb/s. (default: 1000)
  p0mac=XXXXXXXXXXXX      Custom MAC address to use on port 0.
  ptp=0|1                 1 to enable PTP support (default: 0)
  promiscuous=0|1         1 to enable promiscuous mode (default: 0)
  typed_mem=name          Set the typed memory
  udma_chnum=val          Set the preferred udma channel to use (default: dynamic allocation)
  tx_freeq_threshold=val  Set the tx free Q threshold value (default: 120), must be less than no of TX descriptors.
  tx_descriptor_cnt=val   Set the tx descriptors count value (default: 128, Max: 256)
  rx_descriptor_cnt=val   Set the rx descriptors count value (default: 128, Max: 256)
  run_mask_cpu=val        Set the run mask - cpu core where threads are scheduled (default:0) (0 - core0, 1 - core1, 2 - both core)
  poll_phy_ms=val         Set the frequency in ms, to poll the phy status, and poll for management of resources (default: 2000)
  cache_ops=val           Set the cache operation to on/off (default:0  -> cache-coherency set), 1 - Turn on cache off -> cache-coherency off)
  smmu=0|1                1 to enable smmu support (default: 0)
  virt_id=val             Set the virt_id to use for the dma channel when smmu is enabled
  hw_csum=1               1 to enable hw csum (default: 0)
  joinvlan="1;2;3..."     List of VLANs to join
  rx_intr_prio=val        Set the rx interrupt thread priority (default: 21)
  tx_intr_prio=val        Set the tx interrupt thread priority (default: 21)
  cpts_intr_prio=val      Set the cpts interrupt thread priority (default: 22)
  no_stack_thread=val     1 to disable using the stack thread needed for bridge and fastforward to work  (default: 0)
  rx_pacing=0|1           1 to enable rx interrupt pacing (default:0, use rx interrupt)
  rx_pacing_msec=val      Set the rx interrupt pacing internal in msec (default: 1 msec)

Examples:
  # Start io-pkt using the driver and with static IP address
  io-pkt-v6-hc -d cpsw2g
  ifconfig am0 192.0.2.1

  # Start io-pkt using the driver and with DHCP IP address
  io-pkt-v6-hc -d cpsw2g
  dhclient -nw am0

  # Start io-pkt using the driver for mac-to-mac mode in gitbit speed
  io-pkt-v6-hc -d cpsw2g verbose=1,mac-to-mac=1,speed=1000
  ifconfig am0 192.0.2.1

  # Start io-pkt using the driver with custom mac address
  io-pkt-v6-hc -d cpsw2g p0mac=001122334455
  ifconfig am0 192.0.2.1

  # Start io-pkt using the driver with typed memory "ram":
  io-pkt-v6-hc -d cpsw2g typed_mem=ram -ptcpip pkt_typed_mem=ram
  ifconfig am0 192.0.2.1

  # Start io-pkt using the driver with typed memory "ram", preferred udma channel "24" and smmu enabled:
  io-pkt-v6-hc -d cpsw2g ptp=1,typed_mem=ram,udma_chnum=24,smmu=1 -ptcpip pkt_typed_mem=ram
  ifconfig am0 192.0.2.1

  # Start io-pkt using the driver with hw csum enabled
  io-pkt-v6-hc -d cpsw2g hw_csum=1
  ifconfig am0 tcp4csum udp4csum tcp6csum udp6csum
  ifconfig am0 192.0.2.1

5.6.5. Starting driver with other options

To run the cpsw2g devnp driver with full debug log (run “slog2info” to see the debug log):

io-pkt-v6-hc –d cpsw2g verbose=0xff

To run the cpsw2g devnp driver in mac-to-mac mode with 1Gbps:

io-pkt-v6-hc –d cpsw2g mac-to-mac=1,speed=1000

To run the cpsw2g devnp driver in mac-to-mac mode with 1Gbps with gPTP:

io-pkt-v6-hc –d cpsw2g ptp=1,mac-to-mac=1,speed=1000

To get a dynamic IP address for the cpsw2g port:

dhclient -nw am0

To get a static IP address for the cpsw2g port:

ifconfig am0 up
ifconfig am0 <static_ip_address>

To enabled HW CSUM offloading support:

io-pkt-v6-hc -d cpsw2g hw_csum=1
ifconfig am0 tcp4csum udp4csum tcp6csum udp6csum
dhclient -nw am0

5.7. CPSW9G DEVNP driver

5.7.1. Overview

The CPSW9G DEVNP network driver implemented as “virtual” driver that communicates with the Ethernet Firmware Switch firmware running on the R5 core. The control message transfer is done via IPC. The RX and TX data packets are passed to the CPSW9G port using the UDMA.

CPSW9G Firmware on MCU2_0 implemented TimeSync Module utilizing CPTS timer for PTP support to sync with Master clock on the network. TimeSync module configured PTP stack with the following default properties.

  • Ordinary Clock

  • IEEE 802.3 Transport

  • Peer Delay Mechanism (P2P)

For TimeSync Module design, please refer to EthFW documentation for details. CPSW9G DEVNP driver has included an option to return CPTS timer timestamp via devctl() with PTP_GET_TIME command. Please refer to the example, ptp_test, to see how to use PTP_GET_TIME command to get CPTS timer timestamp from cpsw9g devnp driver.

Note

Refer PSDK RTOS Components for more information about EthFW component.

5.7.2. Running

Boot the board with the SDcard. At the QNX prompt, run below, to mount the SD card and then launch the executables:

tisci-mgr
tiipc-mgr
tiudma-mgr
io-pkt-v6-hc -d cpsw9g
dhclient -nw an0

The cpsw9g driver configures all the driver created threads to have the runmask as 0x0 by default. This is to make sure all the threads run on core 0 of the A72. This can be configured with the run_mask_cpu= command-line option. See the use command or :ref:”Usage of devnp-cpsw9g.so” for further details.

For debug traces, run the below command before starting the DEVNP driver and start the driver with increased verbose parameter (ex: verbose=0x3ff) this will show all the drivers slog messages

slog2info -c
slog2info -w &

5.7.3. Additional steps

  • Run “if_up -p an0” to check if the interface is ready

  • Run “ifconfig an0 up” to bring UP the link.

  • Run “dhclient -nw an0” for DHCP server provide the IP address

  • Run “ifconfig -v” to check the assigned IP address and status

  • Run “tcpdump -e” to look at the tcp traffic

5.7.4. Usage of devnp-cpsw9g.so

We can run the below command to get the usage of the DEVNP driver

$ use devnp-cpsw9g.so

devnp-cpsw9g.so j721e cpsw 9G virtual ethernet driver based on enet low level driver

Syntax:
  io-pkt-v6-hc -d cpsw9g [option[,option ...]] ...

Options (to override autodetected defaults):
  verbose=num             Set verbosity level (default: 0).
  ptp=0|1                 1 to enable PTP support (default: 0)
  typed_mem=name          Set the typed memory
  udma_chnum=val          Set the preferred udma channel to use (default: dynamic allocation)
  tx_freeq_threshold=val  Set the tx free Q threshold value (default: 120), must be less than no of TX descriptors.
  tx_descriptor_cnt=val   Set the tx descriptors count value (default: 128, Max: 256)
  rx_descriptor_cnt=val   Set the rx descriptors count value (default: 128, Max: 256)
  run_mask_cpu=val        Set the run mask - cpu core where threads are scheduled (default:0) (0 - core0, 1 - core1, 2 - both core)
  poll_phy_ms=val         Set the frequency in ms, to poll the phy status, and poll for management of resources (default: 10000)
  cache_ops=val           Set the cache operation to on/off (default:0  -> cache-coherency set), 1 - Turn on cache off -> cache-coherency off)
  smmu=0|1                1 to enable smmu support (default: 0)
  virt_id=val             Set the virt_id to use for the dma channel when smmu is enabled
  mac-to-mac=1            Set for mac-to-mac mode (default: 0) (only when EthFW is setup for mac-to-mac mode)
  speed=100|1000          Media data rate for link in Mb/s. (default: 1000)
  joinvlan="1;2;3..."     List of VLANs to join
  rx_intr_prio=val        Set the rx interrupt thread priority (default: 21)
  tx_intr_prio=val        Set the tx interrupt thread priority (default: 21)
  no_stack_thread=val     1 to disable using the stack thread needed for bridge and fastforward to work  (default: 0)
  hw_csum=1               1 to enable hw csum (default: 0)
  p0mac=XXXXXXXXXXXX      Custom MAC address to use on virtual port
  rx_pacing=0|1           1 to enable rx interrupt pacing (default:0, use rx interrupt)
  rx_pacing_msec=val      Set the rx interrupt pacing internal in msec (default: 1 msec)

Examples:
  # Start io-pkt using the driver:
  io-pkt-v6-hc -d cpsw9g
  ifconfig an0 192.0.2.1

  # Start io-pkt using the driver with typed memory "ram":
  io-pkt-v6-hc -d cpsw9g typed_mem=ram -ptcpip pkt_typed_mem=ram
  ifconfig an0 192.0.2.1

  # Start io-pkt using the driver with typed memory "ram", preferred udma channel "24" and smmu enabled:
  io-pkt-v6-hc -d cpsw9g ptp=1,typed_mem=ram,udma_chnum=24,smmu=1 -ptcpip pkt_typed_mem=ram
  ifconfig an0 192.0.2.1

  # Start io-pkt using the driver for mac-to-mac mode in gitbit speed
  io-pkt-v6-hc -d cpsw9g mac-to-mac=1,speed=1000
  ifconfig am0 192.0.2.1

5.7.5. Starting driver with other options

To run the cpsw9g devnp driver with gPTP:

io-pkt-v6-hc –d cpsw9g ptp=1

Run "ptp_test" utility to  verfiy/get the synchronized time from EthFW.

To run the cpsw9g devnp driver in mac-to-mac mode with 1Gbps:

io-pkt-v6-hc –d cpsw9g mac-to-mac=1

5.8. K3conf QNX utility

K3CONF is a QNX port of a standalone application designed to provide a quick’n easy way to dynamically diagnose Texas Instruments’ K3 architecture based processors. K3CONF is intended to provide similar experience to that of OMAPCONF that runs on legacy TI platforms.

Note

WARNING: This is work in progress! Please don’t expect things to be complete in any dimension. Use at your own risk. And keep the reset button in reach.

To get more details on how to use k3conf utility, run below:

k3conf --help

5.9. VPU Video Codec

5.9.1. Overview

The WAVE5 Video Processing Unit (VPU) is a 4K Codec that supports both HEVC and H.264/AVC video formats. It provides high performance encode and decode capability for 8-bit YUV video up to 4K @60fps. The VPU is highly optimized for memory bandwidth loading and it has excellent power management.

Encoder:

  • Capable of encoding H.265/HEVC Main and Main Still Picture Profiles @ L5.1 High tier.

  • Capable of encoding H.264/AVC Baseline/Constrained Baseline/Main/High Profiles @ L5.2.

Decoder:

  • Capable of decoding H.265/HEVC Main and Main Still Picture Profiles @ L5.1 High tier.

  • Capable of decoding H.264/AVC Baseline/Constrained Baseline/Main/High Profiles @ L5.2.

Maximum Resolutions Supported:

  • Encoder Maximum resolution: 8192x8192

  • Decoder Maximum resolution: 8192x4320

    • Note: the VPU can handle the high resolutions, above, but frame-rate performance will be limited

Multiple concurrent encode/decode streams:

  • Number of concurrent streams is dependent on the resolutions and frame rates required

There are 2 instances of the VPU codec present in the SoC and they can perform operations independent of one another.

Resource Manager for VPU codec:

The codec is managed by a QNX Resource Manager driver. This is responsible to manage access to VPU hardware and can support parallel encode and decode operations, both encode and decode happening either as multi-channel or multi-instance.

Current Software Limitations:

Note

All VPU codec software is currently configured to work with memory in the high-memory region of DDR (using a mem carveout, specifically at 0x940000000). A codec memory carveout must be currently specified in this DDR region to use all the codec test examples given in the usage instructions below.

The entire codec carveout is currently managed by the TI Shared Memory Allocator resource manager (shmemallocator). This means that the omx components or the codec resource manager does not restrict itself with the memory carveout management. The only portion of the resource manager that is aware of the high mem is the WAVE5_PROC_AXI_EXT_ADDR value which is configured to be 0x9 for the high mem. If you choose to relocate your highmem carveout to a location in high mem other than 0x9 xxxx xxxx then you should make the corresponding change for this base address and obviously update the shared memory allocator blocks that track the codec high mem.

Multi-instance Support: The resource manager supports 16 parallel instances of encode and/or decode on each core simultaneously. The test vector resolution and the codec performance are intentionally left unspecified here since they depend on a number of factors - bitrate, fps, number of channels to name a few. For instance, in our test setup, for D1 resolution test vectors, when we do 8 encodes and 8 decodes simultaneously, we see that the internal buffer carveout for VPU needs to be at least 310M.

Notable Hardware Limitations

  • Decoder has a minimum resolution of 8x8 for H.265/HEVC and 32x32 for H.264/AVC

  • Decoder does not support non-alligned resolutions (pixel height & width not divisible by 8)

While h.264 encoding may work with the presence of some pixel anomalies, this feature is not supported by the hardware or this SDK

  • Decoder does not support 10-bit color

  • Encoder has a minimum resolution of 256x128

  • Encoder is not capable of producing YUV 4:2:2 output

YUV 4:2:2 input is acceptable, but will be downsampled to YUV 4:2:0 as a part of the encoding process. The pseudo-422 option in the test app can be used to mimic YUV 4:2:2 chroma plane format without the increased quality.

  • Encoder does not support non-alligned resolutions (pixel height & width not divisible by 8)

While h.264 encoding may work with the presence of some pixel anomalies, this feature is not supported by the hardware or this SDK

  • Encoder is not capable of using GOP presets with B frames

Encoding may only be done with I & P frames. When using P frames, only a single reference frame is supported within that GOP. Note that the test app along with the encoder_parameters.conf file can be used as a reference for injecting IDR at desired intervals.

  • Encoder does not support 10-bit color

Decoder: Handling high bit rate (>100mbps) 4K streams

The decoder currently has 2 fixed size input buffers that it expects to be configured as part of the OMX component initialization. These buffers are of 5MB size each and are physically contiguous and adjacent to each other. This configuration lets the VPU hardware see these 2 input buffers as a single 10MB buffer.

When handling high bit rate input streams, the size of these buffers are not enough for the decoder to decode the first full frame. For such specific scenario, it is advised to increase the input buffer size. For a 220mbps bitstream, the decode works for an input buffer size of 10M. Here are the changes needed to get the input buffer size to 10M.

diff --git a/codec/vpu/OpenMAXIL/components/common/omxil_dec_interface.h b/codec/vpu/OpenMAXIL/components/common/omxil_dec_interface.h
index 3e914af9..f55ed0a5 100644
--- a/codec/vpu/OpenMAXIL/components/common/omxil_dec_interface.h
+++ b/codec/vpu/OpenMAXIL/components/common/omxil_dec_interface.h
@@ -34,7 +34,7 @@

 #include "tivpu_dec.h"

-#define VDEC_INPUT_BUF_SIZE (5*1024*1024) // Input Buffer size - set to: ((1 / NUM_IN_BUFFERS) * 10MB)
+#define VDEC_INPUT_BUF_SIZE (10*1024*1024) // Input Buffer size - set to: ((1 / NUM_IN_BUFFERS) * 10MB)

 /**
  *  Event types of callback
diff --git a/codec/vpu/OpenMAXIL/test/dec/input.h b/codec/vpu/OpenMAXIL/test/dec/input.h
index 5a874947..36374a92 100644
--- a/codec/vpu/OpenMAXIL/test/dec/input.h
+++ b/codec/vpu/OpenMAXIL/test/dec/input.h
@@ -29,7 +29,7 @@

 #define CONFIG_DATA_BUFFER_SIZE 8096
 #define DEFAULT_BUFFER_SIZE (2*1024*1024)
-#define INPUT_BUFFER_SIZE  (5*1024*1024) // Input Buffer size - set to: ((1 / NUM_IN_BUFFERS) * 10MB)
+#define INPUT_BUFFER_SIZE  (10*1024*1024) // Input Buffer size - set to: ((1 / NUM_IN_BUFFERS) * 10MB)


 class OmxilVideoDecInput {
diff --git a/codec/vpu/tivpucodec/decoder/tivpu_dec.h b/codec/vpu/tivpucodec/decoder/tivpu_dec.h
index 1234b9c7..0fc59413 100644
--- a/codec/vpu/tivpucodec/decoder/tivpu_dec.h
+++ b/codec/vpu/tivpucodec/decoder/tivpu_dec.h
@@ -24,7 +24,7 @@
 #include "main_helper.h"

 #define STREAM_BUF_SIZE_DEFAULT (4*1024*1024)
-#define STREAM_BUF_SIZE_HEVC   (10*1024*1024)  // bitstream size(HEVC:10MB)
+#define STREAM_BUF_SIZE_HEVC   (20*1024*1024)  // bitstream size(HEVC:10MB)

5.9.2. Usage

OMX IL Components for the VPU video encoder & decoder, and file-based OMX encoder & decoder test applications are provided.

Note that running ti-vpu-codec-mgr (The resource manager for VPU) is a pre-requisite for running the OMX test apps for encoder and decoder. Currently, this is launched as part of the startup scripts.

For more details, run the use command for: omxil_video_enc and omxil_video_dec

$ use omxil_video_enc

This is a test application for OpenMAX IL video encode.
It takes input frames from a .yuv file and encodes them to a H.264 bitstream file.

Syntax:
    omxil_video_enc <options>

Options:
    -C: VPU Core to choose (0 only for j721s2, 0,1 for j784s4)
    -v: increase verbosity, max 7
    -n: stdin not used (no command-line inputs taken)
    -i: input file
    -o: output file
    -s: coding standard (0 = AVC, 1 = HEVC)
    -c: config file for encoder parameters
    -d: disable display
    -a: multi instance mode:
         0 or no option - Use the default memory layout
         1 or 2 - Use either one of the memory layout
    -f: input format for raw input (.yuv/.rgba/etc.)
       e.g.: nv12,1920x1080@30
          nv12 is the input color format
          1920x1080 is resolution(width x height)
          30 is frame rate.
    Supported input format: nv12.
    -L: Enable lossless encoding
    -G: Select GOP preset
          0 - custom_GOP (default / user defined structure)
          1 - all I frames
          9 - consecutive P frames, with single reference I frame
    -I: Enable IDR header information (encode only)
          e.g.: -I 5 - This will set IDR-period to 5

Examples:
    # Basic built-in help:
    omxil_video_enc -h

    # AVC file-to-file encode:
    omxil_video_enc -vv -i /ti_fs/codec_test/vpu/nv12/8bit_BQTerrace_720x128_9frame_nv12.yuv -o /ti_fs/codec_test/vpu/output/8bit_BQTerrace_720x128.264 -d -f nv12,720x128@30

    # AVC file-to-file encode, with config file specified to override default bitrate, etc. :
    cp ti_fs/codec_test/vpu/nv12/mix_1920x1080_8b_10frm_nv12.yuv /tmp/
    omxil_video_enc -vv -i /tmp/mix_1920x1080_8b_10frm_nv12.yuv -c /ti_fs/codec_test/vpu/cfg/encoder_parameters.conf -o /tmp/mix_1920x1080.264 -d -f nv12,1920x1080@30


Additional Options Info:
   -G: Select GOP preset
      Defines frame sequence. Current limitations of the encoder hardware prevent from using GOP
      structures that utilize more than a single reference frame. This allows for the use of consecutive
      I frame and consecutive P frame (w/ single ref I frame) preset which is shown below in the usage
      output. If used w/o GOP preset option, default is 0 (custom_GOP) which follows the consecutive P
      frame (w/ single ref I frame) preset.

   -I: Enable IDR header information
      Determines the frequency of IDR frames within the encoded video. This includes SPS and PPS NAL units.
$ use omxil_video_dec

 This is a test application for OpenMAX IL video decode.
 It takes H.264/HEVC frames from a file, decodes and displays them,
 or decodes and saves the decoded frames in another file.

 Syntax:
     omxil_video_dec <options>


 Options:
     -i: input file
     -o: output file, save output to file
     -L: number of buffers for decode, less than 32
         - default value decided by codec hardware
     -M: number of output buffers, between 3 and 64 (default 12)
     -C: VPU Core to choose (0 only for j721s2, 0,1 for j784s4)
     -v: increase verbosity, max 7
     -n: use the second instance of the carveout. Used for multi-instance testing
     -p: pseudo-YUV422 output using YUV420 source
     -E: spatial & temporal error concealment

 Examples:
     # AVC file-to-file decode:
     omxil_video_dec -v -i /ti_fs/codec_test/bitstream/example.264 -o /ti_fs/codec_test/output/example_nv12.yuv

     # HEVC file-to-file decode:
     omxil_video_dec -v -i /ti_fs/codec_test/bitstream/example.265 -o /ti_fs/codec_test/output/example_hevc_nv12.yuv


 Additional Options Info:
   -L: number of buffers for decode
      This option allows you to specify the number of non-linear frame buffers used by the decoder. Typically,
      the VPU will determine the minimum number needed and pass this value to decoder initialization when allocating
      frame buffers. This results in the most decoder picture buffering and memory savings. If the option passes
      a value lower than this minimum, it will be ignored. By setting this value higher than the minimum, the
      chance of decoder picture buffering is decreased at the cost of higher memmory usage.

   -p: pseudo-YUV422 output using YUV420 source
      Mimics the chroma plane dimensions of YUV 4:2:2 format for easier downstream usage by tools that
      expect this format. This feature is implemented at the application level.

   -E: spatial & temporal error concealment
      Our hardware is capable of block-level error concealment for spatial & temporal frames, which can
      help combat the effect of errors in video transmitted over unreliable channels. Picture-level,
      Slice-level, and Block row-level conceal unit types are also availble.

The omx layer component can be configured to use either instance of the VPU HW. This is still work in progress and the omx app will be updated to test this functionality once implemented. The resource manager is capable of handling both instances of the VPU HW.

An initial version of the VPU Encoder/Decoder driver is provided. It is built as a library. The library accompanies a set of unit-test applications which use file-to-file mode.

For more details, run the use command for: vpu_decoder_test, vpu_encoder_test and vpu_multi_inst_test.

$ use vpu_decoder_test

vpu_decoder_test - This is a VPU Decoder unit-test for file-to-file mode

Syntax:
    vpu_decoder_test <options>

Options (describes primary options, run with -h for more details)
    h           Help
    codec       Codec format (mandatory if using non HEVC encoded input files). Use 0 for AVC, and 12 for HEVC
    input       Input encoded file to decode (mandatory)
    output      Output decoded file (mandatory)

Examples:
    # Basic help built into the executable
    vpu_decoder_test -h

    # AVC file-to-file decode
    vpu_decoder_test --codec 0 --input /ti_fs/codec_test/bitstream/HistoryOfTI-480p.264 --output /ti_fs/codec_test/vpu/output/HistoryOfTI-480p.264-720x416.yuv

    # HEVC file-to-file decode
    vpu_decoder_test --input /ti_fs/codec_test/bitstream/TearOfSteel-Short-1280x720.265 --output /ti_fs/codec_test/vpu/output/TearOfSteel-Short-1280x720.265.yuv

Notes:
    - This test is applicable only on J784S4 SoCs
$ use vpu_encoder_test

vpu_encoder_test - This is a VPU Encoder unit-test for file-to-file mode

Syntax:
    vpu_encoder_test <options>

Options (describes primary options, run with -h for more details)
    h              Help
    cfgFileName    Encoder Config parameters (mandatory)
    codec          Codec format (mandatory if using non HEVC encoded input files). Use 0 for AVC, and 12 for HEVC
    input          Input YUV file to encode (optional). Overrides 'InputFile' in cfg file if defined
    output         Output encoded file (mandatory to store encoded binary into a file)

Examples:
    # Basic help built into the executable
    vpu_encoder_test -h

    # AVC file-to-file encode
    vpu_encoder_test --codec 0 --cfgFileName /ti_fs/codec_test/vpu/cfg/avc_inter_8b_02.cfg --output /ti_fs/codec_test/vpu/output/avc_inter_8b_02.cfg.264

    # HEVC file-to-file encode
    vpu_encoder_test --cfgFileName /ti_fs/codec_test/vpu/cfg/hevc_bg_8b_01.cfg --input /ti_fs/codec_test/vpu/yuv/8bit_BQTerrace_720x128_9frame.yuv --output /ti_fs/codec_test/vpu/output/bg_8b_01.cfg.265

Notes:
    - This test is applicable only on J784S4 SoCs
$ use vpu_multi_inst_test

vpu_multi_inst_test - This is a VPU Multi-instance Decode/Encode unit-test (file-to-file mode)

Syntax:
    vpu_multi_inst_test --instance-num=<N> -e <test-1>,..,<test-N> --codec=<codec-1>,..,<codec-N> --input <in-1>,..,<in-N> --output <out-1>,..,<out-N>

Options (describes primary options, run with -h for more details)
    h              Help
    instance-num   Total number of test instances to run
    e              0: decode, 1: encode (per test)
    codec          Codec format. Use 0 for AVC, and 12 for HEVC (per test)
    enable-wtl     Enables WTL option (per test), if set to 1, for decoded outputs to be written in linear fashion for a framebuffer
    input          Input bitstream for decode, or Input .cfg file for encode (per test)
    output         Output decoded YUV file, or Output encoded bitstream file (per test)

Examples:
    # Basic help built into the executable
    vpu_multi_inst_test -h

    # AVC decode + HEVC encode, and write the decoded output to a single YUV file
    vpu_multi_inst_test --instance-num=2 -e 0,1 --codec=0,12 --enable-wtl=1 \
    --input /ti_fs/codec_test/bitstream/HistoryOfTI-480p.264,/ti_fs/codec_test/vpu/cfg/hevc_bg_8b_01.cfg \
    --output /ti_fs/codec_test/vpu/output/HistoryOfTI-480p.264-720x416.yuv,/ti_fs/codec_test/vpu/output/bg_8b_01.cfg.265

Notes:
    - This test is applicable only on J784S4 SoCs

5.9.3. Performance

The resource manager supports making performance measurements for the decoder and encoder. Once enabled, the resource manager creates a slog based performance log per core per channel, in a directory pointed by the environment variable VPU_PERFORMANCE_LOG_DIR. The log will have per frame decode time, total time taken for the decode, the number of frames decoded and the average fps.

To enable this feature, please follow the steps mentioned below:

  • Set the environment variable VPU_PERFORMANCE_LOG_DIR (for eg: export VPU_PERFORMANCE_LOG_DIR=/tmp/)

  • Slay the resource manager if already running (slay ti-vpu-codec-mgr)

  • Re-launch the resource manager from the same console, so that it has the VPU_PERFORMANCE_LOG_DIR env value.

  • Run the codec use case and you will see a file performance_log_0_ch_0.log

To disable this feature, unset the env variable, slay and relaunch the resource manager.

Sample output for the performance log is shown below:

For encoder:

1970-01-01 19:48:50.224 [OpenMAXIL.] started encoding

1970-01-01 19:49:01.149 [OpenMAXIL.] Total encoding time 10925 ms.
1970-01-01 19:49:01.149 [OpenMAXIL.] Number of encoded frames 601
1970-01-01 19:49:01.149 [OpenMAXIL.] Encoding frame rate is 55.01 fps.

For decoder:

1970-01-01 00:50:48.327 [OpenMAXIL.] [0] done decoding frame

1970-01-01 00:50:48.327 [OpenMAXIL.] Decoding time for frame = 0 ms

1970-01-01 00:50:48.327 [OpenMAXIL.] Total decoding time 1195 ms.
1970-01-01 00:50:48.327 [OpenMAXIL.] Number of decoded frames 42
1970-01-01 00:50:48.327 [OpenMAXIL.] Decoding frame rate is 35.15 fps.