3.7. Multimedia Video Codec

3.7.1. Introduction

The Encoder/Decoder (VENC/VDEC) is a stateful encoder/decoder. It is found on the AM62Ax SoC. Combined H.264 and H.265 encoder/decoder used in the Texas Instruments AM62Ax SoC.

Hardware capabilities:
  • Maximum resolution: 8192x8192 It can handle this resolution, but not necessarily in real-time.
  • Minimum resolution: 256x128
Constraints :
  • A picture width shall be multiple of 8.
  • A picture height shall be multiple of 8.
Multiple concurrent encode/decode streams :
  • Number of concurrent streams dependant on resolution and framerate.
Encoder :
  • Capable of encoding H.265 Main and Main Still Picture Profile @ L5.1 High tier.
  • Capable of encoding H.264 Baseline/Constrained Baseline/Main/High Profiles Level @ L5.2.
Decoder :
  • Capable of decoding H.265 Main and Main Still Picture Profile @ L5.1 High tier.
  • Capable of decoding H.264 Baseline/Constrained Baseline/Main/High Profiles @ L5.2.

The V4L2 compliance tests report the following controls as available:

V4L2 compliance tests report can be generated by following command.

  • v4l2-compliance -d0
Driver Info:
       Driver name      : vpu-dec
       Card type        : vpu-dec
       Bus info         : platform:vpu-dec
       Driver version   : 5.14.0
       Capabilities     : 0x84204000
               Video Memory-to-Memory Multiplanar
               Streaming
               Extended Pix Format
               Device Capabilities
       Device Caps      : 0x04204000
               Video Memory-to-Memory Multiplanar
               Streaming
               Extended Pix Format
       Detected Stateful Decoder
Required ioctls:
       test VIDIOC_QUERYCAP: OK
Allow for multiple opens:
       test second /dev/video0 open: OK
       test VIDIOC_QUERYCAP: OK
       test VIDIOC_G/S_PRIORITY: OK
       test for unlimited opens: OK
       test invalid ioctls: OK
Debug ioctls:
       test VIDIOC_DBG_G/S_REGISTER: OK (Not Supported)
       test VIDIOC_LOG_STATUS: OK (Not Supported)
Input ioctls:
       test VIDIOC_G/S_TUNER/ENUM_FREQ_BANDS: OK (Not Supported)
       test VIDIOC_G/S_FREQUENCY: OK (Not Supported)
       test VIDIOC_S_HW_FREQ_SEEK: OK (Not Supported)
       test VIDIOC_ENUMAUDIO: OK (Not Supported)
       test VIDIOC_G/S/ENUMINPUT: OK (Not Supported)
       test VIDIOC_G/S_AUDIO: OK (Not Supported)
       Inputs: 0 Audio Inputs: 0 Tuners: 0
Output ioctls:
       test VIDIOC_G/S_MODULATOR: OK (Not Supported)
       test VIDIOC_G/S_FREQUENCY: OK (Not Supported)
       test VIDIOC_ENUMAUDOUT: OK (Not Supported)
       test VIDIOC_G/S/ENUMOUTPUT: OK (Not Supported)
       test VIDIOC_G/S_AUDOUT: OK (Not Supported)
       Outputs: 0 Audio Outputs: 0 Modulators: 0
Input/Output configuration ioctls:
       test VIDIOC_ENUM/G/S/QUERY_STD: OK (Not Supported)
       test VIDIOC_ENUM/G/S/QUERY_DV_TIMINGS: OK (Not Supported)
       test VIDIOC_DV_TIMINGS_CAP: OK (Not Supported)
       test VIDIOC_G/S_EDID: OK (Not Supported)
Control ioctls:
       test VIDIOC_QUERY_EXT_CTRL/QUERYMENU: OK
       test VIDIOC_QUERYCTRL: OK
       test VIDIOC_G/S_CTRL: OK
       test VIDIOC_G/S/TRY_EXT_CTRLS: OK
       test VIDIOC_(UN)SUBSCRIBE_EVENT/DQEVENT: OK
       test VIDIOC_G/S_JPEGCOMP: OK (Not Supported)
       Standard Controls: 2 Private Controls: 1
Format ioctls:
       test VIDIOC_ENUM_FMT/FRAMESIZES/FRAMEINTERVALS: OK
       test VIDIOC_G/S_PARM: OK (Not Supported)
       test VIDIOC_G_FBUF: OK (Not Supported)
       test VIDIOC_G_FMT: OK
       test VIDIOC_TRY_FMT: OK
       test VIDIOC_S_FMT: OK
       test VIDIOC_G_SLICED_VBI_CAP: OK (Not Supported)
       test Cropping: OK (Not Supported)
       test Composing: OK
       test Scaling: OK (Not Supported)
Codec ioctls:
       test VIDIOC_(TRY_)ENCODER_CMD: OK (Not Supported)
       test VIDIOC_G_ENC_INDEX: OK (Not Supported)
       test VIDIOC_(TRY_)DECODER_CMD: OK
Buffer ioctls:
       test VIDIOC_REQBUFS/CREATE_BUFS/QUERYBUF: OK
       test VIDIOC_EXPBUF: OK
       test Requests: OK (Not Supported)

Similarly for the encoder, V4L2 compliance tests report can be generated by following command.

  • v4l2-compliance -d1

3.7.2. Software Architecture

3.7.2.1. Software Stack of Accelerated Codec Encoding/Decoding

As shown in the figures below, the software stack of the accelerated encoding and decoding has two parts:

  • A V4L2 (Video4Linux version 2) software driver running on Linux on the A53 MPU subsystem
  • The firmware running on the DECODER and ENCODER

The driver communicates with the firmware running on the ENCODER/DECODER through its own IPC (inter-processor communication).

For the DECODER, at the highest level in the MPU subsystem on the A53, there is a Linux user space application which is based on GStreamer. GStreamer is an open source framework that simplifies the development of multimedia applications. The GStreamer library loads and interfaces with the GStreamer plugin (V4L2 plugin), which handles all the details specific to the use of the hardware accelerator. Specifically, the GStreamer plugin interfaces with the V4L2 decoder kernel driver interface.

codec software stack

Fig. 3.3 CODEC Software Stack


3.7.2.2. Linux Kernel Drivers

TI-Provided V4L2 Drivers for Multimedia

Video4Linux version 2 (V4L2) is an open source framework that provides a media interface to all Linux-based applications. V4L2 is a collection of device drivers and an API for supporting realtime video capture and video memory-to-memory operations on Linux systems.

Video encode and decode using the ENCODER and DECODER hardware, respectively, are enabled as V4L2 drivers. The V4L2 is integrated with the ENCODER and DECODER drivers by a thin layer that implements the V4L2 node ioctls and translates the V4L2 data structures to those understood by the ENCODER/DECODER.

3.7.2.3. GStreamer Plugins for Multimedia

Open Source GStreamer Overview

GStreamer is an open source framework that simplifies the development of multimedia applications, such as media players and capture encoders. It encapsulates existing multimedia software components, such as codecs, filters, and platform-specific I/O operations, by using a standard interface and providing a uniform framework across applications.

The modular nature of GStreamer facilitates the addition of new functionality, transparent inclusion of component advancements and allows for flexibility in application development and testing. Processing nodes are implemented via GStreamer plugins with several sink and/or source pads. Many plugins are running as ARM software implementations, but for more complex SoCs, certain functions are better executed on hardware-accelerated IPs like wave5 (DECODER and ENCODER).

GStreamer is a multimedia framework based on data flow paradigm. It allows easy plugin registration just by deploying new shared objects to the /usr/lib/gstreamer-1.0 folder. The shared libraries in this folder are scanned for reserved data structures identifying capabilities of individual plugins. Individual processing nodes can be interconnected as a pipeline at run-time, creating complex topologies. Node interfacing compatibility is verified at that time - before the pipeline is started.

GStreamer brings a lot of value-added features to Processor SDK Linux AM62Ax, including audio encoding/decoding, audio/video synchronization, and interaction with a wide variety of open source plugins (muxers, demuxers, codecs, and filters). New GStreamer features are continuously being added, and the core libraries are actively supported by participants in the GStreamer community. Additional information about the GStreamer framework is available on the GStreamer project site: http://gstreamer.freedesktop.org/.

Hardware-Accelerated GStreamer Plugins

One benefit of using GStreamer as a multimedia framework is that the core libraries already build and run on ARM Linux. Only a GStreamer plugin is required to enable additional hardware features on TI’s embedded processors with both ARM and hardware accelerators for multimedia. The open source GStreamer plugins provide elements for GStreamer pipelines that enable the use of hardware-accelerated video decoding through the V4L2 GStreamer plugin.

Below is a list of GStreamer plugins that utilize the hardware-accelerated video decoding/encoding in the AM62Ax.

  • ENCODER

    1. v4l2h264enc
    2. v4l2h265enc
  • DECODER

    1. v4l2h264dec
    2. v4l2h265dec

3.7.2.3.1. V4L2 Video Encoder/Decoder

The V4L2 encoder/decoder driver supports the following bitstream formats:

  • V4L2_PIX_FMT_H264
  • V4L2_PIX_FMT_HEVC

3.7.3. GStreamer Pipelines

H.264 encode:
    target # gst-launch-1.0 filesrc location=/<path_to_file>  ! rawvideoparse width=1920 height=1080 format=i420 framerate=30/1 ! v4l2h264enc ! filesink location=/<path_to_file>  sync=true

H.265 encode:
    target # gst-launch-1.0 filesrc location=/<path_to_file>  ! rawvideoparse width=1920 height=1080 format=i420 framerate=30/1 ! v4l2h265enc ! filesink location=/<path_to_file>  sync=true
H.264 decode:
     target # gst-launch-1.0 filesrc location=/<path_to_file>  ! matroskademux ! h264parse ! queue ! v4l2h264dec ! filesink location=/<path_to_file>

H.265 decode:
     target # gst-launch-1.0 filesrc location=/<path_to_file>  ! matroskademux ! h265parse ! queue ! v4l2h265dec ! filesink location=/<path_to_file>
Video only file playback:
     target $gst-launch-1.0 filesrc location=./bbb_1080p60_30s.h264 ! h264parse ! v4l2h264dec capture-io-mode=dmabuf ! kmssink driver-name=tidss -v
Audio/Video file playback (h264/aac muxed file as example)
     target $gst-launch-1.0 filesrc location=bbb_1080p_aac.mp4 ! qtdemux name=demux demux.video_0 ! h264parse ! v4l2h264dec capture-io-mode=dmabuf ! queue ! kmssink driver-name=tidss demux.audio_0 ! queue ! faad ! audioconvert ! audioresample ! audio/x-raw, channels=2, rate=48000 ! autoaudiosink
Transcode use-case (h264->h265 conversion as example)
   target $gst-launch-1.0 filesrc location=./sample_file.264 ! h264parse ! v4l2h264dec capture-io-mode=4 ! v4l2h265enc output-io-mode=5 ! filesink location=./output.265
Video Streaming use-case
Server (imx219 rawcamera->isp->encode->streamout) :
   target $gst-launch-1.0 v4l2src device=/dev/video2 io-mode=dmabuf ! video/x-bayer,width=1920,height=1080, framerate=30/1, format=bggr ! tiovxisp sensor-name=SENSOR_SONY_IMX219_RPI dcc-isp-file=/opt/imaging/imx219/dcc_viss.bin sink_0::dcc-2a-file=/opt/imaging/imx219/dcc_2a.bin sink_0::device=/dev/v4l-subdev2 ! video/x-raw,format=NV12 ! v4l2h264enc output-io-mode=dmabuf-import extra-controls="controls,h264_i_frame_period=60" ! rtph264pay ! udpsink port=5000 host=<ip_address>

Client (streamin->decode->display :
   target $gst-launch-1.0 -v udpsrc port=5000 caps = "application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)H264, payload=(int)96" ! rtpjitterbuffer latency=50 ! rtph264depay ! h264parse ! v4l2h264dec capture-io-mode=dmabuf ! queue ! fpsdisplaysink text-overlay=false name=fpssink video-sink="kmssink driver-name=tidss sync=true show-preroll-frame=false" sync=true -v

3.7.4. Memory Requirement

The following calculations are taken for 1080p single channel 30fps stream using vmstat.

  • Encoder
  1. v4l2h264enc : 43.19 MB
  2. v4l2h265enc : 43.31 MB
  • Decoder
  1. v4l2h264dec : 62.77 MB
  2. v4l2h265dec : 31.91 MB

Note

The Actual Memory foot print may vary depending on the input stream.



3.7.5. Performance metrics

The following figures illustrate the theoretical latency of the IP w.r.t different resolutions.

CodecH265/264 Resolution Latency
Encoder 4K 33.3 ms
1080p 8.3 ms
720p 3.7 ms
480p 1.2 ms

3.7.6. Calculation of Performance metrics using native driver API

The FW reports the tick information and wave5 driver can print the cycle information for each frame. Please refer the below source code.

wave5_vpu_dec.c
static void wave5_vpu_dec_finish_decode(struct vpu_instance *inst)
{
    ...
    dev_dbg(inst->dev->dev, "frame_cycle %8d\n", dec_output_info.frame_cycle);
    ...
}

wave5_vpu_enc.c

static void wave5_vpu_enc_finish_encode(struct vpu_instance *inst)
{
    ...
    dev_dbg(inst->dev->dev, "frame_cycle %8d\n", enc_output_info.frame_cycle);
    ...
}

Dividing the cycle information by the CPU Hz value, we can get the millisecond value. For example,

Test environment : CPU 400MHz

#1 frame_cycle 489472 => 489472 / 400000000 = 0.00122368 millisecond

#2 frame_cycle 442368 => 442368 / 400000000 = 0.00110592 millisecond

#3 frame_cycle 429824 => 429824 / 400000000 = 0.00107456 millisecond

3.7.7. Calculation of Performance metrics using gstreamer

3.7.8. Latency

The instantaneous pipeline and encoder latency be calculated using gstreamer tracer which provides latency in nanoseconds as mentioned in below link : - Gstreamer latency tracer

Example:

Measuring Pipeline latency: This is to measure total pipeline latency.

target # GST_TRACERS="latency" GST_DEBUG=GST_TRACER:7 GST_DEBUG_FILE="/run/latency.txt" gst-launch-1.0 videotestsrc ! v4l2h264enc ! fakesink sync=true -v

Note

The per frame instantaneous latency is printed as “time=(guint64)<latency_in_ns>” at latency.txt

Measuring Per Element latency:

This is useful in case you have multiple elements in the pipeline after source element and you only want to measure latency impact of a particular element. Below example shows how to measure encoder and decoder latencies in streamling pipeline described above.

#Measuring encoder latency in server pipeline
target # GST_TRACERS="latency(flags=pipeline+element)" GST_DEBUG=GST_TRACER:7 GST_DEBUG_FILE="/run/latency_server.txt" gst-launch-1.0 v4l2src io-mode=dmabuf device=/dev/video2 ! video/x-bayer,width=1920,height=1080,format=bggr ! tiovxisp sensor-name=SENSOR_SONY_IMX219_RPI dcc-isp-file=/opt/imaging/imx219/dcc_viss.bin sink_0::dcc-2a-file=/opt/imaging/imx219/dcc_2a.bin sink_0::device=/dev/v4l-subdev2 ! video/x-raw,format=NV12 ! v4l2h264enc output-io-mode=dmabuf-import extra-controls="controls,h264_i_frame_period=60" ! rtph264pay ! udpsink port=5000 host=<ip_address>

#Instantaneous encoder latency (ns)
target # grep v4l2h264enc /run/latency_server.txt
         GST_TRACER :0:: element-latency, element-id=(string)0x901c90, element=(string)v4l2h264enc0, src=(string)src, time=(guint64)8493225, ts=(guint64)927133155
         GST_TRACER :0:: element-latency, element-id=(string)0x901c90, element=(string)v4l2h264enc0, src=(string)src, time=(guint64)5777835, ts=(guint64)957085270
         GST_TRACER :0:: element-latency, element-id=(string)0x901c90, element=(string)v4l2h264enc0, src=(string)src, time=(guint64)6741725, ts=(guint64)992160910;

Note

The per frame instantaneous latency of video encoder can be found by searching for element name i.e. v4l2h264enc0 and which will be printed as “time=(guint64)<latency_in_ns>”: as shown above.

#Average encoder latency (ns)
target # cat /run/latency_server.txt | grep v4l2h264enc | awk -F"guint64)" '{print $2}' | awk -F"," '{total +=$1; count++} END { print total/count }'
target #8.30307e+06

Note

The average latency of video encoder (in nanoseconds) can be found by taking the average of instantaneous latencies for each frame as shown above.

#Measuring decoder latency in client pipeline
target# GST_TRACERS="latency(flags=pipeline+element)" GST_DEBUG_FILE="/run/latency_client.txt" gst-launch-1.0 -v udpsrc port=5000 caps = "application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)H264, payload=(int)96" ! rtpjitterbuffer latency=50 ! rtph264depay ! h264parse ! v4l2h264dec capture-io-mode=dmabuf ! queue ! fpsdisplaysink text-overlay=false name=fpssink video-sink="kmssink driver-name=tidss sync=true show-preroll-frame=false" sync=true -v > /run/client.txt 2>&1&

#Instantaneous decoder latency (ns)
target # grep v4l2h264dec /run/latency_client.txt
         GST_TRACER :0:: element-latency, element-id=(string)0x3c290540, element=(string)v4l2h264dec0, src=(string)src, time=(guint64)72057650, ts=(guint64)5330984535;
         GST_TRACER :0:: element-latency, element-id=(string)0x3c290540, element=(string)v4l2h264dec0, src=(string)src, time=(guint64)72092165, ts=(guint64)5396039490;
         ...

#Average decoder latency (ns)
target # cat /run/latency_client.txt | grep v4l2h264dec | awk -F"guint64)" '{print $2}' | awk -F"," '{total +=$1; count++} END { print total/count }'
target # 7.70918e+07
Codec H264 Resolution Latency
Encoder
Decoder
1080p 8.3 ms
1080p 77 ms

3.7.9. Performance

The max throughput of encoder and decoder elements can be measured using fpsdisplaysink element as mentioned below :

Example:

Encoder framerate :
     target # gst-launch-1.0 filesrc location=/<path_to_file>  ! rawvideoparse width=1920 height=1080 format=i420 framerate=30/1 ! v4l2h264enc ! fpsdisplaysink text-overlay=false name="fakesink sync=false" sync=false -v

Decoder framerate :
     target # gst-launch-1.0 filesrc location=./sample_file.264 ! h264parse ! v4l2h264dec capture-io-mode=dmabuf ! fpsdisplaysink name=fpssink text-overlay=false video-sink="fakevideosink sync=false" sync=false -v

Note

Frames per Second achieved by the pipeline will be shown on console logs as seen below :

/GstPipeline:pipeline0/GstFPSDisplaySink:fpssink/GstFakeVideoSink:fakevideosink0/GstFakeSink:sink: sync = false
/GstPipeline:pipeline0/GstFPSDisplaySink:fpssink: last-message = rendered: 102, dropped: 0, current: 202.05, average: 202.05
/GstPipeline:pipeline0/GstFPSDisplaySink:fpssink: last-message = rendered: 203, dropped: 0, current: 200.04, average: 201.04
/GstPipeline:pipeline0/GstFPSDisplaySink:fpssink: last-message = rendered: 303, dropped: 0, current: 199.99, average: 200.69

3.7.10. DMA Buffer Import/Export

Buffer import on encoder can be tested by selecting the output-io-mode as ‘5’ or ‘dmabuf-import’. Example is mentioned below.

gst-launch-1.0 filesrc location=./sample_file.264 ! h264parse ! v4l2h264dec capture-io-mode=4 ! v4l2h264enc output-io-mode=5 ! filesink location=./output.264
gst-launch-1.0 filesrc location=./sample_file.264 ! h264parse ! v4l2h264dec capture-io-mode=4 ! v4l2h264enc output-io-mode=dmabuf-import ! filesink location=./output.264

Note

DMA Buf import is currently supported only on Encoder.

Buffer export on decoder can be tested by selecting the capture-io-mode as ‘4’ or ‘dmabuf’. Example is mentioned below.

gst-launch-1.0 filesrc location=./sample_file.264 ! h264parse ! v4l2h264dec capture-io-mode=dmabuf ! kmssink driver-name="tidss" -v

Note

Known Limitations:

  1. The full set of encoder configurations is not currently exposed through the V4L2 interface See compliance data for what is available and what is not
  2. Current driver supports 8 channel 1080p Encode and only 7ch 1080p Decode. In the current driver, the requirement for 8ch Decode exceeds the available memory. Memory optimization is under process. The optimizations will be around 20 percent of current memory requirement, that can be saved.

3.7.11. Configuration of CMA Size

The CMA size can be increased or decreased depending on the requirement and the memory map usage by other components.

The macro that specifies the CMA size is CONFIG_CMA_SIZE_MBYTES present in the file arch/arm64/configs/tisdk_am62axx-evm_defconfig in the linux directory of sdk.The default value is 512MB. The value can be increased according to the availability of space in DDR memory map.

Also to change cma without re-compilation, one can stop at u-boot prompt during bootup and update cma as below and then boot :

target# setenv args_all $args_all cma=1000M
target# boot