3.7. Multimedia Video Codec¶
3.7.1. Introduction¶
The Encoder/Decoder (VENC/VDEC) is a stateful encoder/decoder. It is found on the AM68A SoC. Combined H.264 and H.265 encoder/decoder used in the Texas Instruments AM68A SoC.
- Hardware capabilities:
Maximum resolution: 8192x8192 It can handle this resolution, but not necessarily in real-time.
Minimum resolution: 256x128
- Constraints :
A picture width shall be multiple of 8.
A picture height shall be multiple of 8.
- Multiple concurrent encode/decode streams :
Number of concurrent streams dependant on resolution and framerate.
- Encoder :
Capable of encoding H.265 Main and Main Still Picture Profile @ L5.1 High tier.
Capable of encoding H.264 Baseline/Constrained Baseline/Main/High Profiles Level @ L5.2.
- Decoder :
Capable of decoding H.265 Main and Main Still Picture Profile @ L5.1 High tier.
Capable of decoding H.264 Baseline/Constrained Baseline/Main/High Profiles @ L5.2.
The V4L2 compliance tests report the following controls as available:
V4L2 compliance tests report can be generated by following command.
v4l2-compliance -d0
Driver Info:
Driver name : vpu-dec
Card type : vpu-dec
Bus info : platform:vpu-dec
Driver version : 5.14.0
Capabilities : 0x84204000
Video Memory-to-Memory Multiplanar
Streaming
Extended Pix Format
Device Capabilities
Device Caps : 0x04204000
Video Memory-to-Memory Multiplanar
Streaming
Extended Pix Format
Detected Stateful Decoder
Required ioctls:
test VIDIOC_QUERYCAP: OK
Allow for multiple opens:
test second /dev/video0 open: OK
test VIDIOC_QUERYCAP: OK
test VIDIOC_G/S_PRIORITY: OK
test for unlimited opens: OK
test invalid ioctls: OK
Debug ioctls:
test VIDIOC_DBG_G/S_REGISTER: OK (Not Supported)
test VIDIOC_LOG_STATUS: OK (Not Supported)
Input ioctls:
test VIDIOC_G/S_TUNER/ENUM_FREQ_BANDS: OK (Not Supported)
test VIDIOC_G/S_FREQUENCY: OK (Not Supported)
test VIDIOC_S_HW_FREQ_SEEK: OK (Not Supported)
test VIDIOC_ENUMAUDIO: OK (Not Supported)
test VIDIOC_G/S/ENUMINPUT: OK (Not Supported)
test VIDIOC_G/S_AUDIO: OK (Not Supported)
Inputs: 0 Audio Inputs: 0 Tuners: 0
Output ioctls:
test VIDIOC_G/S_MODULATOR: OK (Not Supported)
test VIDIOC_G/S_FREQUENCY: OK (Not Supported)
test VIDIOC_ENUMAUDOUT: OK (Not Supported)
test VIDIOC_G/S/ENUMOUTPUT: OK (Not Supported)
test VIDIOC_G/S_AUDOUT: OK (Not Supported)
Outputs: 0 Audio Outputs: 0 Modulators: 0
Input/Output configuration ioctls:
test VIDIOC_ENUM/G/S/QUERY_STD: OK (Not Supported)
test VIDIOC_ENUM/G/S/QUERY_DV_TIMINGS: OK (Not Supported)
test VIDIOC_DV_TIMINGS_CAP: OK (Not Supported)
test VIDIOC_G/S_EDID: OK (Not Supported)
Control ioctls:
test VIDIOC_QUERY_EXT_CTRL/QUERYMENU: OK
test VIDIOC_QUERYCTRL: OK
test VIDIOC_G/S_CTRL: OK
test VIDIOC_G/S/TRY_EXT_CTRLS: OK
test VIDIOC_(UN)SUBSCRIBE_EVENT/DQEVENT: OK
test VIDIOC_G/S_JPEGCOMP: OK (Not Supported)
Standard Controls: 2 Private Controls: 1
Format ioctls:
test VIDIOC_ENUM_FMT/FRAMESIZES/FRAMEINTERVALS: OK
test VIDIOC_G/S_PARM: OK (Not Supported)
test VIDIOC_G_FBUF: OK (Not Supported)
test VIDIOC_G_FMT: OK
test VIDIOC_TRY_FMT: OK
test VIDIOC_S_FMT: OK
test VIDIOC_G_SLICED_VBI_CAP: OK (Not Supported)
test Cropping: OK (Not Supported)
test Composing: OK
test Scaling: OK (Not Supported)
Codec ioctls:
test VIDIOC_(TRY_)ENCODER_CMD: OK (Not Supported)
test VIDIOC_G_ENC_INDEX: OK (Not Supported)
test VIDIOC_(TRY_)DECODER_CMD: OK
Buffer ioctls:
test VIDIOC_REQBUFS/CREATE_BUFS/QUERYBUF: OK
test VIDIOC_EXPBUF: OK
test Requests: OK (Not Supported)
Similarly for the encoder, V4L2 compliance tests report can be generated by following command.
v4l2-compliance -d1
3.7.2. Software Architecture¶
3.7.2.1. Software Stack of Accelerated Codec Encoding/Decoding¶
As shown in the figures below, the software stack of the accelerated encoding and decoding has two parts:
A V4L2 (Video4Linux version 2) software driver running on Linux on the A72 MPU subsystem
The firmware running on the DECODER and ENCODER
The driver communicates with the firmware running on the ENCODER/DECODER through its own IPC (inter-processor communication).
For the DECODER, at the highest level in the MPU subsystem on the A72, there is a Linux user space application which is based on GStreamer. GStreamer is an open source framework that simplifies the development of multimedia applications. The GStreamer library loads and interfaces with the GStreamer plugin (V4L2 plugin), which handles all the details specific to the use of the hardware accelerator. Specifically, the GStreamer plugin interfaces with the V4L2 decoder kernel driver interface.
3.7.2.2. Linux Kernel Drivers¶
TI-Provided V4L2 Drivers for Multimedia
Video4Linux version 2 (V4L2) is an open source framework that provides a media interface to all Linux-based applications. V4L2 is a collection of device drivers and an API for supporting realtime video capture and video memory-to-memory operations on Linux systems.
Video encode and decode using the ENCODER and DECODER hardware, respectively, are enabled as V4L2 drivers. The V4L2 is integrated with the ENCODER and DECODER drivers by a thin layer that implements the V4L2 node ioctls and translates the V4L2 data structures to those understood by the ENCODER/DECODER.
3.7.2.3. GStreamer Plugins for Multimedia¶
Open Source GStreamer Overview
GStreamer is an open source framework that simplifies the development of multimedia applications, such as media players and capture encoders. It encapsulates existing multimedia software components, such as codecs, filters, and platform-specific I/O operations, by using a standard interface and providing a uniform framework across applications.
The modular nature of GStreamer facilitates the addition of new functionality, transparent inclusion of component advancements and allows for flexibility in application development and testing. Processing nodes are implemented via GStreamer plugins with several sink and/or source pads. Many plugins are running as ARM software implementations, but for more complex SoCs, certain functions are better executed on hardware-accelerated IPs like wave5 (DECODER and ENCODER).
GStreamer is a multimedia framework based on data flow paradigm. It allows easy plugin registration just by deploying new shared objects to the /usr/lib/gstreamer-1.0 folder. The shared libraries in this folder are scanned for reserved data structures identifying capabilities of individual plugins. Individual processing nodes can be interconnected as a pipeline at run-time, creating complex topologies. Node interfacing compatibility is verified at that time - before the pipeline is started.
GStreamer brings a lot of value-added features to Processor SDK Linux AM68A, including audio encoding/decoding, audio/video synchronization, and interaction with a wide variety of open source plugins (muxers, demuxers, codecs, and filters). New GStreamer features are continuously being added, and the core libraries are actively supported by participants in the GStreamer community. Additional information about the GStreamer framework is available on the GStreamer project site: http://gstreamer.freedesktop.org/.
Hardware-Accelerated GStreamer Plugins
One benefit of using GStreamer as a multimedia framework is that the core libraries already build and run on ARM Linux. Only a GStreamer plugin is required to enable additional hardware features on TI’s embedded processors with both ARM and hardware accelerators for multimedia. The open source GStreamer plugins provide elements for GStreamer pipelines that enable the use of hardware-accelerated video decoding through the V4L2 GStreamer plugin.
Below is a list of GStreamer plugins that utilize the hardware-accelerated video decoding/encoding in the AM68A.
ENCODER
v4l2h264enc
v4l2h265enc
DECODER
v4l2h264dec
v4l2h265dec
3.7.2.3.1. V4L2 Video Encoder/Decoder¶
The V4L2 encoder/decoder driver supports the following bitstream formats:
V4L2_PIX_FMT_H264
V4L2_PIX_FMT_HEVC
3.7.3. Encoder and Decoder Capabilities¶
The Max Capability of the Encoder/Decoder is 4K60fps equivalent load.
Maximum instances supported is 32 (Encode/Decode/Encode+Decode).
Eg: MAX 32 can be
(16 Enc + 16 Dec) OR (32 Enc) OR (32 Dec).
(32 Enc + 32 Dec) - Not possible
Note
The number of instances is bound to the available CMA Memory.
The external controls supported by Encoder and Decoder can be seen using below command.
Encoder: v4l2-ctl -d 1 -l
Decoder: v4l2-ctl -d 0 -l
3.7.4. GStreamer Pipelines¶
H.264 encode:
target # gst-launch-1.0 filesrc location=/<path_to_file> ! rawvideoparse width=1920 height=1080 format=i420 framerate=30/1 colorimetry=bt709 ! v4l2h264enc ! filesink location=/<path_to_file> sync=true
H.265 encode:
target # gst-launch-1.0 filesrc location=/<path_to_file> ! rawvideoparse width=1920 height=1080 format=i420 framerate=30/1 colorimetry=bt709 ! v4l2h265enc ! filesink location=/<path_to_file> sync=true
H.264 decode:
target # gst-launch-1.0 filesrc location=/<path_to_file> ! h264parse ! queue ! v4l2h264dec ! filesink location=/<path_to_file>
H.265 decode:
target # gst-launch-1.0 filesrc location=/<path_to_file> ! h265parse ! queue ! v4l2h265dec ! filesink location=/<path_to_file>
Video only file playback:
target $gst-launch-1.0 filesrc location=./bbb_1080p60_30s.h264 ! h264parse ! v4l2h264dec capture-io-mode=dmabuf ! kmssink driver-name=tidss -v
Audio/Video file playback (h264/aac muxed file as example)
target $gst-launch-1.0 filesrc location=bbb_1080p_aac.mp4 ! qtdemux name=demux demux.video_0 ! h264parse ! v4l2h264dec capture-io-mode=dmabuf ! queue ! kmssink driver-name=tidss demux.audio_0 ! queue ! faad ! audioconvert ! audioresample ! audio/x-raw, channels=2, rate=48000 ! autoaudiosink
Transcode use-case (h264->h265 conversion as example)
target $gst-launch-1.0 filesrc location=./sample_file.264 ! h264parse ! v4l2h264dec capture-io-mode=4 ! v4l2h265enc output-io-mode=5 ! filesink location=./output.265
Video Streaming use-case
Server (imx219 rawcamera->isp->encode->streamout) :
target $gst-launch-1.0 v4l2src device=/dev/video2 io-mode=dmabuf ! video/x-bayer,width=1920,height=1080, framerate=30/1, format=bggr ! tiovxisp sensor-name=SENSOR_SONY_IMX219_RPI dcc-isp-file=/opt/imaging/imx219/dcc_viss.bin sink_0::dcc-2a-file=/opt/imaging/imx219/dcc_2a.bin sink_0::device=/dev/v4l-subdev2 ! video/x-raw,format=NV12 ! v4l2h264enc output-io-mode=dmabuf-import extra-controls="controls,h264_i_frame_period=60" ! rtph264pay ! udpsink port=5000 host=<ip_address>
Client (streamin->decode->display :
target $gst-launch-1.0 -v udpsrc port=5000 caps = "application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)H264, payload=(int)96" ! rtpjitterbuffer latency=50 ! rtph264depay ! h264parse ! v4l2h264dec capture-io-mode=dmabuf ! queue ! fpsdisplaysink text-overlay=false name=fpssink video-sink="kmssink driver-name=tidss sync=true show-preroll-frame=false" sync=true -v
Note
In Encode testcases, “colorimetry” should be specified to avoid negotiation failures.
Eg: gst-launch-1.0 filesrc location=sample_1072.yuv blocksize=3087360 ! rawvideoparse width=1920 height=1072 framerate=30/1 format=nv12 colorimetry=bt709 ! v4l2h264enc ! h264parse ! fakesink
3.7.5. Memory Requirement¶
The following calculations are taken for 1080p single channel 30fps stream using vmstat.
Encoder
v4l2h264enc : 31.78 MB
v4l2h265enc : 31.90 MB
Decoder
v4l2h264dec : 51.47 MB
v4l2h265dec : 39.59 MB
Note
The Actual Memory foot print may vary depending on the input stream.
3.7.6. Performance metrics¶
The following figures illustrate the theoretical latency of the IP w.r.t different resolutions.
CodecH265/264
Resolution
Latency
Encoder
4K
33.3 ms
1080p
8.3 ms
720p
3.7 ms
480p
1.2 ms
3.7.7. Calculation of Performance metrics using native driver API¶
The FW reports the tick information and wave5 driver can print the cycle information for each frame. Please refer the below source code.
wave5_vpu_dec.c
static void wave5_vpu_dec_finish_decode(struct vpu_instance *inst)
{
...
dev_dbg(inst->dev->dev, "frame_cycle %8d\n", dec_output_info.frame_cycle);
...
}
wave5_vpu_enc.c
static void wave5_vpu_enc_finish_encode(struct vpu_instance *inst)
{
...
dev_dbg(inst->dev->dev, "frame_cycle %8d\n", enc_output_info.frame_cycle);
...
}
Dividing the cycle information by the CPU Hz value, we can get the millisecond value. For example,
Test environment : CPU 400MHz
#1 frame_cycle 489472 => 489472 / 400000000 = 0.00122368 millisecond
#2 frame_cycle 442368 => 442368 / 400000000 = 0.00110592 millisecond
#3 frame_cycle 429824 => 429824 / 400000000 = 0.00107456 millisecond
3.7.8. Calculation of Performance metrics using gstreamer¶
3.7.9. Latency¶
The instantaneous pipeline and encoder latency be calculated using gstreamer tracer which provides latency in nanoseconds as mentioned in below link : - Gstreamer latency tracer
Example:
Measuring Pipeline latency: This is to measure total pipeline latency.
target # GST_TRACERS="latency" GST_DEBUG=GST_TRACER:7 GST_DEBUG_FILE="/run/latency.txt" gst-launch-1.0 videotestsrc ! v4l2h264enc ! fakesink sync=true -v
Note
The per frame instantaneous latency is printed as “time=(guint64)<latency_in_ns>” at latency.txt
Measuring Per Element latency:
This is useful in case you have multiple elements in the pipeline after source element and you only want to measure latency impact of a particular element. Below example shows how to measure encoder and decoder latencies in streamling pipeline described above.
#Measuring encoder latency in server pipeline
target # GST_TRACERS="latency(flags=pipeline+element)" GST_DEBUG=GST_TRACER:7 GST_DEBUG_FILE="/run/latency_server.txt" gst-launch-1.0 v4l2src io-mode=dmabuf device=/dev/video2 ! video/x-bayer,width=1920,height=1080,format=bggr ! tiovxisp sensor-name=SENSOR_SONY_IMX219_RPI dcc-isp-file=/opt/imaging/imx219/dcc_viss.bin sink_0::dcc-2a-file=/opt/imaging/imx219/dcc_2a.bin sink_0::device=/dev/v4l-subdev2 ! video/x-raw,format=NV12 ! v4l2h264enc output-io-mode=dmabuf-import extra-controls="controls,h264_i_frame_period=60" ! rtph264pay ! udpsink port=5000 host=<ip_address>
#Instantaneous encoder latency (ns)
target # grep v4l2h264enc /run/latency_server.txt
GST_TRACER :0:: element-latency, element-id=(string)0x901c90, element=(string)v4l2h264enc0, src=(string)src, time=(guint64)8493225, ts=(guint64)927133155
GST_TRACER :0:: element-latency, element-id=(string)0x901c90, element=(string)v4l2h264enc0, src=(string)src, time=(guint64)5777835, ts=(guint64)957085270
GST_TRACER :0:: element-latency, element-id=(string)0x901c90, element=(string)v4l2h264enc0, src=(string)src, time=(guint64)6741725, ts=(guint64)992160910;
Note
The per frame instantaneous latency of video encoder can be found by searching for element name i.e. v4l2h264enc0 and which will be printed as “time=(guint64)<latency_in_ns>”: as shown above.
#Average encoder latency (ns)
target # cat /run/latency_server.txt | grep v4l2h264enc | awk -F"guint64)" '{print $2}' | awk -F"," '{total +=$1; count++} END { print total/count }'
target #8.30307e+06
Note
The average latency of video encoder (in nanoseconds) can be found by taking the average of instantaneous latencies for each frame as shown above.
#Measuring decoder latency in client pipeline
target# GST_TRACERS="latency(flags=pipeline+element)" GST_DEBUG_FILE="/run/latency_client.txt" gst-launch-1.0 -v udpsrc port=5000 caps = "application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)H264, payload=(int)96" ! rtpjitterbuffer latency=50 ! rtph264depay ! h264parse ! v4l2h264dec capture-io-mode=dmabuf ! queue ! fpsdisplaysink text-overlay=false name=fpssink video-sink="kmssink driver-name=tidss sync=true show-preroll-frame=false" sync=true -v > /run/client.txt 2>&1&
#Instantaneous decoder latency (ns)
target # grep v4l2h264dec /run/latency_client.txt
GST_TRACER :0:: element-latency, element-id=(string)0x3c290540, element=(string)v4l2h264dec0, src=(string)src, time=(guint64)72057650, ts=(guint64)5330984535;
GST_TRACER :0:: element-latency, element-id=(string)0x3c290540, element=(string)v4l2h264dec0, src=(string)src, time=(guint64)72092165, ts=(guint64)5396039490;
...
#Average decoder latency (ns)
target # cat /run/latency_client.txt | grep v4l2h264dec | awk -F"guint64)" '{print $2}' | awk -F"," '{total +=$1; count++} END { print total/count }'
target # 7.70918e+07
3.7.10. Performance¶
The max throughput of encoder and decoder elements can be measured using fpsdisplaysink element as mentioned below :
Example:
Encoder framerate :
target # gst-launch-1.0 filesrc location=/<path_to_file> ! rawvideoparse width=1920 height=1080 format=i420 framerate=30/1 ! v4l2h264enc ! fpsdisplaysink text-overlay=false name="fakesink sync=false" sync=false -v
Decoder framerate :
target # gst-launch-1.0 filesrc location=./sample_file.264 ! h264parse ! v4l2h264dec capture-io-mode=dmabuf ! fpsdisplaysink name=fpssink text-overlay=false video-sink="fakevideosink sync=false" sync=false -v
Note
Frames per Second achieved by the pipeline will be shown on console logs as seen below :
/GstPipeline:pipeline0/GstFPSDisplaySink:fpssink/GstFakeVideoSink:fakevideosink0/GstFakeSink:sink: sync = false
/GstPipeline:pipeline0/GstFPSDisplaySink:fpssink: last-message = rendered: 102, dropped: 0, current: 202.05, average: 202.05
/GstPipeline:pipeline0/GstFPSDisplaySink:fpssink: last-message = rendered: 203, dropped: 0, current: 200.04, average: 201.04
/GstPipeline:pipeline0/GstFPSDisplaySink:fpssink: last-message = rendered: 303, dropped: 0, current: 199.99, average: 200.69
3.7.11. DMA Buffer Import/Export¶
Buffer import on encoder can be tested by selecting the output-io-mode as ‘5’ or ‘dmabuf-import’. Example is mentioned below.
gst-launch-1.0 filesrc location=./sample_file.264 ! h264parse ! v4l2h264dec capture-io-mode=4 ! v4l2h264enc output-io-mode=5 ! filesink location=./output.264
gst-launch-1.0 filesrc location=./sample_file.264 ! h264parse ! v4l2h264dec capture-io-mode=4 ! v4l2h264enc output-io-mode=dmabuf-import ! filesink location=./output.264
Buffer export on decoder can be tested by selecting the capture-io-mode as ‘4’ or ‘dmabuf’. Example is mentioned below.
gst-launch-1.0 filesrc location=./sample_file.264 ! h264parse ! v4l2h264dec capture-io-mode=dmabuf ! kmssink driver-name="tidss" -v
Buffer import on decoder can be tested by selecting the capture-io-mode as ‘5’ or ‘dmabuf-import’. Example is mentioned below.
gst-launch-1.0 filesrc location=./sample_file.264 ! h264parse ! v4l2h264dec capture-io-mode=5 ! kmssink driver-name="tidss" -v
Note
Known Limitations:
The full set of encoder configurations is not currently exposed through the V4L2 interface See compliance data for what is available and what is not
Current driver supports 8 channel 1080p Encode and 8ch 1080p Decode owing to the default CMA Memory configuration.
3.7.12. Configuration of CMA Size¶
The CMA size can be increased or decreased depending on the requirement and the memory map usage by other components.
The macro that specifies the CMA size is CONFIG_CMA_SIZE_MBYTES present in the file arch/arm64/configs/tisdk_j721s2-evm_defconfig in the linux directory of sdk.The default value is 896MB.
The value can be increased according to the availability of space in DDR memory map. Also to change cma without re-compilation, one can stop at u-boot prompt during bootup and update cma as below and then boot :
target# setenv args_all $args_all cma=1000M
target# boot