Voice over BLE

There is no standard way of transmitting voice over BLE so a custom profile must be used. TI’s custom profile utilizes the GATT layer of the BLE5-Stack to transmit voice frames. This is known as a Voice over GATT Profile approach (VoGP).

TI Voice Profile (VoGP)

The audio data is transmitted using a proprietary service with UUID F000B000-0451-4000-B000-000000000000. This service is composed of the following 2 characteristics:

Note

The characteristics below use the 128-bit TI base UUID of the format F000XXXX-0451-4000-B000-000000000000 where XXXX is their shortened 16bit UUID. For brevity, this document will refer to the characteristics by their 16-bit short UUID.

Name UUID Description GATT Properties
AUDIOPROFILE_START 0xB001 The start characteristic is used to transmit a start command before the streaming starts and a stop command as the last packet of a stream. GATT_PROP_READ, GATT_PROP_NOTIFY
AUDIOPROFILE_AUDIO 0xB002 AUDIOPROFILE_AUDIO is used as the audio stream characteristic, all audio frames will be transmitted using this characteristic. GATT_PROP_READ, GATT_PROP_NOTIFY

GATT_Notification() was selected as the primary vehicle for transmitting voice data over BLE in the voice profile implementation.

Notifications were selected because they have low packet overhead and are asynchronous in nature. These qualities make notifications ideal for voice streaming applications. Before the voice stream begins, the peer device must enable notifications by writing 01:00 to the CCCD of both AUDIOPROFILE_START and AUDIOPROFILE_AUDIO. If notifications are not enabled, the remote will not stream voice data.

The basic flow of a voice transmission is:

  1. CC2640R2F sends a start command (0x04) notification (if enabled) on the AUDIOPROFILE_START characteristic.
  2. CC2640R2F starts streaming voice data. See Sequence diagram for Voice Transmission for more details
  3. CC2650 sends a stop command (0x00) notification on the AUDIOPROFILE_START characteristic.

See the figure below for an illustration of voice transmission over BLE. See BLE Voice Frame Data for more information about the contents of the BLE voice frame.

@startuml
Receiver <- Transmitter: Advertisements
Receiver -> Transmitter: Connect Req
Receiver <-> Transmitter: Voice Service Discovery

Receiver -> Transmitter: Enable Notifications on AUDIOPROFILE_START Char
Receiver -> Transmitter: Enable Notifications on AUDIOPROFILE_AUDIO Char

...Wait until transmitter begins streaming...


Receiver <- Transmitter: GATT_Notification - AUDIOPROFILE_START - start command (0x04)

group Repeat For Each frame in voice stream


Receiver <- Transmitter: GATT_Notification - AUDIOPROFILE_AUDIO -  Metadata + voice data
Receiver <- Transmitter: GATT_Notification - AUDIOPROFILE_AUDIO -  voice data
Receiver <- Transmitter: GATT_Notification - AUDIOPROFILE_AUDIO -  voice data
Receiver <- Transmitter: GATT_Notification - AUDIOPROFILE_AUDIO -  voice data
Receiver <- Transmitter: GATT_Notification - AUDIOPROFILE_AUDIO -  voice data

end

Receiver <- Transmitter: GATT_Notification - AUDIOPROFILE_START - stop command (0x00)


@enduml

Figure 96. Sequence diagram for Voice Transmission

BLE Voice Frame Data

By default the voice profile will send 20 bytes of application data per notification. Thus, it is thus ideal to choose a PDM driver frame length that is a multiple of 20 bytes.

Recall from PDM Driver Metadata that each frame should contain 4 bytes metadata as well. There is a compromise between frame duration and overhead, which was found to be optimized at a total frame length of 100 bytes, which includes 4 bytes metadata.

The numbered headers in the voice frame above are the metadata fields provided by the PDM driver. See PDM Driver Metadata for an explanation of the metadata fields.

../_images/Audio_packetformat.jpg

Figure 97. One audio packet.

../_images/aafig-b8f0328669d9938be8e2ead0794d3deef9ddca85.png

When transmitted over the air, the audio frames are fragmented into 20 byte notifications, this means that each audio frame is sent as 5 notifications:

../_images/Audio_packedIn5Notification.jpg

Figure 98. One audio frame sent over the air as 5 notification.

Modifiying the Latency

The built in flow control in the Bluetooth low energy Protocol is used to ensure delivery of full audio frames during streaming.

Since the header of each audio frame contains the information required to decode that frame separately, the safest way to discard data in e.g. a noisy environment is to discard the full frame.

The PDM driver will drop full audio frames when there are no available buffers, and the application will handle one frame at a time until it has been successfully queued up in the TX FIFO within the BLE5-Stack.

As mentioned above, the application must service a PDM buffer every 2ms. If the application requires a longer contiguous chunk of processing time or a marginal RF environment is causing many re-try events then the number of PDM buffers can be tweaked by modifying MINIMUM_PDM_BUFFER_QUEUE_DEPTH.

Each increase in MINIMUM_PDM_BUFFER_QUEUE_DEPTH triggers a corresponding increase of 2ms of latency which allows the application more time to process. The cost of the increased latency is increased RAM useage.

The user application should be profiled to find the optimal tradeoff between the expected RF conditions, RAM useage, and latency.

Throughput Requirement for BLE

The general required throughput for sending audio frame has been covered in Throughput Requirement. Here we will cover the calculation for required throughput when we take BLE specific headers into consinderation.

Data Calculation rate
L2CAP and ATT header (7 * 5) B / 12ms 23.33kbps
Complete packets overhead (21 * 5) B / 12ms 70kpbs

From Throughput Requirement, we learned that the required thoughput for audio frame is 66.67kbps. After adding the overhead from BLE headers, the required throughput is 70 + 66.67 = 136.67kbps

The hid_adv_remote application will try to transmit as many available audio notifications as possible for every connection event. This means that the required throughput can be obtained with different settings for the connection interval as longs as enough packets can be transmitted in each connection event to successfully reach the ~417 notifications per second limit.

1 notification = 20 B audio data = 160 bits audio data.

Required audio data throughput = 66670bps.

66670 / 160 ~= 417 notifications per second

Typically a connection interval of 10ms can be used where 3-5 notifications are transmitted every connection event.