sitemap downloads home contact E-mail
Stage Tec Entwicklungsgesellschaft für professionelle Audiotechnik Berlin Stagetec Home
sitemap
imprint
 
News
History
References
Know-How
Trade Fairs
Distribution
Contact
Links
Downloads

Know-how

Audio in SDI: Divided and Reunited

back to overview

(2002)

 

see also:
 
Dual-clock Optical Sound

 


Many things became simpler with Serial Digital Interconnect (SDI). Quite a few things, however, have become more complex. This is true, for example, for audio routing – where there are entirely different requirements

The world of digital TV-studio technology has an international standard for transmitting video, audio, and ancillary data: SDI (Serial Digital Interconnect). SDI not only makes studio wiring simpler but it is a compact transmission standard, especially when sending a video signal to the broadcasting center via satellite. Image and sound use the same transmission path, not only reducing the costs but also preventing delays between video and audio. Moreover, SDI provides up to 16 audio tracks for each video signal that audio can easy be included with the video stream by SDI video routers. That eliminates the need for separate audio routing and simplifies the signal handling within large systems. The benefits of SDI are numerous. However, because it has evolved from the first days of digital video, some issues are created, particularly in the area of audio routing, e.g. sound processing.

Embedded Audio

SDI is a digital format, essentially a digitized version of the analog line signal. Additional data may be included into the blankings of the composite signal. Basically, the SDI signal has three elements: The line signal which holds the actual image information; the HANC (Horizontal Ancillary Data) at the end of each line (horizontal blank); and the VANC (Vertical Ancillary Data) between two video fields (Vertical Blanking). While VANC is used, for example, for time-code information, digital audio is exclusively transmitted within the HANC, so the audio information has to be split into data packets that must be integrated or embedded into the line signals. A total bandwidth of approx. 42.2 Mbps (NTSC, 525 lines) or 43.8 Mbps (PAL, 625 lines) can be achieved. These data rates are impressive, compared to the 2-channel AES/EBU interface with a rate of 3.072 Mbps.

Packet Distribution…

Three different types of data packets are proposed for HANC: audio data, ancillary, and control data distinguished by IDs contained in the packet header. All three types have a similar structure and all have one special feature in common: their length is variable, depending on the contents. The audio data is capsulated in packets each with four channels or two stereo channels plus extra data (C, U, V from the AES/EBU signal). This is the simplest form of audio embedding. However, the capacity of such an audio packet is only appropriate for signals with a maximum resolution of 20 bits only. If 24-bit audio is to be transmitted, the ancillarydata packets are set up to carry the remaining four bits. The control-data packets may hold additional information such as the sample rate or a delay setting. These three packet types make up a “ group” in this transmission scheme, and a maximum of four groups can be embedded into an SDI signal, allowing for transmitting 16 audio tracks with a sample rate of 48 KHz each within an SDI signal.

…Delayed

Video and audio are digitized with different rates that lack a common divider. The relation between both signal types at a 48 KHz sample rate becomes obvious in the table below

  Audiosamples je Zeile Audiosamples je Bild
525 Zeilen/29,97 Hz 3,05066 1.601,6
625 Zeilen/25 Hz 3,072 1.920

This means each video line can hold a little more than three audio samples per channel. That is generally considered unsuitable, since a fixed integral allocation of audio samples to lines is preferable. To solve this, audio packets can be of variable length – a certain number of audio packets carry three samples and a certain number of packets carry four samples. To accommodate the maximum number of channels, four of these packets must be inserted at the end of each line in a way that the entire available data rate per line is exploited. This distribution scheme is repeated with every image in a 625-line system; in a 525-line system, it recurs with each fifth image only. The receiver uses a buffer to rebuild a continuous audio stream from the data distributed to the packets. The existence of this buffer causes an audio-signal latency familiar in other areas of digital technology. Delays of 40 - 64 samples are standard; however, this value can be decreased using intelligent techniques for distributing packets to video lines. The packet distribution is not defined in the standard, so it is up to the designer of the SDI embedder to determine the optimum solution. This can causes compatibility issues between devices of different makes!

Three Against Four

Even if the embedding at the sender and the demultiplexing at the receiver comply with one another, the resulting latency must be considered at any rate. This is even truer if the audio is extracted from the SDI signal, externally processed, then reintegrated into the SDI stream. Another issue is a surround signal with five or more audio channels. As each group contains only four audio channels, two groups that must not show any shift in time are required for surround sound. An even more difficult problem caused by the different packet lengths – three or four audio samples respectively – occurs if one track is to be removed or added. Removing one block creates a gap in the data flow and wastes bandwidth. An even more drastic situation occurs if packets are overwritten. In a case where a packet with three samples is overwritten with a four-sample packet – data is lost!

Proper Organization

Therefore, many devices extracting audio tracks, de-embedders, from the SDI signal process only a single group. The other three specified groups are simply lost! This means that only four audio tracks are utilized/extracted instead of 16 tracks. This is insufficient for a single 5.1 surround signal, not to mention more extensive productions (e.g. multilingual soundtracks). The number of possible compatibility problems can be legion. For example, what if you’re using a limited embedding device but want to integrate an audio group into an SDI signal that already carries two audio-data groups? An intelligent system processing all four groups will check the received data stream for existing audio data before trying to add a new one. In addition, devices to exploit the maximum channel amount will relocate the packets, reorganizing the entire structure. This is true especially if audio data are to be routed within the SDI stream, or between different SDI signals.

The Simple Solution

As soon as you start working creatively with SDI, you will feel the need for such flexible compatible audio routing. For example, when preparing for the Olympic Winter Games at Salt Lake City, Switzerland’s SF-DRS faced the usual dilemma broadcasting to a multilingual country: they had to transmit each video signal with German, French and Italian commentary plus the international sound system. When the signal reached Switzerland, the audio signals had to be processed differently to each broadcasting station, since the broadcast commentary had to be in the language of that particular region. The demand arose for external deembedders and embedders to extract and reintegrate the signal quickly and all of the devices had to be compatible with one another as well as with their O.B. Truck fleet and numerous studios that are fully based on CANTUS and NEXUS systems. On this example one can see an obvious solution: An SDI board for NEXUS, the expert in audio-routing!*

* An SDI version of the NEXUS board for extracting and embedding 16 audio tracks in each SDI signal is currently being developed.

What is SDI?

SDI (or, to be more precise, SMPTE 259M) is a standard for serial transmission of digital video. In addition to the plain video data, it provides space for ancillary data including time-code data, user-specific control data, and digitized audio. SMPTE 291 M defines the generic format and location of ancillary data. Audio embedding is specified in SMPTE 272M. The image shows the various data areas of a composite video signal. A data rate of 270 Mbps is standard for traditional TV formats (4:2:2 Video). This rate is equal for 525- line systems (29.97 Hz) and 625- line systems (25 Hz). New standards with higher data rates have been developed after the introduction of new image formats and HDTV.

  back to overview II next II   upward

Stage Tec Entwicklungsgesellschaft für professionelle Audiotechnik mbH, D-12459 Berlin, Tabbertstr. 10
Phone: +49 30 639902-0, Fax: +49 30 639902-32, , © 2002-2008 Stage Tec Berlin

Home / Company / Sitemap