(2002)
see also:
Dual-clock Optical Sound

|
Many things became
simpler with Serial Digital
Interconnect (SDI). Quite
a few things, however,
have become more
complex. This is true, for
example, for audio
routing – where
there are entirely
different requirements
The world of digital TV-studio technology
has an international standard for
transmitting video, audio, and ancillary
data: SDI (Serial Digital Interconnect).
SDI not only makes studio wiring simpler
but it is a compact transmission
standard, especially when sending a
video signal to the broadcasting center
via satellite. Image and sound use the
same transmission path, not only reducing
the costs but also preventing
delays between video and audio.
Moreover, SDI provides up to 16 audio
tracks for each video signal that audio
can easy be included with the video
stream by SDI video routers. That eliminates
the need for separate audio routing
and simplifies the signal handling
within large systems. The benefits of
SDI are numerous. However, because it
has evolved from the first days of digital
video, some issues are created, particularly
in the area of audio routing,
e.g. sound processing.
Embedded Audio
SDI is a digital format, essentially a digitized
version of the analog line signal.
Additional data may be included into the
blankings of the composite signal.
Basically, the SDI signal has three elements:
The line signal which holds the
actual image information; the HANC
(Horizontal Ancillary Data) at the end of
each line (horizontal blank); and the
VANC (Vertical Ancillary Data) between
two video fields (Vertical Blanking).
While VANC is used, for example, for
time-code information, digital audio is
exclusively transmitted within the
HANC, so the audio information has to
be split into data packets that must be
integrated or embedded into the line
signals.
A total bandwidth of approx. 42.2 Mbps
(NTSC, 525 lines) or 43.8 Mbps (PAL,
625 lines) can be achieved. These data
rates are impressive, compared to the
2-channel AES/EBU interface with a
rate of 3.072 Mbps.
Packet Distribution…
Three different types of data packets
are proposed for HANC: audio data,
ancillary, and control data distinguished
by IDs contained in the packet header.
All three types have a similar structure
and all have one special feature in common:
their length is variable, depending
on the contents.
The audio data is capsulated in packets
each with four channels or two stereo
channels plus extra data (C, U, V from
the AES/EBU signal). This is the simplest form of audio embedding. However,
the capacity of such an audio packet is
only appropriate for signals with a maximum
resolution of 20 bits only. If 24-bit
audio is to be transmitted, the ancillarydata
packets are set up to carry the
remaining four bits.
The control-data packets may hold additional
information such as the sample
rate or a delay setting.
These three packet types make up a
“
group” in this transmission scheme,
and a maximum of four groups can be
embedded into an SDI signal, allowing
for transmitting 16 audio tracks with a
sample rate of 48 KHz each within an
SDI signal.
…Delayed
Video and audio are digitized with different
rates that lack a common divider.
The relation between both signal types
at a 48 KHz sample rate becomes
obvious in the table below
| |
Audiosamples je Zeile |
Audiosamples je Bild |
| 525 Zeilen/29,97 Hz |
3,05066 |
1.601,6 |
| 625 Zeilen/25 Hz |
3,072 |
1.920 |
|
This means each video line can hold a
little more than three audio samples per
channel. That is generally considered
unsuitable, since a fixed integral allocation
of audio samples to lines is preferable.
To solve this, audio packets can be
of variable length – a certain number of
audio packets carry three samples and
a certain number of packets carry four
samples. To accommodate the maximum
number of channels, four of these
packets must be inserted at the end of
each line in a way that the entire available
data rate per line is exploited. This
distribution scheme is repeated with
every image in a 625-line system; in a
525-line system, it recurs with each fifth
image only.
The receiver uses a buffer to rebuild a
continuous audio stream from the data
distributed to the packets. The existence
of this buffer causes an audio-signal
latency familiar in other areas of digital
technology. Delays of 40 - 64 samples
are standard; however, this value can be
decreased using intelligent techniques for distributing packets to video
lines.
The packet distribution is not defined in
the standard, so it is up to the designer
of the SDI embedder to determine the
optimum solution. This can causes compatibility
issues between devices of different
makes! Three Against Four
Even if the embedding at the sender and
the demultiplexing at the receiver comply
with one another, the resulting latency
must be considered at any rate. This
is even truer if the audio is extracted
from the SDI signal, externally processed,
then reintegrated into the SDI stream.
Another issue is a surround signal with
five or more audio channels. As each
group contains only four audio channels,
two groups that must not show any
shift in time are required for surround
sound.
An even more difficult problem caused
by the different packet lengths – three
or four audio samples respectively –
occurs if one track is to be removed or
added. Removing one block creates a
gap in the data flow and wastes bandwidth.
An even more drastic situation
occurs if packets are overwritten. In a
case where a packet with three samples
is overwritten with a four-sample packet
–
data is lost!
Proper Organization
Therefore, many devices extracting
audio tracks, de-embedders, from the
SDI signal process only a single group.
The other three specified groups are
simply lost! This means that only four
audio tracks are utilized/extracted
instead of 16 tracks. This is insufficient
for a single 5.1 surround signal, not to
mention more extensive productions
(e.g. multilingual soundtracks). The
number of possible compatibility problems
can be legion. For example, what if
you’re using a limited embedding device
but want to integrate an audio group
into an SDI signal that already carries
two audio-data groups?
An intelligent system processing all four
groups will check the received data
stream for existing audio data before
trying to add a new one. In addition,
devices to exploit the maximum channel
amount will relocate the packets, reorganizing
the entire structure. This is true
especially if audio data are to be routed
within the SDI stream, or between different
SDI signals.
The Simple Solution
As soon as you start working creatively
with SDI, you will feel the need for such
flexible compatible audio routing. For
example, when preparing for the
Olympic Winter Games at Salt Lake City,
Switzerland’s SF-DRS faced the usual
dilemma broadcasting to a multilingual
country: they had to transmit each
video signal with German, French and
Italian commentary plus the international
sound system. When the signal reached
Switzerland, the audio signals had
to be processed differently to each broadcasting
station, since the broadcast
commentary had to be in the language
of that particular region.
The demand arose for external deembedders
and embedders to extract
and reintegrate the signal quickly and all
of the devices had to be compatible with
one another as well as with their O.B.
Truck fleet and numerous studios that are
fully based on CANTUS and NEXUS
systems. On this example one can see
an obvious solution: An SDI board for
NEXUS, the expert in audio-routing!*
* An SDI version of the NEXUS board for extracting
and embedding 16 audio tracks in each
SDI signal is currently being developed.
What is SDI?
SDI (or, to be more precise, SMPTE
259M) is a standard for serial transmission
of digital video. In addition
to the plain video data, it provides
space for ancillary data including
time-code data, user-specific control
data, and digitized audio.
SMPTE 291 M defines the generic
format and location of ancillary
data. Audio embedding is specified
in SMPTE 272M.
The image shows the various data
areas of a composite video signal. A
data rate of 270 Mbps is standard
for traditional TV formats (4:2:2
Video). This rate is equal for 525-
line systems (29.97 Hz) and 625-
line systems (25 Hz). New standards
with higher data rates have
been developed after the introduction
of new image formats and
HDTV.
|