Communication of Musical Gesture using the AES/EBU
Digital Audio Standard
Adrian Freed and David Wessel
CNMAT, 1750 Arch Street, Berkeley, CA 94709
(510) 643 9990, {adrian,wessel}@cnmat.berkeley.edu
- Abstract
-
We have adapted the AES/EBU digital audio standard to the
coding and transmission of transduced gestures. We discuss
the advantages of the AES/EBU standard over MIDI and other
candidate methods and describe alternative mappings of
gestural data to the audio streams of the AES/EBU protocol.
We conclude with a description of a reactive glove system and
a continuous position-sensing keyboard controller using
AES/EBU communications.
- Musical Gesture Communications Requirements
-
- Introduction
-
The following score card rates various standards for
data communications according to requirements for
gestural controllers for live musical performance:
|
|
AES
|
MIDI
|
USB
|
1394
|
Ethernet
|
SCSI
|
Parallel
|
ADAT
|
|
Isochronous
|
x
|
|
x
|
x
|
|
|
|
x
|
|
Electrical Isolation
|
x
|
x
|
|
|
x
|
|
|
x
|
|
Distance > 50m
|
x
|
|
|
|
x
|
|
|
|
|
Gesture samples/second > 100k
|
x
|
|
x
|
x
|
x
|
x
|
x
|
x
|
|
Synchronous clock
|
x
|
|
x
|
x
|
|
|
|
x
|
|
Connector Insertions > 1000
|
x
|
x
|
|
|
x
|
x
|
x
|
x
|
|
Complexity < 1000 gates
|
x
|
x
|
|
|
|
|
x
|
x
|
|
Unoriented connector
|
x
|
|
|
|
|
|
|
|
|
Locking Connector
|
x
|
|
|
|
x
|
x
|
x
|
|
|
Robust Cable
|
x
|
|
|
|
|
|
|
|
- Repeatability
-
One requirement for successful musical expression is
that the sonic response to a gesture must be predictable.
This implies repeatable data for the same gesture and
predictable delivery to the synthesis system. These needs
are best met with digital communications which avoid the
corrupting effects of inductive pickup, radio frequency
interference (RFI) ground loops, connector contact noise,
impedance mismatch, and "microphonic" cables. Precision
and accuracy are optimized by converting to digital form
as close as possible to the transducer and using the
smallest number of shielded conductors as possible.
- Throughput
-
Multimodal, multidimensional gesture arrays can result
in a effective data rates of more than 100,000 gesture
samples/second. This clearly exceeds the performance
available from MIDI.
- Reliability
-
Reliability during the rigors of live musical
performance is critical. This constraint immediately
eliminates many possible communications technologies. USB
and IEEE1394 do not provide for data transmission without
repeaters over distances required in most performance
venues. In addition USB and IEEE 1394 connectors are
designed for a limited number of insertions and are
oriented making it hard to connect them in the dark or
under difficult lighting situations typical of stage
performance. Also, multiwire cables are inherently less
reliable than thicker cables with a few wires. Optical
cables are fragile and those used in S/PDIF and ADAT are
short, i.e., 6m. We particularly favor the robustness
inherent in coaxial cables that may be used in AES-3
links.
Reliability is also critical in audio, video and
lighting device control; virtual reality; and medical
monitoring applications so these applications may too
benefit by adapting the AES/EBU digital audio
standard.
- Latency
-
Controlled latency is essential for music synthesis
applications that respond reactively to gestures. An
advantage of using a digital audio standard for gestural
transmission is that the AES/EBU and S/PDIF interface
cards available for most personal computers and
workstations are optimized for reliable,
controlled-latency reception of audio. We have achieved
our latency goal of 10± 1ms with the AES 3 input of
the SGI O2 and Octane. Although latencies are much longer
on Macintosh 8.0 and Windows 98 systems, we expect
competitive pressures to improve results in future
operating system and hardware products.
It is interesting to contrast these results with data
acquisition cards, an obvious alternative for gesture
transduction. The driver software for these boards is
often a source of debilitating latency and jitter,
because the primary focus in their construction is
reliable transfer of data to disk for later, non-real
time analysis.
Ethernet is an interesting alternative to explore
because it so widely available. Although timely delivery
of packets is a consideration in modern networking ,
there is no widely adopted, transport-independent,
application-layer protocol for using such networks in
reactive systems applications. Of particular concern is
the observation that the overhead of packet preparation,
assembly and disassembly now dominates transit latency .
We have achieved good latency and reliability results
with our Open Sound Control Protocol over 10BaseT and
fast Ethernet . We use this protocol primarily for
interprocess communication between synthesizer control
clients and sound servers. The main problem with Ethernet
for gesture transduction is the cost and complexity of
hardware and software to support the protocol stack.
- Isochrony
-
An important feature of this work is the decision to
communicate continuous measurements of gestures to the
synthesizing device. This provides the great flexibility,
lacking with MIDI, of experimentation with different
gestural interpretation computations . Gestural signals
can operate directly on synthesis parameters or be
analyzed and parsed into events. This sampled signal view
of gestures requires a stable local clock at the source
and an accurate clock recovery scheme at the destination.
These requirements are easily satisfied by AES/EBU links
because they use biphase signaling and are built to
satisfy stringent clock jitter specifications.
- Cost
-
Although parts cost is a factor, the exploratory
nature of our work leads us to minimize development cost.
Evaluation boards from Crystal (www.crystal.com) and AKM
(www.akm.com) contain a
clock, A/D converter and AES/EBU transmitter with coaxial
and optical outputs. Many gestural transduction
applications require significant signal pre-processing
for transducer calibration, linearization, smoothing and
noise reduction. This is easily provided using
development boards for DSP chips. Recent DSP chips
integrate an AES/EBU interface, e.g., Motorola
DSP56011.
- Gesture Formatting
-
AES/EBU and S/PDIF transmit 2 channels of 24-bit samples
at frame rates between 32kHz and 48kHz. Two additional bits,
the user and channel status bits, are also sent with each
sample. Since few drivers give access to the user and channel
status information, we encode gestures within the 24-bit
sample data.
Although not necessarily optimal for any given
application, this mapping covers many musical
applications:

The "F" bit is set every 64 sample frames. A is a 7-bit
field for low-resolution devices. "BLO" and "BHI" are 8-bit
each, and can be combined into a single 16-bit
high-resolution field. The original left channel is reserved
for audio values to support the common situation that the
performer's gestures are combined with an instrument or vocal
source.
- AES/EBU Gesture Acquisition System
-

Our prototype gesture acquisition is built around an audio
A/D converter evaluation board. Serial data from the
converter to the AES/EBU transmitter is interrupted by a
multiplexer. Left channel bits from the converter are passed
through. Right channel bits are derived from a latch and
shift register that stores and serializes results from the
gesture A/D converters. A 6-bit counter provides the frame
count bit and the multiplex control for the A/D converters
for the gestures.
- Example Applications
-
- Expressive Keyboard
-
Most electronic keyboards sense key position at only
the top and bottom of each key's travel. This is
sufficient to establish key up and down velocity to be
encoded and transmitted with MIDI. Unfortunately this
cheap sensing strategy does not adequately capture the
nuance available on acoustic instruments. Musicians use
fine control of key position to control the timbre of
sounds from mechanical tracker organs, harpsichords and
pianos. We are experimenting with different sensor
technologies to measure continuous key position including
ones based on an interrupted light beam, a reflected
light beam, and a bending resistive strip. The pedal and
stops, lower, and upper manual data is mapped to A, BLO,
and BHI respectively-yielding continuous position
estimates approximately every millisecond.
- Reactive Glove
-
In an initial experiment carried out in collaboration
with Butch Rovan , we mounted a force-sensing resistor
(FSR) on the tip of a finger. With a simple conditioning
circuit we obtained a 0 to 5 volt signal that we sampled at
an audio rate with a converter that did not eliminate, as
most audio conversion systems do, the DC and very low
frequency components. We then used this audio signal
interpretation of the gesture data to control various
synthesis parameters like the envelope and modulation index
of an FM patch. These initial experiments were carried out
on an ISPW card running FTS from IRCAM which has a very low
and stable latency (< 4 msec). Striking gestures like
those of a hand drummer were effective in producing
expressive synthetic sounds. The results showed great
potential for musical expressivity and led to the
construction of a series of lightweight, flexible, and
custom fitted gloves with FSR's mounted on the tips of the
fingers and thumbs. This FSR glove technology was combined
with an additional three dimensions of spatial location
technology to accurately locate the positions of the index
finger of each hand. Three dimensions of index finger tip
location and five FSR's per hand produced 16 analog signals
sampled at 3 kHz multiplexed into the single 48 kHz
channel. This combination of FSR and spatial location
sensing provides a flexible poly-point continuous
controller capable of considerable musical expressiveness
but requiring, as with most instruments, a lot of
practice.
- Future Work
-
We are exploring ways to add bidirectionality, higher
bandwidth and power delivery to the recognized strengths of
the AES/EBU digital audio standard.
|
|
AES/EBU
|
MIDI
|
USB
|
1394
|
Ethernet
|
SCSI
|
Parallel
|
ADAT
|
|
Bidirectional
|
|
|
x
|
x
|
|
|
IEEE1284
|
|
|
Power
|
|
|
x
|
x
|
x
|
x
|
|
|
|
Audio+Gesture
|
|
|
x
|
x
|
x
|
x
|
x
|
|
- Acknowledgement
-
We gratefully acknowledge the support of Gibson Guitar and
the Edmund O'Neill foundation.
- References
AES (1985). "AES recommended practice for digital audio
engineering-serial transmission format for linearly represented
digital audio data." Journal of the Audio Engineering Society
33(12): 975-84.
Hand, C. (1997). "A survey of 3D interaction techniques."
Computer Graphics Forum 16(5): 269-81.
Kim, J. H. and A. A. Chien (1996). Rotating Combined
Queueing (RCQ): bandwidth and latency guarantees in low-cost,
high-performance networks. ISCA '96: The 23rd Annual
International Conference on Computer Architecture,
Philadelphia, PA, USA.
Lee, M., Freed, A., Wessel, D. (1991). Real-Time Neural
Network Processing of Gestural and Acoustic Signals.
Proceedings of the 17th International Computer Music
Conference, Montreal, Computer Music Association.
Lee, M., A. Freed, et al. (1992). Neural networks for
simultaneous classification and parameter estimation in musical
instrument control. Adaptive and Learning Systems, Orlando, FL,
USA.
Lee, M. A. and D. Wessel (1992). Connectionist models for
real-time control of synthesis and compositional algorithms.
Proceedings of the International Computer Music Conference,
Computer Music Association.
Lindemann, E., F. Dechelle, et al. (1991). "The architecture
of the IRCAM musical workstation." Computer Music Journal
15(3): 41-9.
Roads, C. (1996). Musical Input Devices. The Computer Music
Tutorial. Cambridge, MIT Press: 619-658.
Rodrigues, S. (1997). High-Performance Local-Area
Communication With Fast Sockets. Usenix 1997.
Rovan, J. B., M. M. Wanderley, et al. (1997). Instrumental
Gestural Mapping Strategies as Expressivity Determinants in
Computer Music Performance. KANSEI - The Technology of
Emotion.
Wessel, D. (1991). Improvisation with highly interactive
real-time performance systems. Proceedings of the International
Computer Music Conference, Montreal, Computer Music
Association.
Wright, M. (1998). Implementation and Performance Issues
with OpenSound Control. International Computer Music
Conference, Ann Arbor, Michigan, ICMA.
Wright, M. and A. Freed (1997). Open Sound Control: A New
Protocol for Communicating with Sound Synthesizers.
International Computer Music Conference, Thessaloniki, Greece,
ICMA.