Instruments That Learn
David Wessel
extract from Computer Music Journal, Vol. 15, No.4, Winter 1991
Musicians often speak of a rather special and very personal
relationship with their instrument. Indeed, many instrumentalists
adapt the instrument physically to particularities of their
playing style ‹ choosing the bridge, string, bow, or the
mouthpiece reed combinations, and so on. On more poetic occasions
a musician will speak as if the instrument has come to know
something of its player. It would seem quite nature then to think
about intelligent instruments that could adapt in some automated
way to a personal playing style.
Before letting the intelligent instrument go too far towards
fantasy, I would like to present some learning technologies that
are now ripe enough for real applications and that might prove
suggestive for musical use. Let us look for a moment at the use
of adaptive neural networks for handwritten character
recognition. While I describe this handwriting classifier I would
like the reader to keep in mind conducting gestures and
signals.
At AT&T Bell Laboratories, Isabelle Guyon and her coworkers
(Guyon et al. 1991) have developed a system that exploits Time
Delay Neural Networks (Waibel et al. 1989) for writer independent
and writer-adaptive on-line character recognition. Their
recognition system is targeted for use in touch terminals and for
signature verification. One of the essential features of their
approach that distinguishes it from much of the work on character
recognition is that the sensing and encoding of the writer's
gesture sequence plays a fundamental role. A character is not
just a pixel map but a sequence of sample points in a plane.
Guyon represents a character at the output of the preprocessing
stage as a sampled sequence of features that describe the state
of the pen‹up or down‹ the coordinates of the pen,
and the slope and curvature of the pen's trajectory. With some
additional features like velocity, this sort of representation
scheme would seem the right sort of thing for conducting
gestures.
The sequence of feature vectors at the output of the
preprocessing stage is scanned by a Time Delay Neural Network
recognizer. This network is constructed so the simple local
topological features are combined through successive layers of
units into more complex and global features until the output
layer. Each of the units in the output layer identifies a
character. This network is trained on a large set of characters
from a large group of writers using the back propagation learning
algorithm (Rumelhart, Hinton, and Williams 1986), and a scheme
for emphasizing the learning of atypical writing styles. Guyon
and her colleagues have further developed their system so that
personalized characters and writing styles can be added to the
core of the writer-independent neural network. With these
techniques they have obtained classification accuracy of better
than 96 percent on test examples, a result far superior to state
of-the-art optical character recognition applied to the same
materials.
At CNMAT, Mike Lee, Adrian Freed, and myself have been exploring
the use of neural networks in conjunction with musical instrument
controllers like the Zeta Guitar and with alternate controllers
like Max Mathew's Radio Baton and Don Buchla's Lightning. These
latter devices can supply real-time spatial coordinate
sequences.
We have added neural-network objects to the MAX programming
environment Puckette and Zicarelli 1990) so that our
experimentation can be carried out in a live-performance context.
Our quite preliminary results are encouraging. We have obtained
reliable recognition of complex guitar strumming gestures and
limited numbers of spatial gestures. In both of these preliminary
experiments the musician supplied a set of personal gestures,
each of which was to provoke a specific response, thus providing
the training materials for the back-propagation learning
algorithm.
With such procedures and much more research, we might conceivably
move towards adaptive, personalizable instruments. There is a
special and intriguing dilemma here. As suggested earlier, some
types of music like jazz emphasize the development of personal
playing styles. Others like those based on traditional Western
performance practice emphasize a standardization of playing
style. With adaptive instruments there will be a new twist; one
will have to decide when to standardize or fix the instrument and
let the musician learn the appropriate gesture and when to let
the instrument adapt to the specialized approach of a player. How
to rig the training harnesses on ourselves as players and on our
instruments as expressively responsive musical tools will be a
question of scientific, aesthetic, and social concern.
References
- Guyon, I. et al. 1991. "Design of a Neural Character
Recognizer for a Touch Terminal." Pattern Recognition
24(2): 105-119.
- Puckette, M. and D. Zicarelli. 1990 MAX‹An
Interactive Graphic Programming Environment. Menlo Park,
CA: Opcode Systems, Inc.
- Rumelhart, D.D., G.E. Hinton, and R.J. Williams. 1986
"Learning Internal Representations by Error Propagation." In
D.E. Rumelhart and J. McClelland, eds.. Parallel Distributed
Processing: Explorations in the Microstructure of
Cognition, vol.1. Cambridge, MA: MIT Press, pp.
318-362.
- Waibel, A. et al. "Phoneme Recognition Using Time-Delay
Neural Networks." IEEE Transactions on Acoustics, Speech,
and Signal Processing. 37:328-339.
A Refined Palate Controller
David Wessel
adapted by Adrian Freed from Computer Music Journal, Vol. 15,
No.4, Winter 1991
Alternatives to keyboard controllers like
Buchla's Thunder and Mathew's Radio Baton are innovative, but
they exist in very small numbers. The use of traditional
instruments as controllers shows promise, but there are still
problems with pitch extraction, the keyboard bias of the MIDI
specification and its bottleneck data rate, and the fact that it
is still very difficult to readily outfit a musician's preferred
instrument, be it a Stradivarius or a Gibson guitar, with the
acoustic and positional gesture sensors that are required to make
it a refined controller.
In the hope of inspiring research and development I would like to
briefly describe some new and exciting developments in sensor
technologies that may be the basis for new generations of
alternate controllers. These technologies may also help solve
some of the problems of adapting traditional acoustic instruments
to be effective controllers.
The micromechanics research group led by R.S. Muller and R. M.
White at the Berkeley Sensor and Actuator Center have made some
astonishing advances in the construction of mechanical structures
using microfabrication techniques derived from integrated circuit
processes (Muller et al. 1990). These microdynamic silicon
structures with moving parts show promise for the design of
high-performance sensors and can be combined with on-chip
circuits for the processing of the sensor data. The Berkeley
group is producing pressure sensors as well as accelerometers
with this technology. These may be used in the construction of
very small and unobtrusive musical instrument transducer
systems.
With these tiny sensor technologies in mind, I would now like to
make a proposal for a controller that would sense the position of
the tongue. This may seem outrageous, but after all, the tongue
can be manipulated in an extremely refined manner. It is perhaps
the most precise voluntary motor control mechanism we have in our
bodies.
I imagine that this tongue controller could be realized in the
following way. The custom-fitted sensor system would use a non
invasive dental retainer. Such retainers have been used for
decades in orthodontics. On the surface of the dental retainer
there would be placed an array of silicon-based pressure sensors
that would sense the planar image of the tongue as it came into
contact with the surface. Preprocessing of this tongue image
could be carried out by the on-chip circuits in the sensors.
Sensor data would be transmitted to the outside of the mouth in a
wireless manner.
One of the advantages of this tongue controller is that it leaves
our other very finely tuned manipulations, the hands, completely
free, and, it is, without a doubt, a very personal
controller.
References
- Muller, R.S. et al. 1990 Microsensors. New York: IEEE Press.
Source Model Loudspeakers
David Wessel
adapted by Adrian Freed from Computer Music Journal, Vol. 15,
No.4, Winter 1991
Instead of placing a number of loudspeakers
in separate locations to achieve stereophony, surround sound, or
simulations of room information, I propose the development of a
loudspeaker system that is concentrated in a small and single
location and that has a programmable radiation pattern. It would
be the computer instrumentalist's sound source.
One motivation for this idea comes from a number of frustrating
experiences with chamber music that mixes electroacoustic and
traditional acoustical instrumental sources. Here the acoustical
instruments are unamplified but the electronic part is passed
through a stereo or quad system. The usual result is that the
electronic part and the acoustical instruments sound as if they
are operating in entirely different acoustical spaces.
The system I am proposing would be an amalgam of digital signal
processing and loudspeaker technologies. The system would allow
for time-variant control of the polar radiation pattern as
function of frequency. It should be capable of simulating the
radiation characteristics of acoustic instruments that vary in
dramatic ways with the musical materials.
A look at the literature on acoustic instrument radiation (for a
summary see [Fletcher and Rossing 1991] ) shows that the polar
radiation patterns vary considerably as a function of frequency.
Furthermore, the nature of this variability is different across
the various instruments. This is true not only for musical
instruments but also for other natural sound sources. One is also
struck by the fact that conventional loudspeaker radiation
patterns are much more homogeneous than those of the acoustic
sources. Indeed, much effort has gone into the design of speakers
so that they behave in this more neutral manner. Furthermore,
conventional loudspeakers, when viewed across the spectrum, are
much more directional than the majority of natural sound
sources.
I am not proposing this type of speaker system as a replacement
for the stereo or surround-the listener approach. I view it as a
compliment to these approaches with some special advantages. Such
systems would allow the computer musician to perform in an
acoustically compatible manner with traditional instrumentalists.
If manufactured with portability in mind, performers could take
fuller responsibility for their sound, facilitating that more
personalized sound that I have argued for earlier. The synthesist
could specify the time variant radiation pattern behaviors in a
way that is tightly coupled with the synthesis algorithm.
Finally, there is a social implication. If such systems prove
satisfactory, they would help calm the religious wars between
"acoustic" and "electronic" performers and stimulate the
development and performance of an intimate and mixed music that
would thrive in small private and public spaces.
References
- Fletcher, N.H., and T. D. Rossing. 1991. The Physics of Musical Instruments. New York: Springer-Verlag.