Introduction and Theoretical Motivations

The human experience of music is a
complex process. The human brain recruits multiple neural
processes to convert signals from the outside world into the
multidimensional musical experience. These processes enable the
perception and memory of pitch, rhythm, melody, harmony,
tonality, and timbre, as well as knowledge of musical
structures, preference for features and items, and expertise in
the perception and production of music. To ask about the nature
of musical experience, then, is to ask about the confluence of
various sub-disciplines in psychology and neuroscience.
Findings in music perception and cognition could have
widespread generalizability across various fields including
perception, attention, language, learning and memory, cognitive
neuroscience, and emotional function (Schmuckler, 1997).

A Model of Musical Experience

This dissertation comes from the
interdisciplinary field of music perception and cognition,
which tries to scientifically account for the experience of
music in the human mind. A model of musical experience is
presented in figure 1.

Figure 1. A model of musical
experience.

As is laid out in the model, musical
experience incorporates perception, cognition, and emotion,
which can be investigated via converging methods in psychology
and neuroscience. The perception of music requires multiple
brain processes that convert the auditory signal, with its
psychoacoustic features such as frequency and duration, into
attributes of the auditory experience such as pitch and rhythm.
Music cognition, which deals primarily with the acquisition and
interpretation of musical knowledge, includes attention,
memory, the acquisition of expertise (learning), and the
formation of expectations (predictions, anticipations) for
sounds in the immediate future. Emotional function in music
includes preference, valence, and arousal, and may arise from
individual differences or as a function of experience.

The central point of this dissertation
is that humans can acquire many aspects of musical experience
via exposure. Within the scope of this thesis, I investigate
the roles of expectation, familiarity, arousal, preference, and
the psychoacoustical factors of harmony and timbre in the
learning of a new musical system.

Grammatical Structures in Music

Generally speaking, music consists of
series of sounds. The sequential presentation of pitches gives
rise to melody, which is thought of as the horizontal dimension
of music (Piston & DeVoto, 1987; Tramo, Cariani, Delgutte,
& Braida, 2001). Learning melodies involves auditory stream
formation (Bregman, 1990) and Gestalt factors (Meyer, 1956),
which may be innate or learned at a very early age (Meyer,
1956; Huron, 2004).

In addition to pitches
presented sequentially, music can also consist of pitches
presented simultaneously. This simultaneous presentation of
pitches gives rise to chords and chord progressions, which can
be construed as harmony, or the vertical dimension of music
(Piston & DeVoto, 1987; Tramo et al., 2001).

It has been proposed that the
vertical and horizontal aspects of organization in music
together form a set of grammatical rules (Lerdahl, 1992;
Bernstein, 1973) which a listener must use in order to
understand and appreciate music as an art form. The importance
of grammatical structure also applies to the composer, who
selects material for a composition according to grammatical
constraints. Thus, music for both the listener and the composer
relies on rules and preferences which can be construed as
musical grammars (Lerdahl & Jackendoff, 1983). A true
musical experience, according to Lerdahl (1992), requires the
alignment of the compositional grammar with the listening
grammar.

The Human Ability to Learn Music

Over and over again, the human brain is
shown to possess knowledge of grammatical structures in music.
For instance, knowledge of musical harmony has repeatedly been
demonstrated in both musically trained and untrained humans
(e.g. Krumhansl & Kessler, 1983). And reaction time studies
have shown that performance levels on a variety of tasks, such
as consonance detection (Bharucha & Stoeckig, 1986), timbre
and contour discrimination (timbre discrimination: Tillmann,
Bigand, Escoffier, & Lalitte, 2006; contour discrimination:
Loui & Wessel, 2007), and even phoneme identification
(Poulin-Charronnat, B., Bigand, E., Madurell, F., &
Peereman, R. 2005) and visual target identification (Escoffier
& Tillmann, 2006), are superior when the target is
accompanied by expected harmonies as dictated by
common-practice Western music theory. From the
electrophysiological literature, grammatically unexpected
chords are shown to elicit several components of brain
potentials. The Late Positive Component, a parietally-centered
positive waveform, is largest around 600ms after the onset of a
melodically unexpected note (Besson & Faïta, 1995). In
contrast, harmonically unexpected chords are shown to elicit an
Early Anterior Negativity at around 200ms, followed by a Late
Negativity largest prefrontally at around 500ms (Koelsch, S.,
Gunter, T., Friederici, A. D., & Schroger, E.; 2000).
Individuals without musical training do have similar neural
activations elicited by unexpected harmonic or melodic endings
(harmonic endings: Koelsch et al., 2000; melodic endings:
Besson & Faïta, 1995), although these effects may be
diminished or delayed in nonmusicians (Besson & Faïta,
1995).

From a developmental standpoint,
efforts have been made to understand the knowledge of musical
harmony in children and infants. Starting as young as 9 months
of age, infants prefer consonant harmonic intervals over
dissonant ones (Schellenberg & Trehub, 1996). Children as
young as 4-5 years of age show the same Event-Related
Potentials as adults in response to unexpected chords (Koelsch
et al., 2005). At six years of age, children are faster and
more accurate at perceptual tasks when the target co-occurred
with an expected chord in a harmonic chord progression
(Schellenberg et al, 2005). As a whole, these developmental
studies suggest that musical knowledge and expectations are
already well-formed early in life.

Musical knowledge and expectations have
also been examined cross-culturally. A number of researchers
have compared findings in the Western harmonic system against
musical systems of other cultures such as Indonesian scales
(Castellano et al, 1986) and the North Sami Yoiks of Finland
(Krumhansl et al, 2000). Findings confirm that individuals of
other cultures demonstrate knowledge of the underlying
statistics of their musical culture, much as do individuals
exposed to the Western musical culture.

Taken together, results from
the literature demonstrate that humans from a young age have
robust knowledge and sensitivity to the music in their culture.
These results have led people to ask about the source of
musical knowledge. One possibility is the existence of innate
psychoacoustical principles (such as sensory consonance and
dissonance) that underlie the perception of musical rules and
structures. Another possibility is that knowledge of musical
harmony is learned via exposure to music in the culture.
Musical training often includes explicit instruction of the
principles of traditional music theory, resulting in explicit
knowledge of Western musical harmony. However, studies have
shown that humans without explicit musical training also evince
strong behavioral and physiological responses to
common-practice Western musical harmony (see e.g. Bharucha
& Stoeckig, 1986; Koelsch et al, 2000), demonstrating that
explicit training is not required. It would seem, therefore,
that knowledge of the principles of musical grammar can be
implicitly acquired by humans via exposure to music (Meyer,
1956; Huron, 2006). While nature and nurture are probably both
involved in music we enjoy today, their relative contributions
to musical systems remain unknown. To address the question of
the source of musical knowledge, we propose a new method to
address music learning: the learning of artificial musical
systems.

The present dissertation addresses the
design and implementation of a new musical system, and a series
of studies in which we use this system to investigate the human
learning ability, especially music learning. The hope is that
the approach used here will help understand the human
experience of music, as well as the perceptual, cognitive, and
emotional constraints on learning in a more domain-general
sense. Before detailing the artificial musical system, however,
the rationale for these experiments must be elucidated; thus I
start with a review of literature to date on the learning of
artificial systems outside of music, which forms a background
to our own music learning studies.

Studies on Artificial Grammar and Statistical
Learning

The central point of this thesis is
that humans can rapidly learn a novel musical system. One
rationale for this comes from the recent field of statistical
learning in language acquisition. Starting with Saffran, Aslin,
and Newport's finding (1996) that eight-month-old infants
demarcate boundaries between artificial words after two minutes
of exposure to an artificial speech stream, statistical
learning has taken off into a growing body of literature
demonstrating that humans can, by computing the statistical
properties of sounds, words, and word categories, acquire
various aspects of language. Aspects of language that have been
demonstrated to be acquired statistically include word
boundaries (Saffran, Aslin, & Newport, 1996), phonotactic
probabilities (Theissen & Saffran, 2003), phoneme
categories (Maye, Werker, & Gerken, 2002), and word meaning
(Hudson Kam & Newport,). The statistical learning mechanism
is sensitive to transitional probability (Aslin, Saffran, &
Newport, 1998), but items learned need not be temporally
contingent: statistical learning is shown to operate over
nonadjacent but statistically dependent word segments (Newport
& Aslin, 2004) as well as nonadjacent dependencies between
word-like units when given sufficient variability in the
intervening units (Gomez & Schvaneveldt, 2002).

Data suggest that statistical learning
may not be tied specifically to language, but may reflect a
domain-general mechanism. Statistical learning has been shown
to operate for tone groups (Saffran, Johnson, Aslin, &
Newport, 1999), visual patterns (Kirkham, Slemmer, &
Johnson, 2002), and motor sequences (Hunt & Aslin, 1999).
Also suggestive, similar learning has been observed in
non-human primates (Hauser, Newport, & Aslin, 2001).

In music, learning has been
demonstrated with sequences of tones varying in pitch (Saffran
et al., 1999) and timbre (Tillmann & McAdams, 2004).
However, much of grammar in common-practice Western music
relies on harmony and chord progressions (e.g. see Janata et
al., 2002), and experiments using artificial grammars of
harmony and chord progressions are as yet relatively unexplored
(although based on personal communication, e.g. McMullen &
Saffran, 2006, several attempts are ongoing in which musical
harmonies are treated as artificial grammars).

Regarding the relationship between
statistical learning and other non-linguistic domains, it has
been proposed (Perruchet & Pacton, 2006) that studies
testing the theory of statistical learning and implicit
learning studies such as those using the Serial Reaction Time
task (e.g. Cohen, Ivry, & Keele, 1991) fundamentally
reflect the same underlying mental processes which involve
performing computations of frequencies and transitional
probabilities. A similar view is presented by Reber's (1989)
claim that artificial grammar learning must reflect implicit
learning processes, as explicit knowledge of the artificial
grammar only seems to help learning if it is presented in the
same relevant dimension as the artificial grammar (Reber,
1989). Among the existing grammar-learning studies in the
literature, finite-state grammars account for a relatively
well-studied genre used in the laboratory setting.

A finite-state grammar typically
consists of a set of nodes connected by possible paths (e.g.
Reber, 1967, 1989; Gomez & Gerken, 1999). A legal string in
one grammar consists of a series of items that follow one set
of all the possible pathways leading from the first node to the
last. Learning is a processes leading to the successful
identification (recognition) of grammatical strings, especially
successful identification of new grammatical strings as being
legal in a given grammar (generalization). The learning of
finite state grammars has been shown with various surface
instantiations of the grammar, including letter strings (Reber,
Walkenfeld, & Hernstadt, 1991; Gomez & Schvaneveldt,
1994), syllables (Gomez & Gerken, 1999), keyboard symbols
(Zizak & Reber, 2004), and Chinese and Japanese characters
(Zizak & Reber, 2004). Interestingly, some experiments have
also reported not only recognition and generalization of
grammatical strings, but also an increased preference for
grammatical items (Zizak & Reber, 2004). The effect of
exposure on preference, coined the Mere Exposure Effect by
Zajonc (1968), was originally shown as increased preference
towards previously encountered, familiar items (Zajonc, 1968)
and has been subsequently replicated using various stimuli
(reviewed in Zajonc, 2001) including music (Tan et al, 2006).
Furthermore, the Mere Exposure Effect is stronger when stimuli
are presented subliminally (i.e. for brief time periods under
the limits of conscious awareness; see Bornstein &
D'Agostino, 1992), suggesting that the Mere Exposure Effect
taps into implicit rather than explicit processes.

A New Musical System for Studying
Learning

In the present dissertation we
investigate the acquisition of artificial musical grammars,
with the specific intentions of 1) examining behavioral and
neural indices of the human ability to learn and like new
music, and 2) demarcating the conditions, both of the sound
signal and of the individual's cognitive state, that optimize
learning and preference change. We designed two new musical
grammars based on a microtonal tuning system, very different
from that used in standard Western music. The musical grammars
were constructed such that they obeyed some psychoacoustic and
Gestalt principles, but were completely novel to all of our pa
rticipants. We presented adult participants with melodies
generated according to the grammars, and examined their
learning of the principles underlying those melodies. The
approach of an artificial musical system to investigate
knowledge acquisition is effective compared to existing
approaches of developmental and cross-cultural studies, as it
allows for a high degree of control. By providing an organizing
system of sounds that is novel to all experimental
participants, we can investigate learning in a controlled
environment where we can systematically manipulate the auditory
input and observe the extent to which musical knowledge is
acquired.