Similarity may seem to be an irreducible psychological primitive, like
“red”, but various theorists have tried to show how it relates to
other fundamental considerations. Let us assume, as do most writers,
that stimulus objects are internally represented, and that similarity between
objects comes from some sort of comparison between their representations.
One then confronts several interlocking questions, for example: (1) What
information is carried in the stimulus representations? (2)
How is this information combined or structured within the representation?
(3) How are representations compared in arriving at a “similarity”?
(4) Given a set of stimuli, how are their similarities determined
and best represented?
The rest of this section summarizes five different theoretical approaches to
The section is not a comprehensive review of this large and complex
subject, but suggests aspects of it that one may encounter in the
avian literature or, in the case of feature models, seem promising lines
for future research with animals. Different approaches
stress one or the other of the above questions, making their appearance
in a single list somewhat problematic. Consideration of the last question
is postponed to the measurement section of this chapter.
Common Elements Approach
elements of two stimuli are represented by "x"s.
The proportion of elements common to the stimuli (in red)
determines their similarity.
One of the best known attempts to suggest processes underlying similarity
or generalization follows from the representation of stimuli as collections
of elements (e.g. Estes, 1955). In applications to conditioning (e.g.
Wagner, 1981), the elements typically carry excitation or inhibition, and
they mediate transfer by appearing in more than one stimulus, as suggested
in Figure 2. If desired, similarity can be
calculated by counting up numbers of common elements relative to other
elements and/or by summing their values.
This scheme was developed primarily to model associative processes,
and it is ill suited to handle similarity among stimuli of any complexity.
However, it can yield useful quantitative predictions of generalization
phenomena along a simple continuum by assuming a series of stimuli each
of which shares elements with its neighbors. Thus, reinforcement
strengthens elements contained in the reinforced stimulus, and responding
generalizes to other stimuli to the extent that they contain elements in
common with the reinforced stimulus.
Appropriately conjoined with other theoretical elements, this scheme
applies beyond simple generalization. For example, I used a variation
of this approach, coupled with the basic associative process of the Rescorla-Wagner
model (Rescorla & Wagner, 1973) to predict the remarkable phenomenon
of dimensional contrast (D. Blough, 1975; see also D. Blough 1983). The phenomenon is a behavioral "edge effect"; it arises (for example)
when all stimuli on a continuum are reinforced except one "negative" stimulus,
which appears frequently without reinforcement. Paradoxically, subjects
then respond more strongly to stimuli fairly similar to the negative stimulus
than they do to more distant stimuli. Data from pigeons that exemplify
this effect appear in Figure
model that generates this dimensional contrast effect is diagrammed in Figure
Here, stimuli adjacent on the abscissa share common elements. The
series of graphs suggests how the presentation of successive unreinforced
and reinforced stimuli alter the associative strength of these elements,
resulting in the development of the contrast “edge” effect.
just illustrated, stimuli defined in terms of undifferentiated elements may help
to clarify some aspects of
learning and stimulus discrimination.
For most purposes, however,
this simple scheme does not adequately represent the similarities among
We turn next to stimulus representations that carry more information.
|Figure 5. Template
models use a point to point correspondence check (green
lines) between image-like representations. The
chance that the representations will be judged similar
may be improved by using a fuzzy representation (here,
filtered to remove high spatial frequencies), among
other transformations. In this figure, the sharp
representation might be from current input, the fuzzy
one a representation drawn from memory.
Template models were developed as an answer to the problem of object
recognition, and they incorporate at least implicitly the idea of similarity
comparison. The representations assumed by template models carry
much more detailed information about stimulus structure than do the element
representations just described. These models are usually applied
to spatially extended visual objects, and their representation can be thought
of as being spatially organized. Similarity is based on the degree
of correspondence in a point-for-point spatial comparison between the representations
being compared, as suggested in Figure 5. Such models have often
been dismissed because they seemed incapable of detecting similarities
among forms that are displaced, rotated, or enlarged. However, such
objections have been countered by evidence for preprocessing operations
that may transform forms to comparable orientation or size; there
is also evidence that when training is controlled, subjects may not, after
all, generalize very easily to objects that are expanded, contracted
or rotated (e.g. Tarr & Bulthoff, 1998).
I used a template scheme in a modestly successful attempt to predict
data on pigeon alphabetic letter and random-dot form similarities from
superposition of fuzzy representations (D. Blough, 1985). As the
upper form in Figure 5 suggests, a fuzzy remembered representation
of a reinforced target was compared with a representation of the stimulus
input. As in many such schemes, the forms were first brought into
correspondence around their centers of gravity. The “fuzziness” then
allowed the comparator to register degrees of correspondence even when
an exact match did not occur at a given point of comparison.
A more general and sophisticated example in the avian literature is
the template matching that appears as a component of the theory of pattern
recognition proposed by Heinemann & Chase (1990), which is an extension
of a general model for learning and behavior (e.g. Heinemann, 1983; Chase
& Heinemann, 2001).
This model represents stimulus objects by pixels, each of which is characterized
by its spatial coordinates, as well as by hue, saturation and brightness.
Such representations exist both as current input and as exemplars of previous
inputs that are stored in memory. Simulations with the model
involve the calculation of the correspondence between a current input and
a sample of exemplars drawn from memory. The memory representations
are fuzzy in that pixels in the memory representation are represented by
bivariate Gaussian distributions, so that points near to, but not exactly
corresponding with, input pixels can contribute to a match between input
and memory representations. Contributing also to variability in the
similarity computation are Gaussian noise that is added to the memory items
at the time of storage, as well as the variety in both input and stored
items due to variations resulting from stimulus distance and similar factors
in the images.
The Heinemann and Chase template model has provided good fits
to a variety of data, including generalization to 2-dimensional forms altered
in size, position, and rotation (Heinemann & Chase, 1990) and alterations
of visual context (Donis, Heinemann & Chase, 1994). It was about
as successful in predicting letter and random-dot form similarities as
my fuzzy template model (D. Blough, 1985; Heinemann & Chase, 1990).
All in all, this seems a promising approach well worth exploring in future
|Figure 6. This set of birds is arranged in
a similarity space of two dimensions. Small birds go near the
top, big ones near the bottom. Red birds go to the left,
yellow ones to the right. Intermediate values on either
dimension find their appropriate places. In the simplest case,
the length of a straight line drawn between any two birds would
correspond to its empirically determined similarity. The
section on measurement
goes into that matter in some detail.
The geometric approach stresses the representation of similarity relationships
among the members of a set of objects. An individual stimulus object
is represented simply by its coordinates in a "similarity space."
Similarity is given by distance between objects in this space; the closer
together two objects are, the more similar they are. The approach
assumes (1) that objects can be represented by values on a few continuous
dimensions, and (2) that similarity can be represented by distance in a
coordinate space. Figure 6 shows an example of stimulus
objects placed in a space defined by size and color dimensions.
The geometric approach to similarity does not go beyond the simplest
representation of objects themselves, and in itself has little to say about
the cognitive processes through which similarity relationships are determined
(but see Shepard, 1987). However, the outcomes of scaling procedures
based on the geometric idea can suggest the qualitative nature of the dimensions
on which stimulus representations may vary and they can also suggest how
that information is combined. Further, such procedures yield information
on the relationship between similarity and the data input to scaling algorithms.
The most notable example is Shepard's Universal Law of Generalization,
which states that the probability of generalization decays exponentially
with dissimilarity, and does so in accordance with one of two metrics.
A more detailed look at these matters is given below in the section on
Although the geometric approach has theoretical beauty and practical
advantages, its assumptions limit its applicability. In a classic
article Tversksy (1977) pointed out that both of the two major assumptions
of the geometric approach are open to question. First, if dissimilarity
is to be represented as a metric distance, it must follow the three metric
axioms of minimality, symmetry, and the triangle inequality (see Notes
on the Metric Axioms), but data contrary to these axioms have arisen
in various experimental situations with humans. Secondly, few stimuli
differ from each other in only a few continuous dimensions such as size
and color. Most stimuli seem to be more effectively described by
the presence or absence of qualitative features. We consider these
objections in turn.
Most of Tversky’s examples of the failure of metric axioms are
based on human judgments involving abstract relations among objects.
For example, minimality implies that an object is most similar to itself,
but sometimes an object is identified more often as another object than
as itself. Also, the probability that two identical objects are judged
“same” varies with the objects judged. Symmetry implies that object A is
as similar to object B as B is to A, but this often fails; for example
North Korea is judged more similar to China than is China to North Korea.
The failure of these axioms in avian data could be helpful in tests
of theoretical accounts, but few examples seem to be available.
Minimality seems to be violated when, as sometimes happens in a generalization
test, a pigeon significantly and repeatedly responds more strongly
to a novel stimulus than to the training stimulus. Symmetry
failed in a discrimination task in which pairs of letters appeared on the
display screen, one letter as target, the other as distractor. For
some letter pairs, performance was distinctly better when one of the two
letters was the target than when the other letter was the target (D. Blough,
1985). In that case the asymmetry could arise from a preference
or bias; Tversky lists various other sources.
As mentioned above, a second problem with the geometric model is that
even with its metric assumptions intact the approach seems inappropriate
for objects that seem to differ in a number of qualitative ways rather
than in a few ways that correspond to continuous dimensions.
For this reason, Tversky and others have assumed that an object is represented
by a set of features or attributes. Usually these are binary variables
(e.g., voiced or unvoiced consonant) or parts that are present or not (e.g.,
eyes; tail; horizontal bar), but they may be ordered sets of properties
like color or size.
Figure 7. Representation of two objects
that each contains its own unique features and also contains
common features. An important aspect of Tversky's model is
that similarity depends not only on the proportion of features
common to the two objects but also on their unique
features. Each letter here represents a feature.
Based on this and several other assumptions, Tversky derived the following
(1) S(a,b) = xf(a and b) – yf(a-b)
Here, S is an interval scale of similarity, f is
an interval scale that reflects the salience of the various features,
and x, y and z are parameters that provide
for differences in focus on the different components.
Tversky's “Contrast Model” (1977) systematizes this feature approach.
A central assumption of the model is that the similarity of object
a to object b is a function of the features common to a and b
( "A and B"), those in a but not in b (symbolized
"A-B") and those in b but not in a (" B-A"). A diagram
exemplifying this appears in Figure 7. Note especially that
similarity is not just a function of common features, but depends also
on features that are unique to each object, and that the relative importance
of these varies with the parameters y and z.
This formulation makes principled sense of several characteristics of
similarity data that contradict the metric assumptions discussed above.
The most troubling is probably asymmetry. This often goes along with
task asymmetry; for example, "how similar is A to B"
may give a different answer than "how similar is B to A". Avian
examples of task asymmetry are the generalization test, where a visible
test stimulus is compared with a remembered training stimulus, and the
search task, where a remembered, searched-for target is compared with irrelevant
distractors. Tversky suggests that when the subject focuses on a
particular stimulus, such as the search target, the features of that stimulus
are weighted more heavily than the features of alternative comparison stimuli.
Thus, in Figure 7, if Object a is the focus
of attention, its features (shown in red and green) will tend to
be heavily weighted; those unique to it (green) are the ones that
asymmetrically affect the similarity computation, for the parameter y
in equation (1) is larger than
z. If, instead, Object
b becomes primary, its features are more heavily weighted and a different
similarity S is computed in equation (1).
Apart from attention or the role of the stimulus in the task,
the number and salience of unique features can also affect the computation,
as suggested by the size and number of features allotted to Object b
in Figure 7. Apart from the examples suggested above, little note has been
taken in the non-human literature of the considerable analytic possibilities
that Tversky’s approach may suggest for avian cognition.
Like template theory, Biederman’s geon theory (e.g. Biederman, 1987)
relates primarily to object recognition and centers on the representation
of visual form. According to geon theory, stimulus objects
are represented by primitive shapes or elementary parts, like cylinders,
bricks, or cones, that stand in particular relations to one another.
According to the theory, generalization between two objects will occur
if the same parts and relations are visible in both, even if details of
the images of the various parts change considerably. For example,
if an object is rotated but none of parts or relations is obscured the
object is still recognizable and, presumably, the rotated and unrotated
images are similar. Wasserman and his colleagues have made a notable
attempt to apply this theory to pigeon discrimination and transfer among
visual stimuli, and an account that compares it with other approaches,
particularly template theory, may be found in Wasserman et al, 1996. Though it is clearly relevant to accounts of similarity,
geon theory does not pretend to be a general theory of similarity. (For an extensive
discussion of geon theory,
see Kirkpatrick, 2001)
Next Section: Experimental