Every day we recognize a multitude of familiar and novel objects.
We do this with little effort, despite the fact that these objects may vary
somewhat in form, color, texture, etc. Objects are recognized from many
different vantage points (from the front, side, or back), in many different places,
and in different sizes. Objects can even be recognized when they
are partially obstructed from view.
While it may be obvious that people are capable
of recognizing objects under many variations in conditions, it has been
thought that pigeons may not possess the same range of capabilities. It has been proposed that pigeons act as "perceptrons," by analyzing simple
features of objects and using those features to recognize objects. If the pigeon were a perceptron, then it would not be able to
an object that varied slightly in form or was seen from a novel viewpoint
because the features would be altered. Moreover, a pigeon would be
unable to discriminate between two objects that contained the same features,
but with a different organization.
This chapter addresses a number of fundamental
issues relating to object recognition, concentrating particularly on an
avian species, the pigeon. The task is to determine whether the basic
process of object recognition in pigeons is at all similar to the most
probable process that has been proposed for humans. In order to demonstrate
the conditions under which object recognition may or may not occur, a number
of illustrated examples will be provided.
This section presents general-level background information, discusses
key theoretical concepts, and provides a short statement of the significant
findings of the specific experiments. More detailed descriptions can be
found in the following sections, to which links are provided throughout.
Readers who are well-versed in the basics of object recognition may wish
to proceed directly to the "Experiments" section.
One view of object recognition in
Cerella (1986) proposed that pigeons recognize objects via "particulate
perception." That is, pigeons perceive only local features of objects and
use those features to recognize specific patterns. He based these conclusions
on the results from a series of investigations which indicated that pigeons
were responding to local features only.
from normal drawings of other
Peanuts characters. Then, Cerella reorganized Charlie Brown by altering the
relations between the head, torso, and legs. He discovered that the pigeons
responded to scrambled versions of Charlie Brown in the same manner as the
original, intact drawings. Therefore, he concluded that the pigeon must be
insensitive to global organizational properties of objects. Insensitivity to
global object properties is one attribute of a particulate perceiver.
In one experiment, Cerella (1980) trained pigeons to
discriminate intact drawings of Charlie Brown
How could a particulate perceiver survive in the
world? A particulate perceiver would have to rely entirely on differences
in local features in order to discriminate and classify objects. Emergent
properties of objects such as overall form, spatial organization, and three-dimensional
structure would not have an impact on perception. It is difficult to imagine
how an organism that flies about in the world could successfully navigate
without using any information about the spatial organization of the surrounding
environment. The pigeon actually possesses two perceptual systems:
(1) a long-range guidance system; and (2) a shorter-range (food) detection
system. It is possible that a particulate mechanism may operate on the
grain-seeking system, which is invoked when closer range, smaller objects
are being detected and identified. Perhaps, the long-range guidance system
does make use of global organizational properties of the surrounding environment.
If the near foveal system of the pigeon does operate
using particulate features, then it is possible that the mechanisms of
avian visual perception differ substantially from the mechanisms of human
visual perception. The avian visual system does differ significantly in
the underlying neuroanatomy/ neurophysiology compared to the primate visual
system. However, there are analogous pathways and structures. It seems
somewhat premature to accept the unparsimonious assumption that the avian
(near foveal) visual system differs vastly from our own in terms of the
mechanisms of object recognition.
In order to address the differences in mechanisms
offered by Cerella's Particulate feature theory (PFT) and theories of human
object recognition, I will first describe a prominent account of object
recognition in humans, consider its predictions and compare them to the
predictions of Particulate feature theory, and then present a series of
experiments designed to specifically address any differences in the predictions
of the two theories.
theory of object recognition in humans
Recognition-by-components (RBC; Biederman, 1987) is a theory
of object recognition in humans that accounts for the successful identification
of objects despite changes in the size or orientation of the image. Moreover,
RBC explains how moderately occluded or degraded images, as well as novel
examples of objects, are successfully recognized by the visual system.
The major contribution of RBC is the proposal that the
visual system extracts geons (or geometric ions) and uses them to identify
Geons are simple volumes such as cubes,
spheres, cylinders, and wedges. RBC proposes that representations of objects
are stored in the brain as structural descriptions. A structural description
contains a specification of the objectís geons and their interrelations
(e.g., the cube is above the cylinder). A perceived object is analyzed
by the visual system, which parses the object into its constituent geons.
Then, the interrelations are determined, which include aspects such as
relative location and size (e.g., the lamp shade is left-of, below, and
larger-than the fixture). The geons and interrelations of the perceived
object are matched against stored structural descriptions. If a reasonably
good match is found, then successful object recognition will occur.
The RBC view of object recognition is analogous to speech perception. A
small set of phonemes are combined using organizational rules to produce
millions of different words. In RBC, the geons serve as phonemes and the
spatial interrelations serve as organizational rules. Biederman (1987)
estimated that as few as 36 geons could produce millions of unique objects.
RBC was developed to account for primal recognition of objects; primal
recognition is fast-acting and does not utilize higher-level cognitive
processes. Higher-level processing may involve the use of shading, texture,
or color in finer discriminations of objects. Additional top-down processing
may also occur when environmental cues such as context are used to identify
particularly difficult instances of objects (e.g., a pencil would be easier
to recognize if it was partially occluded by a stack of papers on a desk
than a pile of leaves in the yard).
There are three major facts of object recognition in humans that are
predicted correctly by RBC, but are at odds with Particulate feature theory.
Next sections review these predictions for humans and the subsequent colored
tables highlight recent results examining these issues with pigeons.
1. The correct
spatial organization is essential for picture recognition in humans
(Biederman, 1972; Biederman, Glass, & Stacey, 1973; Biederman, Rabinowitz,
Glass, & Stacey, 1974). Because RBC is based on the assumption that
a small set of geons are the basis for millions of objects, organizational
rules must play a large role in object recognition. It is possible to have
different objects made up of the same parts, so discriminating between
those objects necessarily involves a sensitivity to spatial interrelations.
This prediction of RBC stands in greatest contrast to Particulate feature
theory, because PFT predicts no role for spatial organization.
|If pigeons recognize objects using local
features alone, then variations in the arrangement of those features would
have little or no impact on the accuracy of recognition. Thus, unlike humans,
the pigeon would be incapable of discriminating between the cup and the
pail. The cup and the pail are comprised of two components: a cylinder
and a curved handle. However, the orientation and position of the handle
relative to the cylinder differs between the objects. In order to discriminate
the cup from the pail, one must be able to recognize the differences in
the organization of the components, a more global property of objects. Several
experiments by Kirkpatrick-Steger, Wasserman, and Biederman have demonstrated
that pigeons can discriminate changes in spatial organization, and that
spatial organization plays a key role in picture recognition in pigeons.
There is, however, one difference in
the local features of the cup and pail -- the points of contact (intersections)
between the handle and cylinder differ slightly. If pigeons were attentive
to fine variations in local features (as PFT argues), then the differences
in contact pionts could prove sufficient in differentiating between these
objects. Kirkpatrick-Steger, Wasserman, and Biederman
(1998) ruled out the contribution of the contact points as a significant
contributor to object recognition in pigeons.
2. If a subset of only two or three
geons are available and they are in the correct spatial organization, then
successful object recognition will occur. RBC predicts this
result because object recognition does not require an exact match between
the perceived object and stored structural description. In contrast,
PFT predicts a detrimental effect of deletion of parts, because the parts
are the only means available for recognizing the object.
|Biederman, Ju, and Clapper (1985) presented
objects lacking some of their components. Human participants correctly
identified objects when only 2 or 3 components were available, but not
when only 1 component was presented. It is easy to identify the sailboat
when only one of the sails is missing. One could also imagine that the
hull and mast alone might produce moderately accurate recognition, but
it is unlikely that the sailboat would be identified from the mast alone.
PFT would predict that the loss of any components leads to a detriment
in recognition accuracy. Kirkpatrick-Steger, Wasserman, and Biederman
(1998) discovered that pigeons could recognize objects at high levels of
accuracy when three of four components were available, but not when only
one component was present. They also discovered that some components were
recognized better than others, a result that is also consistent with research
using human participants.
3. Object recognition in humans is largely
invariant with regard to changes in the size, position, and viewpoint of
the object. The visual information falling on the retina when a
particular object is viewed varies drastically from occasion to occasion,
depending on the distance from the image (which affects the size of the
image on the retina), the vantage point from which the object is viewed,
and the location of the object relative to the viewer (which affects the
part of the retina that is stimulated). One of the most fundamental and
essential properties of the visual system is the ability to recognize a
particular object, despite great variations in the images that impose on
the retina. RBC accounts for all three types of invariances. Invariances
in viewpoint (rotational invariance) provide the greatest challenge to PFT.
People are capable of recognizing objects from many different vantage points,
even views that have never before been seen (Biederman & Gerhardstein,
1993). Notice that some views of the airplane involve the display of different
parts than others. If pigeons recognized simple features alone, then rotational
invariance would not occur. Therefore, a model of object recognition, such
as PFT, that relies on local features alone would predict that rotational
invariance would not be observed. A series of experiments by Wasserman
et al. (1996) demonstrated substantial, but not complete, rotational
invariance in pigeons.
Objects can be recognized despite variations in actual or apparent size.
Because the size of an object, such as the sailboat, does not change the
structural description of an object (the geons and their spatial organization),
RBC predicts that recognition should be size invariant. Kirkpatrick-Steger
and Wasserman (unpublished data) demonstrated generalization of responding
to sizes on either size (smaller or large) of a training size in pigeons,
but there was a generalization decrement at extreme sizes. Successful recognition
of objects by pigeons, despite changes in size, further suggests that the
mechanism of object recognition in the pigeon is similar to the mechanism
of object recognition in humans. However, the finding does not discriminate
between RBC and PFT, because PFT also predicts size invariance.
When an object is moved to a new position in the environment, a different
portion of the retina is stimulated. Nonetheless, modest changes in position
do not disrupt recognition accuracy in human subjects; that is, object
recognition is translationally invariant. Translational invariance
indicates that people do not learn to recognize an object on the basis
of the absolute position in the environment or its position relative to
other objects (e.g., the desk is right of the bookshelf). Kirkpatrick-Steger,
Wasserman, & Biederman (1998) discovered that pigeons performed at
high levels of accuracy when an object, such as the Watering Can, was displayed
in a new position on the viewing screen. Thus, object recognition in both
both pigeons and humans appears to be translationally invariant. However,
successful translational invariance does not discriminate between RBC and
PFT because the object features (and their organization) are unchanged.
There are many similarities in the properties of object recognition
in pigeons and humans, suggesting that similar mechanisms may be employed.
For example, both pigeons and humans are sensitive to object components
and their spatial organization. However, not all of the components
must be present in order for successful recognition to occur; only a subset
of two or three components are needed, provided that they appear in the
correct spatial organization. This explains why object recognition
can occur even when objects are partially occluded. Finally, there
is good evidence for rotational, size, and translational invariances in
both pigeons and people. These broad similarities suggest that a
common theory may be applied in explaining object recognition in both species.
PFT clearly cannot account for the pattern of results. RBC does correctly
predict all of the major findings, but other theories, such as the new
generation of template models offer similar predictions. Further
experiments will undoubtedly be needed in order to differentiate between
Next Section: Experiments