Book notes: How Children Learn the Meanings of Words

How Children Learn the Meanings of Words. Paul Bloom.

Read December 2017–January 2018, on recommendation from JBT.

My brief review, cross-posted from Goodreads:

A very useful review of our understanding of child language acquisition circa 2000, still relevant today. (That it’s still relevant today says as much about Bloom’s presentation and writing style as it does about how enormously difficult it is to make progress in this field.) The contents of the book are about what you’d expect from an undergraduate introduction to language acquisition.

For non-specialists, it’s worth knowing that Bloom stakes out an unconventional view — or at least a non-consensus view — in this book, attributing many different observed phenomena in language acquisition to a single underlying faculty of domain-general social reasoning. This view has gained strength in the 18 years since this book’s publication, but can not yet be called a consensus.

In some chapters, the book goes into a perfect amount of detail — presenting a coherent narrative punctuated by bits of interesting and convincing evidence. Other chapters are practically evidence-free, leaning more toward philosophy than science. We can’t exactly fault Bloom for this fact: we just don’t have a clear understanding as a research field, for example, on how language and conceptual knowledge relate to one another in the first years of a child’s life. I only wish Bloom would have done a bit more epistemic signaling here — explicitly marking those places where the text contains far more opinion than fact.

Table of contents

Fast mapping and the course of word learning

Main questions:

  1. What is the nature of the word learning process?
    1. How much input do children need to learn a new word?
    2. Do children learn words better than adults?
    3. To what extent does word learning differ from other types of learning?
  2. What is the time course of word learning?
  3. What individual differences exist in the word learning process? What causes them?

The nature of the word learning process: fast mapping1

Carey and Bartlett (1978) first introduced the phenomenon of fast-mapping with an experiment with 3– and 4–year-old children. See also Carey (1978), Heibeck and Markman (1987)2.

Markson & Bloom (1997) follow up to address the three sub-questions above, sorta:

  1. Fast mapping applies to object names (artifacts/instruments) as well as color names.
  2. Adults and children are equally able to fast-map new words and recall them a month later.
  3. Recall accuracy of fast-mapped word meanings and linguistically presented facts is roughly the same for each tested time interval.

Bloom believes there is no evidence which can separate the mechanism underlying word meaning fast-mapping from more general fact learning. (NB that this skill has two distinct elements: encoding/learning and storage/recall.)

But then he gets more precise: “fast mapping emerges from a general capacity to learn socially transmitted information–including, but not limited to, the meanings of words” (34–35). Results from an unpublished (?) study where subjects where told the external+internal colors of a solid box which they saw in person; later they better recalled the internal color than the external color. Fits hypothesis because the internal color mapping engaged the mechanism in question, whereas the visible external color did not.3

The course of word learning

Bloom’s claims:

  1. Children’s word meaning representations are just like those of adults’.
  2. There is no sudden acceleration in word learning. “The word spurt is a myth.”

Children start using words at about 10 to 14 months … they sound funny. They are seen as corresponding to words in the adult language largely because of the contexts in which they are used. (35–36)

On meaning representations

The conventional view: bizarre early word uses => bizarre early word meanings

There is some sort of discontinuity in word learning: children’s early use of words often betrays bizarre misunderstandings / generalizations. Examples from literature:

  • car refers only to cars moving down the street when watching from the living room window (L. Bloom, 1973)
  • moon may also refer to half grapefruit, the dial of a dishwasher, a hangnail (Bowerman, 1978)
  • apple may refer to a doorknob (Clark, 1973)

First words “blur the semantic distinctions between objects, properties, and actions.”

Bloom’s view: bizarre early word uses => rational use of minimal resources

Bloom believes that such behavior is not evidence of incomplete or incorrect meanings, but is simply rational behavior given a limited vocabulary and immature production apparatus. Explanations:

  1. Productive speech is difficult, and distinct from comprehension ability! The above observations may be the result of articulation issues, lexical retrieval issues.

  2. These examples might not be errors at all, given that the children don’t know many words and have no productive syntax.

    It has often been pointed out that when children call a doorknob “apple,” it could mean that they are observing that the doorknob is like an apple. (37)

[My synthesis]

What’s the difference here? The conventional view attributes such bizarre usage to competence issues, while PB believes it is induced by performance constraints and a lack of auxiliary competence-knowledge.

From a cosmic perspective, this all seems a bit silly, to be honest. We’re arguing about whether these examples support the claim that children have the right meanings. Where do we look to find the meanings again?4

In other words, overextensions do occur, but they are typically reasonable ones, honest mistakes. For instance, a child who calls a cat “dog” might genuinely think that the cat belongs to the category of dogs. This is wrong, but not perverse. It is the sort of mistak that an adult who had very limited experience with dogs and cats might make. (38)

The defenders of the “conventional view” would not disagree here! Early word meanings are incorrect as a matter of competence, but of course are on the way to being correct.

The substantive disagreement really seems to be competence/lexicon vs. performance/rational-inference-in-context. E Clark would assert that the meanings are incomplete but accurate generalizations stored in the lexicon; Bloom seems to believe that the children have perfect meanings in the lexicon, but that the weird generalizations, etc. come out as rational behavior in context given constraints of available words / syntax / etc.

The word spurt

Conventional view: word spurt exists

“After the appearance of the first few words used consistently with meaning in appropriate situations, there occurs a rapid increase in vocabulary” (McCarthy, 1954, p. 526).

Sometimes cast as a phenomenon with names or with vocabulary as a whole, at between 16–19 months. Hypothesized causes:

  • realization that language is symbolic (Dore, 1978; McShane, 1979)
  • maturation of categorization ability (Gopnik & Meltzoff, 1986)
  • onset of word-learning constraints (Behrend, 1990)

A “word spurt” is usually formally characterized as a sudden increase in rate of vocabulary growth.

Bloom’s view: word spurt does not exist

Consider that an adult has roughly 60,000 words in her vocabulary, and that a 16 month old has 50. By the definition given above, there will necessarily be a word spurt at some point between 16 months and adulthood.

We can’t conclude a “spurt” by measuring the first derivative – especially when we are averaging over time intervals of months.5

It would not be difficult to see if [a word spurt] did occur. One could graph the child’s vocabulary growth and look for a dramatic … change in the slope of the line denoting rate of [vocabulary] growth. (43)

Explaining the course of word learning

“Nobody knows why word learning starts at about 12 months and not at six months or three years.” But we can reject:

  • lack of syntactic knowledge earlier (first words acquired without help of syntax; children seem oblivious to syntax but still acquire meanings)
  • environmental factors, e.g. how parents speak to their children (what’s the special thing that regularly shows up in environment around 12 months?)
  • development of necessary motor control (word onset is the same across spoken and signed languages; only explains production anyway, not comprehension)

More plausible dependencies which might cause the 12-month requirement:

  • phonological knowledge
  • [long-term?] memory support
  • conceptual development (see e.g. Xu & Carey, 1996)
  • understanding of referential intent (Bloom!)

Bloom believes the last one about referential intent. Consequence is that there should be an association between ToM development and vocabulary growth. Suggestive evidence in Morales, Mundy, & Rojas (1998).

And why does word learning begin so slowly, and eventually accelerate? Candidate answer is that children spend the first months of word learning “getting better at figuring out the communicative intentions of other people.”6

Why does the rate of word learning slow down? PB says it’s because we have no need to learn any more words! (i.e., it’s not due to some internal mechanistic change):

The reason that the [vocabulary growth rate] slows down in adulthood is that adults have learned most of the words the immediate environment has to offer. (47)

Individual differences

Types of variation: rate of word learning, which words are learned, end state of vocabulary.

Rate

Easiest measure, and also highly variable across children of the same age! By 2 years some children know 668 of the MBCDI, and some know just 7. What produces this variation? Some suggestions:

  • parental vocabulary scores (best $r^2$ is 10–20%)
  • amount of interaction with parents (Smolak & Weinraub, 1983; Tomasello, Mannle & Kruger, 1986; Huttenlocher et al., 1991)

such correlations might be due to environmental influences, genetic factors, or both.

But the effect sizes discovered are really miniscule relative to the amount of observed variation. We simply don’t know what’s producing such correlations, though there are hints that both accounts are important. Ganger, Pinker & Wallis (1997) suggest genetic influences by examining dyzygotic twins, and Huttenlocher et al. (1991) show a concrete link between environmental use of a word and a child’s likelihood of knowing that word.

Which words

Variation exists! Bates et al. (1994). [Also recall that sampling paper from Eve’s class.]

PB alludes to different “styles” of early language use: a referential style (lots of concrete nouns, later combined into phrases and sentences), and an expressive style (lots of memorized strings of words, often for social or instrumental purposes). [This might be from Nelson (1973)?]

Outcome of word learning

Sternberg (1987): best predictor of IQ test results is a vocabulary test. The direction of causality here is not clear, of course!

Summary

Things that do seem remarkably consistent across children (p. 52):

  • the sorts of words that children learn first
  • developmental changes in the rate at which children learn nouns vs verbs vs determiners
  • correlations between vocabulary growth and syntactic development

In the end, the descriptive facts about variation are interesting enough, but they exist in a theoretical vacuum. We have little idea as to why young children differ in how they acquire language. It’s not just that we can’t explain this variation; we don’t even know what correlates with it. … my hunch is that more progress will be made from the other direction. Individual differences will be best understood, and appreciated, through a theory of how word learning works in general. (52–53)

Word learning and theory of mind

Claim: “theory of mind underlies how children learn the entities to which words refer, intuit how words relate to one another, and understand how words can serve as communicative signs” (55).

Issues with associationist view:

  1. “Any associationist procedure requires that the right correlations are present in the environment. In the case of word learning, this entails that the words are presented at the same time that children are attending to what the words refer to.” (58) “about 30 percent to 50 percent of the time that a word is used, young children are not attending to the object that the adult is talking about” (Collins, 1977; Harris, Jones & Grant, 1983).
  2. Ostensive naming is not a culturally universal practice.
  3. Doesn’t account for word learning when referents cannot be seen / touched.. (numbers, geometrical forms, ideas, mistakes). (Blind children learn words, often at the same rate as sighted children! See Landau & Gleitman, 1985.)
  4. Statistical covariation between word and percept is “neither necessary nor sufficient for word learning.”

General evidence for ToM word learning account

I won’t include Bloom’s detailed list of citations on ToM in young children. But I’ll repeat the language-relevant citations:

  • O’Neill (1996) – babies more likely to point and name object out of reach when parent was not present to see it placed in a particular location
  • Baldwin (1991, 1993) – word learning based on experimenter gaze at time of utterance, not child gaze
  • Tomasello & Barton (1994); Tomasello, Strosberg & Akhtar (1996) – 18- and 24-month-olds learn words taking into account the goal of their counterpart. “Let’s look for the X.” Present multiple objects and eventually find one, look excited, “Ah!”, give to child. The child learns the correct mapping without ostensive naming.

Baldwin (1993) ⇒ word–percept contingency is not necessary for word learning

Baldwin et al. (1996) — disembodied voice naming a novel object does not trigger word learning ⇒ word–percept contingency is not sufficient for word learning

Lexical contrast / the mutual exclusivity bias

Children are biased to think that novel words in environments with objects which have known words do not share referents with the known words (see e.g. Markman & Wachtel (1988)7).

This bias could be a specifically lexical phenomenon, a fact about how words work that is either innate or acquired in the course of language development. Or it could be a special case of a general principle of leraning, one guiding children to prefer one-to-one mappings as part of a general tendency to exaggerate regularities. A third possibility, which I explore here, is that mutual exclusivity is a product of children’s theory of mind.

[What follows, not recorded here, is a standard ToM account of mutual exclusivity, in Eve Clark’s “principle of contrast” language.]

We can therefore distinguish these theories in terms of their scope. A strictly lexical theory predicts that contrast should apply only to words, a simplicity-of-mapping theory predicts that it should apply to all domains, and the theory-of-mind proposal predicts that it should apply only to communicative situations.

Diesendruck and Markson went on to test a further prediction —that if children are using pragmatic reasoning about the adult’s intentions in using the new fact, they should be less inclined to produce such a response in a two-speaker scenario, where the second speaker lacks mutual knowledge with the child. That is, if one speaker tells the child ‘‘My sister gave this to me’’ about one object, and then a different speaker, new to the discourse context, enters the room and asks ‘‘Can you give me the one that dogs like to play with?,’’ the prediction is that children should now choose each of the objects with equal frequency. This is precisely what occurred. (69–70)

An important difference between words and facts, however, was discovered when Diesendruck and Markson did the two-speaker condition with novel words. In this condition, one speaker tells the child ‘‘This is a mep,’’ and a different speaker enters the room and asks ‘‘Can you show me the jop?’’ Here children chose the object that wasn’t originally labeled as the mep, the same as they did in the one-speaker condition with novel words—but different from their behavior in the two-speaker fact condition.

This suggests that children know something about words that isn’t true about facts. Words have public meanings.

So PB claims, with EC, that a principle of lexical contrast is key to the task of word learning. It is an empirically validated behavioral pattern — one that makes statistical sense given the task (see p. 73 for hand-wavy statistical argument).

The origin of words

So far we’ve assumed that children know what words are in the first place — that, upon observing an act of reference, they understand that the noises emerging from someone’s mouth constitute a Saussurean sign.

Words are generally “symmetric” signals within a speech community, but this is not necessarily the case:

Suppose a child observes her father react to a wasp by gasping. It would be mistaken for her to assume that if on another occasion she gasped, her father would think there was a wasp present. Some dogs come to their owner when they are called, but no dogs make the inference that if they were to produce the same sound, their owner will obediently run to them. … These are all non-Saussurian systems.

PB argues that children have some bias for Saussurian signals, derived from a more general bias of imitating the intents and actions of others.

The shared pattern between learning from instances of reference and goal-oriented behavior such as in Meltzoff (1988, 1995): observe someone else performing some behavior while inferring their intent and goal; induce a mapping from intent (not behavior) to goal. Meltzoff’s experiments show that this goes through even when the behavior fails, meaning it really is an intent–goal mapping. And we know the same is true of words from earlier mentioned studies (e.g. Tomasello, Strosberg & Akhtar, 1996).

This idea makes a prediction: that any communicative act should be treated by the child as a potential “word.” Namy & Waxman (1998); Woodward & Hoyne (1999) tested this with hand gestures and noisemaker squeaks as alternative signals. Both studies found that younger children learned the sign–referent associations while older children did not.

an understanding of words develops along two tracks. One is a notion of word that corresponds to a phonological unit … Another notion of word corresponds to a Saussurian sign. This emerges from theory of mind, and hence, initially, any intentional communicative act is treated as a Saussurian sign … not until some time after 18 months do these two notions come together [when] children realize that the only the phonological word is typically used as a Saussurian sign. (77–78)

Autism

Observed behavior: autistic children seem to often learn according to “associative learning mechanisms.”

Baron-Cohen, Baldwin, Crowson (1997) repeated Baldwin (1991) procedure but with autistic, mentally challenged, and normal children. (Experimenter looks at one object and utters its name while child looks elsewhere.) Normal and mentally challenged children learned mapping according to experimenter’s gaze; autistic children learned “associative” mapping according to their own gaze.

This supports the view that these autistic children’s difficulties in word learning are due to their deficit in theory of mind; they lack the inferential capacities that come naturally to normal children who are younger than two. (79)

PB suggests that the spectrum of language ability in autistic children results from the spectrum of theory-of-mind ability.

A severe theory-of-mind deficit might leave children without the ability to orient preferentially to speech, share attention, or follow eye gaze, and they might never be able to grasp the notion of an arbi- trary sign, leading to no word learning at all.

Limitations of the ToM account

ToM won’t help a child learn words if they don’t have / can’t construct the relevant concepts. Some learning patterns also seem to be unaffected by pragmatic factors – e.g. the whole-object bias (see next chapter).

One way to look at it is that children use inferences about the referential intentions of others to create arrows, or pointers, from words to the world. (87)

So ToM explains only how children can determine referential intent – not the underlying general meaning of the sign used.

Object names and other common nouns

Why are early vocabularies so similar across language and circumstance?

Explaining early vocabularies

Accessibility of the form

Children’s environments provide them with certain words – milk, spoon, etc. – and some words at much higher frequency than others. This is obviously a factor in learning!

But: “It is not how often the adult says the word that matters; it is how often the child processes it.”

Accessibility of the concept

Basic-level bias (Brown, 1958) likely partly a result of conceptual development factors.

Accessibility of the mapping

Some mappings are more salient than others. Recall Gilette et al. study and others, where adults watched videos of adult–child interactions with audio muted. Determining reference is often difficult, especially with non-concrete referents!

Object names

It has long been observed that names for objects have a special place in child language. This point is often overstated. Not all or even most of children’s words are object names. …

Nonetheless, object names really are special. They constitute a much larger proportion of children’s early vocabularies than they do of the vocabularies of older children and adults.

Korean may be an interesting exception here: it is verb-final and allows for noun ellipsis, so children experience a lot of emphasized verbs (Choi and Gopnik, 1995; but see Au, Dapretto & Song, 1994).


Whole-object bias exists not only in children but in adults, too. Potential causes: conceptual bias (learned or innate?), ToM bias with a cooperativity assumption (learned or innate?).

What is an object?

PB appeals to Spelke-objects (e.g. Spelke, 1994) which follow principles:

  • principle of cohesion: “connected and bounded region of matter that maintains its connectedness and boundaries when it is in motion”
  • principle of continuity: “expect objects to follow a continuous pathway through space; they do not disappear from one point and reappear at another”
  • principle of solidity: “objects do not pass through each other” (Baillargeon, Spelke & Wasserman, 1985)
  • principle of contact (inanimate objects): inanimate objects move iff they are touched

These are part of Spelke’s claimed innate “core knowledge.”

PB also claims that we can reliably infer the satisfaction of these principles without always manually testing everything for a new object – just from visual inspection, and perhaps a few other perceptual cues.

NB that cohesion seems to be an essential property of objects on introspection, whereas the rest are not. (If the first is violated, you revise your idea of the world – “I thought it was one object, but it’s actually two.”)

The object bias

So this evidence supports that there is some conceptual bias at work:

we are predisposed to think about the world as containing whole objects … In other words, babies see the world as adults do (98).

There are alternatives: syntactic biases, pragmatic inference, etc.

PB cites Shipley & Shepperson (1990) as support for the conceptual-bias account. This perception-of-the-world-as-sets-of-discrete-objects strategy seems to apply outside of word learning, e.g. in numerical cognition.

It is more economical to pose a single discrete-object bias rather than multiple applying in different domains!

Cross-linguistic complications

Imai & Gentner (1997): name a novel object, either simple (kidney-shaped piece of paraffin), complex (wood whisk), or substance (sand in an S-shape). “Look at the dax!” Then show two new entities sharing either substance or shape with the first and ask to “point at the dax.”

English-speaking children generalize for shape in both simple- and complex-case; Japanese-speaking children only in the complex case.

Imai & Gentner attribute this difference to the lack of a count–mass distinction in Japanese. They augment an earlier whole-object bias proposal from Gentner, saying that the “individuability” of an object will increase its likelihood of admitting the bias. [Presumably complex objects like whisks are more “individuable” to Japanese children because of the syntactic difference. Somehow.]

Overcoming the object bias

Children clearly learn words for referents that are not concrete objects. How do they learn water, tail, hopping, white?

Soja et al. [1991] found that children extended words referring to solid objects to objects of the same shape, ignoring substance, but extended words referring to nonsolid substances to portions of the same substance, ignoring shape. (105)

Claim: words for nonsolid referents are easily interpreted as substance terms; words for solid referents as object names. Why? PB candidate explanation:

children can learn water without the distraction of a salient object, but learning wood requires that children actively focus on a bounded object and think of it not just as an object but as a portion of solid stuff. (106)

Potential defeating factors:

  • Candidate referents (from above): we take the word to be a substance term because there isn’t a high-ranking candidate object to be mapped.8
  • Pragmatics: Also recall Markman & Wachtel (1988), who showed that pragmatics matters for learning non-basic-level object terms: novel words with known referents can be mapped to parts or object textures.9
  • Discourse context (Tomasello & Akhtar, 1995): new term is mapped to object or action if observed with several different actions on the same object vs. same action on several objects, respectively.

Individuals that are not objects

Children learn words for individuals (proper names, count nouns) which are not objects (as in Spelke-objects): finger, eye; sneeze, cough, laugh, kiss; minute, hour; sound, noise; etc. etc.

So while being a Spelke-object may be a sufficient condition for being a nameable individual, it is plainly not a necessary one.

How do these sorts of referents even become candidates in the child’s mind?

The generalization hypothesis: Spelke’s principle of cohesion generalizes across modalities, making parts, negative spaces, sounds, etc. plausible candidate referents due to partial/full satisfaction of cohesion, possibly in a different modality.

This doesn’t work for everything, of course: consider terms which refer to entities derived from knowledge about goals, intentions, desires of others.

PB has several studies on teaching the meanings of collective nouns, reviewed in the chapter (112–114).10

The considerations reviewed above are consistent with the view that many of the candidate referents for common nouns and proper names emerge through one of two distinct cognitive systems. The first is an object system with an eye toward portions of matter that satisfy the Spelke-principles, particularly cohesion. This gives us dogs and bricks, whisks and chunks of wood. It might also provide us with—as a result of the extension of these principles—individuals such as fingers, toes, holes, shadows, and jumps. The second system is theory of mind, which parses motion and matter through an understanding of goal, function, and intent, giving rise to individuals such as games, parties, chapters, families, and church.

The ToM claim at the end of this quote seems totally unsupported by the content of the section. Lots of phenomena are listed, but only brief nods to ToM / knowledge about others’ mental states are made.

Finding the right words

How do we solve the word–referent alignment problem?

Nouns don’t have reference – NPs do. Nouns contribute to the meaning of an NP, along with determiners and other modifiers. So why are nouns learned most quickly?

For nouns vs. determiners, PB explains that nouns are 1) more phonologically salient in adult speech, and have 2) semantically more salient “ground-truth” referents. (Children may also have some open-class bias – bootstrapping on syntax, they can impose a bias that open-class words are typically the ones which do the bulk of the work of reference.)

[A discussion of verbs is promised later – in the legendary chapter 8.]

We kind of end up with a cop-out para here:

In sum, it might be that all cultures will use some NPs that refer to individuals in isolation, allowing children to learn their first object names. If not, then children must be capable of somehow learning the meanings of words that are embedded in sentences, by extracting the referential NPs from such sentences through a constrained distribu- tional analysis. An adequate theory of word learning must assume ei- ther strong extrinsic constraints (all cultures use some nominals in isolation) or a powerful learning mechanism (one that can learn words not presented in isolation) (119).

Pronouns and proper names

How do children learn names for individuals—pronouns and proper names11— along with common nouns? More generally, how do children understand the relationship between individuals and kinds?

Pronouns

Pronouns are the first deictic expressions learned, and the demonstrative pronouns this and that are typically found among children’s first words.12

The personal pronouns I, me, and you are understood by children some time after they have learned the deictic pronouns, by about the age of 18 months.

Learning cues for pronouns

Syntax (in English) is a candidate:

  • pronouns do not admit quantifiers/adjectives/determiners
  • PB (1990): 1- and 2-yo productions follow this rule (they use quant/adj/det with common nouns, but not pronouns/proper names)
  • but this is not universal: e.g. Japanese, German admit articles before proper nouns

Another possible cue, but not sure how: The range of items to which pronouns are applied is much larger than the range of any common noun.

Oshima-Takane (1988, 1999) suggests that overhead speech is critical for learning the meanings of I/you – otherwise, as PB shows on p. 124, there are weird indeterminacies such that the correct meanings of these words cannot be derived from parent-child speech alone.

Proper names

Learning cues for proper names

Similar results on determiners with proper names, testing on 17-month-olds (Katz, Baker & Macnamara, 1974).

From Hall (1999):

  • Children seem to understand that objects only have one proper name (Hall & Graham, 1997; note the cue in this study is something like “This dog is named Zavy.”)
  • Lexical contrast is another candidate: Hall (1991) found words more often treated as proper names when they were used to refer to objects whose common terms the children already knew.13
  • Only some entities normally get proper names: people, dogs, etc.; not bricks. Some experimental evidence supports that children rely on such cues (Katz, Baker & Macnamara, 1974; Gelman & Taylor, 1984; Hall, 1994).
  • The range of a proper name is typically just one thing (compare with pronouns and common nouns). Some experimental evidence for this (Hall, 1996b).

Names for kinds and individuals

Anecdotal evidence suggests that the “default hypothesis” for the meaning of a determiner-free NP is a proper name, not a pronoun. Kids “start small.”

The anecdotal evidence here is that kids have uncanny ability to use proper names, but often fail to generalize pronouns successfully at the beginning.

Thinking about kinds and individuals

Experimental evidence (based on surprisal) shows that babies are able to individuate and count perceptually identical play objects (Wynn, 1992a).

[Several paragraphs on the evolutionary benefit of this individuation skill (which has been experimentally replicated with macaques).]

At some level, individuation (awareness of identity and abililty to count distinct individuals) must rely on kind-knowledge – one must know the sortal associated with the thing being enumerated!

But PB argues [and I agree] that this doesn’t mean children need kind terms in order to individuate over that kind. My take on his argument is that we instead exploit the same conceptual knowledge which allows us to distinguish kinds (recall Spelke-objects?) to pick out individuals. (Children individuate elements for which they don’t have terms/concepts; see Xu & Carey, 1996.)14

[Some more philosophical paragraphs on personhood and identity.]

Concepts and categories

What are concepts for? PB’s answer, through my lens: for more efficiently dealing with our environment. I drink orange juice, I like it. I drink oil, I don’t like it. Learning rules involving such concepts allow us to make inductive inferences that help us in the future.

But how are concepts acquired?

Perception, properties, and essences

At the start, we just have perceptual properties. Or do we?

Babies attend to different properties during categorization based on the type of the referent, as we’ve already seen. Categorizing objects vs. nonsolid substances requires attention to different properties, and young children follow our expectations (Soja, Carey & Spelke, 1991).

“Minimal” proposal: people are born with high-dimensional similarity spaces, and project from the space in order to make similiarity predictions / category membership predictions.

Two objections from PB: novel properties and intuitions about essences.

Properties

Where does non-perceptual knowledge fit into these similarity spaces? How is the difference between a stockbroker and a banker encoded? Where does our knowledge of them come from?

Traditional “empiricist” solution: abstract properties emerge through perceptual+linguistic experience, and are somehow merged into the built-in perceptual similarity space.

the extent of progress [in the empiricist account] has not been impressive. … There might be a principled reason for this failure. Perhaps concepts are not statistical abstractions from perceptual experience. Instead, they might be constituted, at least in part, in terms of their role in naive theories of the world (e.g., Carey, 1985, 1988; Gopnik & Meltzoff, 1997; Keil, 1989; Murphy & Medin, 1985). Our concept of stockbroker isn’t a vector in some multidimensional perceptual state-space, then; it is instead rooted in our implicit understanding of society, money, jobs, and so on.15

This is the theory theory. PB defines: “causal and explanatory considerations underlie categorization.” The relevant conceptual features of some object in the world are those determined by our theory of how that object behaves.16

Essences

Psychological essentialism: view of objects as having essential properties.

An essentialist should be able to entertain the possibility that something might resemble water but not actually be water (because it lacks the essence) or not resemble water but be water nonetheless (because it has the essence).

Not a metaphysical claim! Opponents say that the essentialist worldview may be a cultural construct. Fodor:

of course Homer had no notion that water has a hidden essence, or a characteristic microstructure … a fortiori, he had no notion that the hidden essence of water is causally responsible for its phenomenal properties. (1998, p. 55)

To be an essentialist in the Lockean sense, you must believe that these hidden properties are causally responsible for the superficial properties of an entity and determine the category that it belongs to.

Evidence:

  • children generalize based on category membership when provided with conflicting cues of category membership + perceptual resemblance
  • children are aware that removing essential properties of an object O make it become non-O

On the shape bias

brute-shape theory (Linda Smith et al.): children learn to generalize based on shape for names which are given via ostensive naming (“This is a “). For this reason, what seems like a weirdly consistent bias is just an adaptation to a particular linguistic frame.

shape-as-cue theory (Bloom, Gelman, et al.): “shape is important because it is seen as a cue to category membership.”

Different predictions: brute-shape theory has that shape determines naming, whereas shape-as-cue holds that shape is one of many cues to reference.

Brute-shape also only describes word learning, whereas shape-as-cue is a hypothesis about concepts.

[evidence that children do not extend object names given in the ostensive-naming frame solely based on shape]

[evidence that the shape-bias can be defeated, triggering children to generalize based on observed object function: e.g. Bloom, Markson & Diesendruck, 1998]

The structure of concepts

If an essentialist perspective on concepts is correct, it would entail that many words correspond to concepts that do not exhaustively decom- pose into simpler notions. Although concepts might be associated with prototypes or sets of exemplars, they do not reduce to them.

[I don’t follow this. The “essential” conceptual meanings are still defined in terms of other concepts.]

[what follows is a re-hashing of an earlier allusion to a Kripke-style argument: Shakespeare is still Shakespeare even if he didn’t write Hamlet.]

This argument (and much of this chapter) seems to confuse general debate about conceptual structure – which philosophers have engaged in for a long time – with debate about acquisition of that structure via language. Beliefs about one do not support beliefs about the other, I think.

[another post-hoc evolutionary argument for the ability to separate contingent from essential properties; not going to recount here]

Essentialism lite

“Essentialism does not entail the theory theory.”

i.e., essentialism is compatible with a relaxed similarity-space sort of conceptual structure. All we need is that such a space have some axes not generated by contingent perceptual features.

[I’m not sure I buy this. If the similarity space encodes essential features which are separable from perceptual features, then those extra axes constitute some implicit “theory theory.” I suppose this could be bought if we restrict “theory theory” to apply only to discrete, linguistically stated relations between objects.]

[The essentialist view] simply entails that children will have the implicit assumption that some deeper fact relates to the matter [of water being liquid, transparent, drinkable, etc.]—that these properties of water are the result of some deeper essence. (168)

Naming representations

In some instances, children’s appreciation of pictures is parasitic on an understanding of the external world (as when a child looks at a photograph of his mother and knows who it represents), but often it is the other way around: the understanding of the representation comes first. Most children will see a picture of a gorilla before seeing an actual gorilla, and much of the mature understanding of everything from planets to popes comes not from experience with the actual entities but through experience with visual representations. (171)

literature usually confuses referents which are representations of objects (e.g. pictures of objects) and the objects themselves:

A paper on word learning (e.g., Bloom & Kelemen, 1995) will typically begin with some claims of how children learn names for objects, the description of the methods will note that draw- ings of objects were used, and the paper will end with some conclu- sions about how children learn names for objects, without any mention that these weren’t what they were actually tested on.

Merge this in with the general theory, of course! We can explain the more complex cases in terms of intuitive theories of intents and goals:

What underlies the naming in all these examples is the understood intent of the creator. What makes us call an oval ‘‘an egg’’ in one con- text and ‘‘a football’’ in another is our assumption about what the oval was intended to represent, and the same for whether a picture is de- scribed as a generic dog, Fido, Bingo, or a robot, or as a murderer, or as me. The naming of the representation draws on our understanding of the context in which it was created—what it was intended to be.

And to cover the general case:

It is a reasonable inference that something is a picture of a dog if it looks like a dog because we believe that someone who intends to represent a dog will try to create something that will be recognized as one. (176)

Presumably such an inference is not performed online all the time, but is some cached learned behavior.

PB claims that this “intentionalist” view is what supports picture naming ability in adults. [Young children / autistic people?]

What about the children?

Anecdotal [?] evidence supports that babies don’t treat pictures as representations. (iconic realism / childhood realism)

[Babies] show certain signs of being confused about their status. One-year-olds will often grab at pictures, trying to pick up the depicted object.

PB: This is a fair description of child behavior before their third birthday. After that, though, they have an intentionalist view of pictures.1718

Evidence for two- and three-year-old intentionalism w.r.t. picture naming: Bloom & Markson, 1998; Gelman & Ebeling, 1998.

PB argues this is another nail in the coffin of the “brute-shape” proposal from earlier: children clearly name things according to cues other than those relating to shape. (Shape is useful only insofar as it helps determine intent!)

Learning words through linguistic context19

PB general claim: function words are learned from distributional evidence, exploiting both syntactic and non-syntactic cues. Simple enough.

Without syntactic cues

Not much to say here.

Sternberg (1987) theory of word learning from context has three processes:

  1. selective encoding: distinguish between information relevant/irrelevant to meaning
  2. selective combination: combine cues into workable meaning of a word
  3. selective comparison: relate new information to background knowledge

With syntactic cues

Brown (1957): morphosyntactic cues bias whether children interpret as being an action verb / count noun / mass noun. Claim: morphosyntax can act as a cue to meaning.

Nominals

Recall noun–NP contrast in learning pronouns and proper names from chapter 5. That won’t be repeated here.

Soja (1992): count/mass noun syntax can cause children to construe a novel word for a heterogenous collection as denoting a collection (pile, puddle) vs. the substance itself.20

Prasada (1993): two- and three-year-olds can learn solid-substance names when presented with mass syntax (“X is made of Y”), but only when the objects were familiar / already had names.21

Why can’t they learn solid-substance names for unfamiliar objects? Recall the earlier proposed bias to prefer individuals as referents.

Verbs

Naigles (1990): children attend to transitivity of verb to match observed actions with a heard novel word. Extension in Naigles & Kako (1993) suggests that transitivity cues bias children to pick actions which involve physical contact.

[laundry list of different syntactic-cue work for verb learning which I won’t copy over here]

Adjectives and prepositions

[why are these two things presented together?]

“Adjectives draw children’s attention toward properties or subkinds.”

Taylor & Gelman (1988): adjective syntactic context bias children to interpret word as referring to color, pattern, texture, etc.22 Smith (1992): shape bias which holds for count nouns does not apply to adjectives.

Waxman (1990): novel adjectives “facilitate categorization” (?) at the level of subordinate kinds.

PB: These two different claims rest on different functions of the adjective. In some contexts, adjectives act as restrictive modifiers; in other, they are used to predicate properties of the subject. These two functions have different syntactic patterns in English. PB suggests that the varying results are a function of which syntactic structure was used in the experiment.23

Prasada (1997): children more likely to give an adjective a restrictive-modifier interpretation after observing it prenominally vs. as a predicate.


Landau & Stecker (1990): can (morpho)syntactic contrast between nouns and prepositions cue children to object referents vs. spatial relation referents? “This is a corp” vs. “This is acorp the box.”24

In the count noun condition, both three-year-olds and adults generalized the application of the word to objects of the same shape regardless of location, while in the preposition condition, they generalized the word to objects in the same location (or class of locations), regardless of object shape.

Landau (1996): prepositional syntax can be used to teach English-speaking children spatial relations which don’t exist in English.

How do children learn about syntactic cues in the first place?

“The relationship between syntax and semantics is not an arbitrary one.” Arguments:

  1. “The number of NP arguments that a verb takes is related to the number of entities involved in the action that it refers to.” This is no accident; it reflects “an isomorphism between the conceptual structure of a predicate and its syntactic structure.”

  2. Count/mass distinction:

    But it is not that chil- dren note, over the fullness of time, that some kinds of words go with a and another and other kinds of words go with much. Such learning would be superfluous because the knowledge follows from what the determiners mean. Part of knowing a and another is knowing that they interact with nouns that refer to kinds of individuals to form NPs that refer to specific individuals. Part of knowing what much means is knowing that it interacts with nouns that refer to kinds of stuff to form NPs that refer to portions of that stuff.

    [This argument seems a bit confused. PB claims that part of the meaning of words like much is to know that it determines a property of its complement (its mass-ness), and so no co-occurrences need to be tracked. But where does the determiner meaning originally come from? PB refers to chapter 4, but the content there wouldn’t arbitrate this issue nicely. The content there suggests the child needs to note, for example, that the range of much is large in order to see that it is not have e.g. a basic-level object kind as referent. Such inference would require observing co-occurrences, naturally.]

Conclusion [coming far too soon, methinks; not well supported]:

In general, then, the knowledge necessary to use syntactic cues to word meaning can be explained either in terms of other properties of language, such as universal relationships between meaning and form (as in the verb example) or the meanings of specific closed-class items (as in the count-mass example).

This seems far too shallow. What about, say, the bias to interpret prenominal adjectives as restrictive modifiers? Adjectives are an open-class category, and prenominal modification is in no way lingustically universal.25

The importance of syntax

Earlier parts of this book showed how different sorts of word meanings can be acquired without alluding to syntactic cues. So how much does syntax actually matter for acquiring words?

Gleitman “hard words” argument for verbs. Why would verbs be good candidates?

  1. Nouns can be taught through ostensive naming, and/or have referents possibly immediately present to the observer. Verbs do not satisfy either condition.
  2. Object nouns correspond to entitieis that humans universally see as distinct individuals; the same is not true for verbs. [Satellite-framing vs. verb-framing is used as an example of how verb meanings are partitioned differently across languages.]
  3. More generally, “events are cognitively ambiguous in a way that objects are not.”

So the arguments here come from the nature of the referents of verbs, not verbs as syntactic objects, etc.

If you believe that syntax is a reliable cue to conceptual structure, syntax can certainly help. PB suggests that “Fred is __ the thing to Mary” indicates a giving relation and “Mary is __ the thing from Fred” indicates a receiving relation. (The property PB imagines children exploiting, I assume, is the surface order of Fred and Mary, related to some observed scene.)

But abstract nouns are also “hard” (???), e.g. nightmare, and PB asserts these are learned without similar syntactic support. So maybe syntax isn’t necessary after all.

I agree, then, with the thrust of Gleitman’s argument: simply ‘‘observing the events’’ does not suffice for the learning of most verbs. What is less clear is whether it is syntax that fills the gap, as opposed to the other information that sentences convey.

I would add (and I think PB would agree?) the opposing contender is also the information that the situation conveys, not just linguistic cues.

The role of syntax in a theory of word learning

PB ends at a middle ground, suggesting that syntax is an “important informational source as to the meanings of words.” It neither does very little or does very much.

Number words

Numbers are curious objects: their referents have no material extension, and do not even refer to psychological states. They refer to an abstract property of a collection.

This abstractness ought to make them special cases of interest: how are they learned?

Early number sense

Nonhuman animals and babies both have some ability to discriminate between sets of different number, and to count sequences of actions.

Long before language learning, then, they have the main prerequisite for learning the smaller number words: they have the concepts of oneness, twoness, and threeness. Their problem is simply figuring out the anmes that go with these concepts.

[missing here: anthro data on number in different cultures?]

Number word learning

How do children learn that number words refer to abstract set properties rather than some property of the quantified object?

Proposal: explicit counting

May observe the explicit physical counting of entities (“one apple,” “two apples,” …). If children have some prior knowledge of the counting task, then it should be possible for them to exploit these situations. BUT:

  1. Without prior knowledge of the task, no reason to take one and two to denote number vs. properties of the original objects.
  2. Equally reasonable to take one and two as demonstratives.

Why don’t children get tripped up by number?

Wynn (1992)

Longitudinal study of chilidren’s knowledge of number words. Two conditions:

  1. “Can you show me the four fish?” (potential referents: image of one fish, image of four fish)
  2. “Can you show me the four fish?” (potential referents: image of four fish, image of five fish)

#1 is potentially soluble by principle of contrast, if children know that “four” is a number word (even if they don’t know exactly what it means). #2 requires an accurate understanding of “four” vs. “five.”

2.5-year-olds understand number in the sense of #1; most children didn’t learn number in the sense of #2 until a year later.

Children go through a lengthy developmental stage in which they know that words like “two” and “three” refer to numbers but do not know which numbers. (218)

TODO Ankify above

Linguistic cues

PB suggests that there are several linguistic cues demonstrating that numbers are a distinct closed class:

  1. appear prenominally like adjectives, but only modify count nouns
  2. cannot be attenuated with very, too, etc.
  3. precede all adjectives in NPs
  4. occur in partitive constructions (“[two/three/…] of the X”)

NB, learning problem: acquiring negative rules like #1 and #2 is difficult!

Above claims mostly corroborated by a corpus study (Bloom & Wynn).

PB claims that these cues are what enable the 2.5-year-olds from Wynn (1992) to know that number words form a distinct class, before they understand the precise meanings of each word.

Learning the whole system

PB proposes: numerical cognition bootstraps off of language. Understanding of counting, of discrete infinity, etc. come from exposure to a numerically enabled language/culture.

When children are exposed to the language of number, this causes a dramatic restructuring of their numerical knowledge. … [Nonhuman primates] lack a generative numerical system just because they lack the capacity to develop a generative communication system.

–> discrete infinity understanding comes from discrete infinity of number words.

Words and concepts

Previous chapter: first proposal in this book in which linguistic knowledge influences conceptual understanding (as opposed to reverse direction).

  1. Note that this section aims to address the “nature of word learning,” but actually only discusses the fast mapping phenomenon. 

  2. Heibeck and Markman (1987) demonstrate that two-year-olds also are capable of fast mapping color terms, shape terms, and texture terms (at differing levels of accuracy). 

  3. Not sure what to take from this, though, since the study presented (table 2.4, p. 34) uses known color terms with three-year-olds and adults. There’s no necessary link between the results and fast mapping, methinks. 

  4. Perhaps I have just not matured enough as a cognitive scientist to admit this meaning-talk. But wow, is it tiring. :) 

  5. See sympathetic views in Elman et al. (1996), Bates & Carnevale (1993). 

  6. Plus the collection of usual suspects for accelerating vocabulary growth, e.g. the ability to bootstrap from linguistic context. 

  7. I think I remember the results in this study being rather meh. I should link back to my own notes from Eve’s class. (Also, M&W used the results to argue for a mutual exclusivity constraint.) 

  8. NB, this is something that could/should be modeled and tested! 

  9. But also remember that the texture results were pretty fuzzy. I think there were some experimental design issues as well – IIRC, the experimenter explicitly pointed at the intended object part in the trials aimed to teach part terms. 

  10. Something here could be modeled and tested. 

  11. NB that the definition of “individual” here is different than in the previous chapter. 

  12. … contradicting the predictions of the reasoning from the previous chapter? 

  13. Fits in with the results from Markman & Wachtel (1988). 

  14. I think this chapter is pretty weak. There isn’t a whole lot of experimentally sound argumentation going on. That’s not PB’s fault – we just don’t have enough prior knowledge about the relevant concepts. But what is dangerous is that this chapter seems to rock back and forth between two concepts of “individual” – something which has personhood (as introduced in this chapter), or more general entities which can be tracked and counted, and have Spelke-objecthood (as introduced in the previous chapter). FWIW, some of the more gooey philosophical content which I skimmed may have addressed this distinction, and I may have missed it. 

  15. This seems perfectly compatible with an empiricist view, broadly construed. This was my understanding of the view briefly presented here as “empiricist.” 

  16. I don’t think this is correct. I just made it up, in fact. It sounds circular. 

  17. Isn’t this later than the claimed ToM emergence? 

  18. NB, this is in conflict with mainstream developmental literature, which holds that the bias persists until the school years. 

  19. The fabled chapter 8.. I’m so excited! 

  20. NB, could be modeled. 

  21. Recall Markman & Wachtel (1988). 

  22. NB, could be modeled. Critical Q: were the names of the object kinds themselves already known? 

  23. Hard to follow without looking at the original papers. 

  24. Relevant – TODO read this. 

  25. This is a big gaping hole that needs to be addressed.