Infants' word segmentation (idea) by Chris Hook

By the end of their first year, infants are precocious in their speech processing. It has been extensively shown that infants can categorise phonemes from a very early age (e.g. Eimas et al., 1971), a skill that is crucial in distinguishing between words in their native language. However, in order to discriminate words, an infant must first be able to segment words from fluent speech input. This is not a trivial ability, as speech is very different from the written word in that the written word has distinct gaps between words, but in speech there is not necessarily a gap in acoustic energy between words. Often there will be such a gap within a word. Infants first display some ability to segment words at around 7.5 months of age, but these initial attempts only approximate those of fluent speakers of the language (Jusczyk, 1999). Infants appear to first use the predominant stress pattern of their native language, before using other cues such as statistical regularities, allophonic cues, and phonotactic patterns as further cues to word boundaries. During their second year, infants develop segmentation skill to rival the adult, and have begun to link sound patterns with meanings.

Jusczyk and Aslin (1995) demonstrated that infants' abilities to segment words emerge between the ages of 6 and 7 months using a preferential listening paradigm. They familiarised infants for 30 seconds to a list of two words (e.g. "dog" and "cat"), then presented the infants with test passages of fluent speech that either did or did not contain the familiarised words. They found that at 6 months, infants did not show a preference for passages of speech with or without the familiarised targets, however by 9 months, infants did show a preference for listening to passages of speech containing targets. There were also effects found if infants were familiarised on fluent passages, then tested on lists of words containing targets or no targets. Furthermore, infants did not show a preference if they were familiarised on similar nonwords like "gup" and "tog", showing that at this young age infants are quite precise in what they are willing to accept as a target.

However, this demonstration only tells us that infants can segment by 9 months, not what cues they may use to segment words from speech. Cutler and colleagues (Cutler and Carter, 1987; Cutler and Norris, 1988) have demonstrated that infants use a "Metrical Segmentation Strategy", infants use the predominant stress pattern of their native language as a cue to the location of a word boundary. In English, the major prosodic feature at the word level is the stress pattern placed on syllables. About 80-90% of the words used in English show the predominant strong/weak stress pattern and this may be highly salient to infants, since prosodic structure of language has been shown to be something infants are sensitive to. For instance, Mehler et al. (1988) demonstrated that infants show a preference for low-pass filtered native speech as opposed to low-pass filtered foreign speech (low-pass filtering removes all cues apart from underlying prosody). Jusczyk, Cutler, and Redanz (1993) showed that 9 month old English infants, but not 6 month olds, familiarised on strong/weak words (e.g., "kingdom" and "hamlet") showed a listening preference to passages of fluent speech that contained these words. However, if familiarised on weak/strong words (e.g. "device" and "guitar") neither 6 nor 9 month old could segment these from fluent speech. But, using the same test stimuli, but familiarised on "vice" and "tar", infants did show a listening preference (Jusczyk et al., 1999). This demonstrates that by 9 months English infants are using strongly stressed syllables as a marker of the start of a new word.

However, reliance on using stress patterns to segment words is not adequate - after all, 10-20% of words used in English do not conform to the predominant pattern. The infant can get quite far using the Metrical Segmentation Strategy, but in order to attain full segmentational competence an infant will need to develop other strategies to work in combination. By 10.5 months, infants can segment weak/strong patterns, so by this age they must be relying on other cues besides prosody (Jusczyk et al., 1999). One supplementary strategy is to use phototactic cues. The phonotactics of a language can be thought of as the combinations of phonemes that are "legal" within the words of the language. So, an English infant will need to learn that [θ] may be followed by [r] (as in "thrifty"), but not by [l], [n], or [m]. However, these sequences can occur in speech - for instance, [θ] is followed by [m] in "fifth member". By extracting the regularities in the sequences of phonemes within syllables, an infant could use an illegal combination to infer a word boundary. Jusczyk et al. (1993) used the fact that English and Dutch have almost identical prosodic characteristics, but some differences in phonotactics, to observe the use of phonotactic cues in word segmentation in 9 months olds, but not in 6 month olds. For instance, Dutch allows /kn/, /zw/, and /vl/ within syllables, whereas English does not. Jusczyk et al. found that American infants listened significantly longer to lists of words with English sound patterns than Dutch sound patterns at 9 months, whereas Dutch 9 month olds showed the opposite preference. No preference was found for 6 month olds. Due to the fact that 9 month olds showed no preference if the speech stimuli they heard were low-pass filtered indicates that phonotactic cues were being attended to. This is further supported by a preference for English versus Norwegian after low-pass filtering amongst 6 month old American infants - two languages with different prosodic features.

Jusczyk, Hohne, and Bauman have also found that infants can use allophonic cues, auditory variants of the same phoneme, to distinguish between speech stimuli such as "nitrates" and "night rates". There is a subtle difference in the way these two phrases are pronounced. Although Hohne and Jusczyk (1994) found 2 month olds could detect such things, they now seem to have usurped their original study with their newer one with Bauman, finding that this form of segmentation develops later than the others mentioned so far - 9 month olds could not use allophonic cues, whereas 10.5 month olds could.

Saffran et al. (1996) found that 8 month olds could use statistical regularities in speech input as word segmentation cues. They showed this by exposing infants to a two minute string of continuous artificial speech which consisted of four different three syllable sequences produced with flat stress (so as to remove the effects of prosody). The order of the syllables was fixed so that transitional probabilities (i.e., the probability that one phoneme is followed by another) was 1.0 between some, and 0.33 between others. This was done by presenting the syllables as nonwords, such as "tibudo", "pigola", or "pabiku". Thus, "ti" would predict "bu", and "bu" would predict "do". However, "do" would not predict "ti". Infants were then tested on isolated versions of the nonwords, plus "part-words" consisting of the last syllable of one nonword with the first two syllables of another nonword - e.g. "tibudo" or "dopigo". Saffran et al.'s data indicated that their 8 month old infants distinguished the nonwords from part-words, suggesting they used statistical regularities to segment. So, given real speech input, an infant would seem to be able to, by 8 months of age, use the statistical regularities in the sound patterns of the language to segment words. So, given "pretty baby", in English "pre" is more likely to be followed by "ty" than "ty" being followed by "ba", therefore, the transitional probabilities can be used to infer the boundary between words.

However, Saffran et al.'s data did not show a listening preference in the direction that would have been expected from the body of work Jusczyk has been a part of. Juszcyk's work has shown a familiarity preference, whereas Saffran et al. found a novelty preference. This may be caused by Saffran et al.'s repetitive monotone familiarisation stimuli having a habituation effect rather than a familiarisation one. However, if we are to understand the underlying structures of the use of the segmentation strategies, we would need to know when to expect familiarity preferences and when to expect novelty preferences.

The important thing in an infant developing full competence in segmentation is to be able to use all the cues in combination. One is not sufficient alone. For instance, relying on prosodic cues would cause an English listener to miss all weak/strong words. Reliance on statistical regularities alone would cause a listener who knows "candle" to missegment "can deliver". Relying solely on phonotactic cues would be problematic given words such as "business", which contains the illegal combination /zn/. Norris et al. (1997) have proposed that multiple cues are used to rule out possible alternative parses of the speech signal.

The studies by Jusczyk and others suffer from having rather small effect sizes, even though they are significant. The listening preferences are only small differences compared to overall listening times. They also do not really tell us how infants segment, only what cues they seem to latch onto. A connectionist model by Elman (1990) was able to discover word boundaries in speech, although it has been criticised. The connectionist enterprise may provide answers as to how infants segment now that we appear to know what cues are useful to do this.

References:

Jusczyk, P. W., Friederici, A. D., Wessels, J., Svenkerud, V. Y., & Jusczyk, A. M. (1993). Infant's sensitivity to the sound patterns of native language words. Journal of Memory and Language, 32, 402-420.
Jusczyk, P. W., Cutler, A., & Redanz, N. (1993). Preference for the predominant stress patterns of English words. Child Development, 64, 675-687.
Jusczyk, P. (1999). How infants begin to extract words from speech. Trends in Cognitive Sciences, 3(9), 323--328.
Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science. 274, 1926-1928.
Marcus, G. F., Vijayan, S., Bandi Rao, S., & Vishton, P. M. (1999). Rule learning by sevenmonth- old infants. Science, 283, 77-80.
Correspondence in Trends in Cognitive Sciences, 3, 288-291

Language and Gender	A Swarm of Syllables, and Paddington Bear	From conception to birth: stages of fetal growth for the human baby	continuous speech recognition
Significance	Stress	Language	Nitrate
Habituation	larynx	prosody	Speech Processing
Speech	probability