Home Differing strategies in English and Japanese word segmentation: A computational-psycholinguistic approach to bootstrapping the lexicon
Article
Licensed
Unlicensed Requires Authentication

Differing strategies in English and Japanese word segmentation: A computational-psycholinguistic approach to bootstrapping the lexicon

  • Hagen Peukert EMAIL logo
Published/Copyright: December 18, 2013

Abstract

How can six- to eight-month-olds find out where a word begins and where it ends in a continuous speech stream? A computer simulation reveals that the necessary information for segmenting word-like units is present though hidden in English Child-Directed-Speech. This holds even if the cognitive abilities of eight-month-olds constrain the range of possible segmentation algorithms. Applying transitional probability calculations to the incoming speech stream results in segmented chunks, most of which correspond to nonce formations. The key finding is, however, that the most frequent chunks from these formations are indeed words or phrases. Provided that infants prefer and recognize frequent items, it can be shown that a list of sound chains gradually augments a pseudo-lexicon of some eighty to one hundred entries. Now it can be further assumed that these entries are mapped to some new speech material. These mapping locate previously undetected word boundaries. In addition to that, from the pseudo-lexicon, other cues useful for segmentation - phonotactic constructions, prosody, or allophonic variants - could be unambiguously derived and used for complete segmentation before meanings are allocated to these chains. This segmentation mechanism does not seem to be universally true. A second line of computer simulations on Japanese reveals some indirect evidence against a universal segmentation mechanism based on transitional probabilities. To cope with the lack of representative samples of Japanese CDS, the segmentation algorithm was applied to Japanese adult speech (CSJ). For reasons of valid comparison, the simulation was also run on English adult speech (ICE-GB). While English adult speech still generates a viable lexicon, although at a significantly lower performance than English CDS, Japanese adult speech produces an error function that is, although above chance segmentation, insufficient for producing a lexicon

Published Online: 2013-12-18
Published in Print: 2013-12-1

© 2013 by Walter de Gruyter GmbH & Co.

Downloaded on 29.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/gcla-2013-0006/html
Scroll to top button