Differing strategies in English and Japanese word segmentation: A computational-psycholinguistic approach to bootstrapping the lexicon

Hagen Peukert

doi:10.1515/gcla-2013-0006

Article

Differing strategies in English and Japanese word segmentation: A computational-psycholinguistic approach to bootstrapping the lexicon

Hagen Peukert

Published/Copyright: December 18, 2013

Published by

Become an author with De Gruyter Brill

Author Information Explore this Subject

From the journal Yearbook of the German Cognitive Linguistics Association Volume 1 Issue 1

Abstract

How can six- to eight-month-olds find out where a word begins and where it ends in a continuous speech stream? A computer simulation reveals that the necessary information for segmenting word-like units is present though hidden in English Child-Directed-Speech. This holds even if the cognitive abilities of eight-month-olds constrain the range of possible segmentation algorithms. Applying transitional probability calculations to the incoming speech stream results in segmented chunks, most of which correspond to nonce formations. The key finding is, however, that the most frequent chunks from these formations are indeed words or phrases. Provided that infants prefer and recognize frequent items, it can be shown that a list of sound chains gradually augments a pseudo-lexicon of some eighty to one hundred entries. Now it can be further assumed that these entries are mapped to some new speech material. These mapping locate previously undetected word boundaries. In addition to that, from the pseudo-lexicon, other cues useful for segmentation - phonotactic constructions, prosody, or allophonic variants - could be unambiguously derived and used for complete segmentation before meanings are allocated to these chains. This segmentation mechanism does not seem to be universally true. A second line of computer simulations on Japanese reveals some indirect evidence against a universal segmentation mechanism based on transitional probabilities. To cope with the lack of representative samples of Japanese CDS, the segmentation algorithm was applied to Japanese adult speech (CSJ). For reasons of valid comparison, the simulation was also run on English adult speech (ICE-GB). While English adult speech still generates a viable lexicon, although at a significantly lower performance than English CDS, Japanese adult speech produces an error function that is, although above chance segmentation, insufficient for producing a lexicon

Keywords : usage-based word boundary detection; English/Japanese; infant word segmentation; statistical learning; computer simulation

Published Online: 2013-12-18

Published in Print: 2013-12-1

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/gcla-2013-0006

Keywords for this article

usage-based word boundary detection; English/Japanese; infant word segmentation; statistical learning; computer simulation