Distributional learning is error-driven: the role of surprise in the acquisition of phonetic categories

Paul Olejarczuk; Vsevolod Kapatsinski; R. Harald Baayen

doi:10.1515/lingvan-2017-0020

Article

Distributional learning is error-driven: the role of surprise in the acquisition of phonetic categories

Paul Olejarczuk , Vsevolod Kapatsinski and R. Harald Baayen

Published/Copyright: August 15, 2018

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Linguistics Vanguard Volume 4 Issue s2

Abstract

Much previous research on distributional learning and phonetic categorization assumes that categories are either faithful reproductions or parametric summaries of experienced frequency distributions, acquired through a Hebbian learning process in which every experience contributes equally to the category representation. We suggest that category representations may instead be formed via error-driven predictive learning. Rather than passively storing tagged category exemplars or updating parametric summaries of token counts, learners actively anticipate upcoming events and update their beliefs in proportion to how surprising/unexpected these events turn out to be. As a result, rare category members exert a disproportionate influence on the category representation. We present evidence for this hypothesis from a distributional learning experiment on acquiring a novel phonetic category, and show that the results are well described by a classic error-driven learning model (Rescorla, R. A. & A. R. Wagner. 1972. A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (eds.), Classical conditioning II: Current research and theory, 64–99. New York, NY: Appleton-Century-Crofts).

Keywords: predictability; categorization; phonetic categories; distributional learning; error-driven learning; Rescorla-Wagner model

Funding source: Alexander von Humboldt-Stiftung

Award Identifier / Grant number: 1141527

Funding statement: Alexander von Humboldt-Stiftung (Grant/Award Number: ‘1141527’).

References

Arnold, D., F. Tomaschek, K. Sering, F. Lopez, & R. H. Baayen. 2017. Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit. PLoS One 12(4). e0174623.10.1371/journal.pone.0174623Search in Google Scholar

Ashby, F. G. & E. M. Waldron. 1999. On the nature of implicit categorization. Psychonomic Bulletin & Review 6(3). 363–378.10.3758/BF03210826Search in Google Scholar

Aslin, R. N., J. R. Saffran & E. L. Newport. 1998. Computation of conditional probability statistics by 8-month-old infants. Psychological Science 9(4). 321–324.10.1111/1467-9280.00063Search in Google Scholar

Ayton, P. & I. Fischer. 2004. The hot hand fallacy and the gambler’s fallacy: Two faces of subjective randomness? Memory & Cognition 32(8). 1369–1378.10.3758/BF03206327Search in Google Scholar

Baayen, R. H., P. Milin, D. F. Đurđević, P. Hendrix & M. Marelli. 2011. An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. Psychological Review 118(3). 438–481.10.1037/a0023851Search in Google Scholar

Bates, D., M. Mächler, B. Bolker & S. Walker. 2015. Fitting Linear Mixed-Effects Models Usinglme4. Journal of Statistical Software 67(1). 1–48. 10.18637/jss.v067.i01Search in Google Scholar

Boersma, P. 2001. Praat, a system for doing phonetics by computer. Glot International 5. 341–345.Search in Google Scholar

Boersma, P. & B. Hayes. 2001. Empirical tests of the gradual learning algorithm. Linguistic Inquiry 32(1). 45–86.10.1162/002438901554586Search in Google Scholar

Bryan, W. L. & N. Harter. 1899. Studies on the telegraphic language: The acquisition of a hierarchy of habits. Psychological Review 6(4). 345–375.10.1037/h0073117Search in Google Scholar

Buz, E., M. K. Tanenhaus & T. F. Jaeger. 2016. Dynamically adapted context-specific hyper-articulation: Feedback from interlocutors affects speakers’ subsequent pronunciations. Journal of Memory and Language 89. 68–86.10.1016/j.jml.2015.12.009Search in Google Scholar

de Laplace, P. S. 1951. A philosophical essay on probabilities. New York, NY: Dover (Original work published 1796).Search in Google Scholar

Dutoit, T., V. Pagel, N. Pierret, F. Bataille & O. Van Der Vrecken. 1996. The MBROLA project: Towards a set of high quality speech synthesizers free of use for non commercial purposes. In H. T. Bunnell & W. Idsardi (eds.), Proceeding of fourth international conference on spoken language processing ICSLP 96, 1393–1396. New York: IEEE.10.21437/ICSLP.1996-356Search in Google Scholar

Dye, M., M. Jones, D. Yarlett & M. Ramscar. 2017. Refining the distributional hypothesis: A role for time and context in semantic representation. Proceedings of the Annual Meeting of the Cognitive Science Society 39. 313–318.Search in Google Scholar

Escudero, P. & P. Boersma. 2004. Bridging the gap between L2 speech perception research and phonological theory. Studies in Second Language Acquisition 26(4). 551–585.10.1017/S0272263104040021Search in Google Scholar

Feldman, N. H., T. L. Griffiths & J. L. Morgan. 2009. The influence of categories on perception: Explaining the perceptual magnet effect as optimal statistical inference. Psychological Review 116(4). 752–782.10.1037/a0017196Search in Google Scholar

Flannagan, M. J., L. S. Fried & K. J. Holyoak. 1986. Distributional expectations and the induction of category structure. Journal of Experimental Psychology: Learning, Memory, and Cognition 12(2). 241–256.10.1037/0278-7393.12.2.241Search in Google Scholar

Forster, K. I. & C. Davis. 1984. Repetition priming and frequency attenuation in lexical access. Journal of Experimental Psychology: Learning, Memory, and Cognition 10. 680–698.10.1037/0278-7393.10.4.680Search in Google Scholar

Gilovich, T., R. Vallone & A. Tversky. 1985. The hot hand in basketball: On the misperception of random sequences. Cognitive Psychology 17(3). 295–314.10.1016/0010-0285(85)90010-6Search in Google Scholar

Hebb, D. O. 1949. The organization of behavior: A neuropsychological approach. Hoboken, NJ: John Wiley & Sons.Search in Google Scholar

Howes, D. 1957. On the relation between the intelligibility and frequency of occurrence of English words. The Journal of the Acoustical Society of America 29(2). 296–305.10.1121/1.1908862Search in Google Scholar

Hume, E. & F. Mailhot. 2013. The role of entropy and surprisal in phonologization and language change. In A. C. L. Yu (ed.), Origins of sound patterns: Approaches to phonologization, 29–47. Oxford: Oxford University Press.10.1093/acprof:oso/9780199573745.003.0002Search in Google Scholar

Iverson, P. & P. K. Kuhl. 1996. Influences of phonetic identification and category goodness on American listeners’ perception of /r/ and /l/. Journal of the Acoustical Society of America 99(2). 1130–1140.10.1121/1.415234Search in Google Scholar

Johnson, K. 1997. Speech perception without speaker normalization: An exemplar model. In K. Johnson & J. W. Mullenix (eds.), Talker variability in speech processing, 145–165. Burlington, MA: Morgan Kauffman.Search in Google Scholar

Keuleers, E., M. Stevens, P. Mandera & M. Brysbaert. 2015. Word knowledge in the crowd: Measuring vocabulary size and word prevalence in a massive online experiment. The Quarterly Journal of Experimental Psychology 68(8). 1665–1692.10.1080/17470218.2015.1022560Search in Google Scholar

Kleinschmidt, D. F. 2016. Perception in a variable but structured world: The case of speech perception. Ph.D. Dissertation, University of Rochester.10.31237/osf.io/zwvesSearch in Google Scholar

Kleinschmidt, D. F. & T. F. Jaeger. 2015. Robust speech perception: Recognize the familiar, generalize to the similar, and adapt to the novel. Psychological Review 122(2). 148–203.10.1037/a0038695Search in Google Scholar

Kreuz, R. J. 1987. The subjective familiarity of English homophones. Memory & Cognition 15(2). 154–168.10.3758/BF03197027Search in Google Scholar

Kuhl, P. K. 1991. Human adults and human infants show a “perceptual magnet effect” for the prototypes of speech categories, monkeys do not. Perception and Psychophysics 50. 93–107.10.3758/BF03212211Search in Google Scholar

Mandera, P., E. Keuleers & M. Brysbaert. 2016. Changes in the word frequency effect as a function of language exposure. Paper presented at the 10th International Conference on the Mental Lexicon, Ottawa, Canada, October 19–21.Search in Google Scholar

Maye, J. & L. Gerken. 2000. Learning phonemes without minimal pairs. Proceedings of the Annual Boston University Conference on Language Development 24(2). 522–533.Search in Google Scholar

Maye, J., D. J. Weiss & R. N. Aslin. 2008. Statistical phonetic learning in infants: Facilitation and feature generalization. Developmental Science 11(1). 122–134.10.1111/j.1467-7687.2007.00653.xSearch in Google Scholar

Maye, J., J. F. Werker & L. Gerken. 2002. Infant sensitivity to distributional information can affect phonetic discrimination. Cognition 82(3). B101–B111.10.1016/S0010-0277(01)00157-3Search in Google Scholar

McClelland, J. L., B. L. McNaughton & R. C. O’Reilly. 1995. Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychological Review 102(3). 419–457.10.1037/0033-295X.102.3.419Search in Google Scholar

McMurray, B., M. K. Tanenhaus & R. N. Aslin. 2002. Gradient effects of within-category phonetic variation on lexical access. Cognition 86(2). B33–B42.10.1016/S0010-0277(02)00157-9Search in Google Scholar

Milin, P., L. B. Feldman, M. Ramscar, P. Hendrix & R. H. Baayen. 2017. Discrimination in lexical decision. PLoS One 12(2). e0171935.10.1371/journal.pone.0171935Search in Google Scholar

Miller, J. L. 1994. On the internal structure of phonetic categories: A progress report. Cognition 50(1–3). 271–285.10.1016/0010-0277(94)90031-0Search in Google Scholar

Nosofsky, R. M. 1986. Attention, similarity, and the identification–categorization relationship. Journal of Experimental Psychology: General 115(1). 39–57.10.1037/0096-3445.115.1.39Search in Google Scholar

Pierrehumbert, J. 2001. Word frequency, lenition and contrast. In J. L. Bybees & P. J. Hopper (eds.), Frequency and the emergence of linguistic structure, 137–158. Amsterdam: John Benjamins.10.1075/tsl.45.08pieSearch in Google Scholar

Pinker, S. 1997. How the mind works. New York, NY: Norton.Search in Google Scholar

Pisoni, D. B. & J. Tash. 1974. Reaction times to comparisons within and across phonetic categories. Perception & Psychophysics 15(2). 285–290.10.3758/BF03213946Search in Google Scholar

Ramscar, M., D. Yarlett, M. Dye, K. Denny & K. Thorpe. 2010. The effects of feature‐label‐order and their implications for symbolic learning. Cognitive Science 34(6). 909–957.10.1111/j.1551-6709.2009.01092.xSearch in Google Scholar

Rescorla, R. A. & A. R. Wagner. 1972. A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (eds.), Classical conditioning II: Current research and theory, 64–99. New York, NY: Appleton-Century-Crofts.Search in Google Scholar

White, H. 1989. Learning in artificial neural networks: A statistical perspective. Neural Computation 1(4). 425–464.10.1162/neco.1989.1.4.425Search in Google Scholar

Zhao, Y. 2010. Statistical inference in the learning of novel phonetic categories. PhD dissertation, Stanford.Search in Google Scholar

Received: 2017-05-11

Accepted: 2017-08-18

Published Online: 2018-08-15

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/lingvan-2017-0020

Keywords for this article

predictability; categorization; phonetic categories; distributional learning; error-driven learning; Rescorla-Wagner model