Learning and generalizing stress patterns with a sequence-to-sequence neural network

Brandon Prickett; Joe Pater

doi:10.1515/lingvan-2024-0252

Article

Learning and generalizing stress patterns with a sequence-to-sequence neural network

Brandon Prickett and Joe Pater

Published/Copyright: September 26, 2025

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Linguistics Vanguard

Abstract

We present the first application of modern neural networks to the well-studied task of learning word stress systems. We tested our adaptation of a sequence-to-sequence network on the Tesar and Smolensky (2000. Learnability in optimality theory. Cambridge, MA: MIT Press) test set of 124 “languages”, showing that it acquires generalizable representations of stress patterns in a very high proportion of runs. We also show that the neural network can learn lexically specified patterns of stress, something constraint-based approaches to stress acquisition require extra mechanisms to accomplish. And finally we demonstrate that the model, in an agent-based simulation, is biased toward systematic patterns of stress, despite having the expressive power to memorize its training data.

Keywords: phonology; stress; learning; neural networks; agent-based

Corresponding author: Joe Pater, University of Massachusetts, Amherst, USA, E-mail: pater@umass.edu

Funding source: Directorate for Social, Behavioral and Economic Sciences

Award Identifier / Grant number: BCS-2140826

Acknowledgments

We would like to thank the UMass Sound Workshop, as well as the audiences of the 2019 Manchester Phonology Meeting, 2022 Society for Computation in Linguistics, UNC’s Computational Linguistics Brown Bag Seminar, and the 2024 Annual Meeting on Phonology for helpful discussion of topics related to this paper.

Research funding: This research was supported by the National Science Foundation grant BCS-2140826 to the University of Massachusetts Amherst.

References

Beguš, Gašper. 2020. Modeling unsupervised phonetic and phonological learning in generative adversarial phonology. Proceedings of the Society for Computation in Linguistics 3(1). 138–148.10.3389/frai.2020.00044Search in Google Scholar

Berent, Iris. 2013. The phonological mind. Trends in Cognitive Sciences 17(7). 319–327. https://doi.org/10.1016/j.tics.2013.05.004.Search in Google Scholar

Berko, Jean. 1958. The child’s learning of English morphology. Word 14(2–3). 150–177. https://doi.org/10.1080/00437956.1958.11659661.Search in Google Scholar

Boersma, Paul & Joe Pater. 2016. Convergence properties of a gradual learner in Harmonic Grammar. In John J. McCarthy & Joe Pater (eds.), Harmonic Grammar and Harmonic Serialism, 389–434. Bristol, CT: Equinox.Search in Google Scholar

Carpenter, Angela C. 2016. The role of a domain-specific language mechanism in learning natural and unnatural stress. Open Linguistics 2(1). 105–131. https://doi.org/10.1515/opli-2016-0006.Search in Google Scholar

Chollet, François. 2015. Keras [API]. Available at: https://keras.io/getting_started/faq/#how-should-i-cite-keras.Search in Google Scholar

Corkery, Maria, Matusevych Yevgen & Sharon Goldwater. 2019. Are we there yet? Encoder-decoder neural networks as cognitive models of English past tense inflection. In Anna Korhonen, David Traum & Lluís Màrquez (eds.), Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3868–3877. Florence: Association for Computational Linguistics.10.18653/v1/P19-1376Search in Google Scholar

Dempster, Arthur P., Nan M. Laird & Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 39(1). 1–22. https://doi.org/10.1111/j.2517-6161.1977.tb01600.x.Search in Google Scholar

Dresher, B. Elan & Jonathan D. Kaye. 1990. A computational learning model for metrical phonology. Cognition 34(2). 137–195. https://doi.org/10.1016/0010-0277(90)90042-i.Search in Google Scholar

Ernestus, Mirjam & R. Harald Baayen. 2003. Predicting the unpredictable: Interpreting neutralized segments in Dutch. Language 79(1). 5–38. https://doi.org/10.1353/lan.2003.0076.Search in Google Scholar

Ferdinand, Vanessa, Simon Kirby & Kenny Smith. 2019. The cognitive roots of regularization in language. Cognition 184. 53–68. https://doi.org/10.1016/j.cognition.2018.12.002.Search in Google Scholar

Gupta, Prahlad & David S. Touretzky. 1994. Connectionist models and linguistic theory: Investigations of stress systems in language. Cognitive Science 18(1). 1–50. https://doi.org/10.1207/s15516709cog1801_1.Search in Google Scholar

Hare, Mary. 1990. The role of trigger-target similarity in the vowel harmony process. Annual Meeting of the Berkeley Linguistics Society 16. 140–152. https://doi.org/10.3765/bls.v16i0.1724.Search in Google Scholar

Hayes, Bruce & James White. 2013. Phonological naturalness and phonotactic learning. Linguistic Inquiry 44(1). 45–75. https://doi.org/10.1162/ling_a_00119.Search in Google Scholar

Hayes, Bruce & Colin Wilson. 2008. A maximum entropy model of phonotactics and phonotactic learning. Linguistic Inquiry 39(3). 379–440. https://doi.org/10.1162/ling.2008.39.3.379.Search in Google Scholar

Hughto, Coral. 2020. Emergent typological effects of agent-based learning models in maximum entropy grammar. Amherst: University of Massachusetts Amherst PhD dissertation.Search in Google Scholar

Hughto, Coral, Andrew Lamont, Brandon Prickett & Gaja Jarosz. 2019. Learning exceptionality and variation with lexically scaled MaxEnt. Proceedings of the Society for Computation in Linguistics 2(1). 91–101. https://doi.org/10.7275/y68s-kh12.Search in Google Scholar

Jarosz, Gaja. 2013. Learning with hidden structure in Optimality Theory and Harmonic Grammar: Beyond robust interpretive parsing. Phonology 30(1). 27–71. https://doi.org/10.1017/s0952675713000031.Search in Google Scholar

Jarosz, Gaja. 2015. Expectation driven learning of phonology. Unpublished manuscript, University of Massachusetts Amherst.Search in Google Scholar

Kager, René. 2012. Stress in windows: Language typology and factorial typology. Lingua 122(13). 1454–1493. https://doi.org/10.1016/j.lingua.2012.06.005.Search in Google Scholar

Kirby, Simon & James R. Hurford. 2002. The emergence of linguistic structure: An overview of the iterated learning model. In Angelo Cangelosi & Domenico Parisi (eds.), Simulating the evolution of language, 121–147. London: Springer.10.1007/978-1-4471-0663-0_6Search in Google Scholar

Kirov, Christo & Ryan Cotterell. 2018. Recurrent neural networks in linguistic theory: Revisiting Pinker and Prince (1988) and the past tense debate. Transactions of the Association for Computational Linguistics 6. 651–665. https://doi.org/10.1162/tacl_a_00247.Search in Google Scholar

Lee, Seung Suk, Joe Pater & Brandon Prickett. 2024. Representing and learning stress: A MaxEnt framework for comparing learning across grammatical theories. Unpublished manuscript. University of Massachusetts Amherst. Available at: https://websites.umass.edu/pater/papers/ (accessed 20 August 2025).Search in Google Scholar

Linzen, Tal, Emmanuel Dupoux & Yoav Goldberg. 2016. Assessing the ability of LSTMs to learn syntax-sensitive dependencies. Transactions of the Association for Computational Linguistics 4. 521–535. https://doi.org/10.1162/tacl_a_00115.Search in Google Scholar

Linzen, Tal, Sofya Kasyanenko & Maria Gouskova. 2013. Lexical and phonological variation in Russian prepositions. Phonology 30(3). 453–515. https://doi.org/10.1017/s0952675713000225.Search in Google Scholar

Marcus, Gary. 2001. The algebraic mind. Cambridge, MA: MIT Press.10.7551/mitpress/1187.001.0001Search in Google Scholar

Mayer, Connor & Max Nelson. 2020. Phonotactic learning with neural language models. Proceedings of the Society for Computation in Linguistics 3(1). 149–159. https://doi.org/10.7275/g3y2-fx06.Search in Google Scholar

McCurdy, Kate, Sharon Goldwater & Adam Lopez. 2020. Inflecting when there’s no majority: Limitations of encoder-decoder neural networks as cognitive models for German plurals. In Dan Jurafsky, Joyce Chai, Natalie Schluter & Joel Tetreault (eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 1745–1756. Association for Computational Linguistics. Available at: https://aclanthology.org/volumes/2020.acl-main/.10.18653/v1/2020.acl-main.159Search in Google Scholar

Mielke, Jeff. 2008. The emergence of distinctive features. New York: Oxford University Press.10.1093/oso/9780199207916.001.0001Search in Google Scholar

Moore-Cantwell, Claire & Joe Pater. 2016. Gradient exceptionality in maximum entropy grammar with lexically specific constraints. Catalan Journal of Linguistics 15. 53–66. https://doi.org/10.5565/rev/catjl.183.Search in Google Scholar

Moreton, Elliott. 2012. Inter- and intra-dimensional dependencies in implicit phonotactic learning. Journal of Memory and Language 67(1). 165–183. https://doi.org/10.1016/j.jml.2011.12.003.Search in Google Scholar

Nazarov, Aleksei. 2021. Learnability of indexed constraint analyses of phonological opacity. Proceedings of the Society for Computation in Linguistics 4(1). 158–166. https://doi.org/10.7275/f1zb-5s89.Search in Google Scholar

Pater, Joe. 2010. Morpheme-specific phonology: Constraint indexation and inconsistency resolution. In Steve Parker (ed.), Phonological argumentation: Essays on evidence and motivation, 123–154. Sheffield, UK: Equinox.Search in Google Scholar

Pater, Joe. 2019. Generative linguistics and neural networks at 60: Foundation, friction, and fusion. Language 95(1). e41–e74. https://doi.org/10.1353/lan.2019.0009.Search in Google Scholar

Pater, Joe & Elliott Moreton. 2012. Structurally biased phonology: Complexity in learning and typology. Journal of the English and Foreign Languages University, Hyderabad 3(2). 1–44.Search in Google Scholar

Pater, Joe & Robert Staubs. 2015. Emergent contrast in agent-based modeling of language. In Short ’schrift for Alan Prince, a collection of items for Alan Prince on the occasion of his retirement, May/June 2015, organized by Eric Baković. Available at: https://hdl.handle.net/20.500.14394/32429.Search in Google Scholar

Phillips, Betty S. 1984. Word frequency and the actuation of sound change. Language 60(2). 320–342. https://doi.org/10.2307/413643.Search in Google Scholar

Pinker, Steven & Alan Prince. 1988. On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition 28(1–2). 73–193. https://doi.org/10.1016/0010-0277(88)90032-7.Search in Google Scholar

Prickett, Brandon. 2019. Learning biases in opaque interactions. Phonology 36(4). 627–653. https://doi.org/10.1017/s0952675719000320.Search in Google Scholar

Prickett, Brandon. 2021. Modelling a subregular bias in phonological learning with recurrent neural networks. Journal of Language Modelling 9(1). 67–96. https://doi.org/10.15398/jlm.v9i1.251.Search in Google Scholar

Prickett, Brandon, Aaron Traylor & Joe Pater. 2018. Seq2seq models with dropout can learn generalizable reduplication. In Sandra Kuebler & Garrett Nicolai (eds.), Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology, 93–100. Brussels: Association for Computational Linguistics.10.18653/v1/W18-5810Search in Google Scholar

Prince, Alan & Paul Smolensky. 2004. Optimality theory: Constraint interaction in generative grammar. Malden, MA: Blackwell.10.1002/9780470759400Search in Google Scholar

Rumelhart, David E. & James L. McClelland. 1986. On learning the past tenses of English verbs. In James L. McClelland & David E. Rumelhart (eds.), Parallel distributed processing: Explorations in the microstructure of cognition, vol. 2: Psychological and biological models, 216–271. Cambridge, MA: MIT Press.Search in Google Scholar

Sonderegger, Morgan & Partha Niyogi. 2013. Variation and change in English noun/verb pair stress: Data, dynamical systems models, and their interaction. In A. C. L. Yu (ed.), Origins of sound patterns: Approaches to phonologization, 262–284. Oxford: Oxford University Press.10.1093/acprof:oso/9780199573745.003.0013Search in Google Scholar

Sutskever, Ilya, Oriol Vinyals & Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence & K. Q. Weinberger (eds.), Advances in neural information processing systems, 3104–3112. Cambridge, MA: MIT Press.Search in Google Scholar

Tesar, Bruce & Paul Smolensky. 2000. Learnability in optimality theory. Cambridge, MA: MIT Press.10.7551/mitpress/4159.001.0001Search in Google Scholar

Wedel, Andrew B. 2007. Feedback and regularity in the lexicon. Phonology 24(1). 147–185. https://doi.org/10.1017/s0952675707001145.Search in Google Scholar

Wilson, Colin. 2006. Learning phonology with substantive bias: An experimental and computational study of velar palatalization. Cognitive Science 30(5). 945–982. https://doi.org/10.1207/s15516709cog0000_89.Search in Google Scholar

Received: 2024-12-20

Accepted: 2025-07-15

Published Online: 2025-09-26

You are currently not able to access this content.

https://doi.org/10.1515/lingvan-2024-0252

Keywords for this article

phonology; stress; learning; neural networks; agent-based