Abstract
We present the first application of modern neural networks to the well-studied task of learning word stress systems. We tested our adaptation of a sequence-to-sequence network on the Tesar and Smolensky (2000. Learnability in optimality theory. Cambridge, MA: MIT Press) test set of 124 “languages”, showing that it acquires generalizable representations of stress patterns in a very high proportion of runs. We also show that the neural network can learn lexically specified patterns of stress, something constraint-based approaches to stress acquisition require extra mechanisms to accomplish. And finally we demonstrate that the model, in an agent-based simulation, is biased toward systematic patterns of stress, despite having the expressive power to memorize its training data.
Funding source: Directorate for Social, Behavioral and Economic Sciences
Award Identifier / Grant number: BCS-2140826
Acknowledgments
We would like to thank the UMass Sound Workshop, as well as the audiences of the 2019 Manchester Phonology Meeting, 2022 Society for Computation in Linguistics, UNC’s Computational Linguistics Brown Bag Seminar, and the 2024 Annual Meeting on Phonology for helpful discussion of topics related to this paper.
-
Research funding: This research was supported by the National Science Foundation grant BCS-2140826 to the University of Massachusetts Amherst.
References
Beguš, Gašper. 2020. Modeling unsupervised phonetic and phonological learning in generative adversarial phonology. Proceedings of the Society for Computation in Linguistics 3(1). 138–148.10.3389/frai.2020.00044Search in Google Scholar
Berent, Iris. 2013. The phonological mind. Trends in Cognitive Sciences 17(7). 319–327. https://doi.org/10.1016/j.tics.2013.05.004.Search in Google Scholar
Berko, Jean. 1958. The child’s learning of English morphology. Word 14(2–3). 150–177. https://doi.org/10.1080/00437956.1958.11659661.Search in Google Scholar
Boersma, Paul & Joe Pater. 2016. Convergence properties of a gradual learner in Harmonic Grammar. In John J. McCarthy & Joe Pater (eds.), Harmonic Grammar and Harmonic Serialism, 389–434. Bristol, CT: Equinox.Search in Google Scholar
Carpenter, Angela C. 2016. The role of a domain-specific language mechanism in learning natural and unnatural stress. Open Linguistics 2(1). 105–131. https://doi.org/10.1515/opli-2016-0006.Search in Google Scholar
Chollet, François. 2015. Keras [API]. Available at: https://keras.io/getting_started/faq/#how-should-i-cite-keras.Search in Google Scholar
Corkery, Maria, Matusevych Yevgen & Sharon Goldwater. 2019. Are we there yet? Encoder-decoder neural networks as cognitive models of English past tense inflection. In Anna Korhonen, David Traum & Lluís Màrquez (eds.), Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3868–3877. Florence: Association for Computational Linguistics.10.18653/v1/P19-1376Search in Google Scholar
Dempster, Arthur P., Nan M. Laird & Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 39(1). 1–22. https://doi.org/10.1111/j.2517-6161.1977.tb01600.x.Search in Google Scholar
Dresher, B. Elan & Jonathan D. Kaye. 1990. A computational learning model for metrical phonology. Cognition 34(2). 137–195. https://doi.org/10.1016/0010-0277(90)90042-i.Search in Google Scholar
Ernestus, Mirjam & R. Harald Baayen. 2003. Predicting the unpredictable: Interpreting neutralized segments in Dutch. Language 79(1). 5–38. https://doi.org/10.1353/lan.2003.0076.Search in Google Scholar
Ferdinand, Vanessa, Simon Kirby & Kenny Smith. 2019. The cognitive roots of regularization in language. Cognition 184. 53–68. https://doi.org/10.1016/j.cognition.2018.12.002.Search in Google Scholar
Gupta, Prahlad & David S. Touretzky. 1994. Connectionist models and linguistic theory: Investigations of stress systems in language. Cognitive Science 18(1). 1–50. https://doi.org/10.1207/s15516709cog1801_1.Search in Google Scholar
Hare, Mary. 1990. The role of trigger-target similarity in the vowel harmony process. Annual Meeting of the Berkeley Linguistics Society 16. 140–152. https://doi.org/10.3765/bls.v16i0.1724.Search in Google Scholar
Hayes, Bruce & James White. 2013. Phonological naturalness and phonotactic learning. Linguistic Inquiry 44(1). 45–75. https://doi.org/10.1162/ling_a_00119.Search in Google Scholar
Hayes, Bruce & Colin Wilson. 2008. A maximum entropy model of phonotactics and phonotactic learning. Linguistic Inquiry 39(3). 379–440. https://doi.org/10.1162/ling.2008.39.3.379.Search in Google Scholar
Hughto, Coral. 2020. Emergent typological effects of agent-based learning models in maximum entropy grammar. Amherst: University of Massachusetts Amherst PhD dissertation.Search in Google Scholar
Hughto, Coral, Andrew Lamont, Brandon Prickett & Gaja Jarosz. 2019. Learning exceptionality and variation with lexically scaled MaxEnt. Proceedings of the Society for Computation in Linguistics 2(1). 91–101. https://doi.org/10.7275/y68s-kh12.Search in Google Scholar
Jarosz, Gaja. 2013. Learning with hidden structure in Optimality Theory and Harmonic Grammar: Beyond robust interpretive parsing. Phonology 30(1). 27–71. https://doi.org/10.1017/s0952675713000031.Search in Google Scholar
Jarosz, Gaja. 2015. Expectation driven learning of phonology. Unpublished manuscript, University of Massachusetts Amherst.Search in Google Scholar
Kager, René. 2012. Stress in windows: Language typology and factorial typology. Lingua 122(13). 1454–1493. https://doi.org/10.1016/j.lingua.2012.06.005.Search in Google Scholar
Kirby, Simon & James R. Hurford. 2002. The emergence of linguistic structure: An overview of the iterated learning model. In Angelo Cangelosi & Domenico Parisi (eds.), Simulating the evolution of language, 121–147. London: Springer.10.1007/978-1-4471-0663-0_6Search in Google Scholar
Kirov, Christo & Ryan Cotterell. 2018. Recurrent neural networks in linguistic theory: Revisiting Pinker and Prince (1988) and the past tense debate. Transactions of the Association for Computational Linguistics 6. 651–665. https://doi.org/10.1162/tacl_a_00247.Search in Google Scholar
Lee, Seung Suk, Joe Pater & Brandon Prickett. 2024. Representing and learning stress: A MaxEnt framework for comparing learning across grammatical theories. Unpublished manuscript. University of Massachusetts Amherst. Available at: https://websites.umass.edu/pater/papers/ (accessed 20 August 2025).Search in Google Scholar
Linzen, Tal, Emmanuel Dupoux & Yoav Goldberg. 2016. Assessing the ability of LSTMs to learn syntax-sensitive dependencies. Transactions of the Association for Computational Linguistics 4. 521–535. https://doi.org/10.1162/tacl_a_00115.Search in Google Scholar
Linzen, Tal, Sofya Kasyanenko & Maria Gouskova. 2013. Lexical and phonological variation in Russian prepositions. Phonology 30(3). 453–515. https://doi.org/10.1017/s0952675713000225.Search in Google Scholar
Marcus, Gary. 2001. The algebraic mind. Cambridge, MA: MIT Press.10.7551/mitpress/1187.001.0001Search in Google Scholar
Mayer, Connor & Max Nelson. 2020. Phonotactic learning with neural language models. Proceedings of the Society for Computation in Linguistics 3(1). 149–159. https://doi.org/10.7275/g3y2-fx06.Search in Google Scholar
McCurdy, Kate, Sharon Goldwater & Adam Lopez. 2020. Inflecting when there’s no majority: Limitations of encoder-decoder neural networks as cognitive models for German plurals. In Dan Jurafsky, Joyce Chai, Natalie Schluter & Joel Tetreault (eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 1745–1756. Association for Computational Linguistics. Available at: https://aclanthology.org/volumes/2020.acl-main/.10.18653/v1/2020.acl-main.159Search in Google Scholar
Mielke, Jeff. 2008. The emergence of distinctive features. New York: Oxford University Press.10.1093/oso/9780199207916.001.0001Search in Google Scholar
Moore-Cantwell, Claire & Joe Pater. 2016. Gradient exceptionality in maximum entropy grammar with lexically specific constraints. Catalan Journal of Linguistics 15. 53–66. https://doi.org/10.5565/rev/catjl.183.Search in Google Scholar
Moreton, Elliott. 2012. Inter- and intra-dimensional dependencies in implicit phonotactic learning. Journal of Memory and Language 67(1). 165–183. https://doi.org/10.1016/j.jml.2011.12.003.Search in Google Scholar
Nazarov, Aleksei. 2021. Learnability of indexed constraint analyses of phonological opacity. Proceedings of the Society for Computation in Linguistics 4(1). 158–166. https://doi.org/10.7275/f1zb-5s89.Search in Google Scholar
Pater, Joe. 2010. Morpheme-specific phonology: Constraint indexation and inconsistency resolution. In Steve Parker (ed.), Phonological argumentation: Essays on evidence and motivation, 123–154. Sheffield, UK: Equinox.Search in Google Scholar
Pater, Joe. 2019. Generative linguistics and neural networks at 60: Foundation, friction, and fusion. Language 95(1). e41–e74. https://doi.org/10.1353/lan.2019.0009.Search in Google Scholar
Pater, Joe & Elliott Moreton. 2012. Structurally biased phonology: Complexity in learning and typology. Journal of the English and Foreign Languages University, Hyderabad 3(2). 1–44.Search in Google Scholar
Pater, Joe & Robert Staubs. 2015. Emergent contrast in agent-based modeling of language. In Short ’schrift for Alan Prince, a collection of items for Alan Prince on the occasion of his retirement, May/June 2015, organized by Eric Baković. Available at: https://hdl.handle.net/20.500.14394/32429.Search in Google Scholar
Phillips, Betty S. 1984. Word frequency and the actuation of sound change. Language 60(2). 320–342. https://doi.org/10.2307/413643.Search in Google Scholar
Pinker, Steven & Alan Prince. 1988. On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition 28(1–2). 73–193. https://doi.org/10.1016/0010-0277(88)90032-7.Search in Google Scholar
Prickett, Brandon. 2019. Learning biases in opaque interactions. Phonology 36(4). 627–653. https://doi.org/10.1017/s0952675719000320.Search in Google Scholar
Prickett, Brandon. 2021. Modelling a subregular bias in phonological learning with recurrent neural networks. Journal of Language Modelling 9(1). 67–96. https://doi.org/10.15398/jlm.v9i1.251.Search in Google Scholar
Prickett, Brandon, Aaron Traylor & Joe Pater. 2018. Seq2seq models with dropout can learn generalizable reduplication. In Sandra Kuebler & Garrett Nicolai (eds.), Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology, 93–100. Brussels: Association for Computational Linguistics.10.18653/v1/W18-5810Search in Google Scholar
Prince, Alan & Paul Smolensky. 2004. Optimality theory: Constraint interaction in generative grammar. Malden, MA: Blackwell.10.1002/9780470759400Search in Google Scholar
Rumelhart, David E. & James L. McClelland. 1986. On learning the past tenses of English verbs. In James L. McClelland & David E. Rumelhart (eds.), Parallel distributed processing: Explorations in the microstructure of cognition, vol. 2: Psychological and biological models, 216–271. Cambridge, MA: MIT Press.Search in Google Scholar
Sonderegger, Morgan & Partha Niyogi. 2013. Variation and change in English noun/verb pair stress: Data, dynamical systems models, and their interaction. In A. C. L. Yu (ed.), Origins of sound patterns: Approaches to phonologization, 262–284. Oxford: Oxford University Press.10.1093/acprof:oso/9780199573745.003.0013Search in Google Scholar
Sutskever, Ilya, Oriol Vinyals & Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence & K. Q. Weinberger (eds.), Advances in neural information processing systems, 3104–3112. Cambridge, MA: MIT Press.Search in Google Scholar
Tesar, Bruce & Paul Smolensky. 2000. Learnability in optimality theory. Cambridge, MA: MIT Press.10.7551/mitpress/4159.001.0001Search in Google Scholar
Wedel, Andrew B. 2007. Feedback and regularity in the lexicon. Phonology 24(1). 147–185. https://doi.org/10.1017/s0952675707001145.Search in Google Scholar
Wilson, Colin. 2006. Learning phonology with substantive bias: An experimental and computational study of velar palatalization. Cognitive Science 30(5). 945–982. https://doi.org/10.1207/s15516709cog0000_89.Search in Google Scholar
© 2025 Walter de Gruyter GmbH, Berlin/Boston