Less is more: why all paradigms are defective, and why that is a good thing

A. Laura Janda; M. Francis Tyers

doi:10.1515/cllt-2018-0031

Article

Less is more: why all paradigms are defective, and why that is a good thing

A. Laura Janda
Laura A. Janda (born 1957, Ph.D., UCLA, 1984) is Professor of Russian Linguistics at UiT the Arctic University of Norway. Her special areas of interest are the complex factors associated with the grammatical categories of case and aspect and how these can be investigated using corpus data and experiments.
and M. Francis Tyers
Francis M. Tyers (born 1983, Ph.D., Universitat d’Alacant, 2013) is Assistant Professor of Linguistics at Higher School of Economics in Moscow. He is passionate about language technology for lesser-resourced languages and has co-organised workshops on machine translation in a number of countries including Russia and Finland.

Published/Copyright: June 5, 2018

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Corpus Linguistics and Linguistic Theory Volume 17 Issue 1

Abstract

Only a fraction of lexemes are encountered in all their paradigm forms in any corpus or even in the lifetime of any speaker. This raises a question as to how it is that native speakers confidently produce and comprehend word forms that they have never witnessed. We present the results of an experiment using a recurrent neural network computational learning model. In particular, we compare the model’s production of unencountered forms using two types of training data: full paradigms vs. single word forms for Russian nouns, verbs, and adjectives. In the long run, the model displays better performance when exposed to the more naturalistic training on single word forms, even though the other training data is much larger as it includes full paradigms for each and every word. We discuss why “defective” paradigms may be better for human learners as well.

Keywords: morphology; paradigm; Russian; corpus; computational experiment

About the authors

A. Laura Janda

Laura A. Janda (born 1957, Ph.D., UCLA, 1984) is Professor of Russian Linguistics at UiT the Arctic University of Norway. Her special areas of interest are the complex factors associated with the grammatical categories of case and aspect and how these can be investigated using corpus data and experiments.

M. Francis Tyers

Francis M. Tyers (born 1983, Ph.D., Universitat d’Alacant, 2013) is Assistant Professor of Linguistics at Higher School of Economics in Moscow. He is passionate about language technology for lesser-resourced languages and has co-organised workshops on machine translation in a number of countries including Russia and Finland.

References

Ackerman, Farrell, James P Blevins & Robert Malouf. 2009. Parts and wholes: Patterns of relatedness in complex morphological systems and why they matter. In James P Blevins & Juliette Blevins (eds.), Analogy in Grammar: Form and Acquisition, 54–82. Oxford: Oxford University Press.10.1093/acprof:oso/9780199547548.003.0003Search in Google Scholar

Ackerman, Farrell & Robert Malouf. 2016. Implicative relations in word-based morphological systems. In Andrew Hippisley & Gregory Stump (eds.), Cambridge Handbook of Morphology, 297–328. Cambridge: Cambridge University Press.10.1017/9781139814720.012Search in Google Scholar

Aharoni, Roee, Yoav Goldberg & Yonatan Belnikov. 2016. Improving sequence to sequence learning for morphological inflection generation: The BIU-MIT Systems for the SIGMORPHON 2016 shared task for morphological reinflection. In Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology (SIGMORPHON at ACL) 2016. DOI: 10.18653/v1/W16-2007 Search in Google Scholar

Albright, Adam. 2003. A quantitative study of Spanish paradigm gaps. In G. Garding & M. Tsujimura (eds.), West Coast Conference on Formal Linguistics 22 proceedings. Somerville, MA: Cascadilla Press, 1–14. http://web.mit.edu/albright/www/papers/Albright-WCCFL22.pdf Search in Google Scholar

Andrjušina, N. P. 2006. Leksičeskij minimum po russkomu jazyku kak inostrannomu. Bazovyj uroven’. Obščee vladenie. Moscow/St. Petersburg: TsMO MGU/Zlatoust.Search in Google Scholar

Arppe, Antti. 2006. Frequency considerations in morphology, revisited - Finnish verbs differ, too. In M. Suominen, A. Arppe, A. Airola, O. Heinämäki, M. Miestamo, U. Määttä, J. Niemi, K. K. Pitkänen, K. Sinnemäki & Kaius (eds.), A Man of Measure. Festschrift in Honour of Fred Karlsson in his 60th Birthday, Special Supplement to SKY Journal of Linguistics. vol. 19/2006. 175–189. Turku: Linguistic Association of Finland. http://www.ling.helsinki.fi/sky/julkaisut/SKY2006_1/1.3.1.ARPPE.pdf.Search in Google Scholar

Baayen, R. Harald. 1992. Quantitative aspects of morphological productivity. In Gert E Booij & J. Van Marle (eds.), Yearbook of Morphology 1991, 109–149. Dordrecht: Kluwer Academic Publishers.10.1007/978-94-011-2516-1_8Search in Google Scholar

Baayen, R. Harald. 1993. On frequency, transparency, and productivity. In Gert E Booij & J. Van Marle (eds.), Yearbook of Morphology 1992, 181–208. Dordrecht: Kluwer Academic Publishers.10.1007/978-94-017-3710-4_7Search in Google Scholar

Baerman, Matthew. 2011. Defectiveness and homophony avoidance. Journal of Linguistics. 47(1) 1–29.10.1017/S0022226710000022Search in Google Scholar

Blevins, James P. 2016. Word and Paradigm Morphology. Oxford: Oxford University Press.10.1093/acprof:oso/9780199593545.001.0001Search in Google Scholar

Booij, Gert. 2017. The construction of words In Barbara Dancygier (ed.), The Cambridge Handbook of Cognitive Linguistics, Chapter 15. Cambridge: Cambridge University Press.10.1017/9781316339732.016Search in Google Scholar

Bybee, Joan L. 1985. Morphology: A Study of the Relation between Meaning and Form. Amsterdam: John Benjamins.10.1075/tsl.9Search in Google Scholar

Comrie, Bernard & Maria Polinsky. 1998. The Great Dagestanian Case Hoax. In Anna Siewierska & Jae Jung Song (eds.), Case, Typology, and Grammar, 95–114. Amsterdam: John Benjamins.10.1075/tsl.38.09comSearch in Google Scholar

Corbett, Greville G. 2015. Morphosyntactic complexity: A typology of lexical splits. Language. 91. 145–193. 10.1353/lan.2015.0003.Search in Google Scholar

Cotterell, Ryan, Christo Kirov, John Sylak-Glassman, Gėraldine Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui, Sandra Kübler, David Yarowsky, Jason Eisner & Mans Hulden. 2017. CoNLL-SIGMORPHON 2017 shared task: Universal morphological reinflection in 52 languages.In Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection, 1–30.10.18653/v1/K17-2001Search in Google Scholar

Cotterell, Ryan, Christo Kirov, John Sylak-Glassman, David Yarowsky, Jason Eisner & Mans Hulden. 2016. The SIGMORPHON 2016 shared task— Morphological reinflection. In Proceedings of the 14th Annual SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 10–22.10.18653/v1/W16-2002Search in Google Scholar

Cruse, D. A. 1986. Lexical Semantics. Cambridge: Cambridge University Press.Search in Google Scholar

Diessel, Holger. 2015. Usage-based construction grammar In Ewa Dąbrowska & Dagmar Divjak (eds.), Handbook of Cognitive Linguistics, Chapter 14. Berlin: De Gruyter Mouton.10.1515/9783110292022-015Search in Google Scholar

Faruqui, Manaal, Yulia Tsvetkov, Graham Neubig & Chris Dyer. 2016. Morphological inflection generation using character sequence to sequence learning. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, USA, June 12 - June 17, 2016. https://arxiv.org/abs/1512.06110 10.18653/v1/N16-1077Search in Google Scholar

Goldberg, Adele. 2006. Constructions at work. The nature of generalization in language. Oxford: Oxford University Press.10.1093/acprof:oso/9780199268511.001.0001Search in Google Scholar

Hart, Betty & Todd R Risley. 2003. The early catastrophe. The 30 million word gap by age 3. American Educator Spring 2003. 4–9.Search in Google Scholar

Janda, Laura A & Lene Antonsen. 2016. The ongoing eclipse of possessive suffixes in North Saami: A case study in reduction of morphological complexity. Diachronica. 33(3). 330–366. http://dx.doi.org/10.1075/dia.33.3.02jan.Search in Google Scholar

Janda, Laura A & Olga Lyashevksaya. 2011. Grammatical profiles and the interaction of the lexicon with aspect, tense and mood in Russian. Cognitive Linguistics. 22(4) 719–763.10.1515/cogl.2011.027Search in Google Scholar

Kann, Katharina & Hinrich Schütze. 2016a. Single-model encoder-decoder with explicit morphological representation for reinflection. The Association for Computational Linguistics.In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 555–560.10.18653/v1/P16-2090Search in Google Scholar

Kann, Katharina & Hinrich Schütze. 2016b. MED: The LMU System for the SIGMORPHON 2016 Shared Task on Morphological Reinflection.In Proceedings of the 14th Annual SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 62–70.10.18653/v1/W16-2010Search in Google Scholar

Karlsson, Fred. 1985. Paradigms and word forms. Studia gramatyczne VII. Ossolineum, 135–154.10.4064/sm-82-2-135-154Search in Google Scholar

Karlsson, Fred. 1986. Frequency considerations in morphology. Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung. 39. 19–28.10.1524/stuf.1986.39.14.19Search in Google Scholar

Kibrik, Andrei E. 2001. Archi (Caucasian—Daghestanian) In Andrew Spencer & Arnold M Zwicky (eds.), The Handbook of Morphology, Chapter 23. Hoboken, NJ: Wiley-Blackwell.Search in Google Scholar

Kuznetsova, Julia. 2017. The ratio of unique word forms as a measure of creativity. In Anastasia Makarova, Stephen M. Dickey & Dagmar Divjak (eds.), Each Venture a New Beginning: Studies in Honor of Laura A. Janda, 85–97. Bloomington, In Slavica Publishers.Search in Google Scholar

Langacker, Ronald W. 2008. Cognitive Grammar: A Basic Introduction. Oxford: Oxford University Press.10.1093/acprof:oso/9780195331967.001.0001Search in Google Scholar

Levenshtein, Vladimir I. 1965/1966. Dvojnye kody s ispravleniem vypadenij, vstavok i zameščenij simvolov. Doklady Akademii Nauk SSSR. 163(4). 845–848. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10(8), 707–710.Search in Google Scholar

Malouf, Robert. 2016. Generating morphological paradigms with a recurrent neural network. San Diego Linguistic Papers. 6. 122–129.Search in Google Scholar

Malouf, Robert. 2017. Abstractive morphological learning with a recurrent neural network. Morphology. 27. 431–458. 10.1007/s11525-017-9307-x.Search in Google Scholar

Manning, Christopher D. & Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.Search in Google Scholar

Merriënboer, Bart van, Dzmitry Bahdanau, Vincent Dumoulin, Dmitriy Serdyuk, David Warde-Farley, Jan Chorowski & Yoshua Bengio. 2015. Blocks and fuel: Frameworks for deep learning. arXiv preprint arXiv:1506.00619 [cs.LG].Search in Google Scholar

Moreno-Sánchez, Isabel, Francesc Font-Clos & Álvaro Corral. 2016. Large-scale analysis of Zipf’s Law in English texts. PLoS One. 11(1). e0147073. 10.1371/journal.pone.0147073.Search in Google Scholar

Nesset, Tore & Laura A Janda. 2010. Paradigm structure: Evidence from Russian suffix shift. Cognitive Linguistics. 21(4) 699–725.10.1515/cogl.2010.022Search in Google Scholar

Nickel, Klaus P & Pekka Sammallahti. 2011. Nordsamisk grammatikk. Karasjok: Davvi Girji.Search in Google Scholar

Nivre, Joakim, Marie-Catherine De Marneffe, Filip Ginter, Yoav Goldberg, Christopher D Jan Hajic, Ryan McDonald Manning, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty & Daniel Zeman. 2016. Universal Dependencies v1: A Multilingual Treebank Collection. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk & Stelios Piperidis (eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Paris: European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2016/summaries/348.html Search in Google Scholar

Payne, John & Rodney Huddleston. 2002. Nouns and noun phrases. In Rodney Huddleston & Geoffrey Pullum (eds.), The Cambridge Grammar of the English Language, 479–481. Cambridge/New York: Cambridge University Press.10.1017/9781316423530Search in Google Scholar

Pertsova, Katya & Julia Kuznetsova. 2015. Experimental evidence for lexical conservatism in Russian: Defective verbs revisited. In Yohei Oseki, Masha Esipova & Stephanie Harves (eds.), Proceedings of the 24th Meeting of Formal Approaches to Slavic Linguistics. Ann Arbor, Michigan: Michigan Slavic Publications. https://nyu.edu/projects/fasl24/proceedings/pertsova_kuznetsova_fasl24.pdf Search in Google Scholar

Piperski, Alexander. Ch. 2015. To be or not to be: Corpora as indicators of (non-)existence. In V. P. Selegej (ed.), Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference “Dialogue” (2015) 14(1),515–522.Search in Google Scholar

Reynolds, Robert J. 2016. Russian natural language processing for computer-assisted language learning. Doctoral Dissertation, UiT The Arctic University of Norway.Search in Google Scholar

Sims, Andrea D. 2006. Minding the Gaps: Inflectional Defectiveness in a Paradigmatic Theory. PhD Dissertation, Ohio State University.Search in Google Scholar

Sinclair, John. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press.Search in Google Scholar

Spencer, Andrew. 2016. Two morphologies or one? Inflection versus word-formation. In Andrew Hippisley & Gregory Stump (eds.), The Cambridge Handbook of Morphology, 27–49. Cambridge: Cambridge University Press.10.1017/9781139814720.002Search in Google Scholar

Theano Development Team. 2016. Theano: A Python framework for fast computation of mathematical expressions. arXiv:1605.02688v1.Search in Google Scholar

Wurzel, Wolfgang U. 1984. Flexionsmorphologie und Natürlichkeit. Berlin: Akademie-Verlag.10.1515/9783112709658Search in Google Scholar

Wurzel, Wolfgang U. 1989. Inflectional Morphology and Naturalness. Dordrecht. Boston and London: Kluwer Academic Publishers.Search in Google Scholar

Zipf, George K. 1949. Human Behavior and the Principle of Least Effort. Reading, MA: Addison-Wesley.Search in Google Scholar

Published Online: 2018-06-05

Published in Print: 2021-05-26

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/cllt-2018-0031

Keywords for this article

morphology; paradigm; Russian; corpus; computational experiment