Less is more: why all paradigms are defective, and why that is a good thing

A. Laura Janda; M. Francis Tyers

doi:10.1515/cllt-2018-0031

Artikel

Less is more: why all paradigms are defective, and why that is a good thing

A. Laura Janda
Laura A. Janda (born 1957, Ph.D., UCLA, 1984) is Professor of Russian Linguistics at UiT the Arctic University of Norway. Her special areas of interest are the complex factors associated with the grammatical categories of case and aspect and how these can be investigated using corpus data and experiments.
und M. Francis Tyers
Francis M. Tyers (born 1983, Ph.D., Universitat d’Alacant, 2013) is Assistant Professor of Linguistics at Higher School of Economics in Moscow. He is passionate about language technology for lesser-resourced languages and has co-organised workshops on machine translation in a number of countries including Russia and Finland.

Veröffentlicht/Copyright: 5. Juni 2018

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen Erkunden Sie dieses Fachgebiet

Aus der Zeitschrift Corpus Linguistics and Linguistic Theory Band 17 Heft 1

Abstract

Only a fraction of lexemes are encountered in all their paradigm forms in any corpus or even in the lifetime of any speaker. This raises a question as to how it is that native speakers confidently produce and comprehend word forms that they have never witnessed. We present the results of an experiment using a recurrent neural network computational learning model. In particular, we compare the model’s production of unencountered forms using two types of training data: full paradigms vs. single word forms for Russian nouns, verbs, and adjectives. In the long run, the model displays better performance when exposed to the more naturalistic training on single word forms, even though the other training data is much larger as it includes full paradigms for each and every word. We discuss why “defective” paradigms may be better for human learners as well.

Keywords: morphology; paradigm; Russian; corpus; computational experiment

About the authors

A. Laura Janda

Laura A. Janda (born 1957, Ph.D., UCLA, 1984) is Professor of Russian Linguistics at UiT the Arctic University of Norway. Her special areas of interest are the complex factors associated with the grammatical categories of case and aspect and how these can be investigated using corpus data and experiments.

M. Francis Tyers

Francis M. Tyers (born 1983, Ph.D., Universitat d’Alacant, 2013) is Assistant Professor of Linguistics at Higher School of Economics in Moscow. He is passionate about language technology for lesser-resourced languages and has co-organised workshops on machine translation in a number of countries including Russia and Finland.

References

Ackerman, Farrell, James P Blevins & Robert Malouf. 2009. Parts and wholes: Patterns of relatedness in complex morphological systems and why they matter. In James P Blevins & Juliette Blevins (eds.), Analogy in Grammar: Form and Acquisition, 54–82. Oxford: Oxford University Press.10.1093/acprof:oso/9780199547548.003.0003Suche in Google Scholar

Ackerman, Farrell & Robert Malouf. 2016. Implicative relations in word-based morphological systems. In Andrew Hippisley & Gregory Stump (eds.), Cambridge Handbook of Morphology, 297–328. Cambridge: Cambridge University Press.10.1017/9781139814720.012Suche in Google Scholar

Aharoni, Roee, Yoav Goldberg & Yonatan Belnikov. 2016. Improving sequence to sequence learning for morphological inflection generation: The BIU-MIT Systems for the SIGMORPHON 2016 shared task for morphological reinflection. In Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology (SIGMORPHON at ACL) 2016. DOI: 10.18653/v1/W16-2007 Suche in Google Scholar

Albright, Adam. 2003. A quantitative study of Spanish paradigm gaps. In G. Garding & M. Tsujimura (eds.), West Coast Conference on Formal Linguistics 22 proceedings. Somerville, MA: Cascadilla Press, 1–14. http://web.mit.edu/albright/www/papers/Albright-WCCFL22.pdf Suche in Google Scholar

Andrjušina, N. P. 2006. Leksičeskij minimum po russkomu jazyku kak inostrannomu. Bazovyj uroven’. Obščee vladenie. Moscow/St. Petersburg: TsMO MGU/Zlatoust.Suche in Google Scholar

Arppe, Antti. 2006. Frequency considerations in morphology, revisited - Finnish verbs differ, too. In M. Suominen, A. Arppe, A. Airola, O. Heinämäki, M. Miestamo, U. Määttä, J. Niemi, K. K. Pitkänen, K. Sinnemäki & Kaius (eds.), A Man of Measure. Festschrift in Honour of Fred Karlsson in his 60th Birthday, Special Supplement to SKY Journal of Linguistics. vol. 19/2006. 175–189. Turku: Linguistic Association of Finland. http://www.ling.helsinki.fi/sky/julkaisut/SKY2006_1/1.3.1.ARPPE.pdf.Suche in Google Scholar

Baayen, R. Harald. 1992. Quantitative aspects of morphological productivity. In Gert E Booij & J. Van Marle (eds.), Yearbook of Morphology 1991, 109–149. Dordrecht: Kluwer Academic Publishers.10.1007/978-94-011-2516-1_8Suche in Google Scholar

Baayen, R. Harald. 1993. On frequency, transparency, and productivity. In Gert E Booij & J. Van Marle (eds.), Yearbook of Morphology 1992, 181–208. Dordrecht: Kluwer Academic Publishers.10.1007/978-94-017-3710-4_7Suche in Google Scholar

Baerman, Matthew. 2011. Defectiveness and homophony avoidance. Journal of Linguistics. 47(1) 1–29.10.1017/S0022226710000022Suche in Google Scholar

Blevins, James P. 2016. Word and Paradigm Morphology. Oxford: Oxford University Press.10.1093/acprof:oso/9780199593545.001.0001Suche in Google Scholar

Booij, Gert. 2017. The construction of words In Barbara Dancygier (ed.), The Cambridge Handbook of Cognitive Linguistics, Chapter 15. Cambridge: Cambridge University Press.10.1017/9781316339732.016Suche in Google Scholar

Bybee, Joan L. 1985. Morphology: A Study of the Relation between Meaning and Form. Amsterdam: John Benjamins.10.1075/tsl.9Suche in Google Scholar

Comrie, Bernard & Maria Polinsky. 1998. The Great Dagestanian Case Hoax. In Anna Siewierska & Jae Jung Song (eds.), Case, Typology, and Grammar, 95–114. Amsterdam: John Benjamins.10.1075/tsl.38.09comSuche in Google Scholar

Corbett, Greville G. 2015. Morphosyntactic complexity: A typology of lexical splits. Language. 91. 145–193. 10.1353/lan.2015.0003.Suche in Google Scholar

Cotterell, Ryan, Christo Kirov, John Sylak-Glassman, Gėraldine Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui, Sandra Kübler, David Yarowsky, Jason Eisner & Mans Hulden. 2017. CoNLL-SIGMORPHON 2017 shared task: Universal morphological reinflection in 52 languages.In Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection, 1–30.10.18653/v1/K17-2001Suche in Google Scholar

Cotterell, Ryan, Christo Kirov, John Sylak-Glassman, David Yarowsky, Jason Eisner & Mans Hulden. 2016. The SIGMORPHON 2016 shared task— Morphological reinflection. In Proceedings of the 14th Annual SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 10–22.10.18653/v1/W16-2002Suche in Google Scholar

Cruse, D. A. 1986. Lexical Semantics. Cambridge: Cambridge University Press.Suche in Google Scholar

Diessel, Holger. 2015. Usage-based construction grammar In Ewa Dąbrowska & Dagmar Divjak (eds.), Handbook of Cognitive Linguistics, Chapter 14. Berlin: De Gruyter Mouton.10.1515/9783110292022-015Suche in Google Scholar

Faruqui, Manaal, Yulia Tsvetkov, Graham Neubig & Chris Dyer. 2016. Morphological inflection generation using character sequence to sequence learning. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, USA, June 12 - June 17, 2016. https://arxiv.org/abs/1512.06110 10.18653/v1/N16-1077Suche in Google Scholar

Goldberg, Adele. 2006. Constructions at work. The nature of generalization in language. Oxford: Oxford University Press.10.1093/acprof:oso/9780199268511.001.0001Suche in Google Scholar

Hart, Betty & Todd R Risley. 2003. The early catastrophe. The 30 million word gap by age 3. American Educator Spring 2003. 4–9.Suche in Google Scholar

Janda, Laura A & Lene Antonsen. 2016. The ongoing eclipse of possessive suffixes in North Saami: A case study in reduction of morphological complexity. Diachronica. 33(3). 330–366. http://dx.doi.org/10.1075/dia.33.3.02jan.Suche in Google Scholar

Janda, Laura A & Olga Lyashevksaya. 2011. Grammatical profiles and the interaction of the lexicon with aspect, tense and mood in Russian. Cognitive Linguistics. 22(4) 719–763.10.1515/cogl.2011.027Suche in Google Scholar

Kann, Katharina & Hinrich Schütze. 2016a. Single-model encoder-decoder with explicit morphological representation for reinflection. The Association for Computational Linguistics.In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 555–560.10.18653/v1/P16-2090Suche in Google Scholar

Kann, Katharina & Hinrich Schütze. 2016b. MED: The LMU System for the SIGMORPHON 2016 Shared Task on Morphological Reinflection.In Proceedings of the 14th Annual SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 62–70.10.18653/v1/W16-2010Suche in Google Scholar

Karlsson, Fred. 1985. Paradigms and word forms. Studia gramatyczne VII. Ossolineum, 135–154.10.4064/sm-82-2-135-154Suche in Google Scholar

Karlsson, Fred. 1986. Frequency considerations in morphology. Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung. 39. 19–28.10.1524/stuf.1986.39.14.19Suche in Google Scholar

Kibrik, Andrei E. 2001. Archi (Caucasian—Daghestanian) In Andrew Spencer & Arnold M Zwicky (eds.), The Handbook of Morphology, Chapter 23. Hoboken, NJ: Wiley-Blackwell.Suche in Google Scholar

Kuznetsova, Julia. 2017. The ratio of unique word forms as a measure of creativity. In Anastasia Makarova, Stephen M. Dickey & Dagmar Divjak (eds.), Each Venture a New Beginning: Studies in Honor of Laura A. Janda, 85–97. Bloomington, In Slavica Publishers.Suche in Google Scholar

Langacker, Ronald W. 2008. Cognitive Grammar: A Basic Introduction. Oxford: Oxford University Press.10.1093/acprof:oso/9780195331967.001.0001Suche in Google Scholar

Levenshtein, Vladimir I. 1965/1966. Dvojnye kody s ispravleniem vypadenij, vstavok i zameščenij simvolov. Doklady Akademii Nauk SSSR. 163(4). 845–848. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10(8), 707–710.Suche in Google Scholar

Malouf, Robert. 2016. Generating morphological paradigms with a recurrent neural network. San Diego Linguistic Papers. 6. 122–129.Suche in Google Scholar

Malouf, Robert. 2017. Abstractive morphological learning with a recurrent neural network. Morphology. 27. 431–458. 10.1007/s11525-017-9307-x.Suche in Google Scholar

Manning, Christopher D. & Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.Suche in Google Scholar

Merriënboer, Bart van, Dzmitry Bahdanau, Vincent Dumoulin, Dmitriy Serdyuk, David Warde-Farley, Jan Chorowski & Yoshua Bengio. 2015. Blocks and fuel: Frameworks for deep learning. arXiv preprint arXiv:1506.00619 [cs.LG].Suche in Google Scholar

Moreno-Sánchez, Isabel, Francesc Font-Clos & Álvaro Corral. 2016. Large-scale analysis of Zipf’s Law in English texts. PLoS One. 11(1). e0147073. 10.1371/journal.pone.0147073.Suche in Google Scholar

Nesset, Tore & Laura A Janda. 2010. Paradigm structure: Evidence from Russian suffix shift. Cognitive Linguistics. 21(4) 699–725.10.1515/cogl.2010.022Suche in Google Scholar

Nickel, Klaus P & Pekka Sammallahti. 2011. Nordsamisk grammatikk. Karasjok: Davvi Girji.Suche in Google Scholar

Nivre, Joakim, Marie-Catherine De Marneffe, Filip Ginter, Yoav Goldberg, Christopher D Jan Hajic, Ryan McDonald Manning, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty & Daniel Zeman. 2016. Universal Dependencies v1: A Multilingual Treebank Collection. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk & Stelios Piperidis (eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Paris: European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2016/summaries/348.html Suche in Google Scholar

Payne, John & Rodney Huddleston. 2002. Nouns and noun phrases. In Rodney Huddleston & Geoffrey Pullum (eds.), The Cambridge Grammar of the English Language, 479–481. Cambridge/New York: Cambridge University Press.10.1017/9781316423530Suche in Google Scholar

Pertsova, Katya & Julia Kuznetsova. 2015. Experimental evidence for lexical conservatism in Russian: Defective verbs revisited. In Yohei Oseki, Masha Esipova & Stephanie Harves (eds.), Proceedings of the 24th Meeting of Formal Approaches to Slavic Linguistics. Ann Arbor, Michigan: Michigan Slavic Publications. https://nyu.edu/projects/fasl24/proceedings/pertsova_kuznetsova_fasl24.pdf Suche in Google Scholar

Piperski, Alexander. Ch. 2015. To be or not to be: Corpora as indicators of (non-)existence. In V. P. Selegej (ed.), Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference “Dialogue” (2015) 14(1),515–522.Suche in Google Scholar

Reynolds, Robert J. 2016. Russian natural language processing for computer-assisted language learning. Doctoral Dissertation, UiT The Arctic University of Norway.Suche in Google Scholar

Sims, Andrea D. 2006. Minding the Gaps: Inflectional Defectiveness in a Paradigmatic Theory. PhD Dissertation, Ohio State University.Suche in Google Scholar

Sinclair, John. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press.Suche in Google Scholar

Spencer, Andrew. 2016. Two morphologies or one? Inflection versus word-formation. In Andrew Hippisley & Gregory Stump (eds.), The Cambridge Handbook of Morphology, 27–49. Cambridge: Cambridge University Press.10.1017/9781139814720.002Suche in Google Scholar

Theano Development Team. 2016. Theano: A Python framework for fast computation of mathematical expressions. arXiv:1605.02688v1.Suche in Google Scholar

Wurzel, Wolfgang U. 1984. Flexionsmorphologie und Natürlichkeit. Berlin: Akademie-Verlag.10.1515/9783112709658Suche in Google Scholar

Wurzel, Wolfgang U. 1989. Inflectional Morphology and Naturalness. Dordrecht. Boston and London: Kluwer Academic Publishers.Suche in Google Scholar

Zipf, George K. 1949. Human Behavior and the Principle of Least Effort. Reading, MA: Addison-Wesley.Suche in Google Scholar

Published Online: 2018-06-05

Published in Print: 2021-05-26

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Artikel in diesem Heft

https://doi.org/10.1515/cllt-2018-0031

Schlagwörter für diesen Artikel

morphology; paradigm; Russian; corpus; computational experiment