Startseite Linguistic typology in natural language processing
Artikel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

Linguistic typology in natural language processing

  • Emily M. Bender EMAIL logo
Veröffentlicht/Copyright: 23. Dezember 2016

Abstract

This paper explores the ways in which the field of natural language processing (NLP) can and does benefit from work in linguistic typology. I describe the recent increase in interest in multilingual natural language processing and give a high-level overview of the field. I then turn to a discussion of how linguistic knowledge in general is incorporated in NLP technology before describing how typological results in particular are used. I consider both rule-based and machine learning approaches to NLP and review literature on predicting typological features as well as that which leverages such features.

Acknowledgments

I would like to thank Antske Fokkens, Gina-Anne Levow, and Olga Zamaraeva for helpful discussion in the preparation of this paper. All remaining errors and infelicities are my own.

References

Ackema, Peter, Patrick Brandt, Maaike Schoorlemmer & Fred Weerman (eds.). 2006. Arguments and agreement. Oxford: Oxford University Press.Suche in Google Scholar

Ammar, Waleed, George Mulcaire, Miguel Ballesteros, Chris Dyer & Noah A. Smith. 2016. Many languages, one parser. Transactions of the Association for Computational Linguistics 4. 431–444. https://www.transacl.org/ojs/index.php/tacl/article/view/89210.1162/tacl_a_00109Suche in Google Scholar

Baldwin, Timothy & Valia Kordoni (eds.). 2011. The interaction between linguistics and computational linguistics: Virtuous, vicious or vacuous? Special issue of Linguistic Issues in Language Technology 6. http://journals.linguisticsociety.org/elanguage/lilt/issue/view/330.html10.33011/lilt.v6i.1233Suche in Google Scholar

Bandyopadhyay, Sivaji, Pushpak Bhattacharya, Vasudeva Varma, Sudeshna Sarkar, A. Kumaran & Raghavendra Udupa (eds.). 2009. Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies (CLIAWS3), June 4, 2009, Boulder, Colorado. Madison, WI: Omnipress. http://www.aclweb.org/anthology/W09-16Suche in Google Scholar

Bender, Emily M. 2008. Grammar engineering for linguistic hypothesis testing. Texas Linguistics Society 10. 16–36.Suche in Google Scholar

Bender, Emily M. 2009. Linguistically naïve != language independent: Why NLP needs linguistic typology. In Proceedings of the EACL 2009 workshop on the interaction between linguistics and computational linguistics: Virtuous, vicious or vacuous?, 26–32. Vrilissia, Greece: Tehnografia Digital Press. http://www.aclweb.org/anthology/W09-010610.3115/1642038.1642044Suche in Google Scholar

Bender, Emily M. 2011. On achieving and evaluating language-independence in NLP. Linguistic Issues in Language Technology 6(3). 1–26. http://journals.linguisticsociety.org/elanguage/lilt/article/view/2624.html10.33011/lilt.v6i.1239Suche in Google Scholar

Bender, Emily M. 2014. Language CoLLAGE: Grammatical description with the LinGO Grammar Matrix. International Conference on Language Resources and Evaluation 9. 2447–2451. http://www.lrec-conf.org/proceedings/lrec2014/pdf/639_Paper.pdfSuche in Google Scholar

Bender, Emily M., Joshua Crowgey, Michael Wayne Goodman & Fei Xia. 2014. Learning grammar specifications from IGT: A case study of Chintang. In Good et al. (eds.) 2014, 43–53. http://www.aclweb.org/anthology/W14-220610.3115/v1/W14-2206Suche in Google Scholar

Bender, Emily M., Scott Drellishak, Antske Fokkens, Laurie Poulson & Safiyyah Saleem. 2010. Grammar customization. Research on Language and Computation 23–72.10.1007/s11168-010-9070-1Suche in Google Scholar

Bender, Emily M., Dan Flickinger & Stephan Oepen. 2002. The grammar matrix: An open-source starter-kit for the rapid development of crosslinguistically consistent broad-coverage precision grammars. International Conference on Computational Linguistics 19 (Workshop on Grammar Engineering and Evaluation). 8–14. http://www.aclweb.org/anthology/W02-150210.3115/1118783.1118785Suche in Google Scholar

Bender, Emily M., Michael Wayne Goodman, Joshua Crowgey & Fei Xia. 2013. Towards creating precision grammars from interlinear glossed text: Inferring large-scale typological properties. Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities 7. 74–83. http://www.aclweb.org/anthology/W13-2710Suche in Google Scholar

Böhmová, Alena, Jan Hajič, Eva Hajičová & Barbora Hladká. 2003. The Prague Dependency Treebank. In Anne Abeillé (ed.), Treebanks: Building and using parsed corpora, 103–127. Dordrecht: Kluwer.10.1007/978-94-010-0201-1_7Suche in Google Scholar

Brown, Peter F., John Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, Fredrick Jelinek, John D. Lafferty, Robert L. Mercer & Paul S. Roossin. 1990. A statistical approach to machine translation. Computational Linguistics 16. 79–85.Suche in Google Scholar

Buchholz, Sabine & Erwin Marsi. 2006. CoNLL-X shared task on multilingual dependency parsing. Conference on Computational Natural Language Learning 10. 149–164. http://www.aclweb.org/anthology/W06-292010.3115/1596276.1596305Suche in Google Scholar

Büring, Daniel. 2010. Towards a typology of focus realization. In Malte Zimmermann & Caroline Féry (eds.), Information structure, 177–205. Oxford: Oxford University Press.10.1093/acprof:oso/9780199570959.003.0008Suche in Google Scholar

Bybee, Joan L., Revere Perkins & William Pagliuca. 1994. The evolution of grammar: Tense, aspect and modality in the languages of the world. Chicago: University of Chicago Press.Suche in Google Scholar

Calzolari, Nicoletta, Riccardo Del Gratta, Gil Francopoulo, Joseph Mariani, Francesco Rubino, Irene Russo & Claudia Soria. 2012. The LRE map: Harmonising community descriptions of resources. International Conference on Language Resources and Evaluation 8. 1084–1089. http://www.lrec-conf.org/proceedings/lrec2012/pdf/769_Paper.pdfSuche in Google Scholar

Comrie, Bernard. 1976. Aspect: An introduction to the study of verbal aspect and related problems. Cambridge: Cambridge University Press.Suche in Google Scholar

Comrie, Bernard. 1985. Tense. Cambridge: Cambridge University Press.10.1017/CBO9781139165815Suche in Google Scholar

Comrie, Bernard. 1989. Language universals and linguistic typology. 2nd edn. Chicago: University of Chicago Press.Suche in Google Scholar

Copestake, Ann, Dan Flickinger, Carl Pollard & Ivan A. Sag. 2005. Minimal recursion semantics: An introduction. Research on Language and Computation 3. 281–332.10.1007/s11168-006-6327-9Suche in Google Scholar

Corbett, Greville G. 1991. Gender. Cambridge: Cambridge University Press.10.1017/CBO9781139166119Suche in Google Scholar

Corbett, Greville G. 2000. Number. Cambridge: Cambridge University Press.10.1017/CBO9781139164344Suche in Google Scholar

Corbett, Greville G. 2006. Agreement. Cambridge: Cambridge University Press.Suche in Google Scholar

Crowgey, Joshua. 2012. The syntactic exponence of sentential negation: A model for the LinGO Grammar Matrix. Seattle: University of Washington MA thesis. http://hdl.handle.net/1773/22454Suche in Google Scholar

Cysouw, Michael. 2003. The paradigmatic structure of person marking. Oxford: Oxford University Press.Suche in Google Scholar

Dahl, Östen. 1979. Typology of sentence negation. Linguistics 17. 79–106.10.1515/ling.1979.17.1-2.79Suche in Google Scholar

Dahl, Östen. 1985. Tense and aspect systems. Oxford: Blackwell.Suche in Google Scholar

Daumé, Hal, III. 2009. Non-parametric Bayesian areal linguistics. North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2009(1). 593–601. http://www.aclweb.org/anthology/N09-106710.3115/1620754.1620841Suche in Google Scholar

Daumé, Hal, III & Lyle Campbell. 2007. A Bayesian model for discovering typological implications. Association of Computational Linguistics 45(1). 65–72. http://www.aclweb.org/anthology/P07-1009Suche in Google Scholar

Dixon, R. M. W. 1994. Ergativity. Cambridge: Cambridge University Press.10.1017/CBO9780511611896Suche in Google Scholar

Dixon, R. M. W. 2004. Adjective classes in typological perspective. In R. M. W. Dixon & Alexandra Y. Aikhenvald (eds.), Adjective classes: A cross-linguistic typology, 1–49. Oxford: Oxford University Press.Suche in Google Scholar

Drellishak, Scott. 2004. A survey of coordination strategies in the world’s languages. Seattle: University of Washington MA thesis.Suche in Google Scholar

Drellishak, Scott. 2009. Widespread but not universal: Improving the typological coverage of the Grammar Matrix. Seattle: University of Washington doctoral dissertation.Suche in Google Scholar

Drellishak, Scott & Emily M. Bender. 2005. A coordination module for a crosslinguistic grammar resource. International Conference on Head-Driven Phrase Structure Grammar 12. 108–128. http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2005/drellishak-bender.pdf10.21248/hpsg.2005.6Suche in Google Scholar

Dryer, Matthew S. 2005. Negative morphemes. In Haspelmath et al. (eds.) 2005, 454–457.Suche in Google Scholar

Dryer, Matthew S. 2008. Expression of pronominal subjects. In Martin Haspelmath, Matthew S. Dryer, David Gil & Bernard Comrie (eds.), The world atlas of language structures online, Chapter 101. München: Max Planck Digital Library. http://wals.info/feature/101Suche in Google Scholar

Dryer, Matthew S. 2013a. Order of adjective and noun. In Dryer & Haspelmath (eds.) 2013, Chapter 87. http://wals.info/feature/87Suche in Google Scholar

Dryer, Matthew S. 2013b. Order of adposition and noun phrase. In Dryer & Haspelmath (eds.) 2013, Chapter 85. http://wals.info/chapter/85Suche in Google Scholar

Dryer, Matthew S. 2013c. Order of demonstrative and noun. In Dryer & Haspelmath (eds.) 2013, Chapter 88. http://wals.info/chapter/88Suche in Google Scholar

Dryer, Matthew S. 2013d. Order of genitive and noun. In Dryer & Haspelmath (eds.) 2013, Chapter 86. http://wals.info/chapter/86Suche in Google Scholar

Dryer, Matthew S. 2013e. Order of numeral and noun. In Dryer & Haspelmath (eds.) 2013, Chapter 89. http://wals.info/chapter/89Suche in Google Scholar

Dryer, Matthew S. 2013f. Order of subject, object and verb. In Dryer & Haspelmath (eds.) 2103, Chapter 81. http://wals.info/chapter/81Suche in Google Scholar

Dryer, Matthew S. & Martin Haspelmath (eds.). 2013. The world atlas of language structures online. Leipzig: Max Planck Institut für evolutionäre Anthropologie. http://wals.info/Suche in Google Scholar

Evans, Nicholas & Stephen C. Levinson. 2009. The myth of language universals: Language diversity and its importance for cognitive science. Behavioral & Brain Sciences 32. 429–448.10.1017/S0140525X0999094XSuche in Google Scholar

Féry, Caroline & Manfred Krifka. 2009. Information structure: Notional distinctions, ways of expression. In Piet van Sterkenburg (ed.), Unity and diversity of languages, 123–135. Amsterdam: Benjamins.10.1075/z.141.13kriSuche in Google Scholar

Georgi, Ryan, Fei Xia & William D. Lewis. 2010. Comparing language similarity across genetic and typologically-based groupings. International Conference on Computational Linguistics 23. 385–393. http://www.aclweb.org/anthology/C10-1044Suche in Google Scholar

Georgi, Ryan, Fei Xia & William D. Lewis. 2012. Improving dependency parsing with interlinear glossed text and syntactic projection. International Conference on Computational Linguistics 24(Posters), 371–380. http://www.aclweb.org/anthology/C12-2037Suche in Google Scholar

Giannakopoulos, George & Georgios Petasis (eds.). 2013. Proceedings of the workshop “Multilingual multi-document summarization” (MultiLing 2013), August 9, 2013, Sofia, Bulgaria. Madison, WI: Omnipress. http://www.aclweb.org/anthology/W13-31Suche in Google Scholar

Givón, T. 1994. The pragmatics of de-transitive voice: Functional and typological aspects of inversion. In T. Givón (ed.), Voice and inversion, 3–44. Amsterdam: Benjamins.10.1075/tsl.28.03givSuche in Google Scholar

Good, Jeff, Julia Hirschberg & Owen Rambow (eds.). 2014. Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL 2014), June 26, 2014, Baltimore, Maryland, USA. http://www.aclweb.org/anthology/W14-22Suche in Google Scholar

Hajič, Jan, Massimiliano Ciaramita, Richard Johansson, Daisuke Kawahara, Maria Antònia Martí, Lluís Màrquez, Adam Meyers, Joakim Nivre, Sebastian Padó, Jann Štěpánek, Pavel Straňák, Mihai Surdeanu, Nianwen Xue & Yi Zhang. 2009. The CoNLL-2009 shared task: Syntactic and semantic dependencies in multiple languages. Conference on Computational Natural Language Learning 13(2: Shared Task). 1–18. http://www.aclweb.org/anthology/W09-120110.3115/1596409.1596411Suche in Google Scholar

Haspelmath, Martin, Matthew Dryer, David Gil & Bernard Comrie (eds.). 2005. The world atlas of language structures. Oxford: Oxford University Press.Suche in Google Scholar

Hwa, Rebecca, Philip Resnik, Amy Weinberg, Clara Cabezas & Okan Kolak. 2005. Bootstrapping parsers via syntactic projection across parallel texts. Natural Language Engineering 11. 311–325.10.1017/S1351324905003840Suche in Google Scholar

Jagarlamudi, Jagadeesh, Sujith Ravi, Xiaojun Wan & Hal Daumé III (eds.). 2012. Proceedings of the First Workshop on Multilingual Modeling, July 13, 2012,Jeju, Republic of Korea. http://www.aclweb.org/anthology/W12-39Suche in Google Scholar

Kurimo, Mikko, Sami Virpioja, Ville Turunen & Krista Lagus. 2010. Morpho Challenge competition 2005–2010: Evaluations and results. ACL Special Interest Group on Computational Morphology and Phonology 11. 87–95. http://www.aclweb.org/anthology/W10-2211Suche in Google Scholar

Lewis, William D. 2006. ODIN: A model for adapting and enriching legacy infrastructure. IEEE International Conference on E-Science 2. 137.10.1109/E-SCIENCE.2006.261070Suche in Google Scholar

Lewis, William D. & Fei Xia. 2008. Automatically identifying computationally relevant typological features. International Joint Conference on Natural Language Processing 3(2). 685–690. http://www.aclweb.org/anthology/I08-2093Suche in Google Scholar

Lewis, William D. & Fei Xia. 2010. Developing ODIN: A multilingual repository of annotated language data for hundreds of the world’s languages. Journal of Literary and Linguistic Computing 25. 303–319.10.1093/llc/fqq006Suche in Google Scholar

Lu, Xia. 2013. Exploring word order universals: A probabilistic graphical model approach. Association for Computational Linguistics 51(3: Student research workshop). 150–157. http://www.aclweb.org/anthology/P13-3022Suche in Google Scholar

Manning, Christopher D. & Hinrich Schütze. 1999. Foundations of statistical natural language processing. Cambridge, MA: MIT Press.Suche in Google Scholar

Marcus, Mitchell P., Beatrice Santorini & Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19. 313–330.10.21236/ADA273556Suche in Google Scholar

McDonald, Ryan, Joakim Nivre, Yvonne Quirmbach-Brundage, Yoav Goldberg, Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang, Oscar Täckström, Claudia Bedini, Núria Bertomeu Castelló & Jungmee Lee. 2013. Universal dependency annotation for multilingual parsing. Association for Computational Linguistics 51(2: Short papers). 92–97. http://www.aclweb.org/anthology/P13-2017Suche in Google Scholar

Naseem, Tahira, Regina Barzilay & Amir Globerson. 2012. Selective sharing for multilingual dependency parsing. Association for Computational Linguistics 50(1: Long papers). 629–637. http://www.aclweb.org/anthology/P12-1066Suche in Google Scholar

Nivre, Joakim, Johan Hall, Sandra Kübler, Ryan McDonald, Jens Nilsson, Sebastian Riedel & Deniz Yuret. 2007. The CoNLL 2007 shared task on dependency parsing. Joint Conference on Empirical Methods in Natural Language Processing & Computational Natural Language Learning 2007. 915–932. http://www.aclweb.org/anthology/D/D07/D07-1096Suche in Google Scholar

Nivre, Joakim, Johan Hall, Jens Nilsson, Atanas Chanev, Gülşen Eryigit, Sandra Kübler, Svetoslav Marinov & Erwin Marsi. 2007. MaltParser: A language-independent system for data-driven dependency parsing. Natural Language Engineering 13. 95–135.10.1017/S1351324906004505Suche in Google Scholar

Östling, Robert. 2015. Word order typology through multilingual word alignment. Association for Computational Linguistics 53(2: Short papers). 205–211. http://www.aclweb.org/anthology/P15-203410.3115/v1/P15-2034Suche in Google Scholar

Payne, John R. 1985. Complex phrases and complex sentences. In Timothy Shopen (ed.), Language typology and syntactic description, Vol. 2: Complex constructions, 3–41. Cambridge: Cambridge University Press.Suche in Google Scholar

Petrov, Slav, Dipanjan Das & Ryan McDonald. 2012. A universal part-of-speech tagset. International Conference on Language Resources and Evaluation 8. 2089–2096. http://www.lrec-conf.org/proceedings/lrec2012/pdf/274_Paper.pdfSuche in Google Scholar

Pollard, Carl & Ivan A. Sag. 1994. Head-Driven Phrase Structure Grammar. Chicago: University of Chicago Press.Suche in Google Scholar

Poulson, Laurie. 2011. Meta-modeling of tense and aspect in a crosslinguistic grammar engineering platform. University of Washington Working Papers in Linguistics 28. http://http://depts.washington.edu/uwwpl/vol28/poulson_2011.pdfSuche in Google Scholar

Rama, Taraka & Prasanth Kolachina. 2012. How good are typological distances for determining genealogical relationships among languages? International Conference on Computational Linguistics 24(Posters). 975–984. http://www.aclweb.org/anthology/C12-2095Suche in Google Scholar

Saleem, Safiyyah. 2010. Argument optionality: A new library for the grammar matrix customization system. Seattle: University of Washington MA thesis.Suche in Google Scholar

Saleem, Safiyyah & Emily M. Bender. 2010. Argument optionality in the LinGO Grammar Matrix. International Conference on Computational Linguistics 23(Posters). 1068–1076. http://www.aclweb.org/anthology/C10-2123Suche in Google Scholar

Schultz, Tanja & Katrin Kirchhoff (eds.). 2006. Multilingual speech processing. Burlington, MA: Academic Press.Suche in Google Scholar

Siewierska, Anna. 2004. Person. Cambridge: Cambridge University Press.10.1017/CBO9780511812729Suche in Google Scholar

Søgaard, Anders. 2011. Data point selection for cross-language adaptation of dependency parsers. Association for Computational Linguistics: Human Language Technologies 49(2). 682–686. http://www.aclweb.org/anthology/P11-2120Suche in Google Scholar

Song, Sanghoun. 2014. A grammar library for information structure. Seattle: University of Washington doctoral dissertation. http://hdl.handle.net/1773/25372Suche in Google Scholar

Stassen, Leon. 2000. AND-languages and WITH-languages. Linguistic Typology 4. 1–54.10.1515/lity.2000.4.1.1Suche in Google Scholar

Stassen, Leon. 2003. Intransitive predication. Oxford: Oxford University Press.Suche in Google Scholar

Stassen, Leon. 2013. Predicative adjectives. In Dryer & Haspelmath (eds.) 2013, Chapter 118. http://wals.info/feature/118Suche in Google Scholar

Täckström, Oscar, Ryan McDonald & Joakim Nivre. 2013. Target language adaptation of discriminative transfer parsers. North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2013(1). 1061–1071. http://www.aclweb.org/anthology/N13-1126Suche in Google Scholar

Teh, Yee W., Hal Daumé III & Daniel M. Roy. 2007. Bayesian agglomerative clustering with coalescents. In John C. Platt, Daphne Koller, Yoram Singer & Sam T. Roweis (eds.), Advances in neural information processing systems 20. 1463–1480. Cambridge, MA: MIT Press.Suche in Google Scholar

Trimble, Thomas James. 2014. Adjectives in the LinGO Grammar Matrix. Seattle: University of Washington MS thesis. http://hdl.handle.net/1773/27512Suche in Google Scholar

Xia, Fei, William D. Lewis, Michael Wayne Goodman, Glenn Slayden, Ryan Georgi, Joshua Crowgey & Emily M. Bender. 2016. Enriching a massively multilingual database of interlinear glossed text. Language Resources and Evaluation 50. 321–349.10.1007/s10579-015-9325-4Suche in Google Scholar

Yarowsky, David, Grace Ngai & Richard Wicentowski. 2001. Inducing multilingual text analysis tools via robust projection across aligned corpora. In Proceedings of the First International Conference on Human Language Technology Research, 1–8. http://www.aclweb.org/anthology/H01-103510.3115/1072133.1072187Suche in Google Scholar

Zeman, Daniel & Philip Resnik. 2008. Cross-language parser adaptation between related languages. International Joint Conference on Natural Language Processing 3(Workshop on NLP for Less Privileged Languages). 35–42. http://www.aclweb.org/anthology/I08-3008Suche in Google Scholar

Zhang, Yuan & Regina Barzilay. 2015. Hierarchical low-rank tensors for multilingual transfer parsing. Conference on Empirical Methods in Natural Language Processing 2015. 1857–1867. http://aclweb.org/anthology/D15-121310.18653/v1/D15-1213Suche in Google Scholar

Received: 2016-8-3
Revised: 2016-9-6
Published Online: 2016-12-23
Published in Print: 2016-12-1

©2016 by De Gruyter Mouton

Heruntergeladen am 7.9.2025 von https://www.degruyterbrill.com/document/doi/10.1515/lingty-2016-0035/pdf
Button zum nach oben scrollen