Truth be told: a corpus-based study of the cross-linguistic colexification of representational and (inter)subjective meanings
Abstract
The study of crosslinguistic variation in word meaning often focuses on representational and concrete meanings. We argue other kinds of word meanings (e.g., abstract and (inter)subjective meanings) can be fruitfully studied in translation corpora, and present a quantitative procedure for doing so. We focus on the cross-linguistic patterns for lemmas pertaining to truth and reality (English true and real), as these abstract meanings been found to frequently colexify with particular (inter)subjective meanings. Applying our method to a corpus of translated subtitles of TED talks, we show that (1) the abstract-representational meanings are colexified in patterned ways, that, however, are more complex than previously observed (some languages not splitting a ‘true’-like from ‘real’-like terms; many languages displaying further splits of representational meanings); (2) some non-representational meanings strongly colexify with representational meanings of ‘truth’ and ‘reality’, while others also often colexify with other fields.
Funding source: Canadian Network for Research and Innovation in Machining Technology, Natural Sciences and Engineering Research Council of Canada
Award Identifier / Grant number: RGPIN-2019-06917
Funding source: Jackman Humanities Institute, University of Toronto
Award Identifier / Grant number: Scholars in Residence program 2020
Acknowledgments
We would like to thank Lawrence Ora for contributing to the preparation of the data, Mah Noor Amir for her contributions to the initial stages of this project, as well as Songül Gündoğdu and Arsalan Kahnemuyipour for linguistic consultation. Needless to say, they are by no means responsible for the interpretation of the data presented in this paper.
-
Research funding: This paper emerged from a Jackman Humanities Institute Scholars in Residence project, ran in May 2020. The further development of the paper was sponsored by a Connaught New Researcher Award and an NSERC Discovery Grant (RGPIN-2019-06917) to Barend Beekhuizen.
References
Aijmer, Karin & Anne-Marie Simon-Vandenbergen. 2004. A model and a methodology for the study of pragmatic markers: The semantic field of expectation. Journal of Pragmatics 36(10). 1781–1805. https://doi.org/10.1016/j.pragma.2004.05.005.Search in Google Scholar
Ariel, Mira. 2009. Discourse, grammar, discourse. Discourse Studies 11(1). 5–36. https://doi.org/10.1177/1461445608098496.Search in Google Scholar
Biber, Douglas. 1989. A typology of English texts. Linguistics 27. 3–43. https://doi.org/10.1515/ling-2013-0040.Search in Google Scholar
Brown, Peter F., Stephen A. Della Pietra, Vincent J. Della Pietra & Robert L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2). 263–311. https://doi.org/10.5555/972470.972474.Search in Google Scholar
Brown, Roger W. & Eric H. Lenneberg. 1954. A study in language and cognition. Journal of Abnormal and Social Psychology 49(3). 454–462. https://doi.org/10.1037/h0057814.Search in Google Scholar
Bühler, Karl. 1990 (1934). Theory of language. The representational function of language. Amsterdam/Philadelphia: John Benjamins.10.1075/fos.25Search in Google Scholar
Croft, William. 2016. Typology and the future of cognitive linguistics. Cognitive Linguistics 27(4). 587–602. https://doi.org/10.1515/cog-2016-0056.Search in Google Scholar
Dahl, Östen. 2015. How WEIRD are WALS languages? Diversity Linguistics: Retrospect and Prospect. Available at: http://www.eva.mpg.de/fileadmin/content_files/linguistics/conferences/2015-diversity-linguistics/Dahl_slides.pdf.Search in Google Scholar
Dahl, Östen & Bernhard Wälchli. 2016. Perfects and iamitives: Two gram types in one grammatical space. Letras de Hoje 51(3). 325–348. https://doi.org/10.15448/1984-7726.2016.3.25454.Search in Google Scholar
Defour, Tine. 2012. The pragmaticalization and intensification of verily, truly, and really. In Manfred Markus, Yoko Iyeiri, Reinhard Heuberger & Emil Chamson (eds.), Middle and Modern English corpus linguistics: A multi-dimensional approach, 75–92. Amsterdam: John Benjamins.10.1075/scl.50.09defSearch in Google Scholar
D’hondt, Ulrique & Tine Defour. 2012. At the crossroads of grammaticalization and pragmaticalization: A diachronic cross-linguistic case study on vraiment and really. Neuphilologische Mitteilungen 113(2). 169–190.Search in Google Scholar
Dyvik, Helge. 1998. A translational basis for semantics. Language and Computers 24. 51–86.10.1163/9789004653665_006Search in Google Scholar
Erk, Katrin, Diana McCarthy & Nicholas Gaylord. 2013. Measuring word meaning in context. Computational Linguistics 39(3). 511–554. https://doi.org/10.1162/coli_a_00142.Search in Google Scholar
François, Alexandre. 2008. Semantic maps and the typology of colexifications: Intertwining polysemous networks across languages. In Martine Vanhove (ed.), From polysemy to semantic change: Towards a typology of lexical semantic associations, 163–216. Amsterdam: John Benjamins.10.1075/slcs.106.09fraSearch in Google Scholar
Good, Jeff & Michael Cysouw. 2013. Languoid, doculect, and glossonym: Formalizing the notion ‘language’. Language Documentation & Conservation 7. 331–359.Search in Google Scholar
Grice, Herbert P. 1975. Logic and conversation. In Peter Cole & Jerry L. Morgan (eds.), Syntax and semantics. Volume 3: Speech acts, 41–58. Leiden: Brill.10.1163/9789004368811_003Search in Google Scholar
Halliday, Michael Alexander Kirkwood & Christian M. I. M. Matthiessen. 2013. Halliday’s introduction to functional grammar. London: Routledge.10.4324/9780203431269Search in Google Scholar
Haspelmath, Martin. 2018. How comparative concepts and descriptive linguistic categories are different. In Daniël van Olmen, Tanja Mortelmans & Frank Brisard (eds.), Aspects of linguistic variation, 83–114. Berlin: De Gruyter Mouton.10.1515/9783110607963-004Search in Google Scholar
Honnibal, Matthew & Ines Montani. 2017. SpaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. http://citebay.com/how-to-cite/spacy/.Search in Google Scholar
Hotelling, Harold. 1933. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology 24(6). 417. https://doi.org/10.1037/h0070888.Search in Google Scholar
Kay, Paul, Brent Berlin, Luisa Maffi, William R. Merrifield & Richard Cook. 2009. The world color survey. Palo Alto, CA: CSLI Publications Stanford.Search in Google Scholar
Koptjevskaja-Tamm, Maria. 2015. The linguistics of temperature. Amsterdam: John Benjamins.10.1075/tsl.107Search in Google Scholar
Koptjevskaja-Tamm, Maria, Ekaterina Rakhilina & Martine Vanhove. 2015. The semantics of lexical typology. In Nick Riemer (ed.), The Routledge handbook of semantics, 434–454. London: Routledge.Search in Google Scholar
Lenker, Ursula. 2008. Soþlice, forsoothe, truly–communicative principles and invited inferences in the history of truth-intensifying adverbs in English. In Susan M. Fitzmaurice & Irma Taavitsainen (eds.), Methods in historical pragmatics, 81–106. De Gruyter Mouton.10.1515/9783110197822.81Search in Google Scholar
Levinson, Stephen, Sérgio Meira & The Language and Cognition Group. 2003. ‘Natural concepts’ in the spatial topological domain-adpositional meanings in crosslinguistic perspective: An exercise in semantic typology. Language 79(3). 485–516. https://doi.org/10.1353/lan.2003.0174.Search in Google Scholar
Levinson, Stephen C. 1996. Relativity in spatial conception and description. In John J. Gumperz & Stephen C. Levinson (eds.), Rethinking linguistic relativity, 177–202. Cambridge, UK: Cambridge University Press.Search in Google Scholar
Levshina, Natalia. 2016. Verbs of letting in Germanic and Romance languages: A quantitative investigation based on a parallel corpus of film subtitles. Languages in Contrast 16(1). 84–117. https://doi.org/10.1075/lic.16.1.04lev.Search in Google Scholar
Levshina, Natalia. 2017. Online film subtitles as a corpus: An n-gram approach. Corpora 12(3). 311–338. https://doi.org/10.3366/cor.2017.0123.Search in Google Scholar
Levshina, Natalia. 2021. Corpus-based typology: Applications, challenges and some solutions. Linguistic Typology 26(1). 129–160. https://doi.org/10.1515/lingty-2020-0118.Search in Google Scholar
Lewis, Charlton Thomas & Charles Short. 1966. A Latin dictionary: Founded on Andrew’s ed. of Freund’s Latin dictionary. Oxford: Clarendon Press.Search in Google Scholar
Liddle, Henry George & Robert Scott. 1968. A Greek-English Lexicon. Oxford: Clarendon Press.Search in Google Scholar
Majid, Asifa, Melissa Bowerman, Miriam Van Staden & James S. Boster. 2007. The semantic categories of cutting and breaking events: A crosslinguistic perspective. Cognitive Linguistics 18(2). 133–152. https://doi.org/10.1515/cog.2007.005.Search in Google Scholar
Maschler, Yael & Roi Estlein. 2008. Stance-taking in Hebrew casual conversation via be’emet (really, actually, indeed’, lit.in truth’). Discourse Studies 10(3). 283–316. https://doi.org/10.1177/1461445608090222.Search in Google Scholar
Orr, Shirly & Mira Ariel. 2021. Predicating truth: An empirically based analysis. Journal of Pragmatics 185. 131–145. https://doi.org/10.1016/j.pragma.2021.09.005.Search in Google Scholar
Östling, Robert. 2016. Studying colexification through massively parallell corpora. In Paeivi Juvonen & Maria Koptjevskaja-Tamm (eds.), The lexical typology of semantic shifts, 157–176. Berlin: De Gruyter Mouton.10.1515/9783110377675-006Search in Google Scholar
Podani, János, Tibor Kalapos, Barbara Barta & Dénes Schmera. 2021. Principal component analysis of incomplete data–a simple solution to an old problem. Ecological Informatics 61. 101235. https://doi.org/10.1016/j.ecoinf.2021.101235.Search in Google Scholar
Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech & Jan Svartvik. 1985. A comprehensive grammar of the English language. London: Longman.Search in Google Scholar
Ramminger, Johann. 2003. Neulateinische Wortliste. ein Wörterbuch des Lateinischen von Petrarca bis 1700. http://nlw.renaessancestudier.org/Search in Google Scholar
Rosemeyer, Malte & Eitan Grossman. 2021. Why don’t grammaticalization pathways always recur? Corpus Linguistics and Linguistic Theory 17(3). 653–681. https://doi.org/10.1515/cllt-2020-0053.Search in Google Scholar
Rzymski, Christoph, Tiago Tresoldi, Simon J. Greenhill, Mei-Shin Wu, Nathanael E. Schweikhard, Maria Koptjevskaja-Tamm, Volker Gast, Timotheus A. Bodt, Abbie Hantgan, Gereon A. Kaiping, Sophie Chang, Yunfan Lai, Natalia Morozova, Heini Arjava, Nataliia Hübler, Ezequiel Koile, Steve Pepper, Mariann Proos, Briana Van Epps, Ingrid Blanco, Carolin Hundt, Sergei Monakhov, Kristina Pianykh, Sallona Ramesh, Russell D. Gray, Robert Forkel & Johann-Mattis List. 2020. The database of cross-linguistic colexifications, reproducible analysis of cross-linguistic polysemies. Scientific Data 7(1). 1–12. https://doi.org/10.1038/s41597-019-0341-x.Search in Google Scholar
Simon-Vandenbergen, Anne-Marie. 2013. Reality and related concepts: Towards a semantic-pragmatic map of English adverbs. In Juana I. Marín-Arrese, Marta Carretero, Jorge Arús Hita & Johan van der Auwera (eds.), English modality: Core, periphery and evidentiality, 253–280. Berlin: De Gruyter Mouton.10.1515/9783110286328.253Search in Google Scholar
Sinclair, John. 1995. Collins COBUILD English dictionary. London: Harper Collins.Search in Google Scholar
Tagliamonte, Sali & Chris Roberts. 2005. So weird; so cool; so innovative. The use of intensifiers in the television friends. American Speech 80(3). 280–300. https://doi.org/10.1215/00031283-80-3-280.Search in Google Scholar
Talmy, Leonard. 1975. Semantics and syntax of motion. Syntax and Semantics 4. 181–238. https://doi.org/10.1163/9789004368828_008.Search in Google Scholar
Tognini-Bonelli, Elena. 1996. Section 2: The Malvern seminar: Towards translation equivalence from a corpus linguistics perspective. International Journal of Lexicography 9(3). 197–217. https://doi.org/10.1093/ijl/9.3.197.Search in Google Scholar
Traugott, Elizabeth Closs & Richard B Dasher. 2001. Regularity in semantic change. Cambridge: Cambridge University Press.10.1017/CBO9780511486500Search in Google Scholar
Usonienė, Aurelija, Audronė Šolienė & Jolanta Šinkūnienė. 2015. Revisiting the multifunctionality of the adverbials of act and fact in a cross-linguistic perspective. Nordic Journal of English Studies 14(1). 201–231. https://doi.org/10.35360/njes.345.Search in Google Scholar
van der Klis, Martijn & Jos Tellings. 2020. Generating semantic maps through multidimensional scaling: Linguistic applications and theory. Corpus Linguistics and Linguistic Theory 18. 627–665. https://doi.org/10.1515/cllt-2021-001.Search in Google Scholar
Verkerk, Annemarie. 2014. Where Alice fell into: Motion events from a parallel corpus. In Benedikt Szmrecsanyi & Bernhard Wälchli (eds.), Aggregating dialectology, typology, and register analysis: Linguistic variation in text and speech, 324–354. Berlin: Mouton de Gruyter.10.1515/9783110317558.324Search in Google Scholar
Wälchli, Berhard & Michael Cysouw. 2012. Lexical typology through similarity semantics: Toward a semantic map of motion verbs. Linguistics 50(3). 671–710. https://doi.org/10.1515/ling-2012-0021.Search in Google Scholar
Wälchli, Bernhard. 2007. Advantages and disadvantages of using parallel texts in typological investigations. Sprachtypologie und Universalienforschung 60. 118–134. https://doi.org/10.1524/stuf.2007.60.2.118.Search in Google Scholar
Wälchli, Bernhard. 2016. Non-specific, specific and obscured perception verbs in Baltic languages. Baltic Linguistics 7. 53–135. https://doi.org/10.32798/bl.384.Search in Google Scholar
Wälchli, Bernhard. 2018. ‘As long as’, ‘until’ and ‘before’ clauses. Baltic Linguistics 9. 141–236. https://doi.org/10.32798/bl.372.Search in Google Scholar
Wälchli, Bernhard. 2019. The feminine anaphoric gender gram, incipient gender marking, maturity, and extracting anaphoric gender markers from parallel texts. In Francesca Di Garbo, Bruno Olsson & Bernhard Wälchli (eds.), Grammatical gender and linguistic complexity. Volume II: World-wide comparative studies, 61–131. Berlin: Language Science Press.Search in Google Scholar
Walkden, George. 2017. The actuation problem. In Adam Ledgeway & Ian Roberts (eds.), The Cambridge handbook of historical syntax, 403–424. Cambridge: Cambridge University Press.10.1017/9781107279070.020Search in Google Scholar
Weinreich, Uriel, William Labov & Marvin Herzog. 1968. Empirical foundations for a theory of language change. In Winfred P. Lehmann & Yakov Malkiel (eds.), Directions for historical linguistics, 95–189. Austin, TX: University of Texas Press.Search in Google Scholar
Wierzbicka, Anna. 2002. Philosophy and discourse: The rise of «really» and the fall of «truly». Cahiers de Praxématique 38(3). 85–112. https://doi.org/10.4000/praxematique.574.Search in Google Scholar
Willems, Dominique & Annemie Demol. 2006. Vraiment and really in contrast: When truth and reality meet. In Karin Ajimer & Anne-Marie Simon-Vandenbergen (eds.), Pragmatic markers in contrast, 215–235. Leiden: Brill.10.1163/9780080480299_014Search in Google Scholar
© 2023 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Evaluation of keyness metrics: performance and reliability
- Let my speakers talk: metalinguistic activity can indicate semantic change
- The blurring of the boundaries: changes in verb/noun heterosemy in Recent English
- The linguistic organization of grammatical text complexity: comparing the empirical adequacy of theory-based models
- Present perfect and preterit variation in the Spanish of Lima and Mexico city: findings from a corpus analysis
- Lexical borrowing in Korean: a diachronic approach based on a corpus analysis
- Truth be told: a corpus-based study of the cross-linguistic colexification of representational and (inter)subjective meanings
Articles in the same Issue
- Frontmatter
- Evaluation of keyness metrics: performance and reliability
- Let my speakers talk: metalinguistic activity can indicate semantic change
- The blurring of the boundaries: changes in verb/noun heterosemy in Recent English
- The linguistic organization of grammatical text complexity: comparing the empirical adequacy of theory-based models
- Present perfect and preterit variation in the Spanish of Lima and Mexico city: findings from a corpus analysis
- Lexical borrowing in Korean: a diachronic approach based on a corpus analysis
- Truth be told: a corpus-based study of the cross-linguistic colexification of representational and (inter)subjective meanings