Revisiting the automatic prediction of lexical errors in Mandarin

Marc Allassonnière-Tang; I-Ping Wan

doi:10.1515/lingvan-2023-0036

Article

Revisiting the automatic prediction of lexical errors in Mandarin

Marc Allassonnière-Tang and I-Ping Wan

Published/Copyright: June 25, 2024

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Linguistics Vanguard Volume 10 Issue 1

Abstract

Speech errors provide cues for explaining the process of word retrieval. For example, speech errors are less likely to occur with high-frequency words since these words already receive a high level of activation. The current analysis further develops existing findings in two ways. First, instead of considering the overall frequency of the words in the entire corpora, we consider the gap in frequency between sequential pairs of words. We hypothesize that speech errors are more likely to occur if the target has a much lower frequency than its preceding word. Second, we use word embedding methods to quantify the semantic distance between sequential pairs of words. We hypothesize that speech errors are more likely to occur with words that have a large semantic distance from their preceding context. We also consider the potential effects of phonetic distance between sequential pairs of words and position-in-utterance of words in utterances. The results from a Mandarin corpus of speech errors show that word frequency and semantic distance between sequential pairs of words can be used to predict the occurrence of speech errors with an accuracy above the majority baseline.

Keywords: speech errors; Mandarin; frequency gap; position-in-utterance; Levenshtein edit distance

Corresponding author: I-Ping Wan, Phonetics and Psycholinguistics Laboratory, Graduate Institute of Linguistics/Research Center for Mind, Brain, and Learning/Program in Teaching Chinese as a Second Language/Phonetics and Psycholinguistics Laboratory, 34913 National Chengchi University , Taipei, Taiwan, E-mail: ipwan@nccu.edu.tw

Funding source: National Science Council

Award Identifier / Grant number: MOST 98-2410-H-004-103-MY2

Acknowledgments

We appreciate the valuable and constructive comments we received from two anonymous reviewers and the editor. This paper is an extended version of a research paper that has been published in the post-conference proceedings for the 22nd Chinese Lexical Semantics Workshop in Hong Kong by Springer. We would like to thank the participants at that conference for their comments and questions. Thanks also go to Professor Wei-yun Ma for releasing the coding of the CKIP parser, and Professor Li-hsin Ning and Professor Jiahong Yuan for the traditional HMM coding for the phonetic forced alignment in Mandarin. Our deepest appreciation goes to Dr. Chain-wu Lee for his ongoing cutting-edge high-tech programming support in constructing all the corpora in the Phonetics and Psycholinguistics Laboratory. All remaining errors of analysis or interpretation are our own. This research was supported in part by a three-year grant from the National Science and Technology Council to the corresponding author in Taiwan (MOST 98-2410-H-004-103-MY2).

References

Alderete, John & Monica Davies. 2019. Investigating perceptual biases, data reliability, and data discovery in a methodology for collecting speech errors from audio recordings. Language and Speech 62(2). 281–317. https://doi.org/10.1177/0023830918765012.Search in Google Scholar

Alderete, John & Paul Tupper. 2017. Connectionist approaches to generative phonology. In Stephen J. Hannahs & Anna Bosch (eds.), The Routledge handbook of phonological theory, 360–390. New York: Routledge.10.4324/9781315675428-13Search in Google Scholar

Allassonnière-Tang, Marc, I-Ping Wan & Chianwu Lee. 2023. Semantic and phonetic distances in free word association tasks. Paper presented at the 24th Chinese Lexical Semantics Workshop (CLSW), Singapore, 18–20 May.10.1007/978-981-97-0586-3_8Search in Google Scholar

Arnaud, Pierre J. 1999. Target–error resemblance in French word substitution speech errors and the mental lexicon. Applied Psycholinguistics 20(2). 269–287. https://doi.org/10.1017/S0142716499002052.Search in Google Scholar

Bastiaanse, Roelien, Martijn Wieling & Nienke Wolthuis. 2015. The role of frequency in the retrieval of nouns and verbs in aphasia. Aphasiology 30(11). 1221–1239. https://doi.org/10.1080/02687038.2015.1100709.Search in Google Scholar

Berg, Thomas. 1987. A cross-linguistic comparison of slips of the tongue. Bloomington: Indiana University Linguistics Club.Search in Google Scholar

Breiman, Leo, Jerome Friedman, Richard Olshen & Charles Stone. 1984. Classification and regression trees. New York: Routledge.Search in Google Scholar

Chinese Knowledge and Information Processing (CKIP). 1998. The content and illustration of Academic Sinica Corpus. Taipei: Academia Sinica.Search in Google Scholar

Chinese Knowledge and Information Processing (CKIP). 2004. Part-of-speech analysis of Academia Sinica Balanced Corpus of Modern Chinese, version 3 (Technical Report no. 93-05). Taipei: Academia Sinica.Search in Google Scholar

Clifton, Charles & Lyn Frazier. 2004. Should given information come before new? Yes and no. Memory & Cognition 32(6). 886–895. https://doi.org/10.3758/BF03196867.Search in Google Scholar

Cutler, Anne. 1982. The reliability of speech error data. In Anne Cutler (ed.), Slips of the tongue and language production, 7–28. Berlin: De Gruyter Mouton.10.1515/9783110828306.7Search in Google Scholar

Dell, Gary S. 1984. Representation of serial order in speech: Evidence from the repeated phoneme effect in speech errors. Journal of Experimental Psychology: Learning, Memory, and Cognition 10(2). 222–233. https://doi.org/10.1037/0278-7393.10.2.222.Search in Google Scholar

Dell, Gary S. 1986. A spreading-activation theory of retrieval in sentence production. Psychological Review 93(3). 283–321. https://doi.org/10.1037/0033-295X.93.3.283.Search in Google Scholar

Dell, Gary S. 1988. The retrieval of phonological forms in production: Tests of predictions from a connectionist model. Journal of Memory and Language 27(2). 124–142. https://doi.org/10.1016/0749-596X(88)90070-8.Search in Google Scholar

Do, Youngah & Ryan Ka Yau Lai. 2021. Accounting for lexical tones when modeling phonological distance. Language 97(1). e39–e67. https://doi.org/10.1353/lan.2021.0012.Search in Google Scholar

Fay, David & Anne Cutler. 1977. Malapropisms and the structure of the mental lexicon. Linguistic Inquiry 8(3). 505–520.Search in Google Scholar

Fromkin, Victoria (ed.). 1973. Speech errors as linguistic evidence. Berlin: De Gruyter Mouton.Search in Google Scholar

Gahl, Susanne. 2008. Time and thyme are not homophones: The effect of lemma frequency on word durations in spontaneous speech. Language 84(3). 474–496. https://doi.org/10.1353/lan.0.0035.Search in Google Scholar

Garrett, Merril. 1982. Production of speech: Observations from normal and pathological use. In Andrew Ellis (ed.), Normality and pathology in cognitive functions, 19–76. London: Academic Press.Search in Google Scholar

Garrett, Merril. 1984. The organization of processing structure for language production: Applications to aphasic speech. In David Caplan, André Roch Lecours & Alan Smith (eds.), Biological perspectives on language, 172–193. Cambridge, MA: MIT Press.10.7551/mitpress/1615.003.0016Search in Google Scholar

Gaume, Bruno, Ludovic Tanguy, Cécile Fabre, Lydia-Mai Ho-Dac, Bénédicte Pierrejean, Nabil Hathout, Jérome Farinas, Julien Pinquier, Lola Danet & Patrice Péran. 2018. Automatic analysis of word association data from the Evolex psycholinguistic tasks using computational lexical semantic similarity measures. Paper presented at the 13th International Workshop on Natural Language Processing and Cognitive Science (NLPCS), Krakow, 11–12 September.Search in Google Scholar

Gundel, Jeanette. 1985. “Shared knowledge” and topicality. Journal of Pragmatics 9(1). 83–107. https://doi.org/10.1016/0378-2166(85)90049-9.Search in Google Scholar

Gundel, Jeanette. 1988. Universals of topic-comment structure. Studies in Syntactic Typology 17(1). 209–239. https://doi.org/10.1075/tsl.17.16gun.Search in Google Scholar

Harley, Trevor & Siobhan MacAndrew. 2001. Constraints upon word substitution speech errors. Journal of Psycholinguistic Research 30(4). 395–418. https://doi.org/10.1023/A:1010421724343.10.1023/A:1010421724343Search in Google Scholar

Hartsuiker, Robert J, Martin J. Pickering & Nivja H. De Jong. 2005. Semantic and phonological context effects in speech error repair. Journal of Experimental Psychology: Learning, Memory, and Cognition 31(5). 921–932. https://doi.org/10.1037/0278-7393.31.5.921.Search in Google Scholar

Hothorn, Torsten, Kurt Hornik & Achim Zeileis. 2006. Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics 15(3). 651–674. https://doi.org/10.1198/106186006x133933.Search in Google Scholar

Hu, Chiayin. 2017. Information structure in English, Mandarin Chinese and Taiwanese Southern Min: Argument realization of ditransitive objects. Paper presented at the 29th North American Conference on Chinese Linguistics (NACCL-29), Rutgers University, 16–18 June.Search in Google Scholar

Huang, Chu-Ren, Lung-Hao Lee, Wei-guang Qu, Jia-Fei Hong & Shiwen Yu. 2008. Quality assurance of automatic annotation of very large corpora: A study based on heterogeneous tagging systems. Paper presented at the International Conference on Language Resources and Evaluation, Marrakech, 28–30 May.Search in Google Scholar

Ibrahim, Sondos. 2014. A corpus-based investigation of the given before new principle in Tanzanian English. Paper presented at the 9th Lancaster University Postgraduate Conference in Linguistics & Language Teaching (LAEL PG), Lancaster University, 14 July.Search in Google Scholar

Jaeger, Jeri & David Wilkins. 2005. Semantic relationships in lexical errors. In Jeri Jaeger (ed.), Kids’ slips: What young children’s slips of the tongue reveal about language development, 311–384. Mahwah, NJ: Lawrence Erlbaum.Search in Google Scholar

Jaeger, Jeri. 2005. Kids’ slips: What young children’s slips of the tongue reveal about language development. Mahwah, NJ: Lawrence Erlbaum Associates.10.4324/9781410611550Search in Google Scholar

Joulin, Armand, Edouard Grave, Piotr Bojanowski & Tomas Mikolov. 2017. Bag of tricks for efficient text classification. Paper presented at the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Valencia, 3–7 April.10.18653/v1/E17-2068Search in Google Scholar

Junge, Bianca, Anna Theakston & Elena Lieven. 2015. Given–new/new–given? Children’s sensitivity to the ordering of information in complex sentences. Applied Psycholinguistics 36(3). 589–612. https://doi.org/10.1017/S0142716413000350.Search in Google Scholar

Kittredge, Audery, Gary S. Dell, Jay Verkuilen & Myrna Schwartz. 2008. Where is the effect of frequency in word production? Insights from aphasic picture-naming errors. Cognitive Neuropsychology 25(4). 463–492. https://doi.org/10.1080/02643290701674851.Search in Google Scholar

Lebret, Rémi & Ronan Collobert. 2015. Rehabilitation of count-based models for word vector representations. Paper presented at the Computational Linguistics and Intelligent Text Processing: 16th International Conference (CICLing), Cairo, 14–20 April.10.1007/978-3-319-18111-0_31Search in Google Scholar

Levy, Omer & Yoav Goldberg. 2014. Dependency-based word embeddings. Paper presented at the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, 22–27 June.10.3115/v1/P14-2050Search in Google Scholar

Ma, Wei-Yun & Keh-Jiann Chen. 2003. Introduction to CKIP Chinese word segmentation system for the first international Chinese word segmentation bakeoff. Paper presented at the Second SIGHAN Workshop on Chinese Language Processing, Sapporo, 11–12 July.10.3115/1119250.1119276Search in Google Scholar

Ma, Wei-Yun & Chu-Ren Huang. 2006. Uniform and effective tagging of a heterogeneous giga-word corpus. Paper presented at the 5th International Conference on Language Resources and Evaluation (LREC’06), Genoa, 24–26 May.Search in Google Scholar

Martin, Nadine & Eleanor Saffran. 1997. Language and auditory-verbal short-term memory impairments: Evidence for common underlying processes. Cognitive Neuropsychology 14(5). 641–682. https://doi.org/10.1080/026432997381402.Search in Google Scholar

Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg Corrado & Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. Paper presented at the Advances in Neural Information Processing System 26 (NIPS 2013), Lake Tahoe, 5–10 December.Search in Google Scholar

Minkina, Irene, Nadine Martin, Kristie Spencer & Diane Kendall. 2018. Links between short-term memory and word retrieval in aphasia. American Journal of Speech-Language Pathology 27(1S). 379–391. https://doi.org/10.1044/2017_ajslp-16-0194.Search in Google Scholar

Naroll, Raoul. 1970. Galton’s problem. In Raoul Naroll & Ronald Cohen (eds.), A handbook of method in cultural anthropology, 973–989. Garden City: Natural History Press.Search in Google Scholar

Neergaard, Karl & Chu-Ren Huang. 2016. Graph theoretic approach to Mandarin syllable segmentation. Paper presented at the 15th International Symposium on Chinese Languages and Linguistics (IsCLL-15), Hsinchu, 27–29 May.Search in Google Scholar

Nickels, Lyndsey & David Howard. 1994. A frequent occurrence? Factors affecting the production of semantic errors in aphasic naming. Cognitive Neuropsychology 11(3). 289–320. https://doi.org/10.1080/02643299408251977.Search in Google Scholar

Nooteboom, Sieb G. 1973. The tongue slips into patterns. In Victoria A. Fromkin (ed.), Speech errors as linguistic evidence, 144–156. The Hague: Mouton.10.1515/9783110888423.144Search in Google Scholar

Pennington, Jeffery, Richard Socher & Christopher Manning. 2014. Glove: Global vectors for word representation. Paper presented at the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, 26–28 October.10.3115/v1/D14-1162Search in Google Scholar

Shannon, Claude. 1948. A mathematical theory of communication. Bell System Technical Journal 27(3). 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x.Search in Google Scholar

Tang, Marc & I-Ping Wan. 2020. Predicting speech errors in Mandarin based on word frequency. In Qi Su & Weidong Zhan (eds.), From minimal contrast to meaning construct (Frontiers in Chinese Linguistics 9), 289–303. Singapore: Springer.10.1007/978-981-32-9240-6_20Search in Google Scholar

Ting, Kai Ming. 2010. Precision and recall. In Claude Sammut & Geoffrey I. Webb (eds.), Encyclopedia of machine learning. New York: Springer.Search in Google Scholar

Wan, I-Ping. 1999. Mandarin phonology: Evidence from speech errors. Buffalo, NY: State University of New York at Buffalo dissertation.Search in Google Scholar

Wan, I-Ping. 2007a. Mandarin speech errors into phonological patterns. Journal of Chinese Linguistics 35(2). 185–224.Search in Google Scholar

Wan, I-Ping. 2007b. On the phonological organization of Mandarin tones. Lingua 117(10). 1715–1738. https://doi.org/10.1016/j.lingua.2006.10.002.Search in Google Scholar

Wan, I-Ping. 2016. Consonant features in Mandarin speech errors. Concentric: Studies in Linguistics 42(2). 1–39. https://doi.org/10.6241/concentric.ling.42.2.01.Search in Google Scholar

Wan, I-Ping & Marc Allassonnière-Tang. 2021a. The effect of word frequency and position-in-utterance in Mandarin speech errors: A connectionist model of speech production. Paper presented at the 21st Chinese Lexical Semantics Workshop (CLSW), Hong Kong, 28–30 May.10.1007/978-3-030-81197-6_42Search in Google Scholar

Wan, I-Ping & Marc Allassonnière-Tang. 2021b. A corpus study of lexical speech errors in Mandarin. Taiwan Journal of Linguistics 19(2). 87–120. https://doi.org/10.6519/TJL.202107_19(2).0003.Search in Google Scholar

Wan, I-Ping & Jeri Jaeger. 2003. The phonological representation of Taiwan Mandarin vowels: A psycholinguistic study. Journal of East Asian Linguistics 12(3). 205–257. https://doi.org/10.1023/a:1023666819363.10.1023/A:1023666819363Search in Google Scholar

Wan, I-Ping & Jen Ting. 2019. Semantic relationships in Mandarin speech errors. Taiwan Journal of Linguistics 17(2). 33–66. https://doi.org/10.6519/TJL.201907_17(2).0002.Search in Google Scholar

Yao, Yao & Bhamini Sharma. 2017. What is in the neighborhood of a tonal syllable? Evidence from auditory lexical decision in Mandarin Chinese. Proceedings of the Linguistic Society of America 2(45). 1–14. https://doi.org/10.3765/plsa.v2i0.4090.Search in Google Scholar

Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/lingvan-2023-0036).

Received: 2023-03-04

Accepted: 2023-12-11

Published Online: 2024-06-25

You are currently not able to access this content.

Supplementary Material

Articles in the same Issue

https://doi.org/10.1515/lingvan-2023-0036

Keywords for this article

speech errors; Mandarin; frequency gap; position-in-utterance; Levenshtein edit distance