Home Revisiting the automatic prediction of lexical errors in Mandarin
Article
Licensed
Unlicensed Requires Authentication

Revisiting the automatic prediction of lexical errors in Mandarin

  • Marc Allassonnière-Tang ORCID logo and I-Ping Wan EMAIL logo
Published/Copyright: June 25, 2024

Abstract

Speech errors provide cues for explaining the process of word retrieval. For example, speech errors are less likely to occur with high-frequency words since these words already receive a high level of activation. The current analysis further develops existing findings in two ways. First, instead of considering the overall frequency of the words in the entire corpora, we consider the gap in frequency between sequential pairs of words. We hypothesize that speech errors are more likely to occur if the target has a much lower frequency than its preceding word. Second, we use word embedding methods to quantify the semantic distance between sequential pairs of words. We hypothesize that speech errors are more likely to occur with words that have a large semantic distance from their preceding context. We also consider the potential effects of phonetic distance between sequential pairs of words and position-in-utterance of words in utterances. The results from a Mandarin corpus of speech errors show that word frequency and semantic distance between sequential pairs of words can be used to predict the occurrence of speech errors with an accuracy above the majority baseline.


Corresponding author: I-Ping Wan, Phonetics and Psycholinguistics Laboratory, Graduate Institute of Linguistics/Research Center for Mind, Brain, and Learning/Program in Teaching Chinese as a Second Language/Phonetics and Psycholinguistics Laboratory, 34913 National Chengchi University , Taipei, Taiwan, E-mail:

Funding source: National Science Council

Award Identifier / Grant number: MOST 98-2410-H-004-103-MY2

Acknowledgments

We appreciate the valuable and constructive comments we received from two anonymous reviewers and the editor. This paper is an extended version of a research paper that has been published in the post-conference proceedings for the 22nd Chinese Lexical Semantics Workshop in Hong Kong by Springer. We would like to thank the participants at that conference for their comments and questions. Thanks also go to Professor Wei-yun Ma for releasing the coding of the CKIP parser, and Professor Li-hsin Ning and Professor Jiahong Yuan for the traditional HMM coding for the phonetic forced alignment in Mandarin. Our deepest appreciation goes to Dr. Chain-wu Lee for his ongoing cutting-edge high-tech programming support in constructing all the corpora in the Phonetics and Psycholinguistics Laboratory. All remaining errors of analysis or interpretation are our own. This research was supported in part by a three-year grant from the National Science and Technology Council to the corresponding author in Taiwan (MOST 98-2410-H-004-103-MY2).

References

Alderete, John & Monica Davies. 2019. Investigating perceptual biases, data reliability, and data discovery in a methodology for collecting speech errors from audio recordings. Language and Speech 62(2). 281–317. https://doi.org/10.1177/0023830918765012.Search in Google Scholar

Alderete, John & Paul Tupper. 2017. Connectionist approaches to generative phonology. In Stephen J. Hannahs & Anna Bosch (eds.), The Routledge handbook of phonological theory, 360–390. New York: Routledge.10.4324/9781315675428-13Search in Google Scholar

Allassonnière-Tang, Marc, I-Ping Wan & Chianwu Lee. 2023. Semantic and phonetic distances in free word association tasks. Paper presented at the 24th Chinese Lexical Semantics Workshop (CLSW), Singapore, 18–20 May.10.1007/978-981-97-0586-3_8Search in Google Scholar

Arnaud, Pierre J. 1999. Target–error resemblance in French word substitution speech errors and the mental lexicon. Applied Psycholinguistics 20(2). 269–287. https://doi.org/10.1017/S0142716499002052.Search in Google Scholar

Bastiaanse, Roelien, Martijn Wieling & Nienke Wolthuis. 2015. The role of frequency in the retrieval of nouns and verbs in aphasia. Aphasiology 30(11). 1221–1239. https://doi.org/10.1080/02687038.2015.1100709.Search in Google Scholar

Berg, Thomas. 1987. A cross-linguistic comparison of slips of the tongue. Bloomington: Indiana University Linguistics Club.Search in Google Scholar

Breiman, Leo, Jerome Friedman, Richard Olshen & Charles Stone. 1984. Classification and regression trees. New York: Routledge.Search in Google Scholar

Chinese Knowledge and Information Processing (CKIP). 1998. The content and illustration of Academic Sinica Corpus. Taipei: Academia Sinica.Search in Google Scholar

Chinese Knowledge and Information Processing (CKIP). 2004. Part-of-speech analysis of Academia Sinica Balanced Corpus of Modern Chinese, version 3 (Technical Report no. 93-05). Taipei: Academia Sinica.Search in Google Scholar

Clifton, Charles & Lyn Frazier. 2004. Should given information come before new? Yes and no. Memory & Cognition 32(6). 886–895. https://doi.org/10.3758/BF03196867.Search in Google Scholar

Cutler, Anne. 1982. The reliability of speech error data. In Anne Cutler (ed.), Slips of the tongue and language production, 7–28. Berlin: De Gruyter Mouton.10.1515/9783110828306.7Search in Google Scholar

Dell, Gary S. 1984. Representation of serial order in speech: Evidence from the repeated phoneme effect in speech errors. Journal of Experimental Psychology: Learning, Memory, and Cognition 10(2). 222–233. https://doi.org/10.1037/0278-7393.10.2.222.Search in Google Scholar

Dell, Gary S. 1986. A spreading-activation theory of retrieval in sentence production. Psychological Review 93(3). 283–321. https://doi.org/10.1037/0033-295X.93.3.283.Search in Google Scholar

Dell, Gary S. 1988. The retrieval of phonological forms in production: Tests of predictions from a connectionist model. Journal of Memory and Language 27(2). 124–142. https://doi.org/10.1016/0749-596X(88)90070-8.Search in Google Scholar

Do, Youngah & Ryan Ka Yau Lai. 2021. Accounting for lexical tones when modeling phonological distance. Language 97(1). e39–e67. https://doi.org/10.1353/lan.2021.0012.Search in Google Scholar

Fay, David & Anne Cutler. 1977. Malapropisms and the structure of the mental lexicon. Linguistic Inquiry 8(3). 505–520.Search in Google Scholar

Fromkin, Victoria (ed.). 1973. Speech errors as linguistic evidence. Berlin: De Gruyter Mouton.Search in Google Scholar

Gahl, Susanne. 2008. Time and thyme are not homophones: The effect of lemma frequency on word durations in spontaneous speech. Language 84(3). 474–496. https://doi.org/10.1353/lan.0.0035.Search in Google Scholar

Garrett, Merril. 1982. Production of speech: Observations from normal and pathological use. In Andrew Ellis (ed.), Normality and pathology in cognitive functions, 19–76. London: Academic Press.Search in Google Scholar

Garrett, Merril. 1984. The organization of processing structure for language production: Applications to aphasic speech. In David Caplan, André Roch Lecours & Alan Smith (eds.), Biological perspectives on language, 172–193. Cambridge, MA: MIT Press.10.7551/mitpress/1615.003.0016Search in Google Scholar

Gaume, Bruno, Ludovic Tanguy, Cécile Fabre, Lydia-Mai Ho-Dac, Bénédicte Pierrejean, Nabil Hathout, Jérome Farinas, Julien Pinquier, Lola Danet & Patrice Péran. 2018. Automatic analysis of word association data from the Evolex psycholinguistic tasks using computational lexical semantic similarity measures. Paper presented at the 13th International Workshop on Natural Language Processing and Cognitive Science (NLPCS), Krakow, 11–12 September.Search in Google Scholar

Gundel, Jeanette. 1985. “Shared knowledge” and topicality. Journal of Pragmatics 9(1). 83–107. https://doi.org/10.1016/0378-2166(85)90049-9.Search in Google Scholar

Gundel, Jeanette. 1988. Universals of topic-comment structure. Studies in Syntactic Typology 17(1). 209–239. https://doi.org/10.1075/tsl.17.16gun.Search in Google Scholar

Harley, Trevor & Siobhan MacAndrew. 2001. Constraints upon word substitution speech errors. Journal of Psycholinguistic Research 30(4). 395–418. https://doi.org/10.1023/A:1010421724343.10.1023/A:1010421724343Search in Google Scholar

Hartsuiker, Robert J, Martin J. Pickering & Nivja H. De Jong. 2005. Semantic and phonological context effects in speech error repair. Journal of Experimental Psychology: Learning, Memory, and Cognition 31(5). 921–932. https://doi.org/10.1037/0278-7393.31.5.921.Search in Google Scholar

Hothorn, Torsten, Kurt Hornik & Achim Zeileis. 2006. Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics 15(3). 651–674. https://doi.org/10.1198/106186006x133933.Search in Google Scholar

Hu, Chiayin. 2017. Information structure in English, Mandarin Chinese and Taiwanese Southern Min: Argument realization of ditransitive objects. Paper presented at the 29th North American Conference on Chinese Linguistics (NACCL-29), Rutgers University, 16–18 June.Search in Google Scholar

Huang, Chu-Ren, Lung-Hao Lee, Wei-guang Qu, Jia-Fei Hong & Shiwen Yu. 2008. Quality assurance of automatic annotation of very large corpora: A study based on heterogeneous tagging systems. Paper presented at the International Conference on Language Resources and Evaluation, Marrakech, 28–30 May.Search in Google Scholar

Ibrahim, Sondos. 2014. A corpus-based investigation of the given before new principle in Tanzanian English. Paper presented at the 9th Lancaster University Postgraduate Conference in Linguistics & Language Teaching (LAEL PG), Lancaster University, 14 July.Search in Google Scholar

Jaeger, Jeri & David Wilkins. 2005. Semantic relationships in lexical errors. In Jeri Jaeger (ed.), Kids’ slips: What young children’s slips of the tongue reveal about language development, 311–384. Mahwah, NJ: Lawrence Erlbaum.Search in Google Scholar

Jaeger, Jeri. 2005. Kids’ slips: What young children’s slips of the tongue reveal about language development. Mahwah, NJ: Lawrence Erlbaum Associates.10.4324/9781410611550Search in Google Scholar

Joulin, Armand, Edouard Grave, Piotr Bojanowski & Tomas Mikolov. 2017. Bag of tricks for efficient text classification. Paper presented at the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Valencia, 3–7 April.10.18653/v1/E17-2068Search in Google Scholar

Junge, Bianca, Anna Theakston & Elena Lieven. 2015. Given–new/new–given? Children’s sensitivity to the ordering of information in complex sentences. Applied Psycholinguistics 36(3). 589–612. https://doi.org/10.1017/S0142716413000350.Search in Google Scholar

Kittredge, Audery, Gary S. Dell, Jay Verkuilen & Myrna Schwartz. 2008. Where is the effect of frequency in word production? Insights from aphasic picture-naming errors. Cognitive Neuropsychology 25(4). 463–492. https://doi.org/10.1080/02643290701674851.Search in Google Scholar

Lebret, Rémi & Ronan Collobert. 2015. Rehabilitation of count-based models for word vector representations. Paper presented at the Computational Linguistics and Intelligent Text Processing: 16th International Conference (CICLing), Cairo, 14–20 April.10.1007/978-3-319-18111-0_31Search in Google Scholar

Levy, Omer & Yoav Goldberg. 2014. Dependency-based word embeddings. Paper presented at the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, 22–27 June.10.3115/v1/P14-2050Search in Google Scholar

Ma, Wei-Yun & Keh-Jiann Chen. 2003. Introduction to CKIP Chinese word segmentation system for the first international Chinese word segmentation bakeoff. Paper presented at the Second SIGHAN Workshop on Chinese Language Processing, Sapporo, 11–12 July.10.3115/1119250.1119276Search in Google Scholar

Ma, Wei-Yun & Chu-Ren Huang. 2006. Uniform and effective tagging of a heterogeneous giga-word corpus. Paper presented at the 5th International Conference on Language Resources and Evaluation (LREC’06), Genoa, 24–26 May.Search in Google Scholar

Martin, Nadine & Eleanor Saffran. 1997. Language and auditory-verbal short-term memory impairments: Evidence for common underlying processes. Cognitive Neuropsychology 14(5). 641–682. https://doi.org/10.1080/026432997381402.Search in Google Scholar

Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg Corrado & Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. Paper presented at the Advances in Neural Information Processing System 26 (NIPS 2013), Lake Tahoe, 5–10 December.Search in Google Scholar

Minkina, Irene, Nadine Martin, Kristie Spencer & Diane Kendall. 2018. Links between short-term memory and word retrieval in aphasia. American Journal of Speech-Language Pathology 27(1S). 379–391. https://doi.org/10.1044/2017_ajslp-16-0194.Search in Google Scholar

Naroll, Raoul. 1970. Galton’s problem. In Raoul Naroll & Ronald Cohen (eds.), A handbook of method in cultural anthropology, 973–989. Garden City: Natural History Press.Search in Google Scholar

Neergaard, Karl & Chu-Ren Huang. 2016. Graph theoretic approach to Mandarin syllable segmentation. Paper presented at the 15th International Symposium on Chinese Languages and Linguistics (IsCLL-15), Hsinchu, 27–29 May.Search in Google Scholar

Nickels, Lyndsey & David Howard. 1994. A frequent occurrence? Factors affecting the production of semantic errors in aphasic naming. Cognitive Neuropsychology 11(3). 289–320. https://doi.org/10.1080/02643299408251977.Search in Google Scholar

Nooteboom, Sieb G. 1973. The tongue slips into patterns. In Victoria A. Fromkin (ed.), Speech errors as linguistic evidence, 144–156. The Hague: Mouton.10.1515/9783110888423.144Search in Google Scholar

Pennington, Jeffery, Richard Socher & Christopher Manning. 2014. Glove: Global vectors for word representation. Paper presented at the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, 26–28 October.10.3115/v1/D14-1162Search in Google Scholar

Shannon, Claude. 1948. A mathematical theory of communication. Bell System Technical Journal 27(3). 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x.Search in Google Scholar

Tang, Marc & I-Ping Wan. 2020. Predicting speech errors in Mandarin based on word frequency. In Qi Su & Weidong Zhan (eds.), From minimal contrast to meaning construct (Frontiers in Chinese Linguistics 9), 289–303. Singapore: Springer.10.1007/978-981-32-9240-6_20Search in Google Scholar

Ting, Kai Ming. 2010. Precision and recall. In Claude Sammut & Geoffrey I. Webb (eds.), Encyclopedia of machine learning. New York: Springer.Search in Google Scholar

Wan, I-Ping. 1999. Mandarin phonology: Evidence from speech errors. Buffalo, NY: State University of New York at Buffalo dissertation.Search in Google Scholar

Wan, I-Ping. 2007a. Mandarin speech errors into phonological patterns. Journal of Chinese Linguistics 35(2). 185–224.Search in Google Scholar

Wan, I-Ping. 2007b. On the phonological organization of Mandarin tones. Lingua 117(10). 1715–1738. https://doi.org/10.1016/j.lingua.2006.10.002.Search in Google Scholar

Wan, I-Ping. 2016. Consonant features in Mandarin speech errors. Concentric: Studies in Linguistics 42(2). 1–39. https://doi.org/10.6241/concentric.ling.42.2.01.Search in Google Scholar

Wan, I-Ping & Marc Allassonnière-Tang. 2021a. The effect of word frequency and position-in-utterance in Mandarin speech errors: A connectionist model of speech production. Paper presented at the 21st Chinese Lexical Semantics Workshop (CLSW), Hong Kong, 28–30 May.10.1007/978-3-030-81197-6_42Search in Google Scholar

Wan, I-Ping & Marc Allassonnière-Tang. 2021b. A corpus study of lexical speech errors in Mandarin. Taiwan Journal of Linguistics 19(2). 87–120. https://doi.org/10.6519/TJL.202107_19(2).0003.Search in Google Scholar

Wan, I-Ping & Jeri Jaeger. 2003. The phonological representation of Taiwan Mandarin vowels: A psycholinguistic study. Journal of East Asian Linguistics 12(3). 205–257. https://doi.org/10.1023/a:1023666819363.10.1023/A:1023666819363Search in Google Scholar

Wan, I-Ping & Jen Ting. 2019. Semantic relationships in Mandarin speech errors. Taiwan Journal of Linguistics 17(2). 33–66. https://doi.org/10.6519/TJL.201907_17(2).0002.Search in Google Scholar

Yao, Yao & Bhamini Sharma. 2017. What is in the neighborhood of a tonal syllable? Evidence from auditory lexical decision in Mandarin Chinese. Proceedings of the Linguistic Society of America 2(45). 1–14. https://doi.org/10.3765/plsa.v2i0.4090.Search in Google Scholar


Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/lingvan-2023-0036).


Received: 2023-03-04
Accepted: 2023-12-11
Published Online: 2024-06-25

© 2024 Walter de Gruyter GmbH, Berlin/Boston

Articles in the same Issue

  1. Frontmatter
  2. Editorial
  3. Editorial 2024
  4. Phonetics & Phonology
  5. The role of recoverability in the implementation of non-phonemic glottalization in Hawaiian
  6. Epenthetic vowel quality crosslinguistically, with focus on Modern Hebrew
  7. Japanese speakers can infer specific sub-lexicons using phonotactic cues
  8. Articulatory phonetics in the market: combining public engagement with ultrasound data collection
  9. Investigating the acoustic fidelity of vowels across remote recording methods
  10. The role of coarticulatory tonal information in Cantonese spoken word recognition: an eye-tracking study
  11. Tracking phonological regularities: exploring the influence of learning mode and regularity locus in adult phonological learning
  12. Morphology & Syntax
  13. #AreHashtagsWords? Structure, position, and syntactic integration of hashtags in (English) tweets
  14. The meaning of morphomes: distributional semantics of Spanish stem alternations
  15. A refinement of the analysis of the resultative V-de construction in Mandarin Chinese
  16. L2 cognitive construal and morphosyntactic acquisition of pseudo-passive constructions
  17. Semantics & Pragmatics
  18. “All women are like that”: an overview of linguistic deindividualization and dehumanization of women in the incelosphere
  19. Counterfactual language, emotion, and perspective: a sentence completion study during the COVID-19 pandemic
  20. Constructing elderly patients’ agency through conversational storytelling
  21. Language Documentation & Typology
  22. Conative animal calls in Macha Oromo: function and form
  23. The syntax of African American English borrowings in the Louisiana Creole tense-mood-aspect system
  24. Syntactic pausing? Re-examining the associations
  25. Bibliographic bias and information-density sampling
  26. Historical & Comparative Linguistics
  27. Revisiting the hypothesis of ideophones as windows to language evolution
  28. Verifying the morpho-semantics of aspect via typological homogeneity
  29. Psycholinguistics & Neurolinguistics
  30. Sign recognition: the effect of parameters and features in sign mispronunciations
  31. Influence of translation on perceived metaphor features: quality, aptness, metaphoricity, and familiarity
  32. Effects of grammatical gender on gender inferences: Evidence from French hybrid nouns
  33. Processing reflexives in adjunct control: an exploration of attraction effects
  34. Language Acquisition & Language Learning
  35. How do L1 glosses affect EFL learners’ reading comprehension performance? An eye-tracking study
  36. Modeling L2 motivation change and its predictive effects on learning behaviors in the extramural digital context: a quantitative investigation in China
  37. Ongoing exposure to an ambient language continues to build implicit knowledge across the lifespan
  38. On the relationship between complexity of primary occupation and L2 varietal behavior in adult migrants in Austria
  39. The acquisition of speaking fundamental frequency (F0) features in Cantonese and English by simultaneous bilingual children
  40. Sociolinguistics & Anthropological Linguistics
  41. A computational approach to detecting the envelope of variation
  42. Attitudes toward code-switching among bilingual Jordanians: a comparative study
  43. “Let’s ride this out together”: unpacking multilingual top-down and bottom-up pandemic communication evidenced in Singapore’s coronavirus-related linguistic and semiotic landscape
  44. Across time, space, and genres: measuring probabilistic grammar distances between varieties of Mandarin
  45. Navigating linguistic ideologies and market dynamics within China’s English language teaching landscape
  46. Streetscapes and memories of real socialist anti-fascism in south-eastern Europe: between dystopianism and utopianism
  47. What can NLP do for linguistics? Towards using grammatical error analysis to document non-standard English features
  48. From sociolinguistic perception to strategic action in the study of social meaning
  49. Minority genders in quantitative survey research: a data-driven approach to clear, inclusive, and accurate gender questions
  50. Variation is the way to perfection: imperfect rhyming in Chinese hip hop
  51. Shifts in digital media usage before and after the pandemic by Rusyns in Ukraine
  52. Computational & Corpus Linguistics
  53. Revisiting the automatic prediction of lexical errors in Mandarin
  54. Finding continuers in Swedish Sign Language
  55. Conversational priming in repetitional responses as a mechanism in language change: evidence from agent-based modelling
  56. Construction grammar and procedural semantics for human-interpretable grounded language processing
  57. Through the compression glass: language complexity and the linguistic structure of compressed strings
  58. Could this be next for corpus linguistics? Methods of semi-automatic data annotation with contextualized word embeddings
  59. The Red Hen Audio Tagger
  60. Code-switching in computer-mediated communication by Gen Z Japanese Americans
  61. Supervised prediction of production patterns using machine learning algorithms
  62. Introducing Bed Word: a new automated speech recognition tool for sociolinguistic interview transcription
  63. Decoding French equivalents of the English present perfect: evidence from parallel corpora of parliamentary documents
  64. Enhancing automated essay scoring with GCNs and multi-level features for robust multidimensional assessments
  65. Sociolinguistic auto-coding has fairness problems too: measuring and mitigating bias
  66. The role of syntax in hashtag popularity
  67. Language practices of Chinese doctoral students studying abroad on social media: a translanguaging perspective
  68. Cognitive Linguistics
  69. Metaphor and gender: are words associated with source domains perceived in a gendered way?
  70. Crossmodal correspondence between lexical tones and visual motions: a forced-choice mapping task on Mandarin Chinese
Downloaded on 8.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/lingvan-2023-0036/html
Scroll to top button