Abstract
This paper explores the proposed benefits of ΔP (delta P) as a measure of collocation strength. Its focus is on contrasting ΔP with other, more commonly used, association measures, particularly transitional probabilities, but also mutual information and Lexical Gravity G. To this end, first the strong correlation between ΔP and transitional probability is illustrated with the help of two exemplary corpora. This is followed by an analysis of hesitation placement in spontaneous spoken English, based on the assumption that hesitations will not be placed within strong collocations. Results show that, despite their strong similarity, in some contexts ΔP is more predictive of hesitation placement than transitional probability. Yet neither ΔP nor any of the other association measures emerges as the universally best predictor. On the basis of these results, it is suggested that studies should always rely on several association measures.
References
Allan, Lorraine G. 1980. A note on measurement of contingency between two binary variables in judgement tasks. Bulletin of the Psychonomic Society 15(3). 147–149.10.3758/BF03334492Suche in Google Scholar
Arnon, Inbal & Neal Snider. 2010. More than words: Frequency effects for multi-word phrases. Journal of Memory and Language 62. 67–82.10.1016/j.jml.2009.09.005Suche in Google Scholar
Baayen, R. Harald. 2008. Analyzing linguistic data: A practical introduction to statistics using R. Cambridge: Cambridge University Press.10.1017/CBO9780511801686Suche in Google Scholar
Baayen, R. Harald. 2009. LanguageR: Data sets and functions with ‘Analyzing Linguistic Data: A practical introduction to statistics’. R package version 0.955. http://CRAN.R-project.org/package=languageR.10.1017/CBO9780511801686Suche in Google Scholar
Beattie, Geoffrey & Brian L. Butterworth. 1979. Contextual probability and word frequency as determinants of pauses and errors in spontaneous speech. Language and Speech 22(3). 201–211.10.1177/002383097902200301Suche in Google Scholar
Beckner, Clay, Richard Blythe, Morten H. Joan Bybee, William Croft Christiansen, Nick C. Ellis, John Holland, Ke Jinyun, Diane Larsen-Freeman & Tom Schoeneman. 2009. Language is a complex adaptive system: Position paper. Language Learning 59(Supplement 1). 1–26.10.1111/j.1467-9922.2009.00533.xSuche in Google Scholar
Bell, Alan, Daniel Jurafsky, Eric Fosler-Lussier, Cynthia Girand, Michelle Gregory & Daniel Gildea. 2003. Effects of disfluencies, predictability, and utterance position on word form variation in English conversation. Journal of the Acoustical Society of America 113(2). 1001–1024.10.1121/1.1534836Suche in Google Scholar
Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad & Edward Finegan. 1999. Longman grammar of spoken and written English. Harlow: Pearson.Suche in Google Scholar
Bod, Rens. 2010. Probabilistic linguistics. In Bernd Heine & Heiko Narrog (eds.), The Oxford handbook of linguistic analysis, 633–662. Oxford: Oxford University Press.Suche in Google Scholar
Bresnan, Joan & Jessica Spencer. 2013. Frequency and variation in English subject-verb contraction. Stanford, CA: Stanford University Department of Linguistics and Center for the Study of Language and Information.Suche in Google Scholar
Brezina, Vaclav, Tony McEnery & Stephen Wattam. 2015. Collocations in context. A new perspective on collocational networks. International Journal of Corpus Linguistics 20(2). 139–173.10.1075/ijcl.20.2.01breSuche in Google Scholar
Bybee, Joan. 1998. The emergent lexicon. Chicago Linguistics Society 34: The Panels. 421–435.10.1093/acprof:oso/9780195301571.003.0013Suche in Google Scholar
Bybee, Joan. 2002. Phonological evidence for the exemplar storage of multiword sequences. Studies in Second Language Acquisition 24(2). 215–221.10.1017/S0272263102002061Suche in Google Scholar
Bybee, Joan. 2006. From usage to grammar: The mind’s response to repetition. Language 82(4). 711–733.10.1353/lan.2006.0186Suche in Google Scholar
Bybee, Joan. 2007a. Frequency of use and the organization of language. Oxford: Oxford University Press.10.1093/acprof:oso/9780195301571.001.0001Suche in Google Scholar
Bybee, Joan. 2007b. Sequentiality as the basis of constituent structure. In Joan Bybee (ed.), Frequency of use and the organisation of language, 313–335. Oxford: Oxford University Press. (Reprinted from Talmy Givón & Bertram F. Malle (eds.), The evolution of language out of pre-language. Amsterdam: John Benjamins. 2002. 107–132.).10.1075/tsl.53.07bybSuche in Google Scholar
Bybee, Joan. 2010. Language, usage, and cognition. Cambridge: Cambridge University Press.10.1017/CBO9780511750526Suche in Google Scholar
Bybee, Joan & James L. McClelland. 2005. Alternatives to the combinatorial paradigm of linguistic theory based on domain general principles of human cognition. The Linguistic Review 22. 381–410.10.1515/tlir.2005.22.2-4.381Suche in Google Scholar
Bybee, Joan & Joanne Scheibman. 2007. The effect of usage on degrees of constituency. The reduction of don’t in English. In Joan Bybee (ed.), Frequency of use and the organisation of language, 294–312. Oxford: Oxford University Press. (Reprinted from Linguistics 37(4). 1999. 575–596.).10.1093/acprof:oso/9780195301571.001.0001Suche in Google Scholar
Calhoun, Sasha, Jean Carletta, Jason Brenier, Neil Mayo, Daniel Jurafsky, Mark Steedman & David Beaver. 2010. The NXT-format switchboard corpus: A rich resource for investigating the syntax, semantics, pragmatics and prosody of dialogue. Language Resources and Evaluation Journal 44. 387–419.10.1007/s10579-010-9120-1Suche in Google Scholar
Clark, Herbert H. & Jean E. Fox Tree. 2002. Using uh and um in spontaneous speaking. Cognition 84. 73–110.10.1016/S0010-0277(02)00017-3Suche in Google Scholar
Croft, William. 2001. Radical construction grammar: Syntactic theory in typological perspective. Oxford: Oxford University Press.10.1093/acprof:oso/9780198299554.001.0001Suche in Google Scholar
Daudaravičius, Vidas & Marcinkevičienė. Rūta. 2004. Gravity counts for the boundaries of collocations. International Journal of Corpus Linguistics 9(2). 321–348.10.1075/ijcl.9.2.08dauSuche in Google Scholar
Eikmeyer, Hans-Jürgen, Ulrich Schade, Marc Kupietz & Uwe Laubenstein. 1999. A connectionist view of language production. In Rolf Klabunde & Christiane Von Stutterheim (eds.), Representations and processes in language production, 205–236. Wiesbaden: Deutscher Universitätsverlag.10.1007/978-3-322-99290-1_8Suche in Google Scholar
Ellis, Nick C. 2006. Language acquisition as rational contingency learning. Applied Linguistics 27(1). 1–24.10.1093/applin/ami038Suche in Google Scholar
Ellis, Nick C. & Fernando Ferreira-Junior. 2009. Constructions and their acquisition. Islands and the distinctiveness of the occupancy. Annual Review of Cognitive Linguistics 7. 187–220.10.1075/arcl.7.08ellSuche in Google Scholar
Ellis, Nick C., Rita Simpson-Vlach & Carson Maynard. 2008. Formulaic language in native and second language speakers: Psycholinguistics, corpus linguistics and TESOL. TESOL Quarterly 24(3). 375–396.10.1002/j.1545-7249.2008.tb00137.xSuche in Google Scholar
Elman, Jeffrey L. 1990. Finding structure in time. Cognitive Science 14. 179–211.10.4324/9781315784779-11Suche in Google Scholar
Evert, Stefan. 2004. The statistics of co-occurrences: Word pairs and collocations. Stuttgart: Institut für maschinelle Sprachverarbeitung, University of Stuttgart dissertation.Suche in Google Scholar
Fillmore, Charles J., Paul Kay & Mary Catherine O’Connor. 2003. Regularity and idiomaticity in grammatical constructions: The case of let alone. In Michael Tomasello (ed.), The new psychology of language: Cognitive and functional approaches to language structure, 243–270. Mahwah, NJ: Lawrence Erlbaum.Suche in Google Scholar
Fried, Mirjam & Östman. Jan-Ola. 2004. Construction grammar: A thumbnail sketch. In Mirjam Fried & Jan-Ola Östman (eds.), Construction grammar in a cross-language perspective, 11–86. Amsterdam/Philadelphia: John Benjamins.10.1075/cal.2.02friSuche in Google Scholar
Frisson, Steven, Keith Rayner & Martin J. Pickering. 2005. Effects of contextual predictability and transitional probability on eye movements during reading. Journal of Experimental Psychology: Learning, Memory and Cognition 31(5). 862–877.10.1037/0278-7393.31.5.862Suche in Google Scholar
Fung, Loretta & Ronald Carter. 2007. Discourse markers and spoken English: Native and learner use in pedagogic settings. Applied Linguistics 28(3). 410–439.10.1093/applin/amm030Suche in Google Scholar
Godfrey, John J., Edward Holliman & McDaniel. Jane 1992. SWITCHBOARD: Telephone speech corpus for research and development. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1992 1. I-517–I-20.10.1109/ICASSP.1992.225858Suche in Google Scholar
Goldberg, Adele. 2005. Constructions at work: The nature of generalization in language. Oxford: Oxford University Press.10.1093/acprof:oso/9780199268511.001.0001Suche in Google Scholar
Goldman-Eisler, Frieda. 1968. Psycholinguistics: Experiments in spontaneous speech. New York: Academic Press.Suche in Google Scholar
Gregory, Michelle L., William D. Raymond, Alan Bell, Eric Fosler-Lussier & Daniel Jurafsky. 1999. The effects of collocational strength and contextual predictability in lexical production. Communication and Linguistic Studies 35. 151–166.Suche in Google Scholar
Gries, Stefan Th. 2013. 50-something years of work on collocations: What is or should be next …. International Journal of Corpus Linguistics 18(1). 137–165.10.1075/bct.74.07griSuche in Google Scholar
Gries, Stefan Th. 2014. Coll.analysis 3.5. A script for R to compute perform collostructional analyses. http://www.linguistics.ucsb.edu/faculty/stgries/teaching/groningen/index.html.Suche in Google Scholar
Gries, Stefan Th. 2015a. More (old and new) misunderstandings of collostruction analysis: On Schmidt & Küchenhoff (2013). Cognitive Linguistics 26(3). 505–536.10.1515/cog-2014-0092Suche in Google Scholar
Gries, Stefan Th. 2015b. The role of quantitative methods in cognitive linguistics. In Jocelyne Daems, Eline Zenner, Kris Heylen, Dirk Speelman & Hubert Cuyckens (eds.), Change of paradigms – New paradoxes. Recontextualizing language and linguistics. Berlin/Boston: De Gruyter Mouton.Suche in Google Scholar
Gries, Stefan Th. & Joybrato Mukherjee. 2010. Lexical gravity across varieties of English: An ICE-based study of n-Grams in Asian Englishes. International Journal of Corpus Linguistics 15(4). 520–548.10.1075/ijcl.15.4.04griSuche in Google Scholar
Hothorn, Torsten, Kurt Hornik & Achim Zeileis. 2006. Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics 15(3). 651–674.10.1198/106186006X133933Suche in Google Scholar
Jenkins, Herbert M. & William C. Ward. 1965. Judgement of contingency between responses and outcomes. Psychological Monographs 79(1). 1–17.10.1037/h0093874Suche in Google Scholar
Jucker, Andreas. 1993. The discourse marker well: A relevance-theoretical account. Journal of Pragmatics 19. 435–452.10.1016/0378-2166(93)90004-9Suche in Google Scholar
Jurafsky, Daniel, Alan Bell, Eric Fosler-Lussier, Cynthia Girand & William D. Raymond. 1998. Reduction of English function words in Switchboard. Proceedings of the International Conference of Spoken Language Processing, Sydney. 1–4.10.21437/ICSLP.1998-801Suche in Google Scholar
Jurafsky, Daniel & James H. Martin. 2008. Speech and language processing. An introduction to natural language processing, computational linguistics, and speech recognition. Pearson/Prentice Hall International.Suche in Google Scholar
Kapatsinski, Vsevolod M. 2005. Measuring the relationship of structure to use: Determinants of the extent of recycle in repetition repair. Berkeley Linguistics Society 30. 481–492.10.3765/bls.v30i1.949Suche in Google Scholar
Kapatsinski, Vsevolod M. & Joshua Radicke. 2009. Frequency and the emergence of prefabs: Evidence from monitoring. In Roberta Corrigan, Edith A. Moravcsik, Hamid Ouali & Kathleen M. Wheatley (eds.), Formulaic language. Vol. 2: Acquisition, loss, psychological reality, functional explanations, 499–520. Amsterdam/Philadelphia: John Benjamins.10.1075/tsl.83.14kapSuche in Google Scholar
Langacker, Ronald W. 2000. A dynamic usage-based model. In Suzanne Kemmer & Michael Barlow (eds.), Usage-based models of language, 1–63. Stanford, CA: CSLI Publications.Suche in Google Scholar
Levey, Stephen. 2006. The sociolinguistic distribution of discourse marker like in preadolescent speech. Multilingua 25. 413–441.10.1515/MULTI.2006.022Suche in Google Scholar
Maclay, Howard & Charles E. Osgood. 1959. Hesitation phenomena in spontaneous English speech. Word 15. 19–44.10.1080/00437956.1959.11659682Suche in Google Scholar
Manning, Christopher D. & Hinrich Schütze. 1999. Foundations of statistical natural language processing. Cambridge, MA: MIT Press.Suche in Google Scholar
Müller, Simone. 2005. Discourse markers in native and non-native English discourse. Amsterdam/Philadelphia: John Benjamins.10.1075/pbns.138Suche in Google Scholar
NXT Switchboard Corpus Public Release. 2008. Philadelphia: Linguistic Data Consortium. Catalog #LDC2009T26.Suche in Google Scholar
Oakes, Michael. 1998. Statistics for corpus linguistics. Edinburgh: Edinburgh University Press.Suche in Google Scholar
Onnis, Luca & Eric Thiessen. 2013. Language experience changes subsequent learning. Cognition 162(2). 168–284.10.1016/j.cognition.2012.10.008Suche in Google Scholar
Pecina, Pavel. 2010. Lexical association measures and collocation extraction. Language Resources and Evaluation 44(1/2). 137–158.10.1007/s10579-009-9101-4Suche in Google Scholar
Perruchet, Pierre & Sebastien Pacton. 2006. Implicit learning and statistical learning: One phenomenon, two approaches. TRENDS in Cognitive Sciences 10(5). 233–238.10.1016/j.tics.2006.03.006Suche in Google Scholar
Phillips, Martin K. 1983. Lexical macrostructure in science text. Birmingham: University of Birmingham dissertation.Suche in Google Scholar
R Development Core Team. 2009. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. http://www.R-project.org.Suche in Google Scholar
Reali, Florencia & Morten H. Christiansen. 2007. Processing of relative clauses is made easier by frequency of occurrence. Journal of Memory and Language 57. 1–23.10.1016/j.jml.2006.08.014Suche in Google Scholar
Rescorla, Robert A. 1968. Probability of shock in the presence and absence of CS in fear conditioning. Journal of Comparative Physiological Psychology 66. 1–5.10.1037/h0025984Suche in Google Scholar
Rumelhart, David E. & James L. McClelland (eds.). 1986. Parallel distributed processing: Explorations in the microstructure of cognition. Foundations, vol. 1. Cambridge, MA/London: MIT Press/Bradford.10.7551/mitpress/5236.001.0001Suche in Google Scholar
Schmid, Hans-Jörg & Küchenhoff. Helmut. 2013. Collostructional analysis and other ways of measuring lexicogrammatical attraction: Theoretical premises, practical problems and cognitive underpinnings. Cognitive Linguistics 24(3). 531–577.10.1515/cog-2013-0018Suche in Google Scholar
Schneider, Ulrike. 2014. Frequency, chunks and hesitations. A usage-based analysis of chunking in English. Freiburg: NIHIN Studies. https://freidok.uni-freiburg.de/data/9793Suche in Google Scholar
Schneider, Ulrike. 2016. Chunking as a factor determining the placement of hesitations. A corpus-based study of spoken English. In Heike Behrens & Stefan Pfänder (eds.), Frequency effects in language: What counts in language processing, acquisition and change, 61–89. Berlin/New York: Mouton De Gruyter.10.1515/9783110346916-004Suche in Google Scholar
Shanks, David R. 1995. The psychology of associative learning. Cambridge: Cambridge University Press.10.1017/CBO9780511623288Suche in Google Scholar
Shriberg, Elizabeth & Andreas Stolcke. 1996. Word predictability after hesitations: A corpus-based study. Proceedings of the International Conference on Spoken Language Processing. 1868–1871.10.1109/ICSLP.1996.607996Suche in Google Scholar
Strobl, Carolin, Anne-Laure Boulestreix, Thomas Kneib, Thomas Augustin & Achim Zeileis. 2008. Conditional variable importance for random forests. BMC Bioinformatics 9. 307.10.1186/1471-2105-9-307Suche in Google Scholar
Strobl, Carolin, Anne-Laure Boulestreix, Achim Zeileis & Torsten Hothorn. 2007. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics 8. 25.10.1186/1471-2105-8-25Suche in Google Scholar
Strobl, Carolin, James Malley & Gerhard Tutz. 2009. An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods 14(4). 323–348.10.1037/a0016973Suche in Google Scholar
Tagliamonte, Sali A. & R. Harald Baayen. 2012. Models, forests, and trees of York English: Was/were variation as a case study for statistical practice. Language Variation and Change 24. 135–178.10.1017/S0954394512000129Suche in Google Scholar
Tily, Harry, Susanne Gahl, Inbal Arnon, Neal Snider, Anubha Kothari & Joan Bresnan. 2009. Syntactic probabilities affect pronunciation variation in spontaneous speech. Language and Cognition 1(2). 147–165.10.1515/LANGCOG.2009.008Suche in Google Scholar
Vogel Sosa, Anna & James MacFarlane. 2002. Evidence for frequency-based constituents in the mental lexicon: Collocations involving the word. Journal of Brain and Language 83. 227–236.10.1016/S0093-934X(02)00032-9Suche in Google Scholar
Wahl, Alexander. 2015. Intonation unit boundaries and the storage of bigrams. Evidence from bidirectional and directional association measures. Review of Cognitive Linguistics 13(1). 191–219.10.1075/rcl.13.1.08wahSuche in Google Scholar
Ward, William C. & Herbert M. Jenkins. 1965. The display of information and the judgement of contingency. Canadian Journal of Experimental Psychology 19(3). 231–241.10.1037/h0082908Suche in Google Scholar
Wiechmann, Daniel. 2008. On the computation of collostruction strength: Testing measures of association as expressions of lexical bias. Corpus Linguistics and Linguistic Theory 4(2). 253–290.10.1515/CLLT.2008.011Suche in Google Scholar
Wray, Alison. 2002. Formulaic Language and the Lexicon. Cambridge: Cambridge University Press.10.1017/CBO9780511519772Suche in Google Scholar
© 2020 Walter de Gruyter GmbH, Berlin/Boston
Artikel in diesem Heft
- Frontmatter
- Annotating Speaker Stance in Discourse: The Brexit Blog Corpus
- ΔP as a measure of collocation strength
- Assessing theory with practice: an evaluation of two aspectual-semantic classification models of gerundive nominalizations
- Verb-argument constructions in advanced L2 English learner production: Insights from corpora and verbal fluency tasks
- Discourse functions of always progressives: Beyond complaining
- Pitting corpus-based classification models against each other: a case study for predicting constructional choice in written Estonian
- Using token-based semantic vector spaces for corpus-linguistic analyses: From practical applications to tests of theoretical claims
Artikel in diesem Heft
- Frontmatter
- Annotating Speaker Stance in Discourse: The Brexit Blog Corpus
- ΔP as a measure of collocation strength
- Assessing theory with practice: an evaluation of two aspectual-semantic classification models of gerundive nominalizations
- Verb-argument constructions in advanced L2 English learner production: Insights from corpora and verbal fluency tasks
- Discourse functions of always progressives: Beyond complaining
- Pitting corpus-based classification models against each other: a case study for predicting constructional choice in written Estonian
- Using token-based semantic vector spaces for corpus-linguistic analyses: From practical applications to tests of theoretical claims