Home Generating data as a proxy for unavailable corpus data: the contextualized sentence completion task
Article
Licensed
Unlicensed Requires Authentication

Generating data as a proxy for unavailable corpus data: the contextualized sentence completion task

  • Marilyn Ford

    Marilyn Ford is an Associate Professor in the School of Information and Communication Technology at Griffith University in Australia and a member of the Spoken Syntax Lab at Stanford’s Center for the Study of Language and Information. She is also a member of the Institute for Integrated and Intelligent Systems at Griffith University.

    EMAIL logo
    and Joan Bresnan

    Joan Bresnan, Sadie Dernham Patek Professor Emerita in Humanities at Stanford University, is Senior Researcher of the Spoken Syntax Lab at Stanford’s Center for the Study of Language and Information.

Published/Copyright: April 25, 2015

Abstract

There is much interest in using large corpora to explore predictors of the probability of higher level linguistic structures, but suitable corpora are not available for all languages and their varieties. We explore a task that uses discourse contexts from an existing corpus as prompts for sentence completion to investigate the usefulness of the method for generating data as a proxy for unavailable corpus data. Mini databases of dative and genitive structures were obtained with the method using American and Australian participants. It is shown that the databases are indeed a good proxy for corpus data.

About the authors

Marilyn Ford

Marilyn Ford is an Associate Professor in the School of Information and Communication Technology at Griffith University in Australia and a member of the Spoken Syntax Lab at Stanford’s Center for the Study of Language and Information. She is also a member of the Institute for Integrated and Intelligent Systems at Griffith University.

Joan Bresnan

Joan Bresnan, Sadie Dernham Patek Professor Emerita in Humanities at Stanford University, is Senior Researcher of the Spoken Syntax Lab at Stanford’s Center for the Study of Language and Information.

Acknowledgement

This material is based upon work supported by the National Science Foundation under Grant No. BCS-1025602.

Appendix 1

Instructions for the contextualized completion task (for dative elicitation)

In the following passages, one or two speakers talk informally about different topics. The final sentence in each item is left unfinished. Your task is to read each passage and then complete the unfinished sentence. Complete it in the way that feels most natural to you. So, you need not spend a lot of time deciding how to complete it. Just write down what seems natural.

Please read the passages carefully and write completions that seem most natural to you.

Appendix 2

Instructions for the contextualized completion task (for genitive elicitation)

In the following passages, one or two real speakers talk informally about different topics. The final sentence in each item is left unfinished. Here is an example for you:

_________________________________________________________________

Speaker A:

I really use my computer a lot at home. I am an accountant but I work at home. So I use it for that quite often. We have, you know, used some of it for some personal things. We keep track of personal budgets and things like that on it. Since it’s tax season, I’m doing a lot of taxes, so I do a lot of that work on it as well.

Speaker B:

I was amazed when I took ……………………………………………………………............................................……………………………………………………………............................................

_________________________________________________________________

Your task is to read each passage and then complete the unfinished sentence. Complete it in the way that feels most natural to you. However, for each completion, you are given two words or phrases and you are to incorporate the two words or phrases in your completion. For example, after the passage above, you might get something like this:

[a new accountant]

[tax]

You would incorporate these into your completion. You might write something like the following, for example:

“I was amazed when I took our taxes to a new accountant that he did not have a computer on his desk.”

Please note:

You can put the given words or phrases in any order. Grammatical modifications of the words can be used. Thus, for example, if you are given the word “tax” you could use “taxes” or “taxed” if you like.

Please read the passages carefully and write completions containing the required words given in any order. There is no need to spend a lot of time working out what to write. Just write what seems most natural to you.

References

Aylett, Matthew & AliceTurk. 2004. The smooth signal redundancy hypothesis: A functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech. Language and Speech47(1). 3156.Search in Google Scholar

Baayen, R. Harald. 2008. Analyzing linguistic data. A practical introduction to statistics using R. Cambridge: Cambridge University Press.10.1017/CBO9780511801686Search in Google Scholar

Baayen, R. Harald, DougDavidson & Douglas M.Bates. 2008. Mixed effects modeling with crossed random effects for subjects and items. Journal of Memory and Language59(4). 390412.10.1016/j.jml.2007.12.005Search in Google Scholar

Bates, Douglas, MartinMaechler & BinDai. 2009. lme4: Linear mixed-effects models using S4 classes. R package version 0.999375–31.Search in Google Scholar

Bell, Alan, Jason M.Brenier, MichelleGregory, CynthiaGirand & DanielJurafsky. 2009. Predictability effects on durations of content and function words in conversational English. Journal of Memory and Language60(1). 92111.10.1016/j.jml.2008.06.003Search in Google Scholar

Biber, Douglas, StigJohansson, GeoffreyLeech, SusanConrad & EdwardFinegan. 1999. Longman grammar of spoken and written English. Harlow: Longman.Search in Google Scholar

Bresnan, Joan. 2007. Is syntactic knowledge probabilistic? Experiments with the English dative alternation. In SamFeatherston and WolfgangSternefeld (eds.), Roots: Linguistics in search of its evidential base (studies in generative grammar), 7596. Berlin: Mouton de Gruyter.Search in Google Scholar

Bresnan, Joan, AnnaCueni, TatianaNikitina & RHarald Baayen. 2007. Predicting the dative alternation. In GerlofBouma, IreneKrämer, and JoostZwarts (eds.), Cognitive foundations of interpretation,6994. Amsterdam: Royal Netherlands Academy of Science.Search in Google Scholar

Bresnan, Joan & JenniferHay. 2008. Gradient grammar: An effect of animacy on the syntax of give in New Zealand and American English. Lingua118(2). 245259.10.1016/j.lingua.2007.02.007Search in Google Scholar

Bresnan, Joan & MarilynFord. 2010. Predicting syntax: Processing dative constructions in American and Australian varieties of English. Language86(1). 168213.10.1353/lan.0.0189Search in Google Scholar

Choi, Hye-Won. 2007. Length and order: A corpus study of Korean dative-accusative construction. Discourse and Cognition14(3). 207227.10.15718/discog.2007.14.3.207Search in Google Scholar

Collins, Peter. 1995. The indirect object construction in English: An informational approach. Linguistics33(1). 3549.10.1515/ling.1995.33.1.35Search in Google Scholar

Connine, Cynthia, FernandaFerreira, CharlieJones, CharlesClifton & LynFrazier. 1984. Verb frame preferences: Descriptive norms. Journal of Psycholinguistic Research13(4). 307319.10.1007/BF01076840Search in Google Scholar

Diessel, Holger. 2007. Frequency effects in language acquisition, language use, and diachronic change. New Ideas in Psychology25(2). 108127.Search in Google Scholar

Eckert, Penelope. 1989. The whole woman: Sex and gender differences in variation. Language Variation and Change1(1). 245267.10.1017/S095439450000017XSearch in Google Scholar

Eckert, Penelope & SallyMcConnell-Ginet. 1992. Think practically and look locally: Language and gender as community-based practice. Annual Review of Anthropology21. 461490.10.1146/annurev.an.21.100192.002333Search in Google Scholar

Ford, Marilyn & JoanBresnan. 2012. They whispered me the answer’ in Australia and the US: A comparative experimental study. In Tracy HollowayKing & Valeriade Paiva (eds.), From quirky case to representing space: Papers in honor of Annie Zaenen,Stanford, CA: CSLI Publications.Search in Google Scholar

Ford, Marilyn & JoanBresnan. 2013. Studying syntactic variation using convergent evidence from psycholinguistics and usage. In ManfredKrug & JuliaSchlüter (eds.), Research methods in language variation and change, Cambridge: Cambridge University Press.10.1017/CBO9780511792519.020Search in Google Scholar

Ford, Marilyn, JoanBresnan & Ronald MKaplan. 1982. A competence-based theory of syntactic change. In JoanBresnan (ed.), Mental representations and grammatical relations,727796. Cambridge, MA: MIT Press.Search in Google Scholar

Gahl, Susanne & Susan MGarnsey. 2004. Knowledge of grammar, knowledge of usage: Syntactic probabilities affect pronunciation variation. Language80(4). 748775.Search in Google Scholar

Garnsey, Susan M. 1994. [Percentages of completions in a sentence completion task on 107 subjects]. Unpublished data.Search in Google Scholar

Garnsey, Susan M., Neal J.Pearlmutter, ElizabethMyers & Melanie A.Lotocky. 1997. The contributions of verb bias and plausibility to the comprehension of temporarily ambiguous sentences. Journal of Memory and Language37. 5893.10.1006/jmla.1997.2512Search in Google Scholar

Grafmiller, Jason. 2014. Variation in English genitives across modality and genre. English language and linguistics 18. 471–496.10.1017/S1360674314000136Search in Google Scholar

Gregory, Michelle L., William D.Raymond, AlanBell, EricFosler-Lussier & DanielJurafsky. 1999. The effects of collocational strength and contextual predictability in lexical production. In Chicago linguistics society (CLS-99),151166. Chicago: Chicago University Press.Search in Google Scholar

Gries, Stefan Th. 2003a. Multifactorial analysis in corpus linguistics: A study of particle placement. London and New York: Continuum.Search in Google Scholar

Gries, Stefan Th. 2003b. Towards a corpus-based identification of prototypical instances of constructions. Annual Review of Cognitive Linguistics1. 128.10.1075/arcl.1.02griSearch in Google Scholar

Gries, Stefan Th. 2005. Syntactic priming: A corpus-based approach. Journal of Psycholinguistic Research34(4). 365399.10.1007/s10936-005-6139-3Search in Google Scholar

Harrell, Frank E. 2001. Regression modeling strategies: With applications to linear models, logistic regression, and survival analysis. New York: Springer.10.1007/978-1-4757-3462-1Search in Google Scholar

Hinrichs, Lars & BenediktSzmrecsányi. 2007. Recent changes in the function and frequency of standard English genitive constructions: A multivariate analysis of tagged corpora. English Language and Linguistics11(3). 437474.10.1017/S1360674307002341Search in Google Scholar

Holmes, Virginia M., LaurieStowe & LindaCupples. 1989. Lexical expectations in parsing complement-verb sentences. Journal of Memory and Language28(6). 668689.10.1016/0749-596X(89)90003-XSearch in Google Scholar

Hundt, Marianne & BenediktSzmrecsányi. 2012. Animacy in early New Zealand English. English World-Wide33(3). 241263.10.1075/eww.33.3.01hunSearch in Google Scholar

Jaeger, T. Florian. 2008. Categorical data analysis: Away From ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language59(4). 434446.10.1016/j.jml.2007.11.007Search in Google Scholar

Jaeger, T. Florian. 2010. Redundancy and reduction: Speakers manage syntactic information density. Cognitive Psychology61(1). 2362.10.1016/j.cogpsych.2010.02.002Search in Google Scholar

Johnson, Keith. 2008. Quantitative methods in linguistics. Oxford: Blackwell.Search in Google Scholar

Jurafsky, Daniel. 1996. A probabilistic model of lexical and syntactic access and disambiguation. Cognitive Science20(2). 137194.10.1207/s15516709cog2002_1Search in Google Scholar

Kennison, Shelia M. 1999. American English usage frequencies for noun phrase and tensed sentence complement-taking verbs. Journal of Psycholinguistic Research28(2). 165177.10.1023/A:1023210309050Search in Google Scholar

Lapata, Maria, FrankKeller & SabineSchulte Im Walde. 2001. Verb frame frequency as a predictor of verb bias. Journal of Psycholinguistic Research30(4). 419435.10.1023/A:1010473708413Search in Google Scholar

Marcus, Mitchell P., Mary AnnMarcinkiewicz & BeatriceSantorini. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics19(2). 313330.10.21236/ADA273556Search in Google Scholar

Merlo, P. 1994. A corpus-based analysis of verb continuation frequencies for syntactic processing. Journal of Psycholinguistic Research23(6). 435457.10.1007/BF02146684Search in Google Scholar

Pappert, Sandra, JohannesSchließer, ThomasPechmann & Dirk P.Janssen. 2005. Availability of subcategorization frames: A matter of syntactic or lexical frequency? In KatrinErk, AlissaMelinger and SabineSchulte im Walde (eds.), Proceedings of the verb workshop 2005, Saarbrücken,98102. Saarbrücken: Saarland University.Search in Google Scholar

Pauwels, Anne. 1991. Gender differences in Australian English. In SuzanneRomaine (ed.), Language in Australia, 318326. Cambridge: Cambridge University Press.10.1017/CBO9780511620881.024Search in Google Scholar

Pinheiro, José C. & Douglas M.Bates. 2000. Mixed-effects models in s and S-PLUS. New York, NY: Springer.10.1007/978-1-4419-0318-1Search in Google Scholar

Quené, Hugo & Huubvan den Bergh. 2004. On multi-level modeling of data from repeated measures designs: A tutorial. Speech Communication43(1–2). 103121.10.1016/j.specom.2004.02.004Search in Google Scholar

Quené, Hugo & Huubvan den Bergh. 2008. Examples of mixed-effects modeling with crossed random effects and with binomial data. Journal of Memory and Language59(4). 413425.10.1016/j.jml.2008.02.002Search in Google Scholar

Quirk, Randolph, SidneyGreenbaum, GeoffreyLeech & JanSvartvik. 1985. A comprehensive grammar of the English language. London and New York: Longman.Search in Google Scholar

Richter, Tobias. 2006. What is wrong with ANOVA and multiple regression? Analysing sentence reading times with hierarchical linear models. Discourse Processes41(3). 221250.10.1207/s15326950dp4103_1Search in Google Scholar

Roland, Douglas, FredericDick & Jeffrey L.Elman. 2007. Frequency of basic English grammatical structures: A corpus analysis. Journal of Memory and Language57(3). 348379.10.1016/j.jml.2007.03.002Search in Google Scholar

Roland, Douglas & DanielJurafsky. 1998. How verb subcategorization frequencies are affected by corpus choice. Proceedings of COLING-ACL 1998, 1117–1121.Search in Google Scholar

Roland, Douglas & DanielJurafsky. 2002. Verb sense and verb subcategorization probabilities. In PaolaMerlo and SuzanneStevenson (eds.), The lexical basis of sentence processing: Formal, computational, and experimental issues,325345. Amsterdam: John Benjamins.10.1075/nlp.4.17rolSearch in Google Scholar

Rosenbach, Anette. 2002. Genitive variation in English. Conceptual factors in synchronic and diachronic studies (topics in English linguistics, 42).Berlin and New York: Mouton de Gruyter.10.1515/9783110899818Search in Google Scholar

Rosenbach, Anette. 2003. Aspects of iconicity and economy in the choice between the s-genitive and the of-genitive in English. In GuenterRohdenburg and BrittaMondorf, (eds.), Determinants of grammatical variation in English,379412. Berlin and New York: Mouton de Gruyter.10.1515/9783110900019.379Search in Google Scholar

Rosenbach, Anette. 2005. Animacy versus weight as determinants of grammatical variation in English. Language81(3). 613644.10.1353/lan.2005.0149Search in Google Scholar

Rosenbach, Anette & LetiziaVezzosi. 2000. Genitive constructions in early modern English: New evidence from a corpus analysis. In RosannaSornicola, ErichPoppe, and ArielShisha-Halevy (eds.), Stability, variation and change of word-order patterns over time,285307. Amsterdam, Philadelphia: Benjamins.10.1075/cilt.213.20rosSearch in Google Scholar

Shih, Stephanie, JasonGrafmiller, RichardFutrell & JoanBresnan. In Press. Rhythm’s role in genitive construction choice in spoken English. In RalfVogel and Rubenvan de Vijver (eds.), Rhythm in phonetics, grammar and cognition, in the series trends in linguistics. Studies and monographs (TiLSM),Berlin: Mouton.Search in Google Scholar

Snyder, Kieran. 2003. The relationship between form and function in ditransitive constructions. Philadelphia: University of Pennsylvania Department of Linguistics Ph.D. dissertation.Search in Google Scholar

Stallings, Lynne M., Maryellen C.MacDonald & Padraig G.O’Seaghdha. 1998. Phrasal ordering constraints in sentence production: Phrase length and verb disposition in heavy-NP shift. Journal of Memory and Language39(3). 392417.10.1006/jmla.1998.2586Search in Google Scholar

Szmrecsányi, Benedikt. 2005. Language users as creatures of habit: A corpus-linguistic analysis of persistence in spoken English. Corpus Linguistics and Linguistic Theory1(1). 113150.10.1515/cllt.2005.1.1.113Search in Google Scholar

Szmrecsányi, Benedikt. 2006. Morphosyntactic persistence in spoken English: A corpus study of the intersection of variationist sociolinguistics, psycholinguistics, and discourse analysis. Berlin and New York: Mouton de Gruyter.10.1515/9783110197808Search in Google Scholar

Szmrecsányi, Benedikt & LarsHinrichs. 2008. Probabilistic determinants of genitive variation in spoken and written English: A multivariate comparison across time, space, and genres. In TerttuNevalamen, IrmaTaavitsainen, PäiviPahta, & MinnaKorhonen (eds.), The dynamics of linguistic variation: Corpus evidence on English past and present,291309. Amsterdam: Benjamins.10.1075/silv.2.22szmSearch in Google Scholar

Tagliamonte, Sali & LidiaJarmasz. 2008. Variation and change in the English genitive: A sociolinguistic perspective. Linguistic Society of America Annual Meeting. Chicago, Illinois.Search in Google Scholar

Theijssen, Daphne. 2012. Making choices: Modelling the English dative alternation. Ph.D. dissertation, Radboud University Nijmegen.Search in Google Scholar

Thompson, Sandra A. 1990. Information flow and ‘dative shift’ in English. In JerroldEdmondson, KatherineFeagin, and PeterMühlhäusler (eds.), Development and diversity: Linguistic variation across time and space,239253. Dallas: Summer Institute of Linguistics.Search in Google Scholar

Tily, Harry, SusanneGahl, InbalArnon, NealSnider, AnubhaKothari & JoanBresnan. 2009. Syntactic probabilities affect pronunciation variation in spontaneous speech. Language and Cognition1(2). 147165.10.1515/LANGCOG.2009.008Search in Google Scholar

Trueswell, John C., Michael K.Tanenhaus & ChristopherKello. 1993. Verb-specific constraints in sentence processing: Separating effects of lexical preference from garden-paths. Journal of Experimental Psychology: Learning, Memory, and Cognition19(3). 528553.Search in Google Scholar

Wolk, Christoph, JoanBresnan, AnetteRosenbach & BenediktSzmrecsányi. 2013. Dative and genitive variability in late modern English: Exploring cross-constructional variation and change. Diachronica30(3). 382–419.10.1075/dia.30.3.04wolSearch in Google Scholar

Published Online: 2015-4-25
Published in Print: 2015-5-1

©2015 by De Gruyter Mouton

Downloaded on 26.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/cllt-2015-0018/html
Scroll to top button