Abstract
A central task in empirical and quantitative language studies is the extraction of linguistic constructions important to linguistic theory and application. The great number and variety of such constructions increasingly necessitates computer-assisted extraction, which often proves challenging as it entails a simultaneous analysis of multiple layers of linguistic information latent in large-scale corpora. To address this, we present Constraction, an open-source tool for the automatic extraction and interactive exploration of linguistic constructions from arbitrary textual corpora. Constraction features a generic algorithm that integrates customizable layers of linguistic annotation (e.g., lexical, syntactic, and semantic) to identify constructional patterns of varying sizes and abstraction levels. Its browser-based interface allows users to configure various extraction parameters and enables visual, interactive exploration of the extracted patterns. We demonstrate the utility of Constraction through case studies and discuss its potential applications in language research and pedagogy.
Acknowledgments
We would like to thank the editors of Linguistics Vanguard, as well as the anonymous reviewers for their constructive comments and suggestions.
-
Research funding: This study was supported by the Humanities and Social Sciences Foundation of the Ministry of Education of China (Grant No. 21YJC740068), and the Bilingual Cognition and Development Lab, Center for Linguistics and Applied Linguistics, Guangdong University of Foreign Studies (Grant No. BCD202203).
References
Anthony, Laurence. 2022. AntConc [computer program]. Tokyo: Waseda University. https://www.laurenceanthony.net/software/antconc/ (accessed 1 May 2022).Search in Google Scholar
BNC Consortium. 2007. The British national corpus, version 3 (BNC XML edition). Distributed by Bodleian Libraries, University of Oxford, on behalf of the BNC Consortium. http://www.natcorp.ox.ac.uk (accessed 16 February 2018).Search in Google Scholar
Cappelle, Bert, Yury Shtyrov & Friedemann Pulvermüller. 2010. Heating up or cooling up the brain? MEG evidence that phrasal verbs are lexical units. Brain and Language 115(3). 189–201. https://doi.org/10.1016/j.bandl.2010.09.004.Search in Google Scholar
Ciaramita, Massimiliano & Mark Johnson. 2003. Supersense tagging of unknown nouns in WordNet. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, 168–175. Sapporo: Association for Computational Linguistics.10.3115/1119355.1119377Search in Google Scholar
Culicover, Peter W., Ray Jackendoff & Jenny Audring. 2017. Multiword constructions in the grammar. Topics in Cognitive Science 9(3). 552–568. https://doi.org/10.1111/tops.12255.Search in Google Scholar
de Castilho, Richard Eckart, Éva Mújdricza-Maydt, Seid Muhie Yimam, Silvana Hartmann, Iryna Gurevych, Anette Frank & Chris Biemann. 2016. A web-based tool for the integrated annotation of semantic and syntactic structures. In Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH), 76–84. Osaka: COLING 2016 Organizing Committee.Search in Google Scholar
Dunn, Jonathan. 2017. Computational learning of construction grammars. Language and Cognition 9(2). 254–292. https://doi.org/10.1017/langcog.2016.7.Search in Google Scholar
Dunn, Jonathan. 2022. Exposure and emergence in usage-based grammar: Computational experiments in 35 languages. Cognitive Linguistics 33(4). 659–699. https://doi.org/10.1515/cog-2021-0106.Search in Google Scholar
Ellis, Nick C. & Dave C. Ogden. 2017. Thinking about multiword constructions: Usage-based approaches to acquisition and processing. Topics in Cognitive Science 9(3). 604–620. https://doi.org/10.1111/tops.12256.Search in Google Scholar
Ellis, Nick C., Ute Römer & Matthew B. O’Donnell. 2016. Constructions and usage-based approaches to language acquisition. Language Learning 66(S1). 23–44. https://doi.org/10.1111/lang.1_12177.Search in Google Scholar
Evans, Vyvyan. 2012. Cognitive linguistics. WIREs Cognitive Science 3(2). 129–141. https://doi.org/10.1002/wcs.1163.Search in Google Scholar
Fillmore, Charles J., Russell Lee-Goldman & Russell Rhomieux. 2012. The framenet constructicon. In Hans C. Boas & Ivan A. Sag (eds.), Sign-based construction grammar, 309–372. Stanford, CA: CSLI Publications.Search in Google Scholar
Forsberg, Markus, Richard Johansson, Linnéa Bäckström, Benjamin Lyngfelt, Joel Olofsson & Julia Prentice. 2014. From construction candidates to constructicon entries: An experiment using semi-automatic methods for identifying constructions in corpora. Constructions and Frames 6(1). 114–135. https://doi.org/10.1075/cf.6.1.07for.Search in Google Scholar
Francis, Gill, Susan Hunston & Elizabeth Manning. 1996. Grammar patterns, vol. 1: Verbs. London: HarperCollins.Search in Google Scholar
Gilquin, Gaëtanelle. 2021. Using corpora to foster L2 construction learning: A data-driven learning experiment. International Journal of Applied Linguistics 31(2). 229–247. https://doi.org/10.1111/ijal.12317.Search in Google Scholar
Goldberg, Adele E. 1995. Constructions: A construction grammar approach to argument structure. Chicago: University of Chicago Press.Search in Google Scholar
Goldberg, Adele E. 2006. Constructions at work: The nature of generalization in language. Oxford: Oxford University Press.10.1093/acprof:oso/9780199268511.001.0001Search in Google Scholar
Goldberg, Adele E. 2019. Explain me this: Creativity, competition and the partial productivity of constructions. Princeton: Princeton University Press.10.2307/j.ctvc772nnSearch in Google Scholar
Goldberg, Adele E. & Thomas Herbst. 2021. The nice-of-you construction and its fragments. Linguistics 59(1). 285–318. https://doi.org/10.1515/ling-2020-0274.Search in Google Scholar
Gries, Stefan Th. & Nick C. Ellis. 2015. Statistical measures for usage-based linguistics. Language Learning 65(S1). 228–255. https://doi.org/10.1111/lang.12119.Search in Google Scholar
Hilpert, Martin. 2019. Construction grammar and its application to English, 2nd edn. Edinburgh: Edinburgh University Press.10.1515/9781474433624Search in Google Scholar
Hilpert, Martin & Stefan Th. Gries. 2016. Quantitative approaches to diachronic corpus linguistics. In Merja Kytö & Päivi Pahta (eds.), The Cambridge handbook of English historical linguistics, 36–53. Cambridge: Cambridge University Press.10.1017/CBO9781139600231.003Search in Google Scholar
Hoffmann, Thomas. 2021. English comparative correlatives: Diachronic and synchronic variation at the lexicon-syntax interface (studies in English language). Cambridge: Cambridge University Press.Search in Google Scholar
Hoffmann, Thomas & Graeme Trousdale (eds.). 2013. The Oxford handbook of construction grammar. Oxford: Oxford University Press.10.1093/oxfordhb/9780195396683.001.0001Search in Google Scholar
Huang, Yan, Akira Murakami, Theodora Alexopoulou & Anna Korhonen. 2018. Dependency parsing of learner English. International Journal of Corpus Linguistics 23(1). 28–54. https://doi.org/10.1075/ijcl.16080.hua.Search in Google Scholar
Hunston, Susan. 2019. Patterns, constructions, and applied linguistics. International Journal of Corpus Linguistics 24(3). 324–353. https://doi.org/10.1075/ijcl.00015.hun.Search in Google Scholar
Joty, Shafiq, Giuseppe Carenini & Raymond T. Ng. 2015. CODRA: A novel discriminative framework for rhetorical analysis. Computational Linguistics 41(3). 385–435. https://doi.org/10.1162/COLI_a_00226.Search in Google Scholar
Kilgarriff, Adam, Vít Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý & Vít Suchomel. 2014. The sketch engine: Ten years on. Lexicography 1(1). 7–36. https://doi.org/10.1007/s40607-014-0009-9.Search in Google Scholar
Krause, Thomas & Amir Zeldes. 2016. ANNIS3: A new architecture for generic corpus query and visualization. Digital Scholarship in the Humanities 31(1). 118–139. https://doi.org/10.1093/llc/fqu057.Search in Google Scholar
Leńko-Szymańska, Agnieszka. 2017. Training teachers in data-driven learning: Tackling the challenge. Language, Learning and Technology 21(3). 217–241.Search in Google Scholar
Leufkens, Sterre. 2023. Measuring redundancy: The relation between concord and complexity. Linguistics Vanguard 9(s1). 95–106. https://doi.org/10.1515/lingvan-2020-0143.Search in Google Scholar
Li, Jun, Yifan Cao, Jiong Cai, Yong Jiang & Kewei Tu. 2020. An empirical comparison of unsupervised constituency parsing methods. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 3278–3283. Association for Computational Linguistics.10.18653/v1/2020.acl-main.300Search in Google Scholar
Loenheim, Lisa, Benjamin Lyngfelt, Joel Olofsson, Julia Prentice & Sofia Tingsell. 2016. Constructicography meets (second) language education: On constructions in teaching aids and the usefulness of a Swedish constructicon. In Sabine De Knop & Gaëtanelle Gilquin (eds.), Applied construction grammar, 327–356. Berlin: De Gruyter.10.1515/9783110458268-013Search in Google Scholar
Neves, Mariana & Jurica Ševa. 2021. An extensive review of tools for manual annotation of documents. Briefings in Bioinformatics 22(1). 146–163. https://doi.org/10.1093/bib/bbz130.Search in Google Scholar
O’Keeffe, Anne. 2021. Data-driven learning: A call for a broader research gaze. Language Teaching 54(2). 259–272. https://doi.org/10.1017/S0261444820000245.Search in Google Scholar
Paltridge, Brian & Sue Starfield (eds.). 2013. The handbook of English for specific purposes. Boston: Wiley-Blackwell.10.1002/9781118339855Search in Google Scholar
Perek, Florent & Amanda L. Patten. 2019. Towards an English constructicon using patterns and frames. International Journal of Corpus Linguistics 24(3). 354–384. https://doi.org/10.1075/ijcl.00016.per.Search in Google Scholar
Römer, Ute, Matthew B. O’Donnell & Nick C. Ellis. 2015. Using COBUILD grammar patterns for a large-scale analysis of verb-argument constructions. In Nicholas Groom, Maggie Charles & Suganthi John (eds.), Corpora, grammar and discourse, 43–71. Amsterdam: John Benjamins.10.1075/scl.73.03romSearch in Google Scholar
Schneider, Ulrike. 2020. ΔP as a measure of collocation strength. Corpus Linguistics and Linguistic Theory 16(2). 249–274. https://doi.org/10.1515/cllt-2017-0036.Search in Google Scholar
Scott, Mike. 2021. WordSmith tools [computer program]. Liverpool: Lexical Analysis Software. https://www.lexically.net/wordsmith/downloads/ (accessed 12 March 2022).Search in Google Scholar
Solan, Lawrence M. & Peter M. Tiersma (eds.). 2012. The Oxford handbook of language and law. Oxford: Oxford University Press.10.1093/oxfordhb/9780199572120.001.0001Search in Google Scholar
Traugott, Elizabeth C. & Graeme. Trousdale. 2013. Constructionalization and constructional changes. Oxford: Oxford University Press.10.1093/acprof:oso/9780199679898.001.0001Search in Google Scholar
Tsao, Nai-Lung & David Wible. 2013. Word similarity using constructions as contextual features. In Proceedings of the Joint Symposium on Semantic Processing: Textual Inference and Structures in Corpora, 51–59. Available at: https://aclanthology.org/W13-3818.Search in Google Scholar
Wible, David & Nai-Lung Tsao. 2010. StringNet as a computational resource for discovering and investigating linguistic constructions. In Proceedings of the NAACL HLT Workshop on Extracting and Using Constructions in Computational Linguistics, 25–31. Available at: http://www.aclweb.org/anthology/W/W10/W10-0804.Search in Google Scholar
© 2023 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Editorial
- Editorial 2023
- Research Articles
- Tapped /r/ in RP: a corpus-based sociophonetic study across the twentieth century
- Revisiting English written VP-ellipsis and VP-substitution: a dependency-based analysis
- Agreeing objects in Zulu can be indefinite and non-specific
- On the semantics of (negated) approximative kaada in Classical Arabic: a case for embedded exhaustification
- Imperatives as persuasion strategies in political discourse
- Primate origins of discourse-managing gestures: the case of hand fling
- Basic word order typology revisited: a crosslinguistic quantitative study based on UD and WALS
- The effect of L2 German on grammatical gender access in L1 Polish: proficiency matters
- Validation of two measures for assessing English vocabulary knowledge on web-based testing platforms: brief assessments
- Validation of two measures for assessing English vocabulary knowledge on web-based testing platforms: long-form assessments
- Cerebral asymmetries in the processing of opaque compounds in L1 Polish and L2 English
- Are preschool children sensitive to the function of accessibility markers? A visual world study with German-speaking three- to four-year-olds
- Sensory experience ratings (SERs) for 1,130 Chinese words: relationships with other semantic and lexical psycholinguistic variables
- A corpus-based study of quoi in French native speech
- The overlooked effect of amplitude on within-speaker vowel variation
- Contextualized word senses: from attention to compositionality
- Words of scents: a linguistic analysis of online perfume reviews
- Constraction: a tool for the automatic extraction and interactive exploration of linguistic constructions
- The Red Hen Anonymizer and the Red Hen Protocol for de-identifying audiovisual recordings
- Novel metaphor and embodiment: comprehending novel synesthetic metaphors
Articles in the same Issue
- Frontmatter
- Editorial
- Editorial 2023
- Research Articles
- Tapped /r/ in RP: a corpus-based sociophonetic study across the twentieth century
- Revisiting English written VP-ellipsis and VP-substitution: a dependency-based analysis
- Agreeing objects in Zulu can be indefinite and non-specific
- On the semantics of (negated) approximative kaada in Classical Arabic: a case for embedded exhaustification
- Imperatives as persuasion strategies in political discourse
- Primate origins of discourse-managing gestures: the case of hand fling
- Basic word order typology revisited: a crosslinguistic quantitative study based on UD and WALS
- The effect of L2 German on grammatical gender access in L1 Polish: proficiency matters
- Validation of two measures for assessing English vocabulary knowledge on web-based testing platforms: brief assessments
- Validation of two measures for assessing English vocabulary knowledge on web-based testing platforms: long-form assessments
- Cerebral asymmetries in the processing of opaque compounds in L1 Polish and L2 English
- Are preschool children sensitive to the function of accessibility markers? A visual world study with German-speaking three- to four-year-olds
- Sensory experience ratings (SERs) for 1,130 Chinese words: relationships with other semantic and lexical psycholinguistic variables
- A corpus-based study of quoi in French native speech
- The overlooked effect of amplitude on within-speaker vowel variation
- Contextualized word senses: from attention to compositionality
- Words of scents: a linguistic analysis of online perfume reviews
- Constraction: a tool for the automatic extraction and interactive exploration of linguistic constructions
- The Red Hen Anonymizer and the Red Hen Protocol for de-identifying audiovisual recordings
- Novel metaphor and embodiment: comprehending novel synesthetic metaphors