Home Constraction: a tool for the automatic extraction and interactive exploration of linguistic constructions
Article
Licensed
Unlicensed Requires Authentication

Constraction: a tool for the automatic extraction and interactive exploration of linguistic constructions

  • Hengbin Yan and Yinghui Li EMAIL logo
Published/Copyright: December 12, 2023

Abstract

A central task in empirical and quantitative language studies is the extraction of linguistic constructions important to linguistic theory and application. The great number and variety of such constructions increasingly necessitates computer-assisted extraction, which often proves challenging as it entails a simultaneous analysis of multiple layers of linguistic information latent in large-scale corpora. To address this, we present Constraction, an open-source tool for the automatic extraction and interactive exploration of linguistic constructions from arbitrary textual corpora. Constraction features a generic algorithm that integrates customizable layers of linguistic annotation (e.g., lexical, syntactic, and semantic) to identify constructional patterns of varying sizes and abstraction levels. Its browser-based interface allows users to configure various extraction parameters and enables visual, interactive exploration of the extracted patterns. We demonstrate the utility of Constraction through case studies and discuss its potential applications in language research and pedagogy.


Corresponding author: Yinghui Li, Bilingual Cognition and Development Lab, Center for Linguistics and Applied Linguistics, Guangdong University of Foreign Studies, Guangzhou, China, E-mail:

Acknowledgments

We would like to thank the editors of Linguistics Vanguard, as well as the anonymous reviewers for their constructive comments and suggestions.

  1. Research funding: This study was supported by the Humanities and Social Sciences Foundation of the Ministry of Education of China (Grant No. 21YJC740068), and the Bilingual Cognition and Development Lab, Center for Linguistics and Applied Linguistics, Guangdong University of Foreign Studies (Grant No. BCD202203).

References

Anthony, Laurence. 2022. AntConc [computer program]. Tokyo: Waseda University. https://www.laurenceanthony.net/software/antconc/ (accessed 1 May 2022).Search in Google Scholar

BNC Consortium. 2007. The British national corpus, version 3 (BNC XML edition). Distributed by Bodleian Libraries, University of Oxford, on behalf of the BNC Consortium. http://www.natcorp.ox.ac.uk (accessed 16 February 2018).Search in Google Scholar

Cappelle, Bert, Yury Shtyrov & Friedemann Pulvermüller. 2010. Heating up or cooling up the brain? MEG evidence that phrasal verbs are lexical units. Brain and Language 115(3). 189–201. https://doi.org/10.1016/j.bandl.2010.09.004.Search in Google Scholar

Ciaramita, Massimiliano & Mark Johnson. 2003. Supersense tagging of unknown nouns in WordNet. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, 168–175. Sapporo: Association for Computational Linguistics.10.3115/1119355.1119377Search in Google Scholar

Culicover, Peter W., Ray Jackendoff & Jenny Audring. 2017. Multiword constructions in the grammar. Topics in Cognitive Science 9(3). 552–568. https://doi.org/10.1111/tops.12255.Search in Google Scholar

de Castilho, Richard Eckart, Éva Mújdricza-Maydt, Seid Muhie Yimam, Silvana Hartmann, Iryna Gurevych, Anette Frank & Chris Biemann. 2016. A web-based tool for the integrated annotation of semantic and syntactic structures. In Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH), 76–84. Osaka: COLING 2016 Organizing Committee.Search in Google Scholar

Dunn, Jonathan. 2017. Computational learning of construction grammars. Language and Cognition 9(2). 254–292. https://doi.org/10.1017/langcog.2016.7.Search in Google Scholar

Dunn, Jonathan. 2022. Exposure and emergence in usage-based grammar: Computational experiments in 35 languages. Cognitive Linguistics 33(4). 659–699. https://doi.org/10.1515/cog-2021-0106.Search in Google Scholar

Ellis, Nick C. & Dave C. Ogden. 2017. Thinking about multiword constructions: Usage-based approaches to acquisition and processing. Topics in Cognitive Science 9(3). 604–620. https://doi.org/10.1111/tops.12256.Search in Google Scholar

Ellis, Nick C., Ute Römer & Matthew B. O’Donnell. 2016. Constructions and usage-based approaches to language acquisition. Language Learning 66(S1). 23–44. https://doi.org/10.1111/lang.1_12177.Search in Google Scholar

Evans, Vyvyan. 2012. Cognitive linguistics. WIREs Cognitive Science 3(2). 129–141. https://doi.org/10.1002/wcs.1163.Search in Google Scholar

Fillmore, Charles J., Russell Lee-Goldman & Russell Rhomieux. 2012. The framenet constructicon. In Hans C. Boas & Ivan A. Sag (eds.), Sign-based construction grammar, 309–372. Stanford, CA: CSLI Publications.Search in Google Scholar

Forsberg, Markus, Richard Johansson, Linnéa Bäckström, Benjamin Lyngfelt, Joel Olofsson & Julia Prentice. 2014. From construction candidates to constructicon entries: An experiment using semi-automatic methods for identifying constructions in corpora. Constructions and Frames 6(1). 114–135. https://doi.org/10.1075/cf.6.1.07for.Search in Google Scholar

Francis, Gill, Susan Hunston & Elizabeth Manning. 1996. Grammar patterns, vol. 1: Verbs. London: HarperCollins.Search in Google Scholar

Gilquin, Gaëtanelle. 2021. Using corpora to foster L2 construction learning: A data-driven learning experiment. International Journal of Applied Linguistics 31(2). 229–247. https://doi.org/10.1111/ijal.12317.Search in Google Scholar

Goldberg, Adele E. 1995. Constructions: A construction grammar approach to argument structure. Chicago: University of Chicago Press.Search in Google Scholar

Goldberg, Adele E. 2006. Constructions at work: The nature of generalization in language. Oxford: Oxford University Press.10.1093/acprof:oso/9780199268511.001.0001Search in Google Scholar

Goldberg, Adele E. 2019. Explain me this: Creativity, competition and the partial productivity of constructions. Princeton: Princeton University Press.10.2307/j.ctvc772nnSearch in Google Scholar

Goldberg, Adele E. & Thomas Herbst. 2021. The nice-of-you construction and its fragments. Linguistics 59(1). 285–318. https://doi.org/10.1515/ling-2020-0274.Search in Google Scholar

Gries, Stefan Th. & Nick C. Ellis. 2015. Statistical measures for usage-based linguistics. Language Learning 65(S1). 228–255. https://doi.org/10.1111/lang.12119.Search in Google Scholar

Hilpert, Martin. 2019. Construction grammar and its application to English, 2nd edn. Edinburgh: Edinburgh University Press.10.1515/9781474433624Search in Google Scholar

Hilpert, Martin & Stefan Th. Gries. 2016. Quantitative approaches to diachronic corpus linguistics. In Merja Kytö & Päivi Pahta (eds.), The Cambridge handbook of English historical linguistics, 36–53. Cambridge: Cambridge University Press.10.1017/CBO9781139600231.003Search in Google Scholar

Hoffmann, Thomas. 2021. English comparative correlatives: Diachronic and synchronic variation at the lexicon-syntax interface (studies in English language). Cambridge: Cambridge University Press.Search in Google Scholar

Hoffmann, Thomas & Graeme Trousdale (eds.). 2013. The Oxford handbook of construction grammar. Oxford: Oxford University Press.10.1093/oxfordhb/9780195396683.001.0001Search in Google Scholar

Huang, Yan, Akira Murakami, Theodora Alexopoulou & Anna Korhonen. 2018. Dependency parsing of learner English. International Journal of Corpus Linguistics 23(1). 28–54. https://doi.org/10.1075/ijcl.16080.hua.Search in Google Scholar

Hunston, Susan. 2019. Patterns, constructions, and applied linguistics. International Journal of Corpus Linguistics 24(3). 324–353. https://doi.org/10.1075/ijcl.00015.hun.Search in Google Scholar

Joty, Shafiq, Giuseppe Carenini & Raymond T. Ng. 2015. CODRA: A novel discriminative framework for rhetorical analysis. Computational Linguistics 41(3). 385–435. https://doi.org/10.1162/COLI_a_00226.Search in Google Scholar

Kilgarriff, Adam, Vít Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý & Vít Suchomel. 2014. The sketch engine: Ten years on. Lexicography 1(1). 7–36. https://doi.org/10.1007/s40607-014-0009-9.Search in Google Scholar

Krause, Thomas & Amir Zeldes. 2016. ANNIS3: A new architecture for generic corpus query and visualization. Digital Scholarship in the Humanities 31(1). 118–139. https://doi.org/10.1093/llc/fqu057.Search in Google Scholar

Leńko-Szymańska, Agnieszka. 2017. Training teachers in data-driven learning: Tackling the challenge. Language, Learning and Technology 21(3). 217–241.Search in Google Scholar

Leufkens, Sterre. 2023. Measuring redundancy: The relation between concord and complexity. Linguistics Vanguard 9(s1). 95–106. https://doi.org/10.1515/lingvan-2020-0143.Search in Google Scholar

Li, Jun, Yifan Cao, Jiong Cai, Yong Jiang & Kewei Tu. 2020. An empirical comparison of unsupervised constituency parsing methods. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 3278–3283. Association for Computational Linguistics.10.18653/v1/2020.acl-main.300Search in Google Scholar

Loenheim, Lisa, Benjamin Lyngfelt, Joel Olofsson, Julia Prentice & Sofia Tingsell. 2016. Constructicography meets (second) language education: On constructions in teaching aids and the usefulness of a Swedish constructicon. In Sabine De Knop & Gaëtanelle Gilquin (eds.), Applied construction grammar, 327–356. Berlin: De Gruyter.10.1515/9783110458268-013Search in Google Scholar

Neves, Mariana & Jurica Ševa. 2021. An extensive review of tools for manual annotation of documents. Briefings in Bioinformatics 22(1). 146–163. https://doi.org/10.1093/bib/bbz130.Search in Google Scholar

O’Keeffe, Anne. 2021. Data-driven learning: A call for a broader research gaze. Language Teaching 54(2). 259–272. https://doi.org/10.1017/S0261444820000245.Search in Google Scholar

Paltridge, Brian & Sue Starfield (eds.). 2013. The handbook of English for specific purposes. Boston: Wiley-Blackwell.10.1002/9781118339855Search in Google Scholar

Perek, Florent & Amanda L. Patten. 2019. Towards an English constructicon using patterns and frames. International Journal of Corpus Linguistics 24(3). 354–384. https://doi.org/10.1075/ijcl.00016.per.Search in Google Scholar

Römer, Ute, Matthew B. O’Donnell & Nick C. Ellis. 2015. Using COBUILD grammar patterns for a large-scale analysis of verb-argument constructions. In Nicholas Groom, Maggie Charles & Suganthi John (eds.), Corpora, grammar and discourse, 43–71. Amsterdam: John Benjamins.10.1075/scl.73.03romSearch in Google Scholar

Schneider, Ulrike. 2020. ΔP as a measure of collocation strength. Corpus Linguistics and Linguistic Theory 16(2). 249–274. https://doi.org/10.1515/cllt-2017-0036.Search in Google Scholar

Scott, Mike. 2021. WordSmith tools [computer program]. Liverpool: Lexical Analysis Software. https://www.lexically.net/wordsmith/downloads/ (accessed 12 March 2022).Search in Google Scholar

Solan, Lawrence M. & Peter M. Tiersma (eds.). 2012. The Oxford handbook of language and law. Oxford: Oxford University Press.10.1093/oxfordhb/9780199572120.001.0001Search in Google Scholar

Traugott, Elizabeth C. & Graeme. Trousdale. 2013. Constructionalization and constructional changes. Oxford: Oxford University Press.10.1093/acprof:oso/9780199679898.001.0001Search in Google Scholar

Tsao, Nai-Lung & David Wible. 2013. Word similarity using constructions as contextual features. In Proceedings of the Joint Symposium on Semantic Processing: Textual Inference and Structures in Corpora, 51–59. Available at: https://aclanthology.org/W13-3818.Search in Google Scholar

Wible, David & Nai-Lung Tsao. 2010. StringNet as a computational resource for discovering and investigating linguistic constructions. In Proceedings of the NAACL HLT Workshop on Extracting and Using Constructions in Computational Linguistics, 25–31. Available at: http://www.aclweb.org/anthology/W/W10/W10-0804.Search in Google Scholar

Received: 2022-10-14
Accepted: 2023-08-10
Published Online: 2023-12-12

© 2023 Walter de Gruyter GmbH, Berlin/Boston

Articles in the same Issue

  1. Frontmatter
  2. Editorial
  3. Editorial 2023
  4. Research Articles
  5. Tapped /r/ in RP: a corpus-based sociophonetic study across the twentieth century
  6. Revisiting English written VP-ellipsis and VP-substitution: a dependency-based analysis
  7. Agreeing objects in Zulu can be indefinite and non-specific
  8. On the semantics of (negated) approximative kaada in Classical Arabic: a case for embedded exhaustification
  9. Imperatives as persuasion strategies in political discourse
  10. Primate origins of discourse-managing gestures: the case of hand fling
  11. Basic word order typology revisited: a crosslinguistic quantitative study based on UD and WALS
  12. The effect of L2 German on grammatical gender access in L1 Polish: proficiency matters
  13. Validation of two measures for assessing English vocabulary knowledge on web-based testing platforms: brief assessments
  14. Validation of two measures for assessing English vocabulary knowledge on web-based testing platforms: long-form assessments
  15. Cerebral asymmetries in the processing of opaque compounds in L1 Polish and L2 English
  16. Are preschool children sensitive to the function of accessibility markers? A visual world study with German-speaking three- to four-year-olds
  17. Sensory experience ratings (SERs) for 1,130 Chinese words: relationships with other semantic and lexical psycholinguistic variables
  18. A corpus-based study of quoi in French native speech
  19. The overlooked effect of amplitude on within-speaker vowel variation
  20. Contextualized word senses: from attention to compositionality
  21. Words of scents: a linguistic analysis of online perfume reviews
  22. Constraction: a tool for the automatic extraction and interactive exploration of linguistic constructions
  23. The Red Hen Anonymizer and the Red Hen Protocol for de-identifying audiovisual recordings
  24. Novel metaphor and embodiment: comprehending novel synesthetic metaphors
Downloaded on 7.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/lingvan-2022-0122/html
Scroll to top button