Abstract
This article showcases elastic net regression as a means to build fairer models of morphosyntactic variation. Elastic net allows lexical items to appear on the same level as traditional, high-level predictors, enabling fuller models of variation. We apply elastic net regression to 1,296,574 Dutch verbal cluster tokens from the SoNaR corpus, analysing a morphosyntactic alternance in Dutch subordinate clauses. Our results show morphosyntactic preferences among verbs, indicating that semantic effects are indeed at play. Further analysis shows that semantic patterns for either word order exist, though it remains difficult to glean any semantic generalisations. Still, the elastic net technique shows that the inclusion of lexical items as full predictors in a model is useful, as much of the variation left unexplained by high-level predictors can be explained in lexical terms.
Funding source: Fonds Wetenschappelijk Onderzoek
Award Identifier / Grant number: G059922N
Acknowledgements
We would like to thank Jelke Bloem for providing the odds ratio data from his study and Jeroen van Craenenbroeck for his advice on formal syntax. We truly appreciate their help.
-
Research funding: This work was supported by Fonds Wetenschappelijk Onderzoek (https://doi.org/10.13039/501100003130, grant no. G059922N).
A.1 Try it yourself
To aid the analysis of the elastic net coefficients, a Javascript-based interactive analysis tool, Rekker, was developed (Sevenants 2023e). You can look at the dataset and elastic net results yourself by visiting anthesevenants.github.io/Rekker/.
A.2 Corpus and querying
To compute the semantic pull of the different verbs in the red and green word order, we collected all red and green verb clusters in subordinate clauses in the SoNaR corpus (Oostdijk et al. 2008) and SoNaR New Media corpus (Oostdijk et al. 2014). Since we are interested in syntactic alternances, we needed a syntactically informed corpus format (“treebank”) in order to reliably find the attestations we need. While the SoNaR corpus does not ship with syntactic information, much of the corpus material from SoNaR is also included in the Lassy corpus (van Noord et al. 2013), which is syntactically annotated using Alpino (van Noord 2006). We retrieved the syntactic information from SoNaR available in Lassy, and parsed the remaining sentences using Alpino ourselves. This left us with a fully syntactically informed SoNaR corpus, ready to be queried for red and green word orders.
In order to query the syntactic information of the entire SoNaR corpus, we used mattenklopper (Sevenants 2023c), a treebank search engine tailor-made for this study. While there are several Alpino search engines available, many of which are much more user friendly and faster than the custom search engine used here (e.g. Gretel by Augustinus et al. 2012, PaQu by Kleiweg 2023), these engines all have specific problems which made it so they could not be used for this study. In both GrETEL and PaQu, it is only possible to retrieve entire sentences. One cannot further retrieve the participles or auxiliaries in a verb cluster – this must be done manually. GrETEL only supports searching through subsections of SoNaR[9] and PaQu does not even offer the SoNaR corpus for querying. Finally, GrETEL results are limited to only 500 sentences due to copyright concerns, which is not enough for a sophisticated analysis. The custom mattenklopper engine was developed as a solution to all these problems. It is available online and can be used for future alternance studies of Dutch using Alpino-based corpora. The xpath queries used to search the corpus are included in Subsection A.7. The mattenklopper search engine returned 1,604,412 attestations of either the red or green word order.
A.3 Filtering and enriching
The mattenklopper results were further filtered in order to guarantee the quality of the attestations. In short, duplicates were removed, tokenisation errors were fixed (i.e. removing superfluous punctuation from participles) and obvious tagging mistakes were removed (e.g. words such as zgn ‘so-called’ and gemiddeld ‘average’ were removed). In addition, wrong participle endings (e.g. gebeurt instead of gebeurd ‘happened’) were corrected using naive-dt-fix (Sevenants 2023d), a library for the R language designed for this study. This library automatically corrects wrong participle endings by relying on the relative frequencies of all possible spellings. The most frequent spelling is seen as the correct spelling and is used as a correction.[10] Declensed words were also removed (e.g. geplaatst e ). Past participles cannot be declensed in Dutch, so all declensed forms in the corpus tagged as participles are, in fact, mis-tagged adjectives. We also removed all verb clusters with an auxiliary other than hebben, zijn or worden and removed all attestations without a sentence ID (which we need to compute priming). In addition, all types occurring less than 10 times were removed in order to guarantee a stable estimate for the semantic pulls of each type. As a result of these operations, 177,014 attestations were removed.
Furthermore, the attestations were enriched with additional information to be used in the multifactorial elastic net regression. Firstly, regional information was added for each attestation. SoNaR comes with contextual information about its documents, such as country of origin information. Since region is an important influence in the red-green word order, this variable is vital for multifactorial control.
We also used the subcorpus division in SoNaR (e.g. WR-P-E-A_discussion_lists, WR-P-E-F_press_releases) to distinguish between edited and unedited genres. We decided to focus on an edited-unedited dichotomy, because it is difficult to assess the formality of certain genres in the corpus (e.g. websites and blogs). By focussing on whether a genre is typically edited or not, we sidestep these issues, but we are still able to include some form of formality distinction. Refer to Table 8 for an overview of our judgements.
An overview of the SoNaR subcorpora and our edited-unedited judgement.
| Subcorpus | Contents | Degree of editing |
|---|---|---|
| WR-P-E-A | Discussion lists | Unedited |
| WR-P-E-C | e-magazines | Edited |
| WR-P-E-E | Newsletters | No attestations* |
| WR-P-E-F | Press releases | Edited |
| WR-P-E-G | Subtitles | Edited |
| WR-P-E-H | Teletext pages | Edited |
| WR-P-E-I | Websites | Edited |
| WR-P-E-J | Wikipedia | Edited |
| WR-P-E-K | Blogs | Edited |
| WR-P-P-B | Books | Edited |
| WR-P-P-C | Brochures | Edited |
| WR-P-P-D | Newsletters | Edited |
| WR-P-P-E | Guides, manuals | Edited |
| WR-P-P-F | Legal texts | Edited |
| WR-P-P-G | Newspapers | Edited |
| WR-P-P-H | Periodicals, magazines | Edited |
| WR-P-P-I | Policy documents | Edited |
| WR-P-P-J | Proceedings | Edited |
| WR-P-P-K | Reports | Edited |
| WR-U-E-E | Written assignments | Edited |
| WS-U-E-A | Auto cues | Edited |
| WS-U-T-B | Texts for the visually impaired | Edited |
| WR-P-E-L | Tweets | Unedited |
| WR-U-E-A | Chats | Unedited |
| WR-U-E-D | Sms | Unedited |
-
*The newsletters subcorpus is incredibly small, hence why we have no attestations.
Adjectiveness information was added for all participles. Adjectiveness is expressed as a ratio denoting how often a participle functions as an adjective in language use:
0 denotes no adjectival use, 1 denotes maximal adjectival use. We computed adjectiveness on the entire Lassy corpus (van Noord et al. 2013).[11]
Because the Alpino syntactic parser marks separable verbs by infixing an underscore (_) between the preposition and verb root, we can exploit this behaviour to automatically infer whether a verb cluster contains a separable verb.
To compute the length of the middle field, we calculated the number of words between the start of the clause and the verbal cluster itself. This information is based on the tokenisation of the SoNaR corpus.
We included frequency information from the SUBTLEX dataset (Keuleers et al. 2010) in order be able to assess the effect of frequency. Because frequency is typically Zipfian (Zipf 1965), we transformed the frequency information using the natural logarithm for a multitude of reasons: (1) to compress the frequency variation among the types in our dataset (2) to make the distribution of priming more normal (3) because it makes the distribution more psychologically real.
Priming information is also important to include. To obtain priming information, we relied on the sentence IDs included in the SoNaR corpus. Consider the following example:
WR-P-P-B-0000000103.p.37.s.4
The ID refers to document 103 of the WR-P-P-B component of the SoNaR corpus (“books”). Within that document, it refers to the 4th sentence of the 37th paragraph. The window we chose for priming is one paragraph: this means that in our example, we would consider all attestations from paragraph 36 and all sentences leading up to sentence 4 of paragraph 37 to be possible prime sources. It was not possible to work on the sentence level, since paragraphs can have a variable number of sentences and not all sentences have red-green attestations in the dataset.
We included priming in our model by using a corrected log-odds measure, which we will call the “priming ratio”. For every attestation, we computed the following equation:
We computed the ratio between the number of red and green primes and used Laplace smoothing (Brysbaert and Diependaele 2013) to prevent division by zero. The natural logarithm attenuates large disparities between red and green and turns our priming ratio into a continuous variable ranging from −∞ to +∞.
As a final step, we removed all participles for which no adjectiveness value was defined, as these were found not to be participles but mis-taggings. For the same reason, participles with an adjectiveness value of over 0.9 were removed. In addition, all attestations for which no region information was defined were also removed, because they lack the information required for the multifactorial analysis. As a result of these two steps, another 256,112 items were removed.
A.4 Converting the dataset
To compute the semantic preference of the participles found in our attestations, we used elastic net regression, the technique detailed in Section 2. In contrast to regular regression techniques, a tabular dataset cannot be used “as-is” for analysis with elastic net. Instead, the dataset has to be supplied in a matrix form. Consider the toy example in Table 9.
Toy example dataset to illustrate the workings of elastic net regression.
| Word order | Participle | Country | Adjectiveness |
|---|---|---|---|
| Green | gebroken | Belgium | 0.5 |
| Red | mislukt | The Netherlands | 0.4 |
| Green | gebeurd | Belgium | 0.1 |
In the matrix form, each multidimensional column is converted so that each unique value of that column becomes its own predictor. In our case, all unique values of the column participle will become binary predictors, each predictor indicating whether that participle occurs in the verb cluster or not. This means our matrix will be inherently sparse, since each verbal cluster can only feature one participle. Binary predictors such as country are also converted to a binary column in the matrix, and simply indicate a deviation from the reference level. For example, if is_be is a binary column, a Belgian attestation will be encoded as 1, and a Netherlandic attestation as 0. The adjectiveness column is numeric and can be adopted as-is. The response variable word order is also encoded as a binary variable, as is typical in logistic regression, but it is not a part of the input matrix. The input matrix for how our toy example would look is given in Table 10. The response variables would be encoded as [0, 1, 0] with the red order as response variable 1.
Example input matrix to illustrate the workings of elastic net regression.
| is_gebroken | is_mislukt | is_gebeurd | is_BE | Adjectiveness |
|---|---|---|---|---|
| 1 | 0 | 0 | 1 | 0.5 |
| 0 | 1 | 0 | 0 | 0.4 |
| 0 | 0 | 1 | 1 | 0.1 |
To facilitate the conversion process, we used ElasticToolsR (Sevenants 2023b), an R library written for this study. It can automatically convert “traditional” datasets to the matrix format detailed above in seconds.
A.5 Bayesian correlations
To provide more robust evidence for the correlation between our elastic net coefficients and the results of previous studies, we have also computed the Bayesian correlations between the two, complete with their credible intervals (CI), according to Van Doorn et al. (2018). The results are given in Tables 11 and 12.
The Bayesian correlation results for the comparison between De Sutter’s LLR values and our elastic net coefficients.
| Measure | Estimate | CI |
|---|---|---|
| Pearson | 0.386 | 0.172–0.561 |
| Kendall | 0.457 | 0.289–0.587 |
The Bayesian correlation results for the comparison between Bloem’s OR values and our elastic net coefficients.
| Measure | Estimate | CI |
|---|---|---|
| Pearson | 0.494 | 0.449–0.537 |
| Kendall | 0.315 | 0.273–0.353 |
A.6 Comparison tables
Tables 13–16 show the inconsistenties between our results and those of De Sutter et al. (2005) and Bloem (2021).
Overview of all participles which have significant LLR values, but were eliminated in our elastic net regression model.
| Participle | LLR | Elastic net coefficient |
|---|---|---|
| doorzocht | −6.66 | 0 |
| geïnspireerd | −6.66 | 0 |
| gerept | −4.67 | 0 |
| vergemakkelijkt | −4.44 | 0 |
| opgeheven | 4.01 | 0 |
| teruggevonden | 4.01 | 0 |
Overview of all participles which have significant LLR values, but do not appear in our dataset and therefore do not have coefficients.
| Participle | LLR | Elastic net coefficient |
|---|---|---|
| bereid | −15.56 | None |
| bevoegd | −8.88 | None |
| bewust | −6.66 | None |
| gekant | −6.49 | None |
| geneigd | −5.93 | None |
| geoorloofd | −4.67 | None |
| geschikt | −4.44 | None |
| gezond | −4.44 | None |
| verkeerd | −4.44 | None |
Sample of all participles which have OR values, but were eliminated in our elastic net regression model. ORs converted to logits.
| Participle | Logit | Elastic net coefficient |
|---|---|---|
| aanbeden | −0.2884802 | 0 |
| aangehaald | 0.3161285 | 0 |
| aangeduid | 0.7527511 | 0 |
| aangeklaagd | 0.7649804 | 0 |
| aangemeld | 1.1174773 | 0 |
| aangekocht | 1.1233365 | 0 |
| aangebroken | 1.2100116 | 0 |
| aangeleverd | 1.3297969 | 0 |
Sample of all participles which have logit values, but do not appear in our dataset and therefore do not have coefficients. ORs converted to logits.
| Participle | Logit | Elastic net coefficient |
|---|---|---|
| baseren | −0.8770771 | None |
| afkorten | −0.2571632 | None |
| afstemmen | 0.3392527 | None |
| aanwennen | 0.4346616 | None |
| aanbouwen | 0.4984842 | None |
| aflopen | 0.7253951 | None |
| aanplanten | 1.2493800 | None |
| aanhangen | 1.6617019 | None |
A.7 Xpath queries
A.7.1 Xpath queries for identifying eligible clauses
Red order
//node[(@cat=”cp” or @cat=”rel” or @cat=”inf”) and //node[(@wvorm=”pv” or @wvorm=”inf”) and @begin < ./preceding−sibling::node/node[@wvorm=”vd”]/@begin | ./following−sibling::node/node[@wvorm=”vd”]/@begin]]
Green order
//node[(@cat=”cp” or @cat=”rel” or @cat=”inf”) and //node[(@wvorm=”pv” or @wvorm=”inf”) and @begin > ./preceding−sibling::node/node[@wvorm=”vd”]/@begin | ./following−sibling::node/node[@wvorm=”vd”]/@begin]]
A.7.2 Xpath queries for retrieving verb cluster participle
.//node[@rel=”hd” and @wvorm=”vd” and @begin $SIGN$ ../../node[@rel=”hd” and @pt=”ww”]/@begin and not(../../@cat=”smain”) and ../../../../node[@id=”$ID$”]]
with $SIGN = > for the red order, < for the green order, and $ID = the ID of the parent sentence
A.7.3 Xpath queries for retrieving verb cluster auxiliary
.//node[@rel=”hd” and @pt=”ww” and @begin $SIGN$ ../node/node[@rel=”hd” and @wvorm=”vd”]/@begin and not(../@cat=”smain”) and ../../../node[@id=”$ID$”]]
with $SIGN = < for the red order, > for the green order, and $ID = the ID of the parent sentence
References
Adger, David & Graeme Trousdale. 2007. Variation in English syntax: Theoretical implications. English Language and Linguistics 11(2). 261–278. https://doi.org/10.1017/S1360674307002250.Suche in Google Scholar
Augustinus, Liesbeth. 2015. Complement raising and cluster formation in Dutch. PhD thesis. https://www.lotpublications.nl/complement-raising-and-cluster-formation-in-dutch (Accessed 18 June 2024).Suche in Google Scholar
Augustinus, Liesbeth, Vincent Vandeghinste, Frank Van Eynde, Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk & Stelios Piperidis. 2012. Example-based treebank querying. In Proceedings of the 8th international conference on language resources and evaluation (LREC 2012), 3161–3167. Paris: ELRA. https://aclanthology.org/L12-1442/ (Accessed 17 May 2023).Suche in Google Scholar
Barbiers, Sjef, Hans Bennis & Lotte Dros-Hendriks. 2018. Merging verb cluster variation. Linguistic Variation 18(1). 144–196. https://doi.org/10.1075/lv.00008.bar.Suche in Google Scholar
Bloem, Jelke. 2021. Processing verb clusters. LOT international series, vol. 586. Amsterdam: LOT. https://doi.org/10.48273/LOT0586.Suche in Google Scholar
Bossuyt, Tom. 2019. Oppassen geblazen*: Over vormelijke, semantische en historische aspecten van de Nederlandse geblazen-constructie [Oppassen geblazen*: About formal, semantic and historical aspects of the Dutch geblazen-construction]. Nederlandse Taalkunde 24(3). 259–290. https://doi.org/10.5117/NEDTAA2019.3.001.BOSS.Suche in Google Scholar
Brysbaert, Marc & Kevin Diependaele. 2013. Dealing with zero word frequencies: A review of the existing rules of thumb and a suggestion for an evidence-based choice. Behavior Research Methods 45(2). 422–430. https://dx.doi.org/10.3758/s13428-012-0270-5.10.3758/s13428-012-0270-5Suche in Google Scholar
Colleman, Timothy. 2009. Verb disposition in argument structure alternations: A corpus study of the dative alternation in Dutch. Language Sciences 31(5). 593–611. https://doi.org/10.1016/j.langsci.2008.01.001.Suche in Google Scholar
Croft, William. 2010. Construction grammar. The Oxford handbook of cognitive linguistics, 463–508. Oxford University Press.10.1093/oxfordhb/9780199738632.013.0018Suche in Google Scholar
De Sutter, Gert, Dirk Geeraerts & DirkSpeelman. 2005. Rood, groen, corpus! Een taalgebruiksgebaseerde analyse van woordvolgordevariatie in tweeledige werkwoordelijke eindgroepen [Red, green, corpus! A usage-based analysis of word order variation in two-part verbal clusters]. Leuven: KU Leuven PhD thesis.Suche in Google Scholar
Evers, Arnold. 1975. The transformational cycle in Dutch and German. Amsterdam: Utrecht University PhD thesis.Suche in Google Scholar
Friedman, Jerome, Robert Tibshirani & Trevor Hastie. 2010. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33(1). 1–22. https://doi.org/10.18637/jss.v033.i01.Suche in Google Scholar
Geeraerts, Dirk. 2005. Lectal variation and empirical data in cognitive linguistics. In Cognitive linguistics: Internal dynamics and interdisciplinary interaction, vol. 32 (Cognitive linguistics research), 163–189. Berlin: Mouton de Gruyter.10.1515/9783110197716.2.163Suche in Google Scholar
Grafmiller, Jason, Benedikt Szmrecsanyi, Melanie Röthlisberger & Benedikt Heller. 2018. General introduction: A comparative perspective on probabilistic variation in grammar. Glossa: A Journal of General Linguistics 3(1). https://doi.org/10.5334/gjgl.690.Suche in Google Scholar
Gries, Stefan Thomas. 2015. The most under-used statistical method in corpus linguistics: Multi-level (and mixed-effects) models. Corpora 10(1). 95–125. https://doi.org/10.3366/cor.2015.0068.Suche in Google Scholar
Haeseryn, Walter, Kirsten Romijn, Guido Geerts, Jaap de Rooij & Maarten van den Toorn. 1997. 30.3.2.1 Het werkwoord [30.3.2.1 The verb] https://e-ans.ivdnt.org/topics/pid/ans30030201lingtopic (Accessed 28 March 2024).Suche in Google Scholar
Haiman, John. 1980. The iconicity of grammar: Isomorphism and motivation. Language 56(3). 515–540. https://doi.org/10.2307/414448.Suche in Google Scholar
Hartigan, John A. 1975. Clustering algorithms. Michigan: John Wiley & Sons, Inc.Suche in Google Scholar
Hoffmann, Thomas & Graeme Trousdale. 2013. Construction grammar: Introduction. In The Oxford handbook of construction grammar, 1–9. Oxford: Oxford University Press.10.1093/oxfordhb/9780195396683.001.0001Suche in Google Scholar
Hurford, James R. 2014. The origins of language: A slim guide (Oxford linguistics), 173. Oxford: University Press.Suche in Google Scholar
Israel, Michael. 1996. The way constructions grow. In Adele Goldberg (ed.), Conceptual structure, Discourse and language, 217–230. Stanford: Stanford University Press.Suche in Google Scholar
Kaufman, Leonard & Peter J. Rousseeuw. 1990. Partitioning around Medoids (Program PAM). Finding groups in data, 68–125. New York: John Wiley & Sons, Ltd.10.1002/9780470316801.ch2Suche in Google Scholar
Keuleers, Emmanuel, Marc Brysbaert & Boris New. 2010. SUBTLEX-NL: A new measure for Dutch word frequency based on film subtitles. Behavior Research Methods 42(3). 643–650. https://doi.org/10.3758/BRM.42.3.643.Suche in Google Scholar
Kleiweg, Peter. 2023 PaQu. https://github.com/rug-compling/paqu (Accessed 17 May 2023).Suche in Google Scholar
Labov, William. 1972. Sociolinguistic patterns (Conduct and communication). Philadelphia: University of Pennsylvania Press.Suche in Google Scholar
Lander, Jared P., Nicholas Galasinao, Joshua Kraut & Daniel Chen. 2023. Useful: A collection of Handy, useful functions. https://cran.r-project.org/web/packages/useful/index.html (Accessed 18 April 2024).Suche in Google Scholar
Lenth, Russell V. 2024. Emmeans: Estimated marginal means, aka least-squares means. R package version 1.10.1. Available at: https://github.com/rvlenth/emmeans.Suche in Google Scholar
Levshina, Natalia & Kris Heylen. 2014. A radically data-driven construction Grammar: Experiments with Dutch causative constructions. Extending the Scope of Construction Grammar 54. 17.10.1515/9783110366273.17Suche in Google Scholar
Maechler, Martin, Peter Rousseeuw, Anja Struyf, Mia Hubert & Kurt Hornik. 2022. Cluster: Cluster analysis Basics and extensions. Available at: https://CRAN.R-project.org/package=cluster.Suche in Google Scholar
Mikolov, Tomas, Kai Chen, Greg Corrado & Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv: 1301.3781 [cs.CL].Suche in Google Scholar
Montes, Mariana. 2021. Cloudspotting: Visual analytics for distributional semantics dissertation. https://lirias.kuleuven.be/retrieve/630179 (Accessed 30 November 2021).Suche in Google Scholar
Nettle, Daniel & Robin Dunbar. 1997. Social markers and the evolution of reciprocal exchange. Current Anthropology 38(1). 93–99. https://doi.org/10.1086/204588.Suche in Google Scholar
Oostdijk, Nelleke, Martin Reynaert, Paola Monachesi, Gertjan van Noord, Roeland Ordelman, Ineke Schuurman & Vincent Vandeghinste. 2008. From D-Coi to SoNaR: A reference corpus for Dutch. Available at: https://aclanthology.org/L08-1226/.Suche in Google Scholar
Oostdijk, Nelleke, Martin Reynaert, Veronique Hoste, Henk van den Heuvel, Orphee de Clercq & Ewoud Sanders & Creative Computing. 2014. SoNaR nieuw media corpus. https://research.tilburguniversity.edu/en/publications/ac128452-d97c-4290-8e65-12a1462ba47d (Accessed 17 May 2023).Suche in Google Scholar
Pardoen, Justine. 1991. De interpretatie van zinnen met de rode en de groene volgorde [The interpretation of sentences in the red and green order]. In Forum der letteren, Vol. 32, 22.Suche in Google Scholar
Pijpops, Dirk, De Smet Isabeau & Freek Van de Velde. 2018. Constructional contamination in morphology and syntax: Four case studies. Constructions and Frames 10(2). 269–305. https://doi.org/10.1075/cf.00021.pij.Suche in Google Scholar
Schäfer, Roland & Felix Bildhauer. 2012. Building large corpora from the web using a new efficient tool chain. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk & Stelios Piperidis (eds.), Proceedings of the eighth international conference on language resources and evaluation (LREC’12), 486–493. Istanbul, Turkey: European Language Resources Association (ELRA). Available at: https://www.lrec-conf.org/proceedings/lrec2012/pdf/834_Paper.pdf.Suche in Google Scholar
Sevenants, Anthe. 2023a. Adjectiveness dataset for past participles in Dutch. Leuven. https://doi.org/10.5281/zenodo.7753211Suche in Google Scholar
Sevenants, Anthe. 2023b. ElasticToolsR. Version 1.3. Leuven. Available at: https://github.com/AntheSevenants/ElasticToolsR/tree/v1.3.Suche in Google Scholar
Sevenants, Anthe. 2023c. Mattenklopper. Version 1.0. Leuven. Available at: https://github.com/AntheSevenants/mattenklopper/releases/tag/v1.0.Suche in Google Scholar
Sevenants, Anthe. 2023d. naive-dt-fix. Version 1.2. Leuven. Available at: https://github.com/AntheSevenants/naive-dt-fix/tree/v1.2.Suche in Google Scholar
Sevenants, Anthe. 2023e. Rekker. Version 1.0. Leuven. Available at: https://github.com/AntheSevenants/Rekker.Suche in Google Scholar
Speed, Laura J. & Marc Brysbaert. 2023. Ratings of valence, arousal, happiness, anger, fear, sadness, disgust, and surprise for 24,000 Dutch words. Behavior Research Methods 56. 5023–5039. https://doi.org/10.3758/s13428-023-02239-6.Suche in Google Scholar
Stefanowitsch, Anatol. 2013. Collostructional analysis. In Thomas Hoffmann & Graeme Trousdale (eds.), The Oxford Handbook of construction grammar. Oxford: Oxford University Press.10.1093/oxfordhb/9780195396683.013.0016Suche in Google Scholar
Tulkens, Stephan, Chris Emmery & Walter Daelemans. 2016. Evaluating unsupervised Dutch word embeddings as a linguistic resource. In Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk & Stelios Piperidis (eds.), Proceedings of the tenth international conference on language resources and evaluation (LREC 2016). Portorož, Slovenia: European Language Resources Association (ELRA).Suche in Google Scholar
van Craenenbroeck, Jeroen, Marjo van Koppen & Antal van den Bosch. 2019. A quantitative-theoretical analysis of syntactic microvariation: Word order in Dutch verb clusters. Language 95(2). 333–370. https://doi.org/10.1353/lan.2019.0033.Suche in Google Scholar
Van de Velde, Freek & Dirk Pijpops. 2019. Investigating lexical effects in syntax with regularized regression (Lasso). Journal of Research Design and Statistics in Linguistics and Communication Science 6(2). 166–199. https://doi.org/10.1558/jrds.18964.Suche in Google Scholar
Van Doorn, Johnny, Alexander Ly, Maarten Marsman & Eric-Jan Wagenmakers. 2018. Bayesian inference for Kendall’s Rank correlation coefficient. The American Statistician 72(4). 303–308. https://doi.org/10.1080/00031305.2016.1264998.Suche in Google Scholar
van Noord, Gertjan. 2006. At last parsing is now operational. In Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Conférences invitées, 20–42. Leuven: ATALA. https://aclanthology.org/2006.jeptalnrecital-invite.2 (Accessed 15 April 2023).Suche in Google Scholar
van Noord, Gertjan, Gosse Bouma, Frank Van Eynde, Daniel De Kok, Jelmer Van der Linde, Ineke Schuurman, Erik Tjong Kim Sang & Vincent Vandeghinste. 2013. Large scale syntactic annotation of written Dutch: Lassy. In Essential speech and language technology for Dutch: Resources, tools and applications, 147–164. Berlin & Heidelberg: Springer.10.1007/978-3-642-30910-6_9Suche in Google Scholar
Vossen, Piek, Attila Görög, Rubén Izquierdo & Antal van den Bosch. 2012. DutchSemCor: Targeting the ideal sense-tagged corpus. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk & Stelios Piperidis (eds.), Proceedings of the eighth international conference on language resources and evaluation (LREC’12), 584–589. Istanbul, Turkey: European Language Resources Association (ELRA). Available at: http://www.lrec-conf.org/proceedings/lrec2012/pdf/187_Paper.pdf.Suche in Google Scholar
Vossen, Piek, Isa Maks, Roxane Segers, Hennie Van Der Vliet, Marie-Francine Moens, Katja Hofmann, Erik Tjong Kim Sang & De Rijke Maarten. 2013. Cornetto: A combinatorial lexical semantic database for Dutch. In Essential speech and language technology for Dutch: Resources, tools and applications, 165–184. Berlin & Heidelberg: Springer.10.1007/978-3-642-30910-6_10Suche in Google Scholar
Wurmbrand, Susi. 2004. Syntactic vs. post-syntactic movement. In Proceedings of the 2003 annual meeting of the Canadian linguistic association (CLA), 284–295.Suche in Google Scholar
Wurmbrand, Susi. 2017. Verb clusters, verb raising, and restructuring. In The Wiley Blackwell companion to syntax, Vol. 109. Wiley Online Library. https://doi.org/10.1002/9780470996591.ch75.Suche in Google Scholar
Zipf, George Kingsley. 1965. The psycho-biology of language. Cambridge, USA: MIT Press.Suche in Google Scholar
Zwart, Cornelius. 1993. Dutch syntax: A minimalist approach. Groningen: University of Groningen PhD thesis.Suche in Google Scholar
© 2024 Walter de Gruyter GmbH, Berlin/Boston