Verb influence on French wh-placement: a parallel corpus study

Jan Fliessbach; Johanna Rockstroh

doi:10.1515/cllt-2024-0001

Article Open Access

Verb influence on French wh-placement: a parallel corpus study

Jan Fliessbach and Johanna Rockstroh

Published/Copyright: September 3, 2024

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Corpus Linguistics and Linguistic Theory Volume 21 Issue 2

Abstract

Our study investigates the effect of French verb lemmata on the preverbal (QV) or postverbal (VQ) positioning of interrogative forms equivalent to English ‘what’ (que, quoi, and related forms) within a French–Spanish parallel corpus of subtitles. We highlight and illustrate the corpus’s utility for studying less frequent verbs in combination with specific wh-forms. Our findings suggest that less frequent French verbs exhibit weaker associations with QV compared to their more frequent counterparts. A post-hoc study using Spanish translations reveals that French verbs correlated with QV often denote observable actions involving directly accessible Q-referents. We hypothesise that queries concerning ‘situationally accessible’ referents are predominantly utilised for non-standard, evaluative, or challenging questions, which are typically QV in French.

Keywords: Wh-questions; French; interrogatives; deixis; zipf; in-situ

1 Introduction

In French, there are several wh-elements (Q from now on) that correspond to English what. The interrogative object clitic^[1] que ‘what’ can occur preverbally (QV) (1) but is replaced by quoi ‘what’ postverbally (VQ) (2).

(1)

Que vont dire les gens ?	QVS
‘What are people going to say?’

(2)

Les gens vont dire quoi ?	SVQ
‘What are people going to say?’

When combined with prepositions (3), que is replaced by quoi also preverbally (as in à quoi ‘to/for what’, de quoi ‘from/about what’, en quoi ‘in/to what way/extend’, and pour quoi ‘for/of what’^[2]).

(3)

De quoi vont parler les gens ?

QVS

De quoi les gens vont parler ?	QSV
‘What are people going to talk about?’

The construction qu’est-ce que ‘what’ (4) is sometimes seen as fixed and restricted to QV (Zumwald Küster 2018).^[3]

(4)

Qu’est-ce qu’ils vont dire ?	QSV
‘What are they going to say?’

As shown in (5), preverbal Q forms are, stricto sensu, not always sentence-initial (5a), and postverbal Q forms not always sentence-final (5b) (see Adli 2015: 178 for a more detailed overview of several ex situ forms corresponding to QV and in situ forms corresponding to VQ).

(5)

Mais de quoi ils parlent ?	QSV
‘But what are they talking about?’

Les gens vont dire quoi exactement ?	SVQ
‘What exactly are people going to say?’

In this article, we investigate these partial interrogative forms^[4] to test the more general hypothesis, suggested but not investigated in previous research (Baunaz 2015; Baunaz and Puskás 2014; Coveney 1995; Myers 2007), that the preverbal and postverbal placement of the wh-phrase is sensitive to the lexical identity (or lemma) of the verb.^[5]

Our first research question is whether some verbs are particularly associated with QV or VQ. Generally, we find that QV is much more frequent than VQ in our data.^[6] This tendency, however, is not equally strong across the different verbs that occur in the interrogatives under investigation. Instead, a group consisting primarily of high-frequency verbs is skewed towards QV much more strongly than the rest of the sample. Other verbs, many from the mid-frequency range, have a relative tendency towards VQ.

After answering our first research question in the affirmative, we propose that these frequency patterns related to verb lemmata should be further investigated and explained. We therefore formulate and preliminarily investigate a second research question (Sections 3.4 and 3.5): Why do certain verbs pattern together regarding Q-placement? Based on a post-hoc study on Spanish translations of the French interrogatives in our sample from parallel corpus data, we argue that preverbal Q-placement is more frequent in interrogatives that contain so-called “situational verbs” (such as dire ‘say’, faire ‘do’, raconter ‘tell’, s’agir ‘be the matter/be about’). When these verbs are used in interrogatives (sometimes in a way that could be seen as a light verb use^[7]), they often refer to an action or event “that is evident in the situational context”^[8] (Ehmer and Rosemeyer 2018: 80). This account, while still in need of formalisation, is explored in Section 3.5 and found to be of explanatory value. If future research corroborates it, it would constitute a caveat to any general, verb-insensitive Q-placement pattern for French interrogatives. It also sheds new light on sociolinguistic accounts of French Q-placement variants. Instead of suggesting that these variants are “two ways of saying the same thing” (Labov 2004: 7), it seems that using these different syntactic patterns often amounts to saying and doing different things. In Sections 3.6 and 4, we conclude this article with a discussion of our findings and with some desiderata for future studies on the precise nature of these effects.

2 Previous literature on French partial interrogative variability

In the following review of previous literature, we start with approaches that propose structural differences between QV and VQ (2.1) and then turn to the vast literature on sociolinguistic factors that have been shown to influence Q-placement in French partial interrogatives (2.2). The review continues with a brief discussion of whether QV and VQ can be considered part of one “grammar” (2.3), which we answer in the affirmative. We end this section with a discussion of the empirical challenges for investigating verb influence on Q-placement (2.4), with special reference to the Zipfian nature of lemma distribution in most corpus data.

2.1 Structural factors for Q-placement in French

French is sometimes considered a QV language (Dryer 2013). According to the QV-view on French, VQ is seen as marked and requires explanation, which is what a significant part of the literature on French interrogatives argues for in one form or another. Some explanations rely on specific syntactic or pragmatic features that ‘license’ VQ, such as the givenness or uninformativeness of the material preceding Q (Coveney 1995; Hamlaoui 2010, 2011), marked intonation at the end of the intonational phrase (Baunaz 2011; Cheng and Rooryck 2000; Déprez et al. 2013), non-specific Q-readings in VQ (Mathieu 2004), “strong” presupposition of the answer set (Boeckx et al. 2001; Chang 1997; Cheng and Rooryck 2000), and a higher degree of discourse activation for the propositional content of VQ (Garassino 2022).

Some remarks on the importance of verbs for interrogative syntax in French are scattered throughout the literature. Coveney (1995: 161) finds VQ to be more frequent with copula verbs than with lexical verbs, and Palasis et al. (2023) also find the difference between forms of être on the one hand and lexical verbs, on the other hand, to be of importance for the use of VQ or QV in child French. Myers (2007: 78) hypothesises that specific lexical verbs might correlate with QV or VQ. For a small group of verbs of speech and thought, Baunaz and Puskás (2014) and Baunaz (2015) argue that the presence of an emotive–cognitive property licenses QV.^[9] A recent study by Guryev and Delafontaine (2023: 318) on a SwissSMS corpus has been able to distinguish between être ‘be’, faire ‘do’, and a group of movement verbs, and found that all three favoured VQ, both in absolute terms and relative to all other verbs in their sample (which they did not distinguish).

The Q-form is another crucial factor in the choice of QV versus VQ interrogatives (Behnstedt 1973; Myers 2007: 72–75). Pourquoi ‘why’ is sometimes seen as categorically associated with QV (Coveney 1995: 162),^[10] and longer Q-forms such as comment ‘how’ are positively correlated with QV in general (Guryev and Delafontaine 2023; Lefeuvre and Rossi-Gensane 2015), a tendency less present for adverbs such as quand ‘when’ and où ‘where’ (Coveney 1995: 161). The object interrogative forms que and quoi have been found to favour QV (Coveney 1995: 162). The interrogative construction qu’est-ce que ‘what’, often seen as fixed in colloquial French (Zumwald Küster 2018), is categorically restricted to QV. Argument Q-forms, in general, are associated with QV compared to adjuncts (Adli 2015: 182), with the caveat that qui ‘who’ seems to favour VQ (Guryev and Delafontaine 2023: 313). As mentioned in Section 1, we will limit our subsequent corpus study to object interrogatives with que, quoi, qu’est-ce que, and their combinations with prepositions (from now on Q_que/quoi) to reduce the variability induced by the partial interrogative forms in our sample.

2.2 Sociolinguistic factors and the debate about syntactic variants

Numerous variationist studies relate VQ to sociolinguistic factors such as informality or proximity (Thiberge 2020: 217–218), educational degrees (Adli 2013), and many others (Coveney 2011b; Elsig 2009). According to this view, QV and VQ form part of the same envelope of variation (Labov 2004) in which “les francophones peuvent construire des phrases syntaxiquement assez différentes pour demander une même information manquante sur le monde” [speakers of French can construct syntactically quite different sentences to request the same missing information about the world] (Thiberge 2020: 217). As summed up by Blanche-Benveniste (1997: 19), “les discussions sur la notion de variation syntaxique ressemblaient aux anciens débats sur la synonymie” [the debates about the notion of syntactic variation resemble the old debates about synonymy]. Blanche-Benveniste (1997: 20) further states that these debates hinge on whether “à toute différence de forme correspondait une différence de sens” [every difference in form corresponds to a difference in meaning]. As will become clear in Section 3.5, the present contribution argues for different meanings associated with different forms (combinations of Q and V serialisations with specific V lemmata). However, our stance does not amount to the view that “en élargissant la définition de la variation [à la syntaxe], on met en péril son intérêt” [by extending the definition of variation to syntax, one jeopardises its relevance] (Gadet 1997: 10). Rather, we consider our empirical contribution, which is based on genre-homogeneous and therefore only partially representative data (see Section 3.1), as a step towards modelling functional and sociolinguistic variability together. Such an endeavour could then show that syntactic variants which are used synonymously in some contexts (Aaron 2010: 3), or which might appear synonymous when placed out of context, can still correlate with different functions in investigations that consider a large number of uses (see also Quillard 2000; Guryev and Delafontaine 2015 for a similar view).

2.3 How many French grammars?

Some authors have proposed two distinct French grammars in a diglossia, Standard French and Colloquial French ^[11] (references listed in Faure and Palasis 2021: 61). Under this view, Standard French is seen as a QV language, and Colloquial French as a VQ language that allows for QV “under specific conditions, like other in-situ languages” (Faure and Palasis 2021: 57). This is a highly arguable position, not only because it assumes two grammars for speakers that are exposed to a longstanding process of language making (Dufter et al. 2019: 3; Krämer et al. 2022) aimed at framing French as one linguistic entity, but also because corpus studies on colloquial French consistently find relatively high numbers of QV. QV interrogatives constitute the majority of colloquial French interrogatives found in studies conducted in the 20th century (Coveney 2011b: 126) and still reach 40 % of all partial interrogatives even in those studies that found the highest percentages of VQ we are aware of (Adli 2015; Guryev and Delafontaine 2023; Hamlaoui 2011). This means that whatever conditions license QV must be met in at least a third of all cases, even in the corpora that come closest to representing Colloquial French. Given this amount of variability in terms of partial interrogative syntax, even within colloquial varieties of French, we will follow Coveney (2011a) as well as Villeneuve and Auger (2013) and proceed under the assumption that contemporary Hexagonal French can be reasonably conceptualised as one linguistic entity towards which the usage of speakers from the French mainland converges in many colloquial interactions.^[12] Since speakers conceptualise a particular interrogative as pertaining to a system that would, in principle, allow both QV and VQ, they have to consider different factors to guide their choice, be they social, lexical, or grammatical.

2.4 Verb influence and empirical challenges

Before turning to our corpus study, we want to reflect on the empirical challenges that might explain why the sporadic remarks on the importance of verbs for interrogative syntax in French in the literature have not been supported by quantitative corpus evidence so far. One reason seems to be the number of observations necessary. Even with the simplified QV–VQ distinction, an investigation of the effect of n verbs leads to n*2 categories for which a sufficient number of observations are required for statistical analysis. Among corpus-based investigations of French interrogatives, the study by Behnstedt (1973: 295–305) has the highest number of observations for partial interrogatives explored in terms of Q-V-order with N = 3,492 (with 956 VQ). If these were uniformly distributed among the several thousand French verbs (approx. 5,000 verbs feature in Lexique 2, New et al. 2004; 12.000 are conjugated in Bescherelle 2005), we would not expect to find even one interrogative per verb. But the frequency distribution of verbs in interrogative sentences is, of course, not uniform. As we would expect for any corpus of natural speech, Behnstedt’s data instead follow an approximately Zipfian distribution (Figure 1). Let’s compare this distribution to the frequencies of verb lemmata in Lexique 2 (Figure 2), which include not only partial interrogatives but all kinds of utterances. We see some similarities and some differences. In both cases, the first few ranks dominate the overall distribution, a well-known property of a Zipf distribution (Zipf 1935, 1949). This property is crucial for corpus research because any comparison of QV versus VQ that does not differentiate by verb will likely be, to a large extent, based on sentences with être ‘be’, avoir ‘have’, faire ‘do’, aller ‘go’, pouvoir ‘can’, and dire ‘say’.^[13] We also see that some verbs rank differently between the two figures. Se passer ‘happen’ is ranked higher in Behnstedt’s interrogatives (6. vs. 18.), while savoir ‘know’ and suivre ‘follow’ are not present in his sample and are ranked 8. and 12. in Lexique 2. The relative frequency of être is also lower in his sample of interrogatives.

Figure 1:

Counts of verb lemmata (frequency ranks 1:28) in Behnstedt (1973: 296).

Figure 2:

Frequencies per million of verb lemmata in Lexique 2 (frequency ranks 1:40).

The distributions in Figures 1 and 2 indicate that the frequency of occurrence of a verb is approximately inversely proportional to its rank so that the first few ranks (high-frequency verbs) account for a large proportion of the observations, and the first few hundred verbs (mid-frequency verbs) make up almost the entire total. This, in turn, will leave few to no observations for mid- and low-frequency verbs (the remaining several thousand ranks). To find an informative number of observations, even for verbs that rank among the 100 or 200 most frequent French verbs, the number of interrogatives needs to increase by one or two orders of magnitude relative to all previous studies we know. The following section describes how we achieved this and what patterns we found in the data we obtained.

3 Corpus study

This section presents a corpus study on the relation between the lexical identity of different verbs and Q-placement in French interrogatives. As mentioned above, we will abbreviate the different Q-forms with Q_que/quoi to keep in mind that we do not cover the full range of Q forms. Section 3.1 introduces and justifies our database, a French–Spanish parallel subtitle corpus extracted from the OPUS OpenSubtitles corpus (Lison and Tiedemann 2016). Given the relatively scarce use it has received for studying linguistic structures outside of data-driven conversation modelling and translation, we provide a detailed argument for why it is a more suitable starting point for our investigations than other available corpora. Section 3.2 details our methodology, describing the techniques used to extract, annotate, and analyse the corpus data. The results for the first research question (whether some verbs stand out as particularly associated with either QV or VQ) are presented in Section 3.3 and discussed in Section 3.4. This section also introduces the concept of “situational verbs”, as described by Ehmer and Rosemeyer (2018), setting the stage for an exploration of our second research question (why certain verbs pattern together regarding Q-placement) in Section 3.5 via Spanish translations. The preliminary results of this post-hoc study are discussed in Section 3.6.

3.1 Database

As argued in Section 2.4, an investigation of the interaction between specific partial interrogative constructions and those verb lemmata that are not part of the highest frequency ranks requires mega- or gigatoken corpora to provide a sufficiently high number of observations. The OPUS OpenSubtitles corpus is one of the few French corpora that satisfy this requirement. We consider the Corpus de Référence du Français Contemporain (CRFC, Siepmann et al. 2016) the primary alternative of comparable size. We will provide a brief explanation as to why we chose OPUS OpenSubtitles. We start with a qualitative introduction of the nature of the parallel subtitle data and then continue with quantitative arguments regarding its comparative genre-homogeneity and token frequency.

The OPUS OpenSubtitles corpus has been compiled by Lison and Tiedemann (2016) from the OpenSubtitles.org repository, an extensive database of movie and TV subtitles comprising 3.36 million subtitle files covering more than 60 languages. It has been used in a broad range of fields and applications (Baheti et al. 2018; Beeching 2013; Dömötör 2019; Rajeg and Rajeg 2022) and is one of the few corpora to have directly reached an audience of tens of millions of users per month via its integration in ReversoContext (Hoffenberg 1999, user counts according to similarweb.com). The French subcorpus consists of 90.6 million (M) sentences or 700 million tokens (0.7G) (Lison and Tiedemann 2016: 925).

Most movies and TV shows include dialogical and conceptually oral speech (Koch and Oesterreicher 1985). However, subtitles are still a kind of “oral représenté” similar to what has been described in Lefeuvre (2020) and Lefeuvre and Parussa (2020). Nevertheless, it is important to note that Levshina (2017) has shown, based on several statistical measures, that subtitles from open online repositories constitute a close approximation of the data we would find in corpora of naturally occurring informal conversations.^[14] Moreover, the OPUS OpenSubtitles corpus has been further cleaned of unreliable and noisy data with the steps outlined in Lison and Tiedemann (2016: 923–927).

The size and non-commercial, freely available nature of the data on the OpenSubtitles.org website is, in part, the result of “fans subtitling for fellow fans” (Massidda 2020: 189) during the heyday of online filesharing.^[15] As summed up by Massidda (2020: 191–193), such fansubbing can also be called “abusive subtitling” (Nornes 1999) because it often does not obey the restrictions of mainstream subtitling and uses “norm-defying translation strategies” (O’Hagan 2009: 100) to counter “domesticated, manipulated and over-edited official subtitled and dubbed versions” (Massidda 2020: 191; see also Casarini and Massidda 2017). They are “not bound by industry or institutional modes of regulation and rationalisation” (Dwyer 2017: 136) and instead represent the result of a search by communities of fans for ‘authentic text’ (Cubbison 2005; O’Hagan 2009). The OPUS OpenSubtitles corpus, while not controlled in terms of the professionality of its authors,^[16] has still benefited from this strive for authenticity.

As shown in Section 3.5, the parallel structure of our subtitles corpus is another advantage it has over other corpora (such as the CRFC). Alignments of subtitles in different languages for identical interactive sequences allow for post-hoc tests to explain the distributional patterns found in one language by accessing cues only available in another.

The final and possibly crucial argument in favour of using the OpenSubtitles corpus is that it is comparatively homogeneous with regard to the frequency of Q_que/quoi interrogatives throughout its subcomponents. Figure 3 shows the frequency of such interrogatives (only those ending in a question mark) in the CRFC by position in the corpus, extracted via the Corpus Query Language (CQL, Jakubíček et al. 2010) search in Appendix A. The twenty bars evenly divide the corpus into 5 % bins. Approximately one third of the corpus (six bins) contains most of the relevant tokens. This seems to be the case because the CRFC is genre-diverse, and some parts are far more interactional than others. Figure 4 shows the evenly distributed frequency of the relevant object interrogatives in the OPUS OpenSubtitles 2018 French corpus, which can be linked to its genre-homogeneity. Figure 5 shows that the larger OPUS French corpus (Tiedemann 2012), which contains eleven genre-diverse subcorpora including a 2011 version of OPUS OpenSubtitles (see Table 5 in Appendix B), is even less homogeneous in terms of token frequency than the CRFC. This is because the interrogatives under study are several orders of magnitude more frequent in the OPUS OpenSubtitles 2011 subcorpus, which makes up the bars around 100 k frequency per bin, than in the other ten subcorpora, which are barely visible at the same scale.

Figure 3:

Frequency of interrogatives with que/quoi in different portions of the CRFC.

Figure 4:

Frequency of interrogatives with que/quoi in different portions of the OPUS OpenSubtitles 2018 French corpus.

Figure 5:

Frequency of interrogatives with que/quoi across all subcorpora of OPUS French.

In sum, we see that among the very large corpora of French that include conceptually oral data, only the OPUS OpenSubtitles 2018 is genre-homogeneous. Reducing genre-induced variability is paramount for an investigation of verb-induced effects. We, therefore, choose OPUS OpenSubtitles 2018 as our database. Nevertheless, we are aware that future work on natural dialogue corpora, ideally combined with experimental methods, is needed to triangulate our results.

3.2 Methodology

To access the parallel data without the restrictions imposed by SketchEngine (Kilgarriff et al. 2004) (only 10 k concordance lines downloadable), we created a workflow based on the Python package OpusTools (Aulamo et al. 2020). It is built to use the parallel data structure of the corpus, so the process starts by extracting subtitles not only for the language under investigation (source, src) but also for another language (target, trg) for which parallel data is available. We set French as the source language. Due to the large corpus size and comparability with previous research, we set Spanish as the target language. Important insights into the relation between “situational” uses of verbs and partial interrogatives, discussed and further investigated in Sections 3.4 and 3.5, have been made by Ehmer and Rosemeyer (2018) based on Spanish data. Therefore, setting Spanish as the target language links our results to their observations.

In the following, we will refer to the Spanish alignments as translations. We acknowledge that this might be seen as controversial because “translation proper [is] only adapting the verbal signs of the audio channel [whereas] respeaking and fansubbing transform the whole acoustic channel of the source text providing a […] linguistic translation of its verbal signs, at the same time verbalizing its most relevant nonverbal signs for the benefit of a niche target audience” (Perego and Pacinotti 2020: 36). Nevertheless, respeaking and fansubbing are increasingly accepted and studied as a kind of translation (Bogucki and Díaz-Cintas 2020: 11–16). Moreover, we consider the tendency to verbalise nonverbal signs a benefit for our research because it allows us to search for indicators of non-canonical (i.e. not merely inquisitive, but expressive or rhetorical; Farkas 2022; Ciardelli et al. 2019; Celle et al. 2021) and “situational” uses of interrogatives in the Spanish data aligned with French interrogatives, which might not contain such indicators.

The Spanish parallel subcorpus is approximately twice the size of the French subcorpus (179 million sentences, 1.3 billion tokens). We extracted 9 million alignments, which resulted in 53.5 million French tokens (247.517 types), or approximately 7.7 % of the total corpus. We used only a relatively small part of the corpus to develop our analytical tools in R (R Core Team 2024) on a dataset that would not cause excessive runtimes for non-optimised computer code when running on a personal laptop. Given the homogeneity of the data illustrated in Figure 4, we expect our observations to generalise sufficiently to the entire corpus.

Example (6) illustrates an alignment created from a French source file and a Spanish target file. Note that the quoi ‘what’ in source line 309 is verbless, which is an example of a match that was filtered out for our subsequent analysis of verb effects (see below).

(6)

(src)=“305”>Passons !

(trg)=“341”>Terminamos aquí .

================================

(src)=“306”>Tout le monde !

(src)=“307”>Nous allons tous au prochain lieu de tournage !

(trg)=“342”>Nos vamos a la otra localización .

================================

(src)=“308”>Ici le script suivant .

(trg)=“343”>Salió el guión .

================================

(src)=“309”>Quoi … ?

(trg)=“344”>¿ Qué es esto ?

================================

(src)=“310”>Le dénouement final ?

(trg)=“345”>¿ Dónde está la última escena ?

We used two regular expressions (see Appendix C) in our Python script to search for QV and VQ interrogatives followed by question marks^[17] and POS-tagged them with the TreeTagger by Schmid (1994) using the R package koRpus (Michalke 2021).^[18] We then applied a series of cleaning steps in R to the resulting datasets to filter out erroneous matches. These steps all aimed at creating a dataset in which each observation would contain one Q form and one inflected verb, a restriction we considered necessary for the subsequent evaluation of possible correlations between specific verbs and QV/VQ serialization. In particular, we removed postposed, non-integrated discourse marker uses of quoi that we considered a deviation from using quoi as an interrogative pronoun (Delafontaine 2020; Hölker 2010).^[19] We also removed utterance-final sequences of ou quoi ‘or what’, which we considered tag questions with no syntactic relation to the preceding verb. We also removed instances of the multi-word expressions n’importe quoi ‘anything/whatever/nonsense’, pas de quoi ‘you’re welcome’, and quoi que ce soit ‘anything/whatever’. We removed cases without verbs, with multiple verbs, and cases that only contained verbs in the infinitive, aiming for the sentence as our minimal unit. We also removed cases with prefinal punctuation marks to avoid alignments that would include several sentences, leaving open the possibility for pre- and postpositions as well as adjacent, yet not syntactically integrated connectives and particles.

We then selected the 20 verbs in Table 1 for investigation. Any observation that did not contain one of these verbs was grouped as ‘other’. Note that the rank column in Table 1 starts with ranks 1 and 2 from Figure 2 but then starts skipping ranks and continues to do so throughout in ever larger steps.^[20] This ensures that our 20-verb sample includes a selection of high- and mid-frequency verbs, (i) to relate our findings to previous research and observations (which mainly concentrate on high-frequency verbs), and (ii) to present novel insights on the (so far largely neglected) class of mid-frequency verbs. Table 1 gives several English translations for most verbs. The cases in which it doesn’t, such as changer, décider, choisir, and prouver, happen to be those in which the English verb covers all relevant meanings. It is probably no coincidence that these are cognates (besides the semantically vacuous être). The emphasis of Table 1 on different meanings foreshadows aspects of our post-hoc study laid out in Section 3.5. In an attempt to develop an explanation for the different Q-placement patterns for different verbs (Sections 3.3 and 3.4), our post-hoc investigation uses Spanish parallel data to assess whether the Q-placements of interrogatives with specific verbs correlate with different meanings of these verbs to the extent that this becomes measurable in the frequency of different Spanish verbs.

Table 1:

Verbs under investigation by frequency rank in Lexique 2 (New et al. 2004)

Rank	Lemma	Translation(s)
1	être	‘be’
2	avoir	‘have’, ‘get’, ‘be the matter with’
4	faire	‘make’, ‘do’
5	dire	‘say’, ‘tell’
7	vouloir	‘want’, ‘like’, ‘wish’
8	savoir	‘know’, ‘be aware of’
13	parler	‘speak’, ‘talk’
18	se passer	‘happen’, ‘take place’
19	penser	‘think’, ‘believe’, ‘reflect’
20	attendre	‘wait’, ‘expect’
26	regarder	‘watch’, ‘look at’, ‘concern’
38	chercher	‘search (around)’, ‘look for’, ‘try’
50	tenir	‘keep’, ‘hold’
61	changer	‘change’
93	raconter	‘tell’, ‘describe’, ‘spin’, ‘blather’
110	décider	‘decide’
116	s’agir	‘be about’, ‘be the matter’, ‘must’
135	choisir	‘choose’
138	chanter	‘sing’, ‘harp/be on about’
219	prouver	‘prove’

3.3 Results

Our first research question is whether some verbs stand out as particularly associated with either QV or VQ. Table 2 displays the frequencies of QV and VQ patterns for the 20 verbs studied, plus an “other” category for interrogatives with other verbs.^[21] With approximately 85.75 %, QV is much more frequent in our data than VQ (14.25 %). In this regard, our sample deviates from several previous studies on informal registers (Adli 2015; Guryev and Delafontaine 2023; Hamlaoui 2011). While this could reflect the written mode^[22] or the absence of the VQ-associated adverbs quand ‘when’ and où ‘where’ in our sample, our data also differ in their high degree of “situationality” from the corpora used in other studies (see Section 3.5).^[23]

Table 2:

Number of observations per Q-placement relative to verbs.

		QV	VQ	Row total
1	other	33,253	3,911	37,164
2	faire	29,738	2,776	32,514
3	être	8,377	6,708	15,085
4	dire	10,060	1,057	11,117
5	passer	6,761	95	6,856
6	vouloir	5,712	575	6,287
7	avoir	4,491	616	5,107
8	penser	3,702	304	4,006
9	savoir	1,771	1,765	3,536
10	parler	2,748	95	2,843
11	raconter	1,640	31	1,671
12	attendre	1,399	254	1,653
13	regarder	750	110	860
14	agir	703	18	721
15	chercher	563	139	702
16	changer	237	109	346
17	prouver	102	29	131
18	chanter	104	12	116
19	décider	87	24	111
20	choisir	52	22	74
21	tenir	53	3	56
Column total		112,303	18,653	130,956

Whatever the reason for the QV tendency may be, our first research question is whether it holds equally for all the verbs in our sample. We applied a Chi-Square test for independence (Field et al. 2012: 815) to test this. The test showed a highly significant result (χ ² = 18,388.33, d.f. = 20, p < 0.0001), indicating differences between the verbs in terms of the frequency with which they follow or precede the Q-constituents. To assess the strength and direction of these differences, we used the adjusted standardised residuals (ASRs) for the individual cells (Agresti 2002: 81). These are shown in Figure 6, with the direction of the bars showing their tendencies towards (positive ASR) or against (negative ASR) VQ and bar width indicating their count proportions.^[24] The upper half of the verbs in Figure 6 (faire to raconter) mostly point towards QV, with the notable exception of être and savoir. Faire and se passer ^[25] stand out as particularly skewed towards QV, with parler, raconter, dire, vouloir, and penser also clearly following this trend. Turning our attention to the lower (less frequent) ranks of Figure 6, we see a change in the distributions. Only the defective verb s’agir clearly points towards QV, whereas chercher, changer, and choisir significantly favour VQ. Six of the ten less frequent verbs show frequencies of QV and VQ that would be expected from the overall distribution (no significant positive or negative ASR). Still, they point more towards VQ than a number of the particularly frequent verbs, as will be discussed in more detail in the subsequent sections.

Figure 6:

Adjusted standardised residuals (ASR) of chi-square test on Q-placement by verb.

3.4 Discussion of first research question on verb influence

The strong tendency for être towards VQ confirms what we would expect from the literature discussed in Section 2.1. The strong VQ tendency for savoir can be mainly attributed to the high frequency of the non-inquisitive construction Tu sais quoi ?/Vous savez quoi ? ‘You know what?’. It is restricted to the second person (#On/Il/Je sais quoi ?) and has constructionalised to the point at which it has lost its “conditional relevance”, which is the degree to which it requires a reaction by the listener (see Rosemeyer 2021: 125 and references there). Instead, it introduces novel or unexpected information that is about to be provided by the speaker to the hearer(s) (Coveney 2011b: 121; Guryev 2021). French still has the option to use Tu sais quoi ? as the inquiry ‘What do you know?’, and a small number of the observations in our corpus suggest such a use. Any subsequent modification with a prepositional phrase (Tu sais quoi d’autre ? ‘What more do you know?’) enforces an inquisitive reading. However, our data suggest that non-inquisitive uses far outnumber these question uses in conversations.^[26]

The general picture in Figure 6 poses the question why faire, dire, se passer, vouloir, avoir, penser, parler, raconter, and s’agir pattern together (despite their syntactic heterogeneity), and differ particularly from verbs such as chercher, changer, and choisir, but also from prouver, décider, and attendre. As indicated above, we propose that the former verbs fall into the category of “situational verbs”, which Ehmer and Rosemeyer (2018: 80) describe as “verbs that refer to an action that is evident in the situational context.”^[27] Ehmer and Rosemeyer (2018) mention Spanish hacer ‘make/do’, pasar ‘happen’, and decir ‘say’ as instances of verbs that are prone to situational uses and argue that these often acquire a “challenge function” when used in Spanish interrogatives, which signals that the speaker considers an action or utterance (usually by the hearer) as inappropriate. This hypothesis, which we call the “situational verb hypothesis”, will be further explored in a post-hoc study in Section 3.5 and discussed in Section 3.6.

3.5 Post-hoc investigation of second research question on situational uses of verbs

Before we present the evidence for effects of “situationality” found in our post-hoc investigation of the French verbs correlating with QV, we wish to illustrate how readings of French wh-interrogatives can be more or less “situational”. In the illustrative examples (7) to (9), the respective example a) differs from b) by Q-placement but also in terms of the situational accessibility of what is being inquired about. While (7a) seems apt in a situation in which the speaker observes an activity by the hearer, (7b) would be odd if posed to someone whose occupation would be known or accessible (e.g., in uniform and on duty). Similarly, (8a) seems prone to be interpreted as a reaction to a previous statement, whereas (8b) could be interpreted as a request for the interlocutor’s (so far unknown) opinion. Finally, (9a) can be interpreted as inquiring about someone who has shown signs of being ill or in a bad mood, whereas (9b) could be used to inquire about something hidden in a bag. Note that (9b) would not be well suited to evaluate something in the bag and known to the speaker.

(7)

Qu’est-ce que tu fais là ?

‘What are you doing there?’

Tu fais quoi dans la vie ?

‘What do you do for a living?’

(8)

Qu’est-ce que tu dis là ?

‘What are you saying there?’

Tu dis quoi de ma nouvelle robe ?

‘What do you think of my new dress?’

(9)

Qu’est-ce que tu as tout d’un coup ?

‘What’s the matter/wrong with you all of a sudden?’

T’as quoi dans ton sac ?

‘What do you have in your bag?’

The English translation for dire in (8) changes between ‘say’ and ‘think’. Avoir is translated as ‘have’ in (9b) but as ‘be the matter/wrong with’ in (9a). Alternations such as those illustrated here have been previously discussed under the light versus heavy verb dichotomy (Brugman 2001; Wittenberg 2016). However, the special status of light verbs is not only disputed (Bruening 2016), but their role in interrogative syntax and pragmatics has to our knowledge not been studied comprehensively. We will therefore pursue an investigation of the “situational verb hypothesis” based on the observations by Ehmer and Rosemeyer (2018), and will leave a systematic account of verb lemma related effects to future research, be it in terms of light and heavy verbs or in terms of specificity (Mathieu 2004), anti-givenness (Hamlaoui 2010), or exclusivity (Faure and Palasis 2021).

Empirically, the parallel datasets in our corpus facilitate an evaluation of whether the placement of interrogative forms with specific French verbs correlates with the frequency of different verbs in the Spanish alignments. While this should, in principle, hold for all the available languages aligned with the French subcorpus, we chose to use the Spanish subcorpus because the concept and the examples of “situational” uses of verbs have initially been proposed in work on Spanish interrogatives (Ehmer and Rosemeyer 2018). The following sections present the main results we obtained for French faire (3.5.1), avoir (3.5.2), and the three verbs of speech raconter, dire, and parler (3.5.3).

3.5.1 French faire as Spanish hacer or dedicarse/trabajar

A typical Spanish translation for French faire is hacer ‘do’,^[28] which is also one of the examples given by Ehmer and Rosemeyer (2018) for a verb that is often used “situationally” in Spanish interrogatives. Table 3 shows the frequency of hacer in the translations for the different French Q-placements in interrogatives with faire in our sample. We see several relations. We already know from Figure 6 that faire strongly favours QV. We now also see that faire is predominantly translated with Spanish hacer, which is also what we would expect. Yet those Spanish translations that do not contain hacer are translated with French interrogatives that comprise approximately a third of all VQ cases. A chi-square test for independence shows that this effect is significant (χ ² = 103.46, d.f. = 1, p < 0.0001). Figure 7 shows the adjusted standardised residuals per category, with bar direction indicative of the tendency towards (positive ASR) or against (negative ASR) alignment with French VQ. We see that French VQ interrogatives with faire tend to be translated with interrogatives that do not contain hacer (p < 0.0001).

Table 3:

French Q-placement with ‘faire’ by +/− Spanish ‘hacer’

	QV	VQ	Total
hacer	24,404	2,060	26,464
other	5,334	716	6,050
Total	29,738	2,776	32,514

Figure 7:

ASR for French Q-placement with ‘faire’ by +/− Spanish ‘hacer’

Many verbs in the “other” category have a more specific meaning than hacer. The lemmata are relatively infrequent compared to hacer, but some are sufficiently frequent for bivariate tests, such as dedicarse ‘do for a living’ (N = 260) and trabajar ‘work’ (N = 89). As shown in Figures 8 and 9, both significantly correlate with VQ.^[29] We interpret this as an indication that inquiring about a situationally accessible activity more readily lends itself to the use of an unspecific verb. On the other hand, inquiring about an activity outside the temporal and spatial deixis of the conversation requires a verb that specifies what the question is about.

Figure 8:

ASR for French Q-placement with ‘faire’ by +/− Spanish ‘dedicarse’

Figure 9:

ASR for French Q-placement with ‘faire’ by +/− Spanish ‘trabajar’

To illustrate the different uses, the example in (10) shows Spanish translations that would correspond to the QV and VQ variants in (7). The French QV interrogative with faire corresponds to (10a), whereas the VQ variant corresponds to (10b).

(10)

Spanish

French

¿Qué haces?	Qu’est-ce que tu fais (là) ?
‘What are you doing (there)?’

¿A qué te dedicas?	Tu fais quoi (dans la vie) ?
‘What do you do (for a living)?’

3.5.2 French avoir as Spanish tener or pasar/suceder

A typical Spanish translation for French avoir is Spanish tener ‘have/possess/hold’. Table 4 shows the frequency of tener in the translations for the different French Q-placements in interrogatives with avoir in our sample. We know from Figure 6 that avoir correlates with QV, but much less so than faire. We also see that French avoir is translated by Spanish tener in only 21 % of all cases. Nevertheless, Table 4 shows that almost a third of all French VQ sentences with avoir is translated with Spanish tener (χ ² = 28.45, d.f. = 1, p < 0.0001). This means that while the total number of translations for V_avoirQ_que/quoi without tener is higher than those that contain tener, we still find many more cases of tener than we would expect from the overall frequency of tener in the translations of interrogatives with avoir. This results in the statistical association between tener and VQ, which is visible in Figure 10.

Table 4:

French Q-placement with ‘avoir’ by +/− Spanish ‘tener’

tener	QV	VQ	Total
tener	893	180	1,073
other	3,598	436	4,034
Total	4,491	616	5,107

Figure 10:

ASR for French Q-placement with ‘avoir’ by +/− Spanish ‘tener’

The relatively small number of translations of French avoir with Spanish tener indicates that some other Spanish verbs must be associated with avoir, particularly in the QV cases. It turns out that pasar used in the sense of ‘happen/be the matter’ is one of them. This is particularly relevant for us because pasar is one of the verbs mentioned by Ehmer and Rosemeyer (2018). With N = 2003, avoir-interrogatives are translated with pasar almost twice as frequently as with tener. However, only 41 of these are aligned with VQ, which leads to a robust association of pasar with QV, as shown in Figure 11 (Table 8 in Appendix D).

Figure 11:

ASR for French Q-placement with ‘avoir’ by +/− Spanish ‘pasar’

With N = 213, the semantically similar Spanish verb suceder ‘happen’ also almost categorically aligns with Q_que/quoiV_avoir (Table 9 in Appendix D and Figure 12). Taken together, verbs of occurrence (Levin 1993: 260),^[30] such as suceder and pasar, make up approximately half of all verbs in the Spanish translations of Q_que/quoiV_avoir.

Figure 12:

ASR for French Q-placement with ‘avoir’ by +/− Spanish ‘suceder’

As with French faire, the overall picture is that avoir is more likely to be used situationally in QV interrogatives because the speaker is inquiring about something that is visibly or notably happening either to the interlocutor (pasar) or more generally (suceder). On the other hand, avoir is more likely to be used with its possessive meaning in VQ interrogatives. The examples in (11) exemplify Spanish translations that would correspond to the QV and VQ variants in (9). The French QV interrogative with avoir corresponds to (11a), whereas the VQ variant with avoir corresponds to (11b).

(11)

Spanish

French

¿Qué te pasa?	Qu’est-ce que t’as ?
‘What’s the matter with you?’

¿Qué tienes?	T’as quoi ?
‘What have you got?’

3.5.3 Verbs of speech: raconter, dire, parler

A group of verbs that stand out in Figure 6 as particularly associated with QV are the verbs of speech raconter, parler, and dire. 1,640 (98.1 %) of the 1,671 interrogatives with raconter are QV. At first glance, raconter seems almost categorically associated with QV. There are, however, 31 (1.9 %) instances of raconter as part of a VQ interrogative. These are too few observations to allow for an investigation of any patterns in the Spanish parallel data. However, of the 1,640 Q_que/quoiV_raconter interrogatives, 1,378 (84 %) include a second-person pronoun (tu/vous ‘you’). In contrast, only 8 (25 %) of the 31 V_raconterQ_que/quoi interrogatives contain this pronoun. Thus, speakers are more likely to use the addressee as the clause’s pronominalised subject in Q_que/quoiV_raconter interrogatives. This is also visible in the associations between QV and the presence of a second-person pronoun (SecPersPro) in Figure 13.

Figure 13:

ASR for French Q-placement with ‘raconter’ with(out) second-person pronoun.

Another interesting observation concerns the pronoun ça. While 5 (16 %) of the V_raconterQ_que/quoi interrogatives feature ça as a subject, this is the case in only 6 (0.36 %) Q_que/quoiV_raconter interrogatives. A manual inspection of all 11 examples revealed that ça always referred to textual or audio-visual sources.^[31] This again suggests that, while both Q_que/quoiV_raconter and V_raconterQ_que/quoi interrogatives can ask about the content of textual or audio-visual sources, they make up a much higher percentage of the V_raconterQ_que/quoi cases. In contrast, Q_que/quoiV_raconter targets something a person (usually the hearer or one of the hearers) has said in the current dialogue. We find exceptions such as (12b), but they are three orders of magnitude less frequent than examples such as (12a) in our data. Instead, V_raconterQ_que/quoi seems prone to be used as in (12c). These examples also indicate the importance of the finding by Ehmer and Rosemeyer that interrogatives with situationally used verbs are more likely to be used in challenging questions. Example (12a), which might even be a prefabricated interactional phrase (Pausé et al. 2022; Tutin 2019),^[32] seems more likely to be understood as an interactional challenge than (12c) simply because a third person usually cannot accept or take up a challenge. Future studies will need to test if (12b) is also less likely to be understood as an interactional challenge.

(12)

Qu’est-ce que tu racontes ?

‘What are you talking about?’

Tu racontes quoi ?

‘What (subject) are you talking about?’

Ça raconte quoi ?

‘What is it about?’

We know from Figure 6 that dire favours QV. 10,060 (90.5 %) out of the 11,117 interrogatives with dire are QV. However, the remaining 1,057 (9.5 %) V_direQ_que/quoi interrogatives show that VQ is not uncommon in our data. While present, statistical associations of Q-placement patterns with different Spanish translations were weaker for dire than for faire and avoir. Q_que/quoiV_dire slightly favours Spanish hablar ‘talk’ with an adjusted standardised residual (ASR) of 2.5, and V_direQ_que/quoi favours Spanish decir ‘say’ with an ASR of 4.75. Since both Spanish verbs lend themselves to challenging uses (Ehmer and Rosemeyer 2018: 81), we consider these tendencies less informative than the results for faire and avoir.^[33]

Nevertheless, a closer look at the prevalence of second-person pronouns reveals a similar tendency as the one we observed for raconter. Figure 14 shows that second-person pronouns are far more prevalent in Q_que/quoiV_dire (tu/vous ‘you’) than in V_direQ_que/quoi. This indicates a more general tendency for verbs of speech to favour QV in interrogatives with the hearer(s) as a subject.

Figure 14:

ASR for French Q-placement with ‘dire’ with(out) second-person pronoun.

We know from Figure 6 that parler favours QV. Similar to what we have already seen for raconter, the fact that 2,748 (96.7 %) out of the 2,843 interrogatives with parler are QV comes close to a categorical association. Still, the remaining 95 (3.3 %) V_parlerQ_que/quoi interrogatives allow for some comparisons. Two possible Spanish translations for parler in our data are hablar ‘talk’ and tratar(se) ‘be about’, as illustrated in (13) and (14).

(13)

De quoi tu parles ? (French)

‘What are you talking about?’

¿De qué estás hablando? (Spanish)

‘What are you talking about?’

(14)

Tu parles de quoi ? (French)

‘You are talking about what?’

¿De qué se trata? (Spanish)

‘What is this about?’

As mentioned above, Spanish hablar is documented for challenging uses. We are unaware of similar uses for tratar(se) ‘be about’. Therefore, we consider differences in the frequency of the two translations indicative of differences in the frequency of situational uses of interrogatives with parler. Figure 15 shows the association of Q_que/quoiV_parler with hablar. Figure 16 shows the association of V_parlerQ_que/quoi with tratar(se).^[34] Together, these associations again support the association of QV with uses of parler in which the situation restricts the set of possible Q-referents to a singleton, thereby rendering an evaluative reading more likely.

Figure 15:

ASR for French Q-placement with ‘parler’ by +/− Spanish ‘hablar’

Figure 16:

ASR for French Q-placement with ‘parler’ by +/− Spanish ‘tratar(se)’

As with raconter and dire, we find a strong association between second-person pronouns and Q_que/quoiV_parler (Figure 17). The general picture for verbs of speech seems similar to the one we already saw for faire and avoir: when used in QV interrogatives, these verbs are more likely to take on a situational meaning by inquiring about something the hearer has said. However, when used as part of a VQ interrogative, they more often serve to inquire about information not directly accessible within the communicative context.

Figure 17:

ASR for French Q-placement with ‘parler’ with(out) second-person pronoun.

3.6 Discussion of second research question

After finding significant differences between the 20 verbs in our sample regarding their associations with QV or VQ, our second research question, developed in Section 3.4, asked why some very frequent verbs, such as faire ‘do’, dire ‘say’, and se passer ‘happen’, pattern together and differ from a series of other, less frequent verbs in our sample. We pointed to the fact that Ehmer and Rosemeyer (2018: 80) identify the Spanish verbs hacer ‘do’, decir ‘say’, and pasar ‘happen’ as instances of “verbs that refer to an action that is evident in the situational context” and show that these are often used with a non-canonical “challenge function” when used in Spanish interrogatives. We formulated the “situational verb hypothesis”, according to which certain situational verbs are used more frequently in non-canonical object interrogatives, which in turn are known to favour QV in French (Celle et al. 2021; Munaro and Obenauer 1999: 205-206).

In Section 3.5, we did post-hoc tests on the Spanish alignments of the French interrogatives in our sample and presented evidence for differences in the likelihood of situational and non-situational verbs being used in translations for French interrogatives with faire and avoir. We found that faire was more often translated as hacer ‘do’ in QV interrogatives and as dedicarse ‘do in life/for a living’ and trabajar ‘work’ in VQ interrogatives. We interpreted questions about ‘doing something’ as more situational than questions about ‘doing something in life/for a living’ because we consider a general activity less likely to be evident to both interlocutors at the time of utterance of the interrogative. Similarly, we interpreted the fact that avoir was translated more frequently with pasar ‘be the matter with’ in QV and with tener ‘have’ in VQ interrogatives as an indication for the situationality of QV, because ‘something being the matter with someone’ is more likely to be the target of a question if the speaker observes something unexpected or possibly problematic about the person, whereas a question about the possession of an object seems to require less justification by direct evidence.

As a final point in case, we showed that the verbs of speech raconter ‘tell’, dire ‘say’, and parler ‘talk’ are much more prone to be used with second-person pronouns in QV than in VQ interrogatives. We argued that questions with the hearer(s) as subject are indicative of situational uses, particularly given the fact that French parler is more frequently translated as Spanish hablar ‘speak’ in QV and as tratar(se) ‘be about’ in VQ, the latter being more idiomatic to ask for the content of messages or media. We acknowledge that these observations only constitute circumstantial evidence in favour of the “situational verb hypothesis”. Nevertheless, they can serve as a starting point for future, in-depth studies on the factors that determine verb-induced Q-placement patterns, in French and cross-linguistically.

In light of these findings, we can now return to the debate about syntactic variants (Section 2.3). We have been able to show associations between Q-placement and certain verb meanings. This does however not preclude the possibility that there are many concrete instances in which QV and VQ could be used interchangeably (Guryev and Delafontaine 2015; Quillard 2000). And while we have argued for the “situational verb hypothesis” as a possible account for why some verbs favour QV more than others, we are aware that further research is needed to accept or reject it. A guiding principle for such research, which can also be seen as a desideratum following from our work, is the awareness that the co-presence of interlocutors in a shared situation (or deixis) matters for the meaning of verbs as used in interrogatives. By extension, it seems likely that it matters for the discourse functions of the interrogatives as a whole, too. We have proposed that non-inquisitive, evaluative, or challenging (non-canonical) uses become more likely when a speaker inquires about a state of affairs that is situationally accessible to both interlocutors. Such uses, in turn, might also be a feature of certain sociolinguistic styles. In formal interaction, a speaker might prefer strategies other than interrogatives to signal that an action or utterance by the hearer is inappropriate.^[35] This intertwines the lexical effects we have shown with broader aspects of socio-pragmatic variation.

4 Summary and conclusions

In this paper, we have argued that the variability in the placement of French partial interrogatives is influenced by the lexical identity of the verbs that occur in those interrogatives. A brief review of previous approaches that have proposed syntactic or sociolinguistic explanations for this variability has revealed that lexical factors have not been the focus of investigation so far. We have shown that the corpus-based studies could not have performed a detailed investigation of lexical effects simply due to their sample size and the Zipfian distribution of lexical items. With two illustrations of the Zipfian nature of verb frequencies in French interrogatives, we have further underlined the necessity of considering the dominant role played by the first few frequency ranks in any corpus-based study.

We proposed the OPUS OpenSubtitles corpus as a useful and valid tool for investigating the effects of verb lexical identity on French partial interrogatives. We argued that it fulfills several necessary conditions: i) it is sufficiently large to allow quantitative investigations of verbs that are not among the most frequent verbs overall, ii) it is an approximation of informal conversations (Bednarek 2018; Levshina 2017) because many of its authors’ strive for ‘authentic text’ that represents face-to-face interactions (Cubbison 2005; O’Hagan 2009), iii) it gives access to translations in other languages that can serve as a tertium comparationis when searching for possible explanations for the patterns we observe (see Section 3.5), and iv) it is more genre-homogeneous than other corpora of sufficient size. A caveat remains the lack of control over the precise variety under investigation. We can assume that only a minority of French speakers of the early 21st century would have accessed web repositories to download subtitles, and even fewer contributors would have had the technical skills to create and upload them. We can expect that not all age groups participated similarly in this activity, and the written mode of communication might have reduced the prevalence of non-standard features. Future research will have to test the ecological validity of the observations we draw from our database, be it via more naturalistic corpora or experiments.

In our corpus study, we investigated the effects of 20 verbs from the high- and mid-frequency rank range on the placement of Q_que/quoi. We found that a syntactically heterogeneous group of 9 verbs tended towards QV relative to the overall distribution: faire, dire, vouloir, avoir, penser, parler, raconter, se passer, and s’agir. Six verbs did not deviate significantly from the overall pattern: attendre, regarder, prouver, chanter, décider, and tenir. The five VQ-associated verbs were être, savoir, chercher, changer, and choisir. Our finding for être corroborates previous research (Coveney 1995: 161; Guryev and Delafontaine 2023; Palasis et al. 2023). For savoir, we attributed this effect to one specific construction that has taken on a non-inquisitive use that introduced novel information and only occurs with the pronominalised hearer(s) as the interrogative’s subject.

To our knowledge, our study is the first comprehensive corpus study on the influence of a large number of verb lemmata on Q-placement in French interrogatives. Our theoretical proposal to account for the nine QV associations relied on the notion of “situational verbs” by Ehmer and Rosemeyer (2018). These verbs, when used in interrogatives, often refer to an observable activity within the speech situation, which can lead to a challenging reading for the interrogative. We hypothesised that the verbs favouring QV in our sample fall under the notion of situational verbs. In a post-hoc study aimed at assessing this hypothesis, we made use of the Spanish translations available in the parallel subtitle corpus OPUS OpenSubtitles 2018 to search for differences in the association between QV and VQ interrogatives with “situational verbs” (faire, avoir, parler) on the one hand side, and Spanish “situational verbs” (hacer, pasar, hablar) and “non-situational verbs” (dedicarse/trabajar, tener, tratarse) on the other hand side. We showed that French faire correlates with Spanish verbs that encode an activity which is less likely to be accessible in the speech situation when used in V_faireQ_que/quoi, such as Spanish trabajar and dedicarse. In contrast, the more general activity reading is associated with Q_que/quoiV_faire. We further showed that avoir correlates with verbs of occurrence (Levin 1993) when used in QV. In contrast, the possessive meaning of avoir captured by Spanish tener can be found primarily in French VQ. Additional findings on the frequency of second-person pronouns in the French interrogatives also supported the “situational verb hypothesis”.

The overall picture supports Palasis et al.’s (2019: 211) statement that French does not generally favour either QV or VQ. Instead, it is a “mixed” QV–VQ language in the sense of Dryer (2013). This “mix” is a highly multifactorial phenomenon. Apart from the sociolinguistic and syntactic factors already explored in the literature, our findings suggest that future studies on this phenomenon should consider the impact of verb lemmata and the specific meanings that arise from their interplay with the placement of specific partial interrogative forms.

Corresponding author: Jan Fliessbach, Department of Romance Philology, University of Potsdam, Potsdam, Germany, E-mail: jan.fliessbach@uni-potsdam.de

Funding source: Deutscher Akademischer Austauschdienst

Award Identifier / Grant number: 57701768

Funding source: Klaus Tschira Stiftung

Award Identifier / Grant number: 40300928

Appendix A

CQL query for searches via SketchEngine. Lower-case que only included when preceded by connectives without complementiser que (mais, car, donc, et, or, pourtant, puis). Matches are fully contained within a sentence and hard-coded to end in a question mark.

(<s>[!word=“<s>”&!word=“\?”]{0,80})([lemma=“quoi”]|[word=“Que”]|[word=“Qu’”]|(([lemma=“mais”]|[lemma=“car”]|[lemma=“donc”]|[word=“Et”]|[lemma=“or”]|[lemma=“pourtant”]|[lemma=“puis”]|[lemma=“sinon”])[lemma=“que”]))([!word=“\?”]{0,80}[word=“\?”]) within <s/>

Appendix B

See Table 5.

Table 5:

Frequency of investigated interrogatives by subcorpus in OPUS French parallel (SketchEngine).

Rank	Subcorpus	Frequency
1	OpenSubtitles2011	554,730
2	Europarl3	3,189
3	MultiUN	2,099
4	Tatoeba	1,640
5	EMEA	316
6	ECB	146
7	MBS	111
8	KDE4	101
9	OfisPublik	51
10	OpenOffice3	2
11	OpenOffice	1

Appendix C

Regular expressions used in the Python programme OpuSearch (Rockstroh and Fliessbach 2024) for first extraction of interrogatives (input for tagging, filtering, analysis in R). The resulting data are available on OSF.

i) (\w{1,15}\s,\s)?(\w{1,15}\s,\s)?(\w{1,15}\s)?(>(Q|q)u(e|’|\s’)\s|(Q|q)u.{0,3}est.{0,3}ce\squ|((D|\sd)e\squoi|(P|\sp)ar\squoi|(À|\sà)\squoi|(E|\se)n\squoi|(P|\sp)our\squoi)).[ˆ,].{1,40}\s(\?)

ii) \s(Q|q)uoi\s(\w{1,15}\s)?(\w{1,15}\s)?(\w{1,15}\s)?(\?)

Appendix D

See Tables 6–14.

Table 6:

French Q-placement with ‘faire’ by +/− Spanish ‘dedicarse’

Faire	QV	VQ	Total
dedicarse	218	42	260
other	29,520	2,734	32,254
Total	29,738	2,776	32,514

Table 7:

French Q-placement with ‘faire’ by +/− Spanish ‘trabajar’

Faire	QV	VQ	Total
trabajar	65	24	89
other	29,673	2,752	32,425
Total	29,738	2,776	32,514

Table 8:

French Q-placement with ‘avoir’ by +/− Spanish ‘pasar’

Avoir	QV	VQ	Total
pasar	1,962	41	2,003
other	2,529	575	3,104
Total	4,491	616	5,107

Table 9:

French Q-placement with ‘avoir’ by +/− Spanish ‘suceder’

Avoir	QV	VQ	Total
suceder	211	2	213
other	4,280	614	4,894
Total	4,491	616	5,107

Table 10:

French Q-placement with ‘raconter’ with(out) second-person pronoun.

Raconter	QV	VQ	Total
SecPersPro	1,378	8	1,386
other	262	23	285
Total	1,640	31	1,671

Table 11:

French Q-placement with ‘parler’ with(out) second-person pronoun.

Parler	QV	VQ	Total
SecPersPro	2,283	33	2,316
other	465	62	527
Total	2,748	95	2,843

Table 12:

French Q-placement with ‘parler’ by +/− Spanish ‘hablar’

Parler	QV	VQ	Total
hablar	1,918	39	1,957
other	830	56	886
Total	2,748	95	2,843

Table 13:

French Q-placement with ‘parler’ by +/− Spanish ‘tratar(se)’

Parler	QV	VQ	Total
tratar(se)	47	16	63
other	2,701	79	2,780
Total	2,748	95	2,843

Table 14:

French Q-placement with ‘dire’ with(out) second-person pronoun.

Dire	QV	VQ	Total
SecPersPro	5,365	250	5,615
other	4,695	807	5,502
Total	10,060	1,057	11,117

References

Aaron, Jessi E. 2010. Pushing the envelope: Looking beyond the variable context. Language Variation and Change 22(1). 1–36. https://doi.org/10.1017/S0954394509990226.Search in Google Scholar

Adli, Aria. 2005-2019. sgs [spontaneous speech, grammaticality judgments, social data]: Multilingual database of Persian, French, Spanish and Catalan containing syntactically annotated transcriptions of spontaneous speech. Köln: Universität zu Köln.Search in Google Scholar

Adli, Aria. 2013. Syntactic variation in French Wh-questions: A quantitative study from the angle of Bourdieu’s sociocultural theory. Linguistics 51(3). 473–515. https://doi.org/10.1515/ling-2013-0018.Search in Google Scholar

Adli, Aria. 2015. What you like is not what you do: Acceptability and frequency in syntactic variation. In Aria Adli, Marco García García & Göz Kaufmann (eds.), Variation in language: System- and usage-based approaches, 173–200. Berlin, München, Boston: De Gruyter.10.1515/9783110346855-008Search in Google Scholar

Agresti, Alan. 2002. Categorical data analysis, 2nd edn. (Wiley Series in Probability and Statistics). New York, Chichester: Wiley-Interscience.10.1002/0471249688Search in Google Scholar

Armstrong, Richard A. 2014. When to use the Bonferroni correction. Ophthalmic and Physiological Optics: The Journal of the British College of Ophthalmic Opticians (Optometrists) 34(5). 502–508. https://doi.org/10.1111/opo.12131.Search in Google Scholar

Aulamo, Mikko, Umut Sulubacak, Sami Virpioja & Jörg Tiedemann. 2020. OpusTools and parallel corpus diagnostics. In LREC’12, 3782–3789. ELRA.Search in Google Scholar

Baheti, Ashutosh, Alan Ritter, Jiwei Li & Bill Dolan. 2018. Generating more interesting responses in neural conversation models with distributional constraints. In Ellen Riloff, David Chiang, Julia Hockenmaier & Jun’ichi Tsujii (eds.), Proceedings of the 2018 conference on empirical methods in natural language processing, 3970–3980. Stroudsburg, PA, USA: Association for Computational Linguistics.10.18653/v1/D18-1431Search in Google Scholar

Baunaz, Lena. 2011. The grammar of French quantification (Studies in Natural Language and Linguistic Theory 83). Dordrecht, Netherlands: Springer.10.1007/978-94-007-0621-7Search in Google Scholar

Baunaz, Lena. 2015. On the various sizes of complementizers. Probus 27(2). 193–236. https://doi.org/10.1515/probus-2014-0001.Search in Google Scholar

Baunaz, Lena & Genoveva Puskás. 2014. On subjunctives and islandhood. In Marie-Hélène Côté & Éric Mathieu (eds.), Variation within and across romance languages: Selected papers from the 41st linguistic symposium on romance languages (LSRL), Ottawa, 5–7 May 2011, 233–253. Amsterdam, Philadelphia: John Benjamins Publishing Company.10.1075/cilt.333.16bauSearch in Google Scholar

Bednarek, Monika. 2018. Language and television series: A linguistic approach to TV dialogue (The Cambridge applied linguistics series). Cambridge: Cambridge University Press.10.1017/9781108559553Search in Google Scholar

Beeching, Kate. 2013. A parallel corpus approach to investigating semantic change. In Karin Aijmer & Bengt Altenberg (eds.), Advances in corpus-based contrastive linguistics (Studies in corpus linguistics 54), 103–126. Amsterdam: Benjamins.10.1075/scl.54.07beeSearch in Google Scholar

Behnstedt, Peter. 1973. Viens-tu ? Est-ce que tu viens ? Tu viens ? Formen und Strukturen des direkten Fragesatzes im Französischen (Tübinger Beiträge zur Linguistik 41). Tübingen: Gunter Narr.Search in Google Scholar

Bescherelle. 2005. La conjugaison pour tous: dictionnaire de 12 000 verbes (Bescherelle). Paris: Hatier.Search in Google Scholar

Blanche-Benveniste, Claire. 1997. La notion de variation syntaxique dans la langue parlée. Langue française 115(1). 19–29. https://doi.org/10.3406/lfr.1997.6219.Search in Google Scholar

Boeckx, Cedric, Penka Stateva & Arthur Stepanov. 2001. Optionality, presupposition, and Wh-in situ in French. In Joaquim Camps & Caroline R. Wiltshire (eds.), Romance syntax, semantics and L2 acquisition: Selected papers from the 30th linguistic symposium on romance languages : Gainesville, Florida, February 2000 (Amsterdam studies in the Theory and History of Linguistic Science 216), 57–71. Philadelphia, PA: John Benjamins Pub. Co.10.1075/cilt.216.07boeSearch in Google Scholar

Bogucki, Łukasz & Jorge Díaz-Cintas. 2020. An excursus on audiovisual translation. In Łukasz Bogucki & Mikołaj Deckert (eds.), The Palgrave handbook of audiovisual translation and media accessibility (Palgrave Studies in Translating and Interpreting), 11–32. Cham: Springer International Publishing.10.1007/978-3-030-42105-2_2Search in Google Scholar

Bouchard, Denis & Paul Hirschbühler. 1987. French QUOI and its clitic allomorph QUE. In Carol Neidle & Rafael A. Nuñez (eds.), Studies in Romance Languages, 39–60. Dordrecht: Foris.Search in Google Scholar

Briel, Holger. 2023. The question concerning piracy. In Holger Briel, Michael High & Markus Heidingsfelder (eds.), The piracy years: Internet file sharing in a global context, 9–94. Liverpool: XJTLU Imprint.10.3828/liverpool/9781802070545.003.0002Search in Google Scholar

Bruening, Benjamin. 2016. Light verbs are just regular verbs. In GradLingS (ed.), Proceedings of the 39th annual Penn linguistics conference (University of Pennsylvania Working Papers in Linguistics 22), 51–60.Search in Google Scholar

Brugman, Claudia. 2001. Light verbs and polysemy. Language Sciences 23(4–5). 551–578. https://doi.org/10.1016/S0388-0001(00)00036-X.Search in Google Scholar

Casarini, Alice & Serenella Massidda. 2017. Sub Me Do – The development of fansubbing in traditional dubbing countries. In David Orrego-Carmona & Yvonne Lee (eds.), Non-professional subtitling. Newcastle upon Tyne: Cambridge Scholars Publishing.Search in Google Scholar

Celle, Agnès, Anne Jugnet & Laure Lansari. 2021. Expressive questions in English and French: What the hell versus Mais qu’est-ce que. In Andreas Trotzke & Xavier Villalba (eds.), Expressive meaning across linguistic levels and frameworks (Oxford scholarship online), 138–166. Oxford: Oxford University Press.10.1093/oso/9780198871217.003.0008Search in Google Scholar

Chang, Lisa. 1997. Wh-in-situ phenomena in French. The University of British Columbia PhD.Search in Google Scholar

Cheng, Lisa L.-S. & Johan Rooryck. 2000. Licensing Wh-in-situ. Syntax 3(1). 1–19. https://doi.org/10.1111/1467-9612.00022.Search in Google Scholar

Ciardelli, Ivano, Jeroen Groenendijk & Floris Roelofsen. 2019. Inquisitive semantics (Oxford surveys in semantics and pragmatics 6). Oxford, United Kingdom: Oxford University Press.10.1093/oso/9780198814788.001.0001Search in Google Scholar

Coveney, Aidan. 1995. The use of the QU-final interrogative structure in spoken French. Journal of French Language Studies 5(2). 143–171. https://doi.org/10.1017/S0959269500002738.Search in Google Scholar

Coveney, Aidan. 2011a. A language divided against itself? Diglossia, code-switching and variation in French. In France Martineau (ed.), Le francais en contact: Hommages à Raymond Mougeon (Les voies du français), 51–85. Quebec: Presses de l’Univ. Laval.Search in Google Scholar

Coveney, Aidan. 2011b. L’interrogation directe. Travaux de linguistique 63(2). 112–145. https://doi.org/10.3917/tl.063.0112.Search in Google Scholar

Crisp, Virginia. 2015. Film Distribution in the digital age. London: Palgrave Macmillan UK.10.1057/9781137406613Search in Google Scholar

Cubbison, Laurie. 2005. Anime fans, DVDs, and the authentic text. The Velvet Light Trap 56(1). 45–57. https://doi.org/10.1353/vlt.2006.0004.Search in Google Scholar

Delafontaine, François. 2020. Unités grammaticales et particule discursive quoi. Studia linguistica romanica 4. https://doi.org/10.25364/19.2020.4.4.Search in Google Scholar

Déprez, Viviane, Kristen Syrett & Shigeto Kawahara. 2013. The interaction of syntax, prosody, and discourse in licensing French wh-in-situ questions. Lingua 124. 4–19. https://doi.org/10.1016/j.lingua.2012.03.002.Search in Google Scholar

Dömötör, Andrea. 2019. Syntax is clearer on the other side – Using parallel corpus to extract monolingual data. In Marie Candito, Kilian Evang, Stephan Oepen & Djamé Seddah (eds.), Proceedings of the 18th international workshop on Treebanks and linguistic theories (TLT, SyntaxFest 2019), 118–125. Stroudsburg, PA, USA: Association for Computational Linguistics.10.18653/v1/W19-7813Search in Google Scholar

Dryer, Matthew S. 2013. Position of interrogative phrases in content questions (v2020.3). In Matthew S. Dryer & Martin Haspelmath (eds.), The world Atlas of language structures online. Zenodo.Search in Google Scholar

Dufter, Andreas, Klaus Grübl & Thomas Scharinger. 2019. Des parlers d’oïl à la francophonie : Réflexions autour de l’expansion historique du français. In Andreas Dufter, Klaus Grübl & Thomas Scharinger (eds.), Des parlers d’oïl à la francophonie: contact, variation et changement linguistiques (Zeitschrift für romanische Philologie. Beihefte zur Zeitschrift für romanische Philologie), 1–16. Berlin: De Gruyter.10.1515/9783110541816-001Search in Google Scholar

Dwyer, Tessa. 2017. Speaking in subtitles: Revaluing screen translation. Edinburgh: Edinburgh University Press.10.3366/edinburgh/9781474410946.001.0001Search in Google Scholar

Ehmer, Oliver & Malte Rosemeyer. 2018. When “Questions” are not questions. Inferences and conventionalization in Spanish but-prefaced partial interrogatives. Open Linguistics 4(1). 70–100. https://doi.org/10.1515/opli-2018-0005.Search in Google Scholar

Elsig, Martin. 2009. Grammatical variation across space and time: The French interrogative system (Studies in Language Variation 3). Amsterdam: John Benjamins.10.1075/silv.3Search in Google Scholar

Farkas, Donka F. 2022. Non-intrusive questions as a special type of non-canonical questions. Journal of Semantics 39(2). 295–337. https://doi.org/10.1093/jos/ffac001.Search in Google Scholar

Faure, Richard & Katerina Palasis. 2021. Exclusivity! Wh-fronting is not optional wh-movement in Colloquial French. Natural Language and Linguistic Theory 39(1). 57–95. https://doi.org/10.1007/s11049-020-09476-w.Search in Google Scholar

Field, Andy P., Jeremy Miles & Zoë Field. 2012. Discovering statistics using R. London, Thousand Oaks: Sage.Search in Google Scholar

Gadet, Françoise. 1997. La variation, plus qu’une écume. Langue française 115(1). 5–18. https://doi.org/10.3406/lfr.1997.6218.Search in Google Scholar

Garassino, Davide. 2022. Discourse-pragmatic perspectives on interrogatives. Functions of Language 29(1). 25–57. https://doi.org/10.1075/fol.00037.gar.Search in Google Scholar

Grevisse, Maurice & André Goose (eds.). 2008. Le bon usage, 14th edn. Bruxelles: De Boeck/Duculot.Search in Google Scholar

Guryev, Alexander. 2017. La forme des interrogatives dans le Corpus suisse de SMS en français : étude multidimensionnelle. Sorbonne Paris Cité en cotutelle avec Université de Neuchâtel (Neuchâtel, Suisse) Thèse de doctorat.Search in Google Scholar

Guryev, Alexander. 2021. « Car je suis un… tu sais quoi ? Rappeur » : Étude pragma-syntaxique de la séquence Tu sais quoi ? Lexique 29. 97–115. https://doi.org/10.54563/lexique.771.Search in Google Scholar

Guryev, Alexander & François Delafontaine. 2015. La variabilité formelle des questions dans les écrits SMS. Travaux neuchâtelois de linguistique(63). 129–152. https://doi.org/10.26034/tranel.2015.2973.Search in Google Scholar

Guryev, Alexander & François Delafontaine. 2023. L’interrogative in situ à la lumière des principes de ‘End-Weight’ et ‘End-Focus’. Journal of French Language Studies 33(3). 299–323. https://doi.org/10.1017/S0959269523000145.Search in Google Scholar

Hamlaoui, Fatima. 2010. Anti-givenness, prosodic structure and “intervention effects”. The Linguistic Review 27(3). 347–364. https://doi.org/10.1515/tlir.2010.013.Search in Google Scholar

Hamlaoui, Fatima. 2011. On the role of phonology and discourse in Francilian French wh-questions. Journal of Linguistics 47(1). 129–162. https://doi.org/10.1017/S0022226710000198.Search in Google Scholar

Hoffenberg, Théo. 1999. Reverso: A new generation of machine translation software for English-French-English, German-French-German, etc. In EAMT workshop: EU and the new languages. Prague, Czech Republic: European Association for Machine Translation.Search in Google Scholar

Hölker, Klaus. 2010. Frz. quoi als Diskursmarker. Linguistik Online 44(4). https://doi.org/10.13092/lo.44.405.Search in Google Scholar

Huang, Yan. 2017. Conversational implicature. In Yan Huang (ed.), The Oxford handbook of pragmatics (Oxford handbooks in linguistics), 156–179. Oxford: Oxford University Press.10.1093/oxfordhb/9780199697960.013.7Search in Google Scholar

Jakubíček, Miloš, Adam Kilgarriff, Diana McCarthy & Pavel Rychlý. 2010. Fast syntactic searching in very large Corpora for many languages. PACLIC. 741–747.Search in Google Scholar

Kalouli, Aikaterini-Lida, Katharina Kaiser, Annette Hautli-Janisz, Georg A. Kaiser & Miriam Butt. 2018. A multilingual approach to question classification. In Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018). Miyazaki, Japan: European Language Resources Association (ELRA).Search in Google Scholar

Kilgarriff, Adam, Pavel Rychlý, Pavel Smrž & David Tugwell. 2004. The Sketch engine. In Proceedings of the 11th EURALEX international congress, 105–116.Search in Google Scholar

Koch, Peter & Wulf Oesterreicher. 1985. Sprache der Nähe – Sprache der Distanz: Mündlichkeit und Schriftlichkeit im Spannungsfeld von Sprachtheorie und Sprachgeschichte. Romanistisches Jahrbuch 36. 15–43. https://doi.org/10.1515/9783110244922.15.Search in Google Scholar

Krämer, Philipp, Ulrike Vogl & Leena Kolehmainen. 2022. What is “language making”? International Journal of the Sociology of Language 274. 1–27. https://doi.org/10.1515/ijsl-2021-0016.Search in Google Scholar

Labov, William. 2004. Quantitative analysis of linguistic variation. In Ulrich Ammon, Norbert Dittmar, Klaus J. Mattheier & Peter Trudgill (eds.), Sociolinguistics: An international handbook of the science of language and society, 2nd edn. (Handbücher zur Sprach- und Kommunikationswissenschaft 3.1). 6–21. Berlin: de Gruyter Reference Global.Search in Google Scholar

Larrivée, Pierre & Alexander Guryev. 2021a. Variantes formelles de l’interrogation. Présentation. Langue française 212(4). 9–24. https://doi.org/10.3917/lf.212.0009.Search in Google Scholar

Larrivée, Pierre & Alexander Guryev (eds.). 2021b. Variantes formelles de l’interrogation (Langue française 212). Paris: Armand Colin.10.3917/lf.212.0009Search in Google Scholar

Lefeuvre, Florence. 2001. Pour quoi. Travaux de linguistique 42-43(1). 199–210. https://doi.org/10.3917/tl.042.199.Search in Google Scholar

Lefeuvre, Florence. 2020. Les interrogatives partielles dans un corpus de théâtre contemporain. Langages 217(1). 23–38. https://doi.org/10.3917/lang.217.0023.Search in Google Scholar

Lefeuvre, Florence & Gabriella Parussa. 2020. L’oral représenté en diachronie et en synchronie : une voie d’accès à l’oral spontané ? Langages 217. 9–21. https://doi.org/10.3917/lang.217.0009.Search in Google Scholar

Lefeuvre, Florence & Nathalie Rossi-Gensane. 2015. Interrogation. Project report. Université Sorbonne Nouvelle – Paris 3. http://www.univ-paris3.fr/medias/fichier/fiche-interrogation_1425994815933.pdf (accessed 15 August 2024).Search in Google Scholar

Levin, Beth. 1993. English verb classes and alternations: A preliminary investigation. Chicago, London: University of Chicago Press.Search in Google Scholar

Levshina, Natalia. 2017. Online film subtitles as a corpus: An n-gram approach. Corpora 12(3). 311–338. https://doi.org/10.3366/cor.2017.0123.Search in Google Scholar

Lison, Pierre & Jörg Tiedemann. 2016. OpenSubtitles2016: Extracting large parallel corpora from movie and TV subtitles. In LREC’16, 923–929. ELRA.Search in Google Scholar

Massidda, Serenella. 2020. Fansubbing: Latest trends and future prospects. In Łukasz Bogucki & Mikołaj Deckert (eds.), The Palgrave handbook of audiovisual translation and media accessibility (Palgrave Studies in Translating and Interpreting), 189–208. Cham: Springer International Publishing.10.1007/978-3-030-42105-2_10Search in Google Scholar

Massot, Benjamin & Paul Rowlett. 2013. Le débat sur la diglossie en France: aspects scientifiques et politiques. Journal of French Language Studies 23(1). 1–16. https://doi.org/10.1017/S0959269512000336.Search in Google Scholar

Mathieu, Eric. 2004. The mapping of form and interpretation: The case of optional wh-movement in French. Lingua 114(9–10). 1090–1132. https://doi.org/10.1016/j.lingua.2003.07.002.Search in Google Scholar

Michalke, Meik. 2021. koRpus: Text analysis with emphasis on POS tagging, readability, and lexical diversity (0.13-8). https://reaktanz.de/?c=hacking&s=koRpus (accessed 23 December 2022).Search in Google Scholar

Munaro, Nicola & Hans-Georg Obenauer. 1999. On underspecified wh-elements in pseudointerrogatives. Working Papers in Linguistics 9(1–2). 181–253.Search in Google Scholar

Myers, Lindsy L. 2007. WH-interrogatives in Spoken French: A corpus-based analysis of their form and function. Austin, Texas: The University of Texas Dissertation.Search in Google Scholar

New, Boris, Christophe Pallier, Marc Brysbaert & Ludovic Ferrand. 2004. Lexique 2: A new French lexical database. Behavior Research Methods, Instruments, & Computers: A Journal of the Psychonomic Society, Inc 36(3). 516–524. https://doi.org/10.3758/BF03195598.Search in Google Scholar

Nornes, Abé M. 1999. For an abusive subtitling. Film Quarterly 52(3). 17–34. https://doi.org/10.2307/1213822.Search in Google Scholar

O’Hagan, Minako. 2009. Evolution of user-generated translation. The Journal of Internationalization and Localization. 94–121. https://doi.org/10.1075/jial.1.04hag.Search in Google Scholar

Ortmann, Katrin, Adam Roussel & Stefanie Dipper. 2019. Evaluating off-the-shelf NLP tools for German. In Proceedings of the conference on natural language processing (KONVENS), 212–222. Erlangen, Germany.Search in Google Scholar

Palasis, Katerina, Richard Faure & Frédéric Lavigne. 2019. Explaining variation in wh -position in child French: A statistical analysis of new seminaturalistic data. Language Acquisition 26(2). 210–234. https://doi.org/10.1080/10489223.2018.1513004.Search in Google Scholar

Palasis, Katerina, Richard Faure & Fanny Meunier. 2023. Wh-in-situ in child French: Deictic triggers at the syntax-semantics interface. Journal of French Language Studies. 1–26. https://doi.org/10.1017/S0959269523000030.Search in Google Scholar

Pausé, Marie-Sophie, Agnès Tutin, Olivier Kraif & Maximin Coavoux. 2022. Extraction de Phrases Préfabriquées des Interactions à partir d’un corpus arboré du français parlé : une étude exploratoire. SHS Web of Conferences 138. 10002. https://doi.org/10.1051/shsconf/202213810002.Search in Google Scholar

Perego, Elisa & Ralph Pacinotti. 2020. Audiovisual translation through the ages. In Łukasz Bogucki & Mikołaj Deckert (eds.), The Palgrave handbook of audiovisual translation and media accessibility (Palgrave Studies in Translating and Interpreting), 33–56. Cham: Springer International Publishing.10.1007/978-3-030-42105-2_3Search in Google Scholar

Poletto, Cecilia & Jean-Yves Pollock. 2009. Another look at wh-questions in Romance: The case of Mendrisiotto and its consequences for the analysis of French wh-in situ and embedded interrogatives. In Danièle Torck & W. Leo Wetzels (eds.), Romance languages and linguistic theory 2006, 303, 199–258. Amsterdam: John Benjamins.10.1075/cilt.303.12polSearch in Google Scholar

Quillard, Virginie. 2000. Interroger en français parlé: Etudes syntaxique,pragmatique et sociolinguistique. France: Université de Tours PhD Thesis.Search in Google Scholar

R Core Team. 2024. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/ (accessed 1 February 2024).Search in Google Scholar

Rajeg, Gede P. W. & I. M. Rajeg. 2022. A corpus linguistic study of constructional equivalence for the Indonesian translation of ROB and STEAL based on the Open Subtitles Parallel Corpus. Journal of Linguistics and Education 12(2). 28–48. https://doi.org/10.14710/parole.v12i2.177-197.Search in Google Scholar

Rockstroh, Johanna & Jan Fliessbach. 2024. OpuSearch: Application and GUI to generate and search alignments from OPUS OpenSubtitles. Geneva: zenodo. https://doi.org/10.5281/zenodo.12742554 (accessed 15 August 2024).Search in Google Scholar

Rosemeyer, Malte. 2021. Two types of constructionalization processes in Spanish and Portuguese Clefted wh-interrogatives. Studies in Hispanic and Lusophone Linguistics 14(1). 117–160. https://doi.org/10.1515/shll-2021-2042.Search in Google Scholar

Schmid, Helmut. 1994. Probabilistic part-of-speech tagging using decision trees. In Proceedings of the international conference on new methods in language processing. Manchester, UK.Search in Google Scholar

Siepmann, Dirk, Christoph Bürgel & Sascha Diwersy. 2016. Le Corpus de référence du français contemporain (CRFC), un corpus massif du français largement diversifié par genres. SHS Web of Conferences 27. 11002. https://doi.org/10.1051/shsconf/20162711002.Search in Google Scholar

Thiberge, Gabriel. 2020. Acquisition et maîtrise des interrogatives partielles en français: La variation sociolinguistique comme outil interactionnel. Paris: Université de Paris Doctoral Thesis.Search in Google Scholar

Tiedemann, Jörg. 2012. Parallel data, tools and interfaces in OPUS. In Proceedings of the eighth international conference on language resources and evaluation (LREC’12), 2214–2218. Istanbul, Turkey: European Language Resources Association (ELRA).Search in Google Scholar

Tutin, Agnès. 2019. Phrases préfabriquées des interactions: quelques observations sur le corpus CLAPI. Cahiers de lexicologie 114. 63–91.Search in Google Scholar

Villeneuve, Anne-José & Julie Auger. 2013. ‘chtileu qu’i m’freumereu m’bouque i n’est point coér au monne’: Grammatical variation and diglossia in Picardie. Journal of French Language Studies 23(1). 109–133. https://doi.org/10.1017/S0959269512000385.Search in Google Scholar

Wittenberg, Eva. 2016. With light verb constructions from syntax to concepts (Potsdam Cognitive Science Series 7). Potsdam: Univ.-Verl.Search in Google Scholar

Zipf, George K. 1935. The psycho-biology of language. Oxford, England: Houghton, Mifflin.Search in Google Scholar

Zipf, George K. 1949. Human behavior and the principle of least effort. Cambridge: Addison-Wesley Press.Search in Google Scholar

Zumwald Küster, Géraldine. 2018. Est-ce que et ses concurrents. In Marie-José Béguelin, Aidan Coveney & Alexander Guryev (eds.), L’interrogative en français (Sciences pour la communication 124), 95–118. Bern: Peter Lang.Search in Google Scholar

Received: 2024-01-03

Accepted: 2024-08-05

Published Online: 2024-09-03

Published in Print: 2025-05-26

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/cllt-2024-0001

Keywords for this article

Wh-questions; French; interrogatives; deixis; zipf; in-situ

Creative Commons

BY 4.0