Discourse connectives and their arguments: an experiment on anaphoricity in German

Yulia Clausen; Manfred Stede

doi:10.1515/lingvan-2021-0102

Artikel Open Access

Discourse connectives and their arguments: an experiment on anaphoricity in German

Yulia Clausen und Manfred Stede

Veröffentlicht/Copyright: 3. November 2022

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen Erkunden Sie dieses Fachgebiet

Aus der Zeitschrift Linguistics Vanguard Band 8 Heft 1

Abstract

Adverbial connectives like therefore, which link a preceding ‘external’ to an ‘internal’ argument, can be regarded as anaphoric: The external argument is selected by an interpretation process akin to that of an event anaphor, and intervening material can appear between both arguments. We report on a crowdsourcing experiment on the German connectives trotzdem and dennoch that studies factors that lead readers to assume such long-distance arguments: semantic plausibility of intervening material, ‘subjective’ versus ‘objective’ content, and the presence of an anaphoric morpheme in the connective. We find that the type and content of the intervening material play an important role in argument choice.

Keywords: connective; discourse anaphora; discourse structure

1 Introduction

Discourse connectives are a closed class of lexical items that indicate a semantic or pragmatic relation between units of text (e.g., Danlos et al. 2018). The relation can for instance be causal or temporal:

(1)

This year the storm was particularly heavy. As a result, 52 buildings lost their roofs.

(2)

While the storm was still underway, local politicians started devising relief packages.

Following Webber et al. (2003), we call the related text spans arguments of the connective, evoking the analogy to the verb-arguments complex. According to the Handbook of German Connectives (Pasch et al. 2003), a lexical item X is a connective if

X is not inflectable,
X does not assign case to its syntactic environment,
X expresses some two-place semantic relation,
the arguments of the relational meaning of X are propositional structures,
the verbalizations of the arguments are or could be clauses.

This definition covers coordinating and subordinating conjunctions as well as certain adverbials. All connectives have exactly two arguments; we call the argument syntactically associated with the connective internal, and the other one external. Borrowing notation from Prasad et al. (2008), we use italics for the external argument, boldface for the internal one, and underlining for the connective; cf. examples (1) and (2) above.

In this paper, we present a study on the anaphoric behavior of connectives, designed to uncover some factors influencing the ‘tolerated’ linear distance between an anaphoric connective and its antecedent statement. In the following, we discuss the notion of anaphoricity and afterwards specify our goals in detail.

1.1 Structural versus anaphoric connectives

We focus specifically on adverbial connectives. Here, the common pattern of realization is:

[external argument] [internal argument with connective]

Mostly, there is a sentence boundary between the two arguments, and the internal argument consists of the single sentence that contains the adverbial connective. The external argument can be a sentence, but can also be longer or shorter. Furthermore, sometimes there may be intervening material between the two arguments:

[external argument] [other] [internal argument with connective]

This situation prompted Webber et al. (2003) to explicitly distinguish structural and anaphoric connectives. The former provide a syntactic link between adjacent clauses, as with preposed adverbial clauses (ex. (3)) and with paired coordinating conjunctions (ex. (4)):^[1]

(3)

Although John is generous, he is hard to find.

(4)

On the one hand, Fred likes beans. On the other hand, he is allergic to them.

Webber et al. point out that structural connectives do not admit crossing of their predicate-argument dependencies:

(5)

On the one hand, Fred likes beans.

Not only does he eat them for dinner.

On the other hand, he’s allergic to them.

(??) But he also eats them for breakfast and snacks.

In contrast, adverbial connectives do allow crossing dependencies, and are thus seen as anaphoric, selecting their antecedent from the context in the same way as event anaphors do.

(6)

John loves Barolo.

So he ordered three cases of the ’97.

But he had to cancel the order

because then he discovered he was broke.

The combination of structural connective (because) and adverbial (then) allows attaching the discover event (d) temporally following the ordering event (b), while the causal relation holds between the discovery (d) and the cancellation (c).

In German, the same dichotomy exists, and Pasch et al. (2003: 522) also call the relation between connective and external argument anaphoric. Furthermore, certain German adverbials contain an explicitly-anaphoric morpheme, e.g., da raufhin (‘thereupon’) or trotz dem (‘however’/‘nonetheless’), while others do not, e.g., später (‘later’) or dennoch (‘however’/‘nonetheless’). Stede and Grishina (2016) reported that 79 connectives (29%) from the lexicon ‘DiMLex’^[2] have an anaphoric morpheme (prefix or suffix).

Here is another example from Webber et al. (2003), illustrating ‘long-distance arguments’:

(7)

Although John is generous – for example, he gives money to anyone who asks him for it – he’s hard to find.

So far, not much is known about the conditions under which readers prefer a long-distance over an adjacent argument, for instance about the discourse function of the intervening material (e.g., parenthetical, as in ex. (7)) and about the distance admissible for an argument to be picked up. In the absence of appropriate annotated German corpora, these questions are to be tackled by carefully designed experiments.

1.2 Goals and structure of the paper

We take an experimental approach (specifically, ‘crowdsourcing’) to obtain insights about the anaphoric behavior of German connectives. In particular, we study the connection between readers’ readiness to accept a long-distance argument and its semantic ‘likelihood’. Thus we vary semantic conditions, and determine readers’ preferences on argument selection by testing inferences triggered by a concessive connective. In the test items, participants mark those statements (a to d) that they understand as being entailed by the text:

(8)

In June, we were often out in the pinewoods. Nonetheless we haven’t found any shaggy parasols.

a. In early summer, shaggy parasols grow in the pinewoods.

b. We didn’t have our glasses on.

c. No shaggy parasols grew in early summer this year.

d. None of the specified options applies.

From the answers we infer what external argument the participant has constructed for the connective. Our assumption is that normally, readers prefer adjacency of external argument and adverbial connective, but that this preference can be overridden by semantic features making the adjacent sentence less likely to function as external argument.

In the following, we discuss related work on the anaphoricity of connectives and pronominal abstract anaphora (Section 2). Then we introduce the rationale of our approach and our specific hypotheses (Section 3), present our experiment (Section 4) and results (Section 5). Finally, we discuss the outcome (Section 6) and draw conclusions (Section 7).

2 Previous research

2.1 Connectives and anaphoricity

The largest text corpus annotated with connectives and their arguments is the English-language Penn Discourse Treebank or PDTB (Prasad et al. 2008), which contains texts from the Wall Street Journal with a total of 18,459 connectives. One thousand six hundred and sixty six of the 18,459 external arguments (9%) are in a non-adjacent previous sentence, and 736 of those 1,666 connectives are adverbials. Lee et al. (2008) computed some statistics of the syntactic types of external arguments. Prasad et al. (2010) found that paragraph boundaries are a strong barrier for external arguments of adverbials to stretch across, unless the sentence with the adverbial is paragraph-initial.

A related project is the discourse-level annotation in the Czech-language Prague Dependency Treebank^[3] whose size is similar to PDTB. Regarding long-distance connective arguments, Poláková and Mírovský (2019) reported that 5,455 relations range over more than one sentence, and for 11.7% of inter-sentential relations (3.5% of all coherence relations) the external and internal arguments of a connective are non-adjacent. Manual analysis of 245 instances of frequent connectives with non-adjacent arguments revealed two types of material:

The external argument provides a general statement/claim; the intervening material is an elaboration (e.g., detail or background information); the internal argument is strongly linked to the topic of the external one.
The intervening material constitutes a digression, which can be marked as parenthetical by brackets or dashes.

Further, the study by Poláková and Mírovský is to our knowledge the only one that looked for a potential effect of explicit anaphoric morphemes in antecedent selection (in the Czech data), but no evidence was found.

For German, there is PDTB-style data in the Potsdam Commentary Corpus (Stede and Neumann 2014), but with roughly 1,100 connectives it is too small to study aspects of argument distance.

2.2 Pronominal abstract anaphora

For the related phenomenon of pronominal event anaphora, a comprehensive survey is provided by Kolhatkar et al. (2018). Corpus-based work on German is to our knowledge limited to Dipper and Zinsmeister (2012). In parliamentary debate text, they annotated 225 instances of abstract anaphors (demonstratives das/dies ‘this’ and personal pronoun es ‘it’). For the antecedents, they used a set of semantic types, which are potentially relevant for resolving the anaphors, as their role in the verb complex may trigger semantic selectional restrictions on the antecedent; this is an important difference to anaphoric connectives.

Of the three pronouns, das is the one that most frequently occurs in adjacency to its antecedent, but for all pronouns there are many instances of non-adjacency (exact numbers are not given in the paper).

2.3 Other features of connectives

One of the factors we investigate in our experiment is the role of subjectivity in the statements joined by the connective. So far, the subjective/objective parameter has been studied predominantly for the connectives themselves, such as the difference between Dutch want and omdat (Canestrelli et al. 2013) or similar contrasts in Spanish (Santana et al. 2021). The influence of non-/subjective contexts on a single connective, on the other hand, has received little attention. Likewise, we are not aware of earlier work that investigated the tradeoff between the semantic plausibility of an external argument versus its linear distance from the connective.

3 Approach

The corpus work described above shows that non-adjacent external arguments are by no means rare in authentic text. But, as no suitable corpora are available for German, we study readers’ preferences on non-/adjacent external arguments by means of an experimental approach. Specifically, we use crowdsourcing, which allows us to efficiently recruit large numbers of participants working online. For research questions such as ours it can be a good alternative to a tightly-controlled lab experiment with a small number of subjects. To our knowledge, such studies have so far not been conducted on readers’ selection of connectives’ arguments.^[4]

In designing the experiment, we aim at a ‘natural reading’ scenario and try to reconstruct the reader’s antecedent selection by means of a follow-up question. Specifically, we opt for a multiple-choice question that offers a number of propositions, for which readers should indicate whether they are entailed by the preceding text.

The anaphoric connective used in the items conveys a coherence relation between its internal and external arguments. We use only adverbial connectives and fix their position as sentence-initial, so the task boils down to finding the portion of the preceding context that fits the coherence relation to the sentence starting with the connective. To keep ‘portions’ simple, we choose to compose the context of short main-clause sentences with an occasional embedded clause; the reader’s task essentially is to pick one (or more) sentences as external argument. Using the notation from Section 1.1, and calling the connective-initial sentence the ‘trigger sentence’, our items look like this:

{[external argument sentence] | [other sentence]} [trigger sentence]

Curly brackets indicate that different orderings of the two elements are possible.

The next decision is what coherence relation(s) to use for triggering the external argument selection. For example, if the connective is then, we ask the reader to select the sentence(s) expressing the event that temporally precedes the one described in the internal argument. This seems difficult to design, though, because essentially any event can temporally precede any other event, so testing for long-distance arguments is not likely to be fruitful. The same problem holds for a purely additive relation, as signaled by also or furthermore.

A causal relation seems to be a better choice: Readers are invited to infer the most likely reason for the event in the trigger sentence, when it starts for example with thus or therefore. We decided to select a specific variant, viz. the Concession relation, whose semantics is generally described as an “overwritten default causality” (see, for example, König 1988). A statement ‘A. Nonetheless, B.’ conveys that A would usually imply not-B, but in this particular case, B holds ‘anyway’. When embedded in a discourse, the connective prompts the reader to select an A from the left context that most likely implies not-B. Hence, A can immediately precede the trigger sentence (as in ex. (9)), but it need not (as in ex. (10)). In both examples, the external argument (in italics) corresponds to A and implies ‘one does typically not jump into lakes’, i.e., the negation of the internal argument (in bold) B.

(9)

The forecasters had projected winter weather. Indeed, temperatures were below zero. Nonetheless we jumped into the lake, like every morning.

(10)

Temperatures were below zero. Just like the forecasters had projected. Nonetheless we jumped into the lake, like every morning.

The concessive connective thus prompts the reader to construct the complete ‘overwritten default statement’ (here: ‘People don’t jump into lakes when temperatures are freezing.’). In our experiment items, this reconstruction is what we test for as a proposition to be judged as implied by the text (or not); see Section 4.1 for examples.

Using this setting, we study three factors as potentially affecting readers’ judgments, and formulate corresponding hypotheses:

(H1) Plausibility versus distance: Starting from the assumption that readers prefer adjacency for connective arguments, we hypothesize that semantic implausibility of the inference involving the adjacent sentence makes the selection of a distant argument more likely.
(H2) Factuality status of the candidate inference: We hypothesize that material presented as a fact is more easily accepted as potential implications than subjective judgments are.
(H3) Connective choice: Specifically for two German concessive connectives, we hypothesize that for trotz dem, which has an explicitly-anaphoric morpheme, a distant argument is more easily constructed than for dennoch, which has no such morpheme.^[5]

4 Experiment setup

4.1 Design

Every participant received a questionnaire containing eight items that we are interested in, and eight fillers. Items and fillers are short texts of two or three simple sentences,^[6] followed by three statements that are possibly implied by the text meaning (see Supplementary Material for a complete list of experiment items). The participants were instructed to carefully read the text and mark those statements that they could infer from the content of the given text, without relying solely on their world knowledge or opinions. As the three options are semantically not mutually exclusive, participants could select as many as they wished out of the three, or opt for ‘none of the specified options applies’. The instructions came along with an example of a text with possible inferences and explanations (see Supplementary Material).

Items in individual questionnaires come from a pool of eight item groups of four items each, i.e., sets of items that cover the same topic, but vary the parameters we study. One parameter concerns the connective with/out anaphoric morpheme (H3): Four of the eight item groups contain trotzdem, the other four dennoch. For logistical reasons, the study was split into two experiments, henceforth exp1 and exp2. Both contained the same items except that we swapped the occurrences of trotzdem and dennoch between the experiments.

To study the factuality parameter (H2), we used two types of response statements in our experiment conditions II, III and IV: In half of the item groups – two with trotzdem and two with dennoch – the response statements b, which represent an inference from the second sentence and the concessive connective, were conceived as facts (‘Cabrera’, ‘Mexico’, ‘Parasol’, ‘Vaccine’; see Supplementary Material), whereas in the other half, b expressed a subjective opinion (‘Barbecue’, ‘Birthday’, ‘Farm’, ‘Sea’). To assess our third parameter – the tradeoff between semantic plausibility and anaphoric distance (H1) – we formulate the second sentence of the short texts in variants that make it semantically more or less likely to be regarded as part of the external argument (whose core is always the first sentence).

An example of an item group (‘Cabrera’) is given in examples (11)–(14) below. The first item of each group (11) contains two sentences: a sentence with the external argument and the trigger sentence starting with trotzdem/dennoch. We use this kind of item (adjacent external argument and connective) as a control to test whether the candidate inference induced by the concessive connective can actually be drawn by participants (condition I). Texts in items (12)–(14) comprise three sentences: the same two as in condition I plus an intervening sentence in between. Readers can regard this intervening sentence either as part of the external argument of the connective or not. We vary the degree of semantic incompatibility between the intervening sentence and the sentence with the connective: In (12), the intervening sentence coheres easily with the trigger sentence (condition II); in (13), the relation is less plausible (condition III); and in (14), the compatibility is the lowest (condition IV). Taken together, these items test how an inference based on a long-distance argument is drawn in the presence of varying amount of intervening information between an anaphoric connective and its antecedent.

The options offered to participants (a to d) for judging plausible inferences were selected for the four conditions as follows: a is the target inference (based on the adjacent argument in condition I, but on the long-distance argument in conditions II – IV); it stays unchanged throughout the group. Option b is either an invalid inference (condition I) or represents a statement implied by the respective intervening sentence, i.e., corresponds to adjacency between the connective and external argument (conditions II – IV). For this statement, we contrast versions of items with a factual statement on the one hand, and a subjective opinion on the other, in order to test our hypothesis on the role of subjectivity (H2).^[7] Option c serves as distraction material that is kept unchanged throughout the group. Option d can be selected if none of the above seems applicable.

(11)

Condition I: [Sent1 TriggerSent]

Ich kenne mich mit der Literatur des 17. Jahrhunderts ziemlich gut aus. Trotzdem habe ich noch nie von Juan de Cabrera gehört.

‘I am quite familiar with the literature of the 17th century. Nonetheless I have never heard of Juan de Cabrera.’

Juan de Cabrera war ein Schriftsteller des 17. Jahrhunderts.

‘Juan de Cabrera was a writer of the 17th century.’

Spanische Literatur ist nicht mein Arbeitsgebiet.

‘Spanish literature is not my field of work.’

Juan de Cabrera war kein richtiger Schriftsteller.

‘Juan de Cabrera was not a real writer.’

Keine der genannten Optionen gilt.

‘None of the specified options applies.’

(12)

Condition II: [Sent1 Sent2highLinkToTrigger TriggerSent]

Ich kenne mich mit der Literatur des 17. Jahrhunderts ziemlich gut aus. Ich habe mich vor allem mit spanischen Autoren befasst. Trotzdem habe ich noch nie von Juan de Cabrera gehört.

‘I am quite familiar with the literature of the 17th century. I mostly focused on the Spanish authors. Nonetheless I have never heard of Juan de Cabrera.’

Juan de Cabrera war ein Schriftsteller des 17. Jahrhunderts.

‘Juan de Cabrera was a writer of the 17th century.’

Juan de Cabrera stammt aus Spanien.

‘Juan de Cabrera is from Spain.’

Juan de Cabrera war kein richtiger Schriftsteller.

‘Juan de Cabrera was not a real writer.’

Keine der genannten Optionen gilt.

‘None of the specified options applies.’

(13)

Condition III: [Sent1 Sent2mediumLinkToTrigger TriggerSent]

Ich kenne mich mit der Literatur des 17. Jahrhunderts ziemlich gut aus. Letztes Jahr habe ich auch einen Vortrag darüber gehalten. Trotzdem habe ich noch nie von Juan de Cabrera gehört.

‘I am quite familiar with the literature of the 17th century. Last year, I also gave a lecture on this. Nonetheless I have never heard of Juan de Cabrera.’

Juan de Cabrera war ein Schriftsteller des 17. Jahrhunderts.

‘Juan de Cabrera was a writer of the 17th century.’

Wenn man einen Vortrag über die Literatur des 17. Jahrhunderts hält, kennt man alle Autoren dieser Zeit.

‘If you give a lecture on the literature of the 17th century, you know all the authors of that period.’

Juan de Cabrera war kein richtiger Schriftsteller.

‘Juan de Cabrera was not a real writer.’

Keine der genannten Optionen gilt.

‘None of the specified options applies.’

(14)

Condition IV: [Sent1 Sent2lowLinkToTrigger TriggerSent]

Ich kenne mich mit der Literatur des 17. Jahrhunderts ziemlich gut aus. Darum besitze ich auch viele Bücher der Schriftsteller aus dieser Zeit. Trotzdem habe ich noch nie von Juan de Cabrera gehört.

‘I am quite familiar with the literature of the 17th century. That’s why I own many books by writers from that period. Nonetheless I have never heard of Juan de Cabrera.’

Juan de Cabrera war ein Schriftsteller des 17. Jahrhunderts.

‘Juan de Cabrera was a writer of the 17th century.’

Wenn man viele Bücher der Schriftsteller aus dem 17. Jahrhundert besitzt, kennt man alle Autoren dieser Zeit.

‘If you own many books of the writers from the 17th century, you know all the authors of that time.’

Juan de Cabrera war kein richtiger Schriftsteller.

‘Juan de Cabrera was not a real writer.’

Keine der genannten Optionen gilt.

‘None of the specified options applies.’

4.2 Technical implementation

We recruited participants via the crowdsourcing platform Prolific^[8] and implemented our questionnaires using _magpie.^[9] The condition number (I – IV) from each group was assigned to the participants automatically at random, and the order of presentation of the items and fillers was shuffled.

For both exp1 and exp2, we initially collected judgments from 100 participants per experiment. After a manual check of all submissions, we rejected those with low-effort results, e.g., no correct Prolific ID submitted or very short completion time (under 4 min for reading the instructions and judging all items). Those participants were automatically replaced by Prolific with new candidates.

As a further measure of quality control, we used MACE (Hovy et al. 2013) to detect unreliable participants on the basis of the content of their submissions.^[10] We excluded 33 submissions with MACE score below 0.5, which left us with 78 submissions in exp1 and 89 in exp2. The resulting number of judgments per item ranges from 12 to 23 in exp1, and from 13 to 34 in exp2, due to random assignment of items to participants and MACE filtering.

5 Results

In the following, we present the results of exp1 and exp2 grouped by condition. An additional cross-conditional comparison of the results can be found in Appendix A.

We discuss two kinds of inference:

target inference based on the first sentence of the item text, which represents the long-distance argument of the concessive connective in conditions II – IV, and adjacent argument in condition I
adjacent inference based on the sentence adjacent to the connective; only relevant in conditions II – IV

5.1 Condition I

In exp1 (Figure 1), the target inference (a) was selected by the majority of participants (>60% in the prevailing number of items). The invalid inferences b and c, on the contrary, were selected only occasionally (‘Farm’, ‘Vaccine’ and ‘Parasol’).^[11] This demonstrates that in general our candidate inference induced by the concessive connective was recognized well by the readers. However, option d (no inference applies) was selected quite frequently, especially in the subjective items (ca. 40–70% per item). We regard this as a first pointer to the fact that material presented as subjective opinion poses a greater challenge for the reader, thus generating more heterogeneous judgments compared to facts.

Figure 1:

Results for condition I in exp1 across the item groups: a (target inference), b and c (distraction material), d (no inference selected). Factual items are marked in bold in this and all subsequent figures.

The results for exp2 (Figure 2) show a similar pattern, although its items project somewhat greater overall uncertainty – option d is more frequent in all but two items compared to exp1. In one half of the items the target inference predominates (>50% per item). In the other half, it is either a tie between the options a and d (‘Parasol’) or the prevailing majority of participants did not consider any of the offered inferences to be valid. A statistical model fitted to the relevant subset of the experimental data, i.e. condition I, revealed a significant preference for the target inference over option d in exp1 (Est. = 1.104, SE = 0.235, Z = 4.696, p = 0.0). However, no significance was determined for exp2 (Est. = 0.261, SE = 0.209, Z = 1.250, p = 0.211). Full models can be found in Appendix B (Table 2).^[12]

Figure 2:

Results for condition I in exp2 across the item groups: a (target inference), b and c (distraction material), d (no inference selected).

The generalization that factual items appear more straightforward than subjective ones is confirmed by the models reporting a significantly higher probability of selecting the target inference in factual than in subjective items in exp1 (Est. = 1.164, SE = 0.417, Z = 2.789, p = 0.005) and exp2 (Est. = 1.257, SE = 0.453, Z = 2.774, p = 0.006); for the full models see Appendix B (Table 4). The other way around, option d is significantly more frequent in subjective than in factual items in exp1 (Est. = 1.533, SE = 0.626, Z = 2.449, p = 0.014) and in exp2 (Est. = 1.344, SE = 0.407, Z = 3.30, p = 0.001); for the full models see Appendix B (Table 5).^[13]

5.2 Condition II

The results for condition II (high coherence of the intervening sentence with the trigger sentence) make the contrast between factual and subjective items even more prominent (Figures 3 and 4). Both the target (a) and adjacent (b) inference were selected in all items (except for ‘Farm’ in exp2); they both prevail in factual items, whereas in the subjective items option d (>60% per item) predominates. Furthermore, as in condition I, exp2 seems to be associated with higher uncertainty compared to exp1.

Figure 3:

Results for condition II in exp1 across the item groups: a (target inference), b (adjacent inference), c (distraction material), d (no inference selected).

Figure 4:

Results for condition II in exp2 across the item groups: a (target inference), b (adjacent inference), c (distraction material), d (no inference selected).

In exp1, in three out of four factual items, most participants preferred the target inference over the adjacent inference, although this difference is rather minor. In exp2, the picture is similar. In both experiments, ‘Parasol’ triggered the adjacent inference noticeably more frequent than the target inference, but this item sets itself apart from the rest, as mentioned above.

The probability of selecting the adjacent inference is significantly higher in factual than in subjective items in exp1 (Est. = 2.694, SE = 0.684, Z = 3.936, p = 0.0) and exp2 (Est. = 2.641, SE = 0.852, Z = 3.099, p = 0.002); for the full models see Appendix B (Table 3). Since in this condition, option b is inferred from the coherent intervening material, its frequent selection is not surprising. Interestingly, its presence did not prevent the participants from making the target inference; rather, they deemed both implications possible.^[14]

5.3 Condition III

In this condition (less plausible relation between the intervening and trigger sentence compared to condition II), all items in exp1 (Figure 5), except for ‘Barbecue’ and ‘Farm’, show a sharp reduction in the frequency of the adjacent inference compared to condition II. The same holds for exp2 (Figure 6), except for ‘Farm’. This is in line with our prediction that an inference based on the long-distance argument becomes more apparent in the absence of a plausible adjacent alternative. Similarly to previously discussed conditions, the gap between the factual and subjective items is apparent here as well.

Figure 5:

Results for condition III in exp1 across the item groups: a (target inference), b (adjacent inference), c (distraction material), d (no inference selected).

Figure 6:

Results for condition III in exp2 across the item groups: a (target inference), b (adjacent inference), c (distraction material), d (no inference selected).

5.4 Condition IV

In condition IV (lowest compatibility between the intervening and trigger sentence), we observe a clear case of target inference drawing in the items ‘Cabrera’ and ‘Vaccine’. It was chosen almost exclusively (>95%) in exp1 (Figure 7) and in >75% in exp2 (Figure 8), whereas the adjacent inference is absent completely. The items ‘Mexico’ and ‘Parasol’ show rather mixed results, but the target inference prevails over the adjacent inference there as well. In the subjective items, the target inference predominates over the adjacent inference for the most part, especially in exp1. However, in general, option d is most frequent.

Figure 7:

Results for condition IV in exp1 across the item groups: a (target inference), b (adjacent inference), c (distraction material), d (no inference selected).

Figure 8:

Results for condition IV in exp2 across the item groups: a (target inference), b (adjacent inference), c (distraction material), d (no inference selected).

6 Discussion

The results presented in the previous section reveal a clear pattern in several items: In condition I (test), the target inference was easily drawn with visible consensus among participants. In condition II (coherent intervening sentence), both the target and adjacent inferences were drawn by the majority of participants, with the target inference slightly predominating in most items. The fact that the adjacent inference was selected frequently confirms our hypothesis H1 that a semantically coherent adjacent sentence is readily picked as external argument. In conditions III (less coherent intervening sentence) and IV (incoherent intervening sentence), the target inference clearly predominates, which is even more pronounced in condition IV. These results prove to be statistically significant.

The described pattern primarily holds for the factual items, whereas the subjective items worked consistently less well: Participants often remained unsure and chose none of the listed inferences (option d) across conditions. Thus, across all experiments, the majority of option d choices occurred in the subjective items. Specifically the items ‘Barbecue’, ‘Farm’ and ‘Parasol’ appeared to be confusing. The participants were encouraged to leave comments after the experiment; 12 (in exp1) and 14 (in exp2) provided constructive feedback, indicating that separating world knowledge from the inferences based solely on the presented texts was the main difficulty in judgments.

In sum, both experiments showed that the parameter ‘fact versus subjective judgment’ (H2) plays an important role in readers’ decision making. Factual items were consistently more easily accepted as valid implications in the given contexts, which confirms our second hypothesis. However, we did not find any differences related to the choice of connective (trotzdem, dennoch) (H3). Rather, the observed differences seem to originate from individual variations in participants’ judgments and heterogeneity of the experiment items.

7 Conclusion

We investigated readers’ preferences on selecting non-/adjacent external arguments of adverbial anaphoric connectives trotzdem and dennoch by means of an experimental approach. The link between the external and internal arguments of these connectives is established by an interpretation process similar to that of anaphora resolution; accordingly, the external argument can be distanced from the internal argument by intervening material.

In a crowdsourcing experiment, we varied semantic conditions under which readers prefer choosing a long-distance over an adjacent argument. We explored the factors of semantic plausibility of the intervening material, subjective versus objective content, and the type of connective. Our approach was to ascertain these factors by means of presenting a short text followed by a multiple-choice question designed to reveal the reader’s inference triggered by the concessive connective; we conclude that this approach is generally suitable for our research questions.

We found semantic coherence to be a strong factor in participants’ inference-making decisions: Semantically incoherent intervening material contributes to more frequent selection of the inference based on a long-distance argument. However, in the latter case, the inference based on the adjacent argument is often co-selected. Factual items generally seem to be more clear to participants as they select the target inference there more readily, whereas with the subjective material, participants tend to select no inferences at all. This higher uncertainty can be seen as in alignment with the result found by Canestrelli et al. (2013): subjective connectives trigger processing disadvantages, compared to objective ones. The presence of an anaphoric morpheme in a connective was not a decisive factor in our data; this confirms the corresponding result of the Czech corpus study by Poláková and Mírovský (2019).

We used a crowdsourcing approach instead of a more-controlled lab experiment, and see as the main advantage that data can be gathered very quickly from a large number of participants. Our task involves semantic plausibility judgments, which are by nature subjective, and thus a larger number of participants is required for drawing generalizations. Furthermore, the ease of adding more participants facilitates exploring different conditions, which can be tested with differently-composed item sets. Nonetheless, as with any crowdsourcing experiment, quality control is of great importance; we used both automatic filtering and the MACE method (Hovy et al. 2013).

Possible improvements of the method include using stricter controls of readers’ understanding of instructions/experiment items, e.g. via a pretest with trial items and feedback; furthermore, the potential role of different modalities in the experiment items (e.g., the presence of Abtönungspartikeln) can be investigated. Another direction for future research is to test how the inference processes are influenced by the length of the intervening material and by its type, e.g. in terms of discourse relations, as studied by Poláková and Mírovský (2019) on Czech data.

Corresponding author: Yulia Clausen, Department of Linguistics, University of Potsdam, Potsdam, Germany, E-mail: yulia.clausen@uni-potsdam.de

Funding source: Deutsche Forschungsgemeinschaft

Award Identifier / Grant number: 323949969

Acknowledgments

We thank Anna Laurinavichyute for help with statistics, and Robin Schäfer and the anonymous reviewers for valuable comments and suggestions. This work was funded by Deutsche Forschungsgemeinschaft (DFG) – Project ID 323949969.

Appendix A: Cross-conditional comparison

In Table 1, we summarize the average per participant frequencies of selecting the target inference, adjacent inference, and both these inferences concurrently in conditions II, III and IV.^[15] Whereas the results discussed so far are based on the complete dataset, in this section we focus only on the cases where each of these options was selected without simultaneously selecting any other alternatives.

Table 1:

Average frequency of a certain inference type (and no other inference type simultaneously) selected by a participant per experiment and condition across all seen items.

	Target	Adjacent	Target + adjacent
exp1
Condition II – IV	32%	9%	10%
Condition II	17%	14%	21%
Condition III	38%	8%	4%
Condition IV	43%	5%	6%
exp2
Condition II – IV	31%	11%	8%
Condition II	18%	15%	15%
Condition III	38%	8%	2%
Condition IV	38%	9%	7%

The combined results of conditions II – IV show the same overall trend in both experiments: The target inference was selected ca. three times as often as the adjacent inference or both inferences together. According to the model fitted to the data from conditions II – IV, the probability of selecting the target inference is significantly higher compared to the other two options in exp1 (adjacent: Est. = −1.641, SE = 0.193, Z = −8.523, p = 0.0; target + adjacent: Est. = −1.515, SE = 0.186, Z = −8.140, p = 0.0) and exp2 (adjacent: Est. = −1.362, SE = 0.170, Z = −8.018, p = 0.0; target + adjacent: Est. = −1.751, SE = 0.190, Z = −9.240, p = 0.0). Full models are provided in Appendix B (Table 6). The frequencies of the adjacent inference and the cases of a joint selection of both inference types are similar and were not reported to differ significantly.

A comparison of the results for each condition reveals a difference between condition II on the one side and conditions III and IV on the other. In condition II, the three possible options are distributed fairly evenly, and we find no significant difference among them. In conditions III and IV, the target inference has the highest frequency, which is at least four times the frequency of the other options. These results are significant in both conditions and experiments; see Appendix B (Table 6). This confirms our hypothesis H1 that readers favor adjacency with concessive connective arguments, but only as long as semantic plausibility is assured. Higher semantic implausibility of the intervening material makes the inference based on the long-distance argument more readily accepted.

The target inference frequency in exp1 is more than twice as high in conditions III and IV as in condition II. This observation is corroborated by the model, which reports a significant effect of the condition on the target inference: it is more likely to be selected in condition III (Est. = 1.220, SE = 0.305, Z = 4.002, p = 0.0) or IV (Est. = 1.537, SE = 0.309, Z = 4.973, p = 0.0) than in condition II. The significance pertains also in exp2. The adjacent inference as well as the combination of the adjacent and target inferences are most frequent in condition II. The models report a lower probability of the adjacent inference being selected in condition III or IV; however, these results are only partially significant (for condition IV in exp1 and condition III in exp2). For the option target + adjacent the significance was reported for both conditions III and IV. Full models are provided in Appendix B (Table 7).

Appendix B: Summary of fixed effects

All models posit by-item and by-participant random intercepts unless stated otherwise.

Table 2:

Target inference versus option d (reference level for the independent variable inference type) in condition I.

	exp1				exp2
	Est.	SE	Z	p	Est.	SE	Z	p
Intercept	−0.580	0.167	−3.474	0.001	−0.174	0.148	−1.178	0.239
Target	1.104	0.235	4.696	0.000	0.261	0.209	1.250	0.211

Table 3:

Probability of the adjacent inference (dependent variable) with subjective as the reference level for the independent variable item type in condition II.

	exp1				exp2
	Est.	SE	Z	p	Est.	SE	Z	p
Intercept	−2.108	0.519	−4.062	0.000	−2.862	0.781	−3.665	0.000
Factual	2.694	0.684	3.936	0.000	2.641	0.852	3.099	0.002

Table 4:

Probability of selecting the target inference (dependent variable) with subjective as the reference level for the independent variable item type. For exp1 condition I, the by-item random intercept had to be dropped due to convergence issues.

	exp1				exp2
	Est.	SE	Z	p	Est.	SE	Z	p
Condition I
Intercept	0.009	0.260	0.036	0.972	−0.531	0.312	−1.704	0.088
Factual	1.164	0.417	2.789	0.005	1.257	0.453	2.774	0.006
Condition II
Intercept	−1.662	0.369	−4.511	0.000	−1.776	0.436	−4.072	0.000
Factual	2.188	0.497	4.407	0.000	1.730	0.540	3.206	0.001
Condition III
Intercept	−1.380	0.379	−3.640	0.000	−1.233	0.280	−4.403	0.000
Factual	2.022	0.524	3.860	0.000	1.658	0.385	4.308	0.000
Condition IV
Intercept	−1.118	0.571	−1.958	0.050	−1.519	0.505	−3.006	0.003
Factual	2.342	0.843	2.778	0.005	2.285	0.705	3.241	0.001

Table 5:

Probability of selecting option d (dependent variable) with factual as the reference level for the independent variable item type. For exp2 condition IV, the by-participant random intercept had to be dropped due to convergence issues.

	exp1				exp2
	Est.	SE	Z	p	Est.	SE	Z	p
Condition I
Intercept	−1.501	0.486	−3.087	0.002	−0.870	0.291	−2.990	0.003
Subjective	1.533	0.626	2.449	0.014	1.344	0.407	3.300	0.001
Condition II
Intercept	−2.349	0.575	−4.087	0.000	−1.097	0.387	−2.836	0.005
Subjective	3.682	0.833	4.419	0.000	2.559	0.677	3.777	0.000
Condition III
Intercept	−0.867	0.296	−2.932	0.003	−0.626	0.277	−2.258	0.024
Subjective	1.353	0.415	3.260	0.001	1.220	0.360	3.389	0.001
Condition IV
Intercept	−1.850	0.532	−3.476	0.001	−0.947	0.278	−3.409	0.001
Subjective	2.655	0.728	3.648	0.000	1.593	0.392	4.060	0.000

Table 6:

Probability of selecting a particular type of inference with target as the reference level for the independent variable inference type.

	exp1				exp2
	Est.	SE	Z	p	Est.	SE	Z	p
Condition II–IV
Intercept	−0.786	0.247	−3.184	0.001	−0.882	0.185	−4.767	0.000
Adjacent	−1.641	0.193	−8.523	0.000	−1.362	0.170	−8.018	0.000
Target + adjacent	−1.515	0.186	−8.140	0.000	−1.751	0.190	−9.240	0.000
Condition II
Intercept	−1.681	0.314	−5.345	0.000	−1.618	0.251	−6.448	0.000
Adjacent	−0.207	0.320	−0.648	0.517	−0.206	0.287	−0.720	0.472
Target + adjacent	0.270	0.299	0.904	0.366	−0.251	0.289	−0.869	0.385
Condition III
Intercept	−0.542	0.227	−2.384	0.017	−0.563	0.175	−3.212	0.001
Adjacent	−1.947	0.338	−5.768	0.000	−1.879	0.312	−6.029	0.000
Target + adjacent	−2.613	0.424	−6.170	0.000	−3.560	0.603	−5.908	0.000
Condition IV
Intercept	−0.320	0.311	−1.028	0.304	−0.559	0.231	−2.416	0.016
Adjacent	−2.824	0.413	−6.842	0.000	−1.884	0.309	−6.107	0.000
Target + adjacent	−2.696	0.395	−6.818	0.000	−2.201	0.340	−6.463	0.000

Table 7:

Effect of condition (independent variable with condition II as the reference level) on the selection frequency of the target inference, adjacent inference and target + adjacent option (each being a dependent variable in the respective model).

	exp1				exp2
	Est.	SE	Z	p	Est.	SE	Z	p
Target
Intercept	−1.863	0.437	−4.259	0.000	−1.834	0.393	−4.667	0.000
Condition III	1.220	0.305	4.002	0.000	1.175	0.278	4.233	0.000
Condition IV	1.537	0.309	4.973	0.000	1.201	0.274	4.376	0.000
Adjacent
Intercept	−1.842	0.299	−6.164	0.000	−2.346	0.351	−6.678	0.000
Condition III	−0.670	0.374	−1.789	0.074	−0.748	0.376	−1.986	0.047
Condition IV	−1.223	0.435	−2.812	0.005	−0.678	0.367	−1.846	0.065
Target + adjacent
Intercept	−2.022	0.601	−3.363	0.001	−2.631	0.528	−4.986	0.000
Condition III	−2.064	0.498	−4.140	0.000	−2.474	0.667	−3.707	0.000
Condition IV	−1.570	0.454	−3.460	0.001	−0.992	0.417	−2.380	0.017

References

Beck, Sigrid. 2020. Readings of scalar particles: Noch/still. Linguistics and Philosophy 43(1). 1–67. https://doi.org/10.1007/s10988-018-09256-1.Suche in Google Scholar

Canestrelli, Anneloes R., Willem M. Mak & Ted J. M. Sanders. 2013. Causal connectives in discourse processing: How differences in subjectivity are reflected in eye movements. Language and Cognitive Processes 28(9). 1394–1413. https://doi.org/10.1080/01690965.2012.685885.Suche in Google Scholar

Danlos, Laurence, Katerina Rysova, Magdalena Rysova & Manfred Stede. 2018. Primary and secondary discourse connectives: Definitions and lexicons. Dialogue and Discourse 9(1). 50–78. https://doi.org/10.5087/dad.2018.102.Suche in Google Scholar

Dipper, Stefanie & Heike Zinsmeister. 2012. Annotating abstract anaphora. Language Resources and Evaluation 46(1). 37–52. https://doi.org/10.1007/s10579-011-9160-1.Suche in Google Scholar

Hovy, Dirk, Taylor Berg-Kirkpatrick, Ashish Vaswani & Eduard Hovy. 2013. Learning whom to trust with MACE. In Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: human language technologies, 1120–1130. Atlanta, Georgia: Association for Computational Linguistics. Available at: https://www.aclweb.org/anthology/N13-1132.Suche in Google Scholar

Jolly, Eshin. 2018. Pymer4: Connecting R and Python for linear mixed modeling. Journal of Open Source Software 3(31). 862. https://doi.org/10.21105/joss.00862.Suche in Google Scholar

Köhne-Fuetterer, Judith, Heiner Drenhaus, Francesca Delogu & Vera Demberg. 2021. The online processing of causal and concessive discourse connectives. Linguistics 59(2). 417–448. https://doi.org/10.1515/ling-2021-0011.Suche in Google Scholar

Kolhatkar, Varada, Adam Roussel, Stefanie Dipper & Heike Zinsmeister. 2018. Anaphora with non-nominal antecedents in computational linguistics: A survey. Computational Linguistics 44(3). 547–612. https://doi.org/10.1162/coli_a_00327.Suche in Google Scholar

König, Ekkehard. 1988. Concessive connectives and concessive sentences: Cross-linguistic regularities and pragmatic principles. In John A. Hawkins (ed.), Explaining language universals, 145–166. London: Blackwell.Suche in Google Scholar

Lee, Alan, Rashmi Prasad, Aravind Joshi & Bonnie Webber. 2008. Departures from tree structures in discourse: Shared arguments in the Penn Discourse Treebank. In Proceedings of the constraints in discourse III workshop (CID), 61–68. Potsdam, Germany. https://www.researchgate.net/publication/228579529_Departures_from_tree_structures_in_discourse_Shared_arguments_in_the_penn_discourse_treebank.Suche in Google Scholar

Pasch, Renate, Ursula Brauße, Eva Breindl & Ulrich Herrmann Waßner. 2003. Handbuch der deutschen Konnektoren. Berlin/New York: Walter de Gruyter.10.1515/9783110201666Suche in Google Scholar

Poláková, Lucie & Jirí Mírovský. 2019. Anaphoric connectives and long-distance discourse relations in Czech. Computación y Sistemas 23(3). 711–717. https://doi.org/10.13053/CyS-23-3-3274.Suche in Google Scholar

Prasad, Rashmi, Aravind Joshi & Bonnie Webber. 2010. Exploiting scope for shallow discourse parsing. In Proceedings of the seventh international conference on language resources and evaluation (LREC’10). Valletta, Malta: European Language Resources Association (ELRA). Available at: http://www.lrec-conf.org/proceedings/lrec2010/pdf/935_Paper.pdf.Suche in Google Scholar

Prasad, Rashmi, Nikhil Dinesh, Alan Lee, Eleni Miltsakaki, Livio Robaldo, Aravind Joshi & Bonnie Webber. 2008. The Penn Discourse Treebank 2.0. In Proceedings of the sixth international conference on language resources and evaluation (LREC’08), 2961–2968. Marrakech, Morocco: European Language Resources Association (ELRA). Available at: http://www.lrec-conf.org/proceedings/lrec2008/pdf/754_paper.pdf.Suche in Google Scholar

Santana, Andrea, Wilbert Spooren, Dorien Nieuwenhuijsen & Ted J. M. Sanders. 2021. Do Spanish causal connectives vary in subjectivity? What crowdsourcing data reveal about native speakers’ preferences. Text & Talk 41(2). 211–237. https://doi.org/10.1515/text-2019-0102.Suche in Google Scholar

Schwarz, Florian. 2009. Two types of definites in natural language. University of Massachussetts Amherst: Open Access Dissertations. Available at: https://scholarworks.umass.edu/open_access_dissertations/122/.Suche in Google Scholar

Stede, Manfred & Arne Neumann. 2014. Potsdam commentary corpus 2.0: Annotation for discourse research. In Proceedings of the international conference on language resources and evaluation (LREC), 925–929. Reikjavik, Iceland: European Language Resources Association (ELRA). Available at: http://www.lrec-conf.org/proceedings/lrec2014/pdf/579_Paper.pdf.Suche in Google Scholar

Stede, Manfred & Yulia Grishina. 2016. Anaphoricity in connectives: A case study on German. In Proceedings of the workshop on coreference resolution beyond OntoNotes (CORBON 2016), 41–46. San Diego, California. Available at: https://aclanthology.org/W16-0706.pdf.10.18653/v1/W16-0706Suche in Google Scholar

Tonhauser, Judith, David Beaver, Craige Roberts & Mandy Simons. 2013. Toward a taxonomy of projective content. Language 89(1). 66–109. https://doi.org/10.1353/lan.2013.0001.Suche in Google Scholar

Webber, Bonnie, Matthew Stone, Aravind Joshi & Alistair Knott. 2003. Anaphora and discourse structure. Computational Linguistics 29(4). 545–587.10.1162/089120103322753347Suche in Google Scholar

Supplementary Material

The online version of this article offers supplementary material (https://doi.org/10.1515/lingvan-2021-0102).

Received: 2021-07-09

Accepted: 2022-03-21

Published Online: 2022-11-03

This work is licensed under the Creative Commons Attribution 4.0 International License.

Supplementary Material Details

Artikel in diesem Heft

https://doi.org/10.1515/lingvan-2021-0102

Schlagwörter für diesen Artikel

connective; discourse anaphora; discourse structure

Creative Commons

BY 4.0

Discourse connectives and their arguments: an experiment on anaphoricity in German

Artikel

Abstract

1 Introduction

1.1 Structural versus anaphoric connectives

1.2 Goals and structure of the paper

2 Previous research

2.1 Connectives and anaphoricity

2.2 Pronominal abstract anaphora

2.3 Other features of connectives

3 Approach

4 Experiment setup

4.1 Design

4.2 Technical implementation

5 Results

5.1 Condition I

5.2 Condition II

5.3 Condition III

5.4 Condition IV

6 Discussion

7 Conclusion

Acknowledgments

Appendix A: Cross-conditional comparison

Appendix B: Summary of fixed effects

References

Supplementary Material

Zusatzmaterial

Artikel in diesem Heft

Artikel in diesem Heft

Artikel in diesem Heft