Extraction from NP, frequency, and minimalist gradient harmonic grammar

Gereon Müller; Johannes Englisch; Andreas Opitz

doi:10.1515/ling-2020-0049

Artikel Open Access

Extraction from NP, frequency, and minimalist gradient harmonic grammar

Gereon Müller , Johannes Englisch und Andreas Opitz

Veröffentlicht/Copyright: 21. Juli 2022

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen Erkunden Sie dieses Fachgebiet

Aus der Zeitschrift Linguistics Band 60 Heft 5

Abstract

Extraction of a PP from an NP in German is possible only if the head noun and the governing verb together form a natural predicate. We show that this corresponds to collocational frequency of the verb-noun combinations in corpora, based on the metric of ΔP. From this we conclude that frequency should be conceived of as a language-external grammatical building block that can directly interact with language-internal grammatical building blocks (like triggers for movement and economy constraints blocking movement) in excitatory and inhibitory ways. Integrating frequency directly into the syntax is not an option in most current grammatical theories. However, things are different in Gradient Harmonic Grammar, a version of Optimality Theory where linguistic objects of various kinds can be assigned strength in the form of numerical values (weights). We show that by combining a Minimalist approach to syntactic derivations with a Gradient Harmonic Grammar approach of constraint evaluation, the role of frequency in licensing extraction from PP in German can be integrated straightforwardly, the only additional prerequisite being that (verb-noun) dependencies qualify as linguistic objects that can be assigned strength (based on their frequency).

Keywords: frequency; gradient harmonic grammar; islands; minimalism; Optimality Theory

1 Extraction from NP

It has often been noted that extraction from NP in German is subject both to structural and to lexical restrictions; cf. Fanselow (1987: Ch. 2), Grewendorf (1989: Ch. 2.8), Webelhuth (1988, 1992, Müller (1991, 1995, 2011, Sauerland (1995), De Kuthy and Meurers (2001), Schmellentin (2006), Ott (2011), and Frey (2015); also see Cattell (1976), Bach and Horn (1976), Chomsky (1977), Davies and Dubinsky (2003) and Koster (1987: Ch. 4) for English and Dutch, respectively.^[1] The examples in (1) illustrate extraction from NP in German. As shown by (1a)–(1b) and (1c)–(1d), wh-movement and scrambling can bring about extraction from NP; more generally, the operation is not confined to specific movement types. Furthermore, the operation can involve either complete PP complements of N, as in (1a)–(1c), or R-pronouns that act as complements of the P heads of complements of N, as in (1b)–(1d); the latter option is restricted to varieties of German that allow postposition stranding more generally (and, in the examples here, with a bare vocalic onset of the preposition in particular; see Riemsdijk (1978), Trissler (1993), Müller (2000), and Hein and Barnickel (2018), among others).^[2]

(1)

[pp ₁	Worüber ]	hat	der	Fritz	[np	ein	Buch	t₁ ]	gelesen ?
	about.what	has	the	Fritz_nom		a	book_acc		read

[dp ₁	Wo ]	hat	die	Maria	[np	ein	Buch	[pp	über	t₁ ]]	gelesen ?
	what	has	the	Maria_nom		a	book_acc		about		read

dass	[pp ₁	darüber ]	keiner	je	[np	ein	Buch	t₁	]	gelesen	hat
that		about that	no-one_nom	ever		a	book_acc			read	has

dass

[dp ₁

da ]

keiner

[np

ein

Buch

[pp

t₁

über ]]

gelesen

hat

that

no-one_nom

ever

book_acc

about

read

has

Among the structural factors restricting the operation we take to be the following. First, extraction from NP is not possible with external arguments (of transitive or unergative verbs); cf. (2).

(2)

*[pp ₁	Worüber ]	hat	[np	ein	Buch	t₁	]	den	Fritz	beeindruckt ?
	about.what	has		a	book_nom			the	Fritz_acc	impressed

Next, extraction from NP cannot take place with indirect objects bearing dative case (cf. (3a)), even if the verb as such allows extraction from NP (cf. (3b), where extraction from the direct object occurs in a ditransitive, dative-accusative environment).

(3)

*[pp ₁	Worüber ]	hat	man	[np	einem	Buch	t₁	]	einen	Preis
	about what	has	one_nom		a	book_dat			an	award_acc
	gegeben ?
	given

[pp ₁	Worüber ]	hat	man	der	Maria	[np	ein	Buch	t₁	]	gegeben ?
	about what	has	one_nom	the	Maria_dat		a	book_acc			given

Third, extraction from a definite NP typically yields degraded results; this specificity effect (cf. Mahajan 1992; Webelhuth 1992) is shown in (4), which forms a minimal pair with the non-specific example in (1a).

(4)

?*[pp ₁	Worüber ]	hat	der	Fritz	[np	das	Buch	t₁	]	gelesen ?
	about what	has	the	Fritz_nom		the	book_acc			read

A fourth observation is that extraction from NP is blocked when there is a possessor NP present (either pre-nominally or post-nominally); see (5).

(5)

*[pp ₁	Worüber ]	hat	die	Maria	[np	Fritzens/eines	Mannes Buch	t₁	]
	about what	has	the	Maria_nom		Fritz_gen/a man_gen	book_acc
	gelesen ?
	read

Finally, freezing effects occur if a direct object which as such licenses extraction undergoes movement itself. Thus, (6) illustrates that an NP blocks extraction if it is scrambled; compare (6a) with (6b).^[3]

(6)

*[pp ₁	Worüber ]	hat	[np ₂	ein	Buch	t₁	]	keiner	t₂	gelesen ?
	about what	has		a	book_acc			no-one_nom		read

[pp ₁	Worüber ]	hat	keiner	[np ₂	ein	Buch	t₁	]	gelesen ?
about.what		has	no-one_nom		a	book_acc			read

All of these structural restrictions on extraction from NP can be derived without too much ado under current approaches to movement, based on (whatever derives) the Condition on Extraction Domain (Chomsky 1986; Huang 1982) and the Minimal Link Condition (Chomsky 2001, 2008); see, e.g., Müller (2011) for an account of these phenomena that relies on Chomsky’s (2001) Phase Impenetrability Condition (PIC).^[4]

In addition to these structural factors, extraction from NP in German is conditioned by lexical factors. Thus, whereas a verb like lesen ‘read’ in (1a) (repeated here as (7a)) permits extraction from the NP headed by Buch ‘book’, a verb like stehlen ‘steal’ does not, in an identical environment (see (7b)). Note that syntactically, the two verbs otherwise behave the same (they take an internal theme argument as a direct object and an external agent argument as a subject, they assign accusative to the direct object, etc.). What is more, as observed by Sauerland (1995), not only is the nature of the verb relevant: by keeping the verb identical and modulating the head noun of the object, extraction can also become impossible; see (7c), where Verlautbarung ‘official statement’ replaces Buch ‘book’ in the presence of lesen ‘read’. As one might expect, a combination as in (7d) will also block extraction from NP: Here Verlautbarung is the head noun and stehlen is the governing verb.

(7)

[pp ₁	Worüber ]	hat	der	Fritz	[np	ein	Buch	t₁ ]	gelesen ?
	about.what	has	the	Fritz_nom		a	book_acc		read

*[pp ₁	Worüber ]	hat	der	Fritz	[np	ein	Buch	t₁ ]	gestohlen ?
	about.what	has	the	Fritz_nom		a	book_acc		stolen

?*[pp ₁	Worüber ]	hat	der	Fritz	[np	eine	Verlautbarung	t₁ ]
	about.what	has	the	Fritz_nom		an	official.statement_acc
	gelesen ?
	read

*[pp ₁	Worüber ]	hat	der	Fritz	[np	eine	Verlautbarung	t₁ ]
	about.what	has	the	Fritz_nom		an	official.statement_acc
	gestohlen ?
	stolen

This effect is not movement type-specific. As shown in (8) (where (8a) = (1c)), scrambling of a PP (or of a bare R-pronoun, in the varieties of German that permit this, as in (1d)) instantiates the same pattern (see Müller 1995; Webelhuth 1992).

(8)

dass	[pp ₁	darüber ]	keiner	je	[np	ein	Buch	t₁	]	gelesen	hat
that		about that	no-one_nom	ever		a	book_acc			read	has

*dass	[pp ₁	darüber ]	keiner	je	[np	ein	Buch	t₁	]	gestohlen	hat
that		about that	no-one_nom	ever		a	book_acc			stolen	has

?*dass	[pp ₁	darüber ]	keiner	je	[np	eine	Verlautbarung
that		about that	no-one_nom	ever		an	official.statement_acc
t₁	]	gelesen	hat
		read	has

*dass	[pp ₁	darüber ]	keiner	je	[np	eine	Verlautbarung
that		about that	no-one_nom	ever		an	official.statement_acc
t₁	]	gestohlen	hat
		stolen	has

The conclusion that suggests itself in view of this kind of evidence is that for extraction from NP of a PP complement (or an R-pronoun contained in it) to be legitimate in German, V and N must enter a tight relationship; they must form a natural predicate, i.e., a dependency of two lexical items that qualifies as entrenched.

It is not a priori clear how this condition can be implemented in grammatical theory. Following Bach and Horn’s (1976) proposal for English, Fanselow (1987) assumes that extraction from NP is in fact never possible in German; rather, data of the kind in (8a) are the result of a pre-syntactic reanalysis rule that makes it possible for the verb to take not just NP, but also PP directly as arguments, so that PP does not have to leave NP in (8a) in the first place. Whereas a reanalysis approach along these lines has sometimes been been adopted by subsequent studies (cf., e.g., De Kuthy 2001; De Kuthy and Meurers 2001), severe problems have been pointed out for it that, in our view, make such an analysis untenable (see Webelhuth 1988; Fanselow 1991; Müller 1998; and Schmellentin 2006, among others). For one thing, in the absence of a theory of general restrictions on reanalysis rules, it is completely unclear why reanalysis cannot involve a verb and agent (subject) or goal (indirect object) arguments; recall (2), (3a). Furthermore, on this view, it is a mystery why specificity and possessor intervention effects should arise if there is no extraction from NP in the first place; see (4), (5). Next, if PP does not have to undergo extraction from NP in the well-formed examples discussed so far, how can it be that NP scrambling creates a typical freezing effect (as in (6a) versus (6b))?

Now, it is known that verbs like lesen ‘read’ in (7a), in contrast to verbs like stehlen (‘steal’) in (7b), may occur in constructions in which the PP is present but the NP is either completely absent or realized only as a pronoun. This is generally taken to be the strongest argument in support of the base-generation approach to extraction from NP; see (9a) versus (9b).

(9)

dass	Fritz	(?es)	[pp	über	die	Liebe	]	gelesen	hat
that	Fritz	it		about	the	love		read	has

*dass	Fritz	(es)	[pp	über	die	Liebe	]	geklaut	hat
that	Fritz	it		about	the	love		stolen	has

However, verbs like geben (‘give’) in German behave like lesen in that they permit extraction from a direct object NP (cf. (3b)), but behave like stehlen (‘steal’) in that they do not allow the NP to be pronominal or dropped (cf. (10a)). What is more, as shown in (7c), lesen (‘read’) does not permit extraction if the head noun of its complement is Verlautbarung ‘official.statement’, but NP can be pronominal or zero in this context; see (10b). Thus, the correlation breaks down, in both directions (there is the option of extraction without the option of pronominal/zero realization of NP, and there is the option of pronominal/zero realization of NP without the option of extraction); and with it goes the argument for reanalysis.^[5]

(10)

*dass	man	(es)	der	Maria	[pp	über	die	Liebe	]	gegeben	hat
that	one_nom	it_acc	the	Maria_dat		about	the	love		given	has

dass	der	Fritz	(sie)	[pp	über	die	Liebe	]	gelesen	hat
that	the	Fritz_nom	she_acc		about	the	love		read	has

To conclude, reanalysis as a tool to account for extraction from NP is problematic from an empirical point of view. Furthermore, as noted above, there is no theory of what a reanalysis rule can and cannot look like; more generally, the concept emerges as dubious from a conceptual point of view, too (see, e.g., Baltin and Postal 1996: 135–141).

At this point, two basic questions need to be addressed as regards the influence of lexical factors on extraction from NP. The first question is how it can be determined whether a V and an N can form a natural predicate; i.e., how this lexical factor can be measured. And the second question is how this information then licenses or blocks the grammatical process of extraction, i.e., how the lexical factor, once its nature is determined, can interact with the building blocks of grammar that are involved in syntactic movement.^[6] In a nutshell, the answers we will give are that the concept of a natural predicate corresponds to collocational frequency, which can be encoded as a numerical value for V–N dependencies (Section 2); and that an approach to syntax that combines Minimalist derivations with constraint interaction in a Gradient Harmonic Grammar approach makes it possibe to implement the lexical factor, by letting the numerical values capturing different collocational strengths of V–N dependencies interact with constraints that trigger and block extraction (Section 3).

2 Frequency

2.1 ΔP

In what follows, we will pursue the hypothesis that frequency is the decisive factor in establishing a natural predicate, i.e., an entrenched V–N dependency, in the cases of extraction from NP that we are interested in. A basic premise is that the absolute frequencies of individual lexical items in corpora will not be particularly informative in this context, and that the same goes for the absolute frequencies of V–N collocations. Rather, what is needed is a more fine-grained approach to frequency that is based on how well the two lexical items in a V–N dependency predict each other. One such measure that has been proposed is collostructional strength (see Gries et al. 2005; Gries and Stefanowitsch 2004; Stefanowitsch 2009). More recently, Gries (2013) has suggested to employ the measure of ΔP, and it is this concept that we will make use of in what follows.^[7] ΔP _X|Y measures how well the presence of some item Y predicts the presence or absence of some other item X. ΔP is defined as in (11).

(11)

ΔP (Gries 2013, 143):

ΔP = p(outcome|cue = present) – p(outcome|cue = absent)

Here, p(X|Y = present) captures the probability of the outcome X in the presence of the cue Y; p(X|Y = absent) is the probability of the outcome X in the absence of the cue Y; and to determine ΔP _X|Y, the latter is subtracted from the former. The values of ΔP range from −1.0 to 1.0; they are interpreted as follows:

ΔP _X|Y approaching 1.0: Y is a good cue for the presence of X
ΔP _X|Y approaching −1.0: Y is a good cue for the absence of X
ΔP _X|Y approaching 0.0: Y is not a good cue for the presence or absence of X

Note that this relationship is asymmetric. An element predicting another element well is not necessarily well-predicted by that element. This means that for every pair of X and Y, there are two values ΔP _X|Y and ΔP _Y|X. As an illustration, let us look at how ΔPs are determined for a V–N dependency involving kaufen ‘buy’ and Buch (‘book’) on the basis of the frequencies of the co-occurrences. To calculate ΔP _X|Y, we first search the corpus for the number of all cases where X and Y co-occur, where only one of the elements occurs, and where none of the elements occur. (12) shows such a co-occurrence table for the pair Buch kaufen ‘book buy’.^[8]

(12)

Co-occurrences of Buch ‘book’ and kaufen ‘buy’

This kind of information can be used to calculate ΔP _X|Y by taking the difference between the probability of X given the presence of Y and the probability of X given the absence of Y. Suppose that X = Buch and Y = kaufen. ΔP _X|Y (= ΔP _Buch|kaufen) is then determined as shown in (13); it shows how well kaufen predicts Buch in the corpus.

(13)

Δ P Buch | kaufen = p ( Buch | kaufen present ) − p ( Buch | kaufen absent ) = Buch and kaufen present kaufen present − Buch present and kaufen absent kaufen absent = 144 8717 − 27063 5810856 ≈ 0.01186

In the same way, ΔP _Y|X (= ΔP _kaufen|Buch) based on the data in (12) is calculated as shown in (14). The resulting value indicates how well Buch predicts kaufen.

(14)

Δ P kaufen | Buch = p ( kaufen | Buch present ) − p ( kaufen | Buch absent ) = kaufen and Buch present Buch present − kaufen present and Buch absent Buch absent = 144 27207 − 8573 5792366 ≈ 0.00381

By comparing the two ΔPs, it becomes evident that kaufen is a somewhat better predictor for Buch than Buch is for kaufen: The likelihood of a buying event involving books (rather than, say, bikes or guitars) is greater (ΔP = 0.01186) than the likelihood that a book is involved in a buying event (rather than, say, a reading or burning event, or some other scenario in which books may show up; ΔP = 0.00381).^[9]

2.2 Corpus

The data in our survey come from the core corpus of Digitales Wörterbuch der deutschen Sprache (DWDS; see Geyken 2007). The DWDS is a freely searchable corpus consisting of about 5.8 m sentences in the German language. It contains a balanced mix of fictional, scientific, functional, and newspaper texts from the twentieth century.

The list in (15) shows the queries used to elicit the counts for nouns, verbs, and noun–verb pairs. Ideally, one would like to query the corpus for every instance where a given noun is the direct object of a given verb (recall that this is the only environment in which extraction from NP can be possible, given our characterization of the empirical evidence in the previous section). However, while the corpus is lemmatised and tagged for part-of-speech, it does not encode dependencies. Hence, without an additional step of dependency parsing applied to corpora, the queries can only ever be approximations.

(15)

Query: Buch with $p=NN

Searches for the lemma Buch with the part-of-speech tag NN (common nouns)

Query: kaufen with $p=VV*

Searches for the lemma kaufen with a part-of-speech tag starting with VV (verbs)

Query: near (Buch with $p=NN, kaufen with $p=VV*, 3)

Searches for a sentence with the noun Buch and the verb kaufen with zero to three tokens between them.

The query in (15c) attempts to find noun-verb pairs by looking for sentences where the noun and the verb are close to each other. This avoids false positives as in (16a) (where Buch ‘book’ and gekauft ‘bought’ are clause-mates in a VP coordination construction, but Buch is the (head of the) object of gelesen ‘read’, not of gekauft). However, it also introduces false negatives as in (16b), where Buch is the (head of the) object of gelesen, but is separated from it by more than three items as a consequence of having undergone topicalization to the clause-initial (‘Vorfeld’) position.

(16)

Fritz	hat	ein	Buch	gelesen	und	ein	Lesezeichen	gekauft.
Fritz	has	a	book	read	and	a	bookmark	bought
‘Fritz read a book and bought a bookmark.’

Das	Buch	hat	der	Fritz	in	der	Innenstadt	gekauft.
the	book	has	the	Fritz	in	the	city.centre	bought
‘Fritz bought the book in the city centre.’

Cases like (16b) can only pose a potential problem if there is reason to assume that object topicalization also (i.e., like extraction from NP) shows asymmetries depending on how close the relation between the verb and the object’s head noun is, such that, e.g., an object headed by Buch ‘book’ tends to undergo topicalization more often, or more easily (or, in fact, less often, or less easily) in the presence of lesen ‘read’ than in the presence of stehlen ‘steal’. We are not aware of any claims in the literature that would go in this direction, and will assume, here and henceforth, that there is no such effect. Thus, false negatives like (16b) generated by object movement can be ignored, assuming that they affect all kinds of V–N dependencies in the same way.^[10]

2.3 Results

We have determined both ΔP values for every V–N pair where N is a noun in (17a) and V is a verb in (17b) (see the Appendix for the raw data). This results in high-frequency collocations like Buch lesen ‘book read’, combinations of low-frequency pairs where this intuitively seems to be ‘the noun’s fault’, like Verlautbarung lesen ‘official.statement read’, and combinations where it is the verb that is responsible for the low frequency, as in Buch werfen ‘book throw’.

(17)

Nouns

Bericht ‘report’, Buch ‘book’, Geschichte ‘story/history’, Roman ‘novel’, Verlautbarung ‘official statement’

Verbs

aufschlagen ‘open (book)’, kaufen ‘buy’, klauen ‘steal (coll.)’, lesen ‘read’, öffnen ‘open’, schreiben ‘write’, stehlen ‘steal’, verfassen ‘write (book)’, verkaufen ‘sell’, vorlesen ‘read (to sb.)’, weglegen ‘put away’, werfen ‘throw’

As shown in (19), the ΔPs for Buch lesen are both higher than the ΔPs for Buch stehlen.

(18)

ΔPs for two V–N pairs:
N	V	extraction from NP	ΔP _V\|N	ΔP _N\|V
Buch ‘book’	lesen ‘read’	yes	0.02580	0.03441
Buch ‘book’	stehlen ‘steal’	no	0.00007	0.00093

This is in full accordance with the fact that the V–N dependency Buch lesen permits extraction from the NP whereas the V–N dependency Buch stehlen does not; recall the examples in (7a) and (7b) above. As shown by the ΔPs for some other V–N combinations in (19), this result can be generalized: The higher a ΔP is, the more likely it is that extraction is possible.

(19)

ΔPs for more V–N pairs:
N	V	extr./NP	ΔP _V\|N	ΔP _N\|V
Buch ‘book’	schreiben ‘write’	yes	0.02154	0.01589
Buch ‘book’	kaufen ‘buy’	yes	0.00381	0.01186
Bericht ‘report’	schreiben ‘write’	yes	0.00148	0.00055
Buch ‘book’	weglegen ‘put away’	no	0.00031	0.08271
Buch ‘book’	öffnen ‘open’	no	−0.00124	−0.00283
Bericht ‘report’	werfen ‘throw’	no	−0.00278	−0.00218
Verlautbarung ‘off.st.’	stehlen ‘steal’	no	−0.00037	−0.00007

These data also shed some light on which ΔP value may be most relevant for establishing the strength of a V–N dependency (and, consequently, for determining the option of extraction from NP). A priori, three options suggest themselves: ΔP _V|N, ΔP _N|V, and the arithmetic mean of these two values. Closer inspection reveals that ΔP _N|V is not fully reliable. On the one hand, there are cases like Buch weglegen ‘book put.away’ where ΔP _N|V is fairly high (i.e., weglegen ‘put.away’ is a reasonably good predictor for the presence of Buch ‘book’), but extraction is not straightforwardly possible in this environment (cf. *Worüber hat der Fritz ein Buch weggelegt? ‘about.what has the Fritz_nom a book_acc put.away’). On the other hand, there are also cases like Bericht schreiben ‘report write’ where ΔP _N|V is quite low (i.e., schreiben ‘write’ is not a good predictor for the presence of Bericht ‘report’), but extraction is easily possible (cf. Worüber hat der Fritz einen Bericht geschrieben? ‘about.what has the Fritz_nom a report_acc filed’). In contrast, ΔP _V|N makes the right predictions in these cases: Bericht ‘report’ is a good predictor for schreiben ‘write’, and Buch ‘book’ is not such a good predictor for weglegen ‘put.away’. This leaves ΔP _V|N and the arithmetic mean of the values as the remaining options. In what follows, we will settle for ΔP _V|N alone. Note that this introduces an asymmetry: Whether a V–N dependency qualifies as a natural predicte or not depends on how well the noun can predict the verb.^[11]

2.4 Scaling

In the next section, we will implement the frequency-based approach to extraction from NP in German in a version of Gradient Harmonic Grammar (see Smolensky and Goldrick 2016). Standardly, numerical strength values assigned to linguistic objects in this grammatical theory are taken to be within the interval [0, 1].^[12] We will therefore scale up numerical values of the type found for ΔP in (18) and (19), by min-max normalization (feature scaling), so that they end up squarely in the [0, 1] interval. Thus, the data can be normalized into a range of [0, 1] using the formula X ′ = X − min ( X ) ( max ( X ) − min ( X ) ) . For the V–N dependencies in (18) and (19), this produces the values in (20). We will adopt these normalized values for the theoretical modelling in the next section.

(20)

Strength assignments for V–N dependencies:
N	V	extr./NP	ΔP _V\|N	normalized
Buch ‘book’	lesen ‘read’	yes	0.02580	0.6272
Buch ‘book’	schreiben ‘write’	yes	0.02154	0.5441
Buch ‘book’	kaufen ‘buy’	yes	0.00381	0.1982
Bericht ‘report’	schreiben ‘write’	yes	0.00148	0.1527
Buch ‘book’	weglegen ‘put away’	no	0.00031	0.1300
Buch ‘book’	stehlen ‘steal’	no	0.00007	0.1253
Verlautbarung ‘off.st.’	stehlen ‘steal’	no	−0.00037	0.1167
Buch ‘book’	öffnen ‘open’	no	−0.00124	0.0996
Bericht ‘report’	werfen ‘throw’	no	−0.00278	0.0695

(20) shows that there is a correlation between a higher normalized ΔP value and the option of extraction. In addition, the plot in (21) reveals that the cut-off point with respect to extraction is not so much between high-frequency and low-frequency pairs of N and V, but rather within the low-frequency area, at a strength of 0.14 (or thereabout). This picture persists when the complete set of data is taken into account (cf. the Appendix).

(21)

3 Minimalist gradient harmonic grammar

3.1 The gist of the analysis

In this section, we show how the different strength values of V–N dependencies correctly predict the options of extraction from direct object NPs in German, assuming (i) a gradient harmonic grammar approach where both violable syntactic constraints and linguistic expressions (like V–N dependencies) are associated with weights, (ii) a minimalist approach to syntactic derivations in which both intermediate and final movement steps target the left edge of a verbal phase (a specifier of v), and (iii) an approach to iterative optimization based on harmonic serialism, where optimization domains are small and the amount of information that can be taken into account during each optimization is limited. However, before we address these issues in detail, let us focus on the gist of the analysis.

In all cases of extraction from NP, there is a dependency between a verbal head X and a noun Y that intervenes between the base position (α _i+1) and (what is typically) the target position (intermediate or final) at the left edge of the verbal phase (α _i); see (22).

(22)

α _i … (Y) … X … (Y) … α _i+1

At the heart of the analysis is a well-established locality constraint, the Condition on Extraction Domain (CED), which we take to be violable and weighted. If YP (the maximal projection of Y) is not a complement of X, ungrammaticality will invariably arise with extraction from NP (because of a violation of the CED that will always emerge as fatal); this covers the structural restrictions discussed in Section 1. If, however, YP is a complement of X, the CED can be satisfied, and it is at this point that the weight of the X–Y dependency becomes crucial: CED satisfaction can bring about a reward, and this reward is required by each case of extraction from NP because the general constraint blocking movement (Economy Condition) as such has slightly more weight than the general constraint forcing movement (Merge Condition), so for the scales to be tipped in favor of the movement candidate, the derivation cannot do without a reward from the CED – and only if the reward for CED satisfaction generated by the X–Y dependency’s weight is sufficiently high will extraction from NP (i.e., movement from α _i+1 to α _i) be legitimate. This covers the lexical variation with extraction from NP, i.e., the natural predicate effect. In what follows, we flesh out this analysis, starting with Gradient Harmonic Grammar.

3.2 Gradient Harmonic Grammar

Harmonic Grammar (see Smolensky and Legendre 2006; Pater 2016) is a version of Optimality Theory (see Prince and Smolensky 1993) that abandons the strict domination property (according to which no number of violations of lower-ranked constraints can outweigh a single violation of a higher-ranked constraint) and replaces harmony evaluation by constraint ranking with harmony evaluation based on weight assignment to constraints. The central concept of harmony is defined in (23) (see Pater 2009).

(23)

Harmony:

H = ∑ k = 1 K s k w k (w_k = weight of a constraint; s_k = violation score of a candidate)

According to (23), the weight of a constraint is multiplied with the violation score of a candidate for that constraint, and all the resulting numbers are added up, thereby determining the harmony score of a candidate. An output qualifies as optimal if it is the candidate with maximal harmony in its candidate set; i.e., if it has the highest harmony value.

Gradient Harmonic Grammar (see Smolensky and Goldrick 2016), in turn, is an extension of Harmonic Grammar where it is not just the constraints that are given weights; rather, symbols in linguistic representations are also assigned weights (between 0 and 1). This gives rise to a very straightforward way of associating strength with linguistic objects. So far, most of the work on Gradient Harmonic Grammar has been in phonology; but cf. Smolensky (2017), Putnam and Schwarz (2017), Lee (2018), Müller (2019), and Schwarz (2020) for recent applications in syntax.^[13]

3.3 Minimalist derivations

We adopt a minimalist setting (cf. Chomsky 2001), according to which syntactic structure is created incrementally by external and internal Merge operations, where the former are responsible for basic structure-building and the latter bring about structure-building by movement. We assume that syntactic movement is restricted by the inviolable Phase Impenetrability Condition (PIC; cf. Chomsky 2001, 2008) in (24).^[14]

(24)

Phase Impenetrability Condition (PIC; inviolable):

The domain of a head X of a phase XP is not accessible to operations outside XP; only X and its edge are accessible to such operations.

This implies that movement must take place successive-cyclically, via intermediate edge domains (i.e., specifiers) of phases, where the clausal spine is composed of CP, TP, vP, and VP, of which CP and vP qualify as phases. (We follow Chomsky in assuming that NP/DP is not a phase). Next, suppose that all Merge operations, including movement steps to intermediate phase edges, are triggered by designated features (cf. Chomsky 1995, 2001; Collins and Stabler 2016; Georgi 2017; Pesetsky and Torrego 2006; Urk 2015); this can be enforced by the Merge Condition (MC) in (25) (see Heck and Müller (2013) for the [•F•] notation for features that trigger external or internal Merge), which we assume to be a violable, weighted constraint (in contrast to the PIC).

(25)

Merge Condition (MC: violable, weighted):

For all features [•F•], [•F•] triggers Merge of an XP with a matching [F].

Next, there is a counteracting constraint that prohibits structure-building; for present purposes, it can be assumed that this role is played by the Economy Condition (EC) in (26) (see Grimshaw 1997; Legendre et al. 2006; also see Grimshaw (2006) for an attempt at a yet more principled approach). Like MC, EC is violable, and associated with a weight.

(26)

Economy Condition (EC: violable, weighted):

Merge is prohibited.

Given this state of affairs, for now it looks as though the relative weights of MC and EC decide on whether Merge can apply or not. In a pure Harmonic Grammar approach, this may indeed be true (abstracting away from the potential influence of other constraints for the time being). However, in Gradient Harmonic Grammar, things are somewhat more flexible since the varying strength of the [•F•] features that MC is formulated as a restriction for lead to different degrees of violation of this constraint. A [•F•] feature with a weight of 0.2 will trigger a less severe violation of MC in an output where movement does not take place than a [•F•] feature with a weight of 0.6, and this may distinguish between a violation of MC (in a candidate that does not carry out movement) that is optimal and one that is not. As shown in Müller (2019), asymmetries between different kinds of Merge operations – in particular, between different types of movement – can be derived in such an MC/EC-based approach by postulating different weights (like 0.2 vs. 0.6) of the individual [•F•] features that trigger the operations. With stronger features, an MC violation may become fatal that may be tolerable with weaker features; stronger features may thus ensure that structure-building (or movement) takes place where weaker features do not. This way, it can be derived that, e.g., wh-movement (with a strong trigger [•wh•] on C) can leave a CP in German whereas scrambling (with a weak trigger [•Σ•] on v) cannot do so. That said, as shown in Section 1, extraction from NP in German does not distinguish between wh-movement and scrambling (or, for that matter, topicalization, relativization, or others movement types that exist in German); cf. Webelhuth (1992), Müller (1995). For this reason, to keep things simple, we will postulate in what follows that a violation of MC is always of strength −1.0, independently of which movement type is involved.

Against this background, two questions need to be answered to provide an account of extraction from NP in German. First, how does optimization of Merge operations proceed technically? And second, how can the (frequency-based) weights assigned to V–N dependencies be integrated as a factor that may enable or block extraction from NP in the presence of MC and EC? We address the two issues in turn.

3.4 Optimization

There are two general possibilities to model the interaction of minimalist derivations and harmony evaluation. A first option is that all syntactic operations (which, by assumption, take place in the Gen component of the grammar) precede a single, parallel step of harmony evaluation (H-Eval). This then qualifies as a standard case of harmonic parallelism (see Prince and Smolensky 2004), and it has been explicitly pursued by, e.g., Broekhuis (2006) and Broekhuis and Woolford (2013). Another option is that Merge operations (Gen) and harmony evaluation (H-Eval) alternate constantly. On this view, syntactic operations and selection of the most harmonic (optimal) output are intertwined. This model is an instance of harmonic serialism (see Prince and Smolensky 2004). It has been adopted in, e.g., Heck and Müller (2013) and Murphy (2017) (also see McCarthy [2010] and contributions in McCarthy and Pater [2016] for some applications in phonology). In what follows, we adopt an approach based on harmonic serialism. Harmonic serialism in syntax can be viewed as a procedure that is actually little more than a reasonably precise specification of standard minimalist approaches that incorporate a concept of the best next step at any given stage of the derivation (see, e.g., Chomsky [1995, 2001] on Merge over Move). The mechanics of harmonic serialism are laid out in (27).

(27)

Harmonic serialism:

Given some input I_i the candidate set CS_i = {O_i1 O_i2 … O_in} is generated by applying at most one operation to I_i.

The output O_ij with the best constraint profile is selected as optimal.

O_ij forms the input I_ij for the next generation step producing a new candidate set CS_j = {O_ij1, O_ij2, … O_ijn}.

The output O_ijk with the best constraint profile is selected as optimal.

Candidate set generation stops (i.e. the derivation converges) when the output of an optimization procedure is identical to the input (i.e. when the constraint profile cannot be improved anymore).

In the present context, the main reason for adopting a harmonic serialist approach is that, in interaction with the PIC, it directly implements strict locality of constraint interaction: Since all competing outputs are separated from the input by at most one elementary operation, it can be ensured that there is no danger that processes taking place in potentially radically different areas of the sentence can interact with the process at issue in unwanted and unforeseen ways; in line with this, harmony evaluation based on weights assigned to constraints and to linguistic expressions remains feasible throughout since the number of interacting weights remains small.

3.5 Integrating dependencies

Finally, it needs to be clarified how the optimization of structures involving extraction from NP can be made sensitive to ΔP _V|N-based weight assignments to V–N dependencies. To this end, we postulate that X–Y dependencies relating two heads can function as syntactic primitives that constraints can refer to (and that they can restrict). This assumption has been made earlier in a number of otherwise quite different approaches, and sometimes with a different label attached to X–Y (like chains, catenae, or selections instead of dependencies); see, e.g., O’Grady (1998), Osborne et al. (2012), Manzini (1995), Bowers (2017), and Bruening (2020a, 2020b). For present purposes, we assume that dependencies (in this technical sense) are always two-membered (X–Y), and that they are characterized by a selection relation (X selects Y).^[15] As detailed above, we assume that ΔP _X|Y determines the strength of an X–Y dependency. And we would like to suggest that the constraint where strength of dependencies plays a crucial role in the theory of extraction is the Condition on Extraction Domain (CED; see Huang 1982; Chomsky 1986; Cinque 1990) in (28).

(28)

Condition on Extraction Domain (CED; violable, weighted):

For all X–Y dependencies, if X–Y intervenes between two adjacent members of a movement chain, X is a sister of the phrase headed by Y.

According to earlier versions of the CED, an XP blocks movement across it if it is not governed (see Huang 1982), or not L(exically)-marked (see Chomsky 1986), or not a complement (Cinque 1990). It is this latter version that we adopt in (28). Furthermore, (28) formulates the CED as a constraint on X–Y dependencies intervening in a movement chain (rather than as a constraint on movement, or on adjacent members of movement chains, as in the original versions). This is so as to ensure that it is the strength of the intervening X–Y dependency (rather than, say, the strength of the moved item, or of the movement chain that it is a part of) that determines CED satisfaction. Assuming the concept of intervention in (29), this change is innocuous.

(29)

Intervention:
An X–Y dependency intervenes between two members of a movement chain α _i and α _i+1 iff (a), (b), and (c) hold.
a.	α _i m-commands X.^[16]
b.	Y m-commands α _i+1.
c.	It is not the case that X m-commands α _i and c-commands α _i+1.

Given (29), all but the most local instances of movement to either intermediate phase edges or final landing sites will cross an X–Y dependency. Let us illustrate the concept of intervention in 3.5 by looking at some of the relevant configurations. Consider first the case of extraction from a direct object NP to the Specv position; cf. (30).

(30)

Dependency intervention with extraction from direct object NP to Specv:

There are three relevant X–Y dependencies to be considered in (30), viz., V–N₂ (V selects the head of a direct object NP₂), v-V (v selects the head of its complement VP), and v-N₁ (the head of the external argument NP₁ is selected by v). Of these, only the V–N₂ dependency intervenes between α _i and α _i+1: α _i m-commands V; N₂ m-commands α _i+1; and it is not the case that V both m-commands α _i and c-commands α _i+1 (the latter is true but the former is not). In contrast, the v-V dependency does not intervene between α _i and α _i+1: α _i m-commands v; V m-commands α _i+1; but it is the case that v both m-commands α _i and c-commands α _i+1.^[17] Third, the v-N₁ dependency does not intervene either: α _i m-commands v; but N₁ does not m-command α _i+1; furthermore, as we have just seen, v as an X makes clause (c) of (29) false. There are further dependencies that eventually need to be taken into account, but they will fail to intervene between α _i and α _i+1 because one of their members is too deeply embedded to carry out m-command (for instance, this holds for the D head of DP, assuming that N selects D); so we can conclude that there is a unique intervening V–N₂ dependency with extraction from a direct object NP.

Consider next extraction from a subject NP in Specv, to a higher Specv, as in (31).^[18]

(31)

Dependency intervention with extraction from subject NP to Specv:

Let us focus again on the three X–Y dependencies V–N₂, v-V, and v-N₁. This time, the V–N₂ dependency does not intervene between α _i and α _i+1; the reason is that N₂ does not m-command α _i+1. As before, the v-V dependency does not intervene: α _i m-commands v; and v fails to simultaneously m-command α _i and c-command α _i+1, as required for intervention. However, V does not m-command α _i+1. Still, there is again a unique intervening dependency, viz., v-N₁: α _i m-commands v; N₁ m-commands α _i+1, and, as we have just seen, v m-commands α _i but does not c-command α _i+1 (whereas v m-commands α _i+1, thus supporting the use of c-command rather than m-command in the second subclause of (29c)).

For present purposes, (30) and (31) are the core contexts of extraction from NP. However, more generally, it can be verified that there is an intervening X–Y dependency (often a unique one) in other extraction from NP scenarios as well. For instance, with extraction from an indirect object NP in SpecV, there is a unique intervening V–N₃ dependency; see (32a). With extraction from a direct object NP scrambled to Specv, there will be two intervening dependencies, viz., v-N₂ and V–N₂; cf. (32b).

(32)

Dependency intervention with extraction from indirect object NP to Specv:

Dependency intervention with extraction to Specv from direct object NP scrambled to Specv:

Thus, we can conclude that in all these scenarios where extraction takes place from an NP in a specifier or complement position, there is a dependency intervening between the moved item and its base position.^[19]

Based on these assumptions, we postulate that the CED, as a constraint on X–Y dependencies, plays a dual role in harmony evaluation. On the one hand, it is a negative constraint, just like MC and EC are: The CED registers a violation if it is violated by an output (and the strength of the violation depends on the strength of the X–Y dependency that gives rise to it). On the other hand, however, the CED is also a positive constraint, unlike MC and EC: It assigns a reward if it is satisfied. Positive constraints of this type are difficult to implement in standard parallel optimality theory (because of an Infinite Goodness problem arising according to which one could in principle carry out an infinite number of processes yielding rewards from a given constraint), but as noted by Kimper (2016), this problem vanishes under harmonic serialism, where input and output can be separated by at most one operation. Kimper observes that adopting positive constraint evaluation is empirically advantageous in the area of autosegmental spreading in phonology; and it turns out to also give rise to a much simpler account of the natural predicate effect with extraction from NP than would otherwise be available. Positive evaluation of the CED has the consequence that if an X–Y dependency satisfies the constraint, it can yield an additional reward, depending on the weight assigned to the X–Y dependency via ΔP _X|Y.

3.6 Analyses

Let us look at some consequences. Suppose that MC is associated with a weight of 4.0, and EC with a weight of 5.0. Based on just these two constraints, the default consequence is that movement (or, in fact, any other kind of structure-building) is not possible: An output that carries out movement (in the presence of a designated feature [•F•]) will incur a violation (−1) of EC, and end up with a harmony value of −5.0. In contrast, a competing output that fails to apply movement will only trigger a violation (−1) of MC, therefore has an overall harmony value of −4.0 (other things being equal), and will thus always be selected. On this view, to bring about movement (i.e., to make the output with movement optimal), it is necessary to get a reward from the remaining constraint, CED.^[20] We take the CED to be associated with a weight of 7.5.

Under these assumptions, a first prediction is that NP specifiers (subjects, indirect objects, and moved NPs) are invariably islands. Movement from a position within NP to the next edge of a phase will always violate the CED, and thus the bias against the movement-inducing MC will actually be strengthened. As we have seen, there are intervening dependencies in these environments: There is an intervening v-N dependency with extraction from subject NPs (see (31)), there is an intervening V–N dependency with extraction from indirect object NPs (see (32a)), and there is an intervening v-N dependency (plus an intervening V–N dependency) with extraction from scrambled objects (see (32b)). Consequently, the CED springs into action here, and rules out extraction. This is shown for the case of extraction from a subject NP in (33).

(33)

Optimization of extraction from subject NP:

In (33), output O₂ leaves XP₁ in situ, within the subject NP in Specv, even though, by assumption, there is a featural trigger for it. This gives rise to a −1 violation of the MC with weight 4.0, and to a harmony score of −4.0. On the other hand, O₁ extracts XP₁ out of the subject NP in Specv, to an outer Specv position, as required by MC (and ultimately by the PIC). This violates EC, yielding a violation score of −5.0. However, in addition, the CED is also violated since there is an intervening v-N dependency, and NP is not a sister of v. It is clear that, whatever the weight of the v-N dependency is, the constraint profile of the output that employs movement is thereby further worsened. For the sake of concreteness, we have registered a −1 violation of CED with O₁, yielding an overall harmony score of −12.5; but essentially the same result would have been obtained if the v-N dependency had a weight of, say, 0,01 (with −5,075 as the overall harmony score). The fact that the in-situ candidate O₂ wins this competition is, as such, not yet fatal. However, it is clear that XP₁ movement to the eventual target position later in the derivation (unless this already is the final landing site, as with local scrambling) will now eventually give rise to a fatal violation of the inviolable PIC.

Consider next the consequences that arise for extractions from NPs that are complements of V, i.e., direct objects. In this scenario, the CED is not violated. However, this does not yet suffice to permit extraction from the complement domain of N to the phase edge of v; in addition, there must be a sufficient reward from the CED (with weight 7.5) generated by an intervening V–N dependency. This reward may then render fatal the MC violation incurred by the output that does not apply movement, by lessening the EC violation incurred by the output that does. The reward is big enough in the well-formed cases of extraction from NP (i.e., where a natural predicate is involved, with a strength >0.133), and too small in the ill-formed cases of extraction from NP (where V and N do not enter a tight relation, with a strength <0.133).^[21]

To illustrate this, we will focus on two weights assigned to V–N dependencies that are close to the dividing line between V–N dependencies that permit extraction and V–N dependencies that do not permit extraction; recall (20). Suppose first that the V–N dependency is equipped with a numerical value of 0.12 (roughly the strength associated with Buch stehlen (‘book steal’)). As shown in (34), this leads to a reward of 0.9 provided by the CED. Thus, the harmony score of the output that employs movement (i.e., O₁) is improved. However, (34) also shows that this does not yet suffice to license movement; the EC violation incurred by movement is still too strong, and leaving XP₁ in situ, as in O₂, remains the most harmonic strategy.

(34)

Optimization of extraction from direct object, ΔP _V|N → 0.12:

Things are different when the V–N dependency has a weight of 0.15, though (approximately the strength associated with Bericht schreiben ‘report write’). As shown in (35), in this case the reduction effect brought about by the 1.125 reward for CED satisfaction is sufficiently large to permit the unavoidable violation of EC in the movement candidate O₁; and the MC violation incurred by the in-situ candidate O₂ becomes fatal.

(35)

Optimization of extraction from direct object, ΔP _V|N → 0.15:

It is clear that all V–N dependencies with a weight higher than 0.15 (i.e., with higher ΔP _V|N values, as with, e.g., Buch lesen ‘book read’ or Buch schreiben ‘book write’, which have normalized ΔP values in the 0.5, 0.6 area) will ceteris paribus also permit extraction from a complement NP, and that all V–N dependencies with a weight smaller than 0.12 will invariably block it. Thus, by assuming frequency-based ΔP values to act as weights associated with V–N dependencies, the concept of a natural predicate can be given a precise characterization, and asymmetries arising with extractions from NP in German can be derived.

3.7 Consequences

Needless to say, the present analysis makes a lot of further predictions, and raises several new questions. One obvious consequence is that not just extraction from NP, but in fact all instances of movement that are not extremely local will depend on an intervening head-head dependency giving rise to a CED reward that sufficiently reduces the negative harmony value incurred by the EC violation inherent to movement, so as to make the output that carries out movement more harmonic than the output that does not (and that thereby violates MC). For instance, given (29), a movement step from Specv to SpecC (as in standard cases of wh-movement) crosses an intervening T-v dependency: In (36), α _i m-commands T, v m-commands α _i+1, and whereas T c-commands α _i+1, it is not the case that T m-commands α _i.

(36)

In contrast, the C-T dependency does not intervene in (36) since C m-commands α _i and c-commands α _i+1. If nothing more is said, this dependency must be strong enough to bring about a sufficient CED reward to license the movement step, i.e., T-v must be associated with a weight >0.133. We will assume that, more generally, when a head-head dependency involves two functional categories, or one functional category and one lexical category, the weight associated with it is typically very high; this follows naturally by determining the ΔP values: A category like v is an extremely good predictor for a category like T, even if it is assumed that the particular phonological realizations of v and T are taken to be decisive (rather than the abstract functional category labels); the reason is that the number of different manifestations of both v and T is very small (and v and T usually co-occur).

A further consequence of the analysis concerns EPP-driven movement of subject NPs to SpecT, which we take to be optional in German. Given a clause structure as in (37), there is no head-head dependency intervening between two members of a movement chain α _i and α _i+1 (T m-commands α _i and c-commands α _i+1).

(37)

[_TP α _i [ T ′ [_vP α _i+1 [ v ′ [_VP … ] v ]] T ]]

Consequently, the CED cannot be violated in (37), but there is also no reward since there is no dependency that satisfies the constraint non-trivially (in general, trivial constraint satisfaction by dependencies must not be able to generate a reward). This means that movement should ceteris paribus be blocked in (37) (with the in-situ candidate violating MC being more harmonic than the movement candidate violating the constraint EC, which has a greater weight than MC). Several options suggest themselves to solve this problem. A simple solution would be that the EPP feature triggering (internal or external) Merge with T has more strength than other features triggering movement.^[22]

Next, recall that varieties of German allow for the option of moving an R-pronoun wo (‘where’) or da (‘there’) as the pronominal argument of a preposition, and this may also further involve extraction from an object NP, as in (1b) and (1d). R-pronoun extraction from NP is determined by exactly the same structural and lexical factors that PP extraction from NP is determined by; in the present context, this implies that the N-P dependency does not directly interact with the V–N dependency in the same optimization, neither by contributing additional weight, nor by reducing weight. The facts fall into place if it is assumed (i) that PP is accompanied by a functional projection pP on top of it, (ii) that pP qualifies as a phase, and (iii) that N continues to select P (deviating from strict locality in this environment; see Foonote 1) but does not select p (cf. Riemsdijk 1978; Koopman 2000, and Abels (2012) for discussion of relevant proposals); cf. (38) (compare (30)).

(38)

Under these assumptions, an R-pronoun needs to reach Specp before moving on; and since there is no N-p dependency and the P item of the N-P dependency fails to m-command α _i+1 in the specifier of pP, there is no additional intervening dependency to consider.

Finally, it can be noted that the present approach opens up the possibility of implementing Featherston’s (2004) findings regarding the role of frequency in extraction from CPs in German in a very direct way. In German (and many other languages), the legitimacy of extraction from an embedded declarative clause headed by dass (‘that’) depends both on the grammatical function (direct object: yes, subject: no) and, more importantly in the present context, on the choice of matrix verb; only bridge verbs allow extraction. Two examples illustrating this are given in (39a) (with bridge verbs) versus (39b) (with non-bridge verbs).

(39)

(Ich	weiß	nicht)	[_CP₁	wen₄	[_vP	t 4 ′ ″	sie	meint/glaubt/sagt	[_CP₂	t 4 ″
I	know	not		whom			she	thinks/thinks/says
dass	[_vP t 4 ′	du	t₄	getroffen	hast ]]]]
that		you		met	have

?*(Ich	weiß	nicht)	[_CP₁	wen₄	[_vP	t 4 ′ ″	sie	bereut/weiß/bezweifelt
I	know	not		whom			she	regrets/knows/doubts
[_CP₂	t 4 ″	dass	[_vP	t 4 ′	du	t₄	getroffen	hast ]]]
		that			you		met	have

Featherston’s (2004) observation is that bridge verbs are more frequent than non-bridge verbs. Thus, the mean log frequencies of CP-embedding verbs that can be derived by collecting the absolute frequencies of these verbs in four different corpora, converting the numbers by applying a logarithm function, summing the four individual resulting numbers for each verb, and finally dividing them by four strongly correlate with the option of extraction from CP (which was determined by experiments involving grammaticality judgements). This is shown for the verbs sagen ‘say’, glauben ‘believe’, and bezweifeln ‘regret’ in (40).

(40)

Interestingly, even though Featherston (2004) has, in our view, convincingly identified frequency of the matrix verb as a factor determining the option of extraction from CP in German, the grammatical theory he employs (the Decathlon Model; see Featherston 2005, 2019), while designed to predict frequencies in outputs, does in fact not incorporate frequency as a grammatical building block that may interact with other building blocks (like MC, EC, or CED in the present approach) to license or block extraction. Accordingly, Featherston (2004) remains silent on how to actually account for the frequency effect with the bridge verb phenomenon in grammatical theory. In contrast, it seems clear how the effect of frequency on extraction from CP could be modelled in the present approach. First, instead of bare V frequencies, ΔP _V|C values for V–C dependencies that intervene between a movement chain member α _i+1 in SpecC and its immediate chain antecedent α _i in the matrix Specv have to be determined (we have not done this but are reasonably confident that the results will be very similar to Featherston’s results). And second, normalized versions of these numbers are then predicted to bring about CED-based rewards that permit extraction from CP with highly frequent V–C dependencies (i.e., V–C dependencies that form a bridge).

4 Concluding remarks

It is a standard observation that extraction of PPs and R-pronouns from direct object NPs in German is dependent on V and N forming a natural predicate. In this article, we have argued that this can and should be conceived of as a frequency effect: Only those V–N dependencies permit extraction from a direct object NP that have a sufficiently high ΔP _V|N value. In other words: Frequency can act as a language-external grammatical building block that transparently and directly interacts with language-internal grammatical building blocks regulating syntactic movement. We would like to contend that such a finding is difficult to reconcile with virtually all of the more widely adopted grammatical theories. It seems that the best one can do in standard approaches in order to implement the generalization is to view frequency as a factor determining the learning of syntactic operations, or rules. On such a view, highly frequent V–N dependencies could have become equipped with a special diacritic in the course of language acquisition, and the decision on whether movement can or cannot apply could then be made sensitive in the grammar to the presence or absence of this diacritic.^[23] We take it to be uncontroversial that such a use of ad hoc diacritics whose sole purpose is to encode some other well-defined, independently existing piece of information that cannot be available in the grammar for systematic reasons is to be avoided if at all possible. As we have tried to show, Gradient Harmonic Grammar is unique among current theories of grammar in postulating that linguistic objects are associated with numerical weights that then interact with the weights assigned to the language-internal grammatical constraints, and that therefore make implementing frequency values a straightforward option. Our approach combines standard constraint evalulation of Gradient Harmonic Grammar with standard Minimalist derivations and standard Harmonic Serialism (which independently suggests itself for Minimalist derivations due to its inherently derivational nature). The only innovative assumption that we had to make is that the weights of V–N dependencies (as well as of other head-head dependencies) are determined by frequency.^[24]

In addition to this substantive conceptual difference, a diacritic-based approach where frequency only plays a role in language acquisition and an approach where frequency acts as a language-external building block in the grammar itself are also not extensionally equivalent. At least in principle, they make different empirical predictions when it comes to variation in the domain of extraction from NP. Indeed, there seems to be quite a bit of variation with extraction from NP. In Gradient Harmonic Grammar, there are two natural sources for this: First, different weights of constraints (MC, EC, or CED, in the case at hand) can produce different optimal outputs. This implies that speakers with slightly different weights assigned to crucial constraints may simply have different thresholds for accepting or rejecting extraction from direct object NPs, without there being any weight differences with respect to V–N dependencies. Second, different weights of N-V dependencies can of course also produce different optimal outputs. To end this article, it is this latter consequence that we would briefly like to focus on.^[25]

Corpora like the DWDS core corpus can only approximate the frequency of V–N dependencies in the external and internal linguistic inputs accessible to speakers. If the external linguistic input (i.e., the body of linguistic data outside of a speaker which are accessible by hearing or reading) is vastly different, different outputs may become grammatical. To give a concrete example: Suppose that a speaker is immersed in a culture which is just like that of a prototypical German-speaking community, except that there is a tradition of throwing books in the air after reading them. In that case, Buch ‘book’ will be a much better predictor for werfen ‘throw’ than it is in (19), and ΔP _werfen|Buch will be much higher. Here we may then expect that sentences like Worüber hat Fritz ein Buch (in die Luft) geworfen? ‘about what has Fritz a book (in the air) thrown’ will become well formed. The same conclusion can be drawn for internal linguistic inputs (i.e., all the acts of thinking in terms of language without ever externalizing it, conducting inner monologues, and the like). Suppose, for instance, that some Nazi speaker fantasizes about burning books all the time and very clearly distinguishes between authors, or between topics, of the books that he wants to burn. In this scenario, ΔP _{verbrennen|Buch} will go up, and it would seem to be likely that this speaker will accept sentences like Über wen soll ich heute ein Buch verbrennen? ‘about whom should I today a book burn’, which are certainly not well formed otherwise for most speakers (unless they have extremely reduced thresholds). These two thought experiments make it possible to distinguish empirically between the diacritic-based approach to frequency effects in extraction from NP and the purely frequency-based approach that we have pursued. In the former approach, frequency determines language acquisition and ceases to be active afterwards, whereas frequency stays active as a factor in the latter approach, and a change in frequency is expected to potentially lead to a change in the application of grammatical operations. Therefore, a change of the external linguistic input or of the internal linguistic input at any point in time is predicted to result in different extraction options under the direct approach to frequency effects advocated in the present paper, but not under the indirect approach that confines the role of frequency to language acquisition. Effects of the type hypothesized in this paragraph may then be taken as a further possible argument in support of the idea that frequency is directly active as a building block of grammar.^[26]

Corresponding author: Gereon Müller, Institut für Linguistik, Universität Leipzig, D-04081 Leipzig, Germany, E-mail: gereon.mueller@uni-leipzig.de

Acknowledgments

For helpful comments and discussion, we are grateful to audiences at Universität Leipzig (various courses and colloquia) and Universität Hamburg (DGfS 42, Workshop Modelling Gradient Variability in Grammar), and to two careful reviewers for Linguistics.

Research funding: Research for this article was supported by a DFG Reinhart Koselleck grant (MU 1444/14-1, Structure Removal in Syntax).

Appendix

The goal of this appendix is twofold. On the one hand, we provide both the raw DWDS corpus data underlying the ΔP and the complete normalizations; on the other hand, we present the results under alternative measures of collocational strength.

Let us start with the former. The raw corpus data are given in Table 1, together with the two ΔP evaluations.

Table 1:

Raw DWDS corpus data: ΔPs.

Noun	Verb	N.Count	V.Count	Pair.Count	ΔP.NV	ΔP.VN
Roman	schreiben	4,705	36,935	241	0.00575	0.04491
Buch	lesen	27,207	20,375	794	0.03441	0.02580
Roman	lesen	4,705	20,375	121	0.00515	0.02223
Buch	schreiben	27,207	36,935	756	0.01589	0.02154
Buch	kaufen	27,207	8,717	144	0.01186	0.00381
Bericht	lesen	13,674	20,375	96	0.00237	0.00353
Verlautbarung	verfassen	429	1,793	1	0.00048	0.00202
Buch	aufschlagen	27,207	1,005	55	0.05006	0.00186
Roman	verfassen	4,705	1,793	9	0.00421	0.00161
Bericht	schreiben	13,674	36,935	107	0.00055	0.00148
Geschichte	schreiben	34,246	36,935	267	0.00135	0.00146
Buch	verkaufen	27,207	6,201	66	0.00597	0.00137
Verlautbarung	lesen	429	20,375	2	0.00002	0.00116
Buch	vorlesen	27,207	1,343	36	0.02214	0.00110
Buch	verfassen	27,207	1,793	33	0.01373	0.00091
Bericht	verfassen	13,674	1,793	14	0.00546	0.00072
Roman	kaufen	4,705	8,717	10	0.00034	0.00063
Geschichte	vorlesen	34,246	1,343	20	0.00901	0.00036
Buch	weglegen	27,207	103	9	0.08271	0.00031
Geschichte	lesen	34,246	20,375	125	0.00025	0.00015
Geschichte	aufschlagen	34,246	1,005	10	0.00407	0.00012
Geschichte	verfassen	34,246	1,793	14	0.00192	0.00010
Buch	stehlen	27,207	2,140	12	0.00093	0.00007
Roman	aufschlagen	4,705	1,005	1	0.00019	0.00004
Geschichte	weglegen	34,246	103	1	0.00382	0.00001
Buch	klauen	27,207	421	2	0.00008	0.00000
Bericht	vorlesen	13,674	1,343	3	−0.00012	−0.00001
Verlautbarung	weglegen	429	103	0	−0.00007	−0.00002
Roman	weglegen	4,705	103	0	−0.00081	−0.00002
Bericht	weglegen	13,674	103	0	−0.00235	−0.00002
Roman	vorlesen	4,705	1,343	1	−0.00006	−0.00002
Verlautbarung	klauen	429	421	0	−0.00007	−0.00007
Roman	klauen	4,705	421	0	−0.00081	−0.00007
Bericht	klauen	13,674	421	0	−0.00235	−0.00007
Geschichte	klauen	34,246	421	0	−0.00589	−0.00007
Verlautbarung	aufschlagen	429	1,005	0	−0.00007	−0.00017
Bericht	aufschlagen	13,674	1,005	0	−0.00235	−0.00017
Verlautbarung	vorlesen	429	1,343	0	−0.00007	−0.00023
Geschichte	stehlen	34,246	2,140	1	−0.00542	−0.00034
Verlautbarung	stehlen	429	2,140	0	−0.00007	−0.00037
Roman	stehlen	4,705	2,140	0	−0.00081	−0.00037
Bericht	stehlen	13,674	2,140	0	−0.00235	−0.00037
Buch	werfen	27,207	17,441	67	−0.00084	−0.00054
Roman	verkaufen	4,705	6,201	1	−0.00065	−0.00085
Geschichte	verkaufen	34,246	6,201	1	−0.00573	−0.00104
Verlautbarung	verkaufen	429	6,201	0	−0.00007	−0.00107
Bericht	verkaufen	13,674	6,201	0	−0.00235	−0.00107
Buch	öffnen	27,207	11,887	22	−0.00283	−0.00124
Roman	öffnen	4,705	11,887	3	−0.00056	−0.00141
Verlautbarung	kaufen	429	8,717	0	−0.00007	−0.00150
Bericht	kaufen	13,674	8,717	0	−0.00235	−0.00150
Geschichte	kaufen	34,246	8,717	0	−0.00589	−0.00151
Geschichte	öffnen	34,246	11,887	7	−0.00531	−0.00185
Bericht	öffnen	13,674	11,887	1	−0.00227	−0.00197
Verlautbarung	öffnen	429	11,887	0	−0.00007	−0.00204
Roman	werfen	4,705	17,441	4	−0.00058	−0.00215
Geschichte	werfen	34,246	17,441	23	−0.00458	−0.00234
Bericht	werfen	13,674	17,441	3	−0.00218	−0.00278
Verlautbarung	werfen	429	17,441	0	−0.00007	−0.00300
Verlautbarung	schreiben	429	36,935	0	−0.00007	−0.00635

The normalized values for all collocations are shown in Table 2.

Table 2:

DWDS corpus data: Normalized ΔP values.

Noun	Verb	ΔP	ΔP
Noun	Verb	norm.NV	norm.VN
Roman	schreiben	0.1315	1.0000
Buch	lesen	0.4550	0.6272
Roman	lesen	0.1246	0.5576
Buch	schreiben	0.2459	0.5441
Buch	kaufen	0.2004	0.1982
Bericht	lesen	0.0933	0.1926
Verlautbarung	verfassen	0.0720	0.1633
Buch	aufschlagen	0.6315	0.1601
Roman	verfassen	0.1141	0.1552
Bericht	schreiben	0.0727	0.1527
Geschichte	schreiben	0.0818	0.1523
Buch	verkaufen	0.1340	0.1505
Verlautbarung	lesen	0.0668	0.1465
Buch	vorlesen	0.3164	0.1452
Buch	verfassen	0.2215	0.1416
Bericht	verfassen	0.1281	0.1378
Roman	kaufen	0.0703	0.1361
Geschichte	vorlesen	0.1682	0.1308
Buch	weglegen	1.0000	0.1300
Geschichte	lesen	0.0694	0.1267
Geschichte	aufschlagen	0.1124	0.1262
Geschichte	verfassen	0.0882	0.1258
Buch	stehlen	0.0770	0.1253
Roman	aufschlagen	0.0686	0.1246
Geschichte	weglegen	0.1097	0.1241
Buch	klauen	0.0674	0.1238
Bericht	vorlesen	0.0652	0.1236
Verlautbarung	weglegen	0.0657	0.1235
Roman	weglegen	0.0574	0.1235
Bericht	weglegen	0.0400	0.1235
Roman	vorlesen	0.0658	0.1235
Verlautbarung	klauen	0.0657	0.1224
Roman	klauen	0.0574	0.1224
Bericht	klauen	0.0400	0.1224
Geschichte	klauen	0.0001	0.1224
Verlautbarung	aufschlagen	0.0657	0.1205
Bericht	aufschlagen	0.0400	0.1204
Verlautbarung	vorlesen	0.0657	0.1193
Geschichte	stehlen	0.0054	0.1172
Verlautbarung	stehlen	0.0657	0.1167
Roman	stehlen	0.0574	0.1166
Bericht	stehlen	0.0400	0.1166
Buch	werfen	0.0571	0.1134
Roman	verkaufen	0.0592	0.1072
Geschichte	verkaufen	0.0019	0.1035
Verlautbarung	verkaufen	0.0657	0.1030
Bericht	verkaufen	0.0400	0.1030
Buch	öffnen	0.0346	0.0996
Roman	öffnen	0.0602	0.0964
Verlautbarung	kaufen	0.0657	0.0946
Bericht	kaufen	0.0400	0.0945
Geschichte	kaufen	0.0000	0.0944
Geschichte	öffnen	0.0066	0.0878
Bericht	öffnen	0.0409	0.0853
Verlautbarung	öffnen	0.0657	0.0840
Roman	werfen	0.0600	0.0819
Geschichte	werfen	0.0148	0.0782
Bericht	werfen	0.0419	0.0695
Verlautbarung	werfen	0.0657	0.0654
Verlautbarung	schreiben	0.0657	0.0000

By and large, these data are fully in accordance with the theoretical modelling we have suggested, where extraction is predicted to be possible if the normalized ΔP-VN value is above a cut-off point in the 0.13–0.14 area. The few obvious discrepancies (as with Verlautbarung verfassen ‘official statement write’, Verlautbarung lesen ‘official statement read’, which do not permit extraction from NP) would seem to be traceable back to independent causes; in particular, an extreme overall rarity of a V–N dependency looks like an obvious additional factor.

Next, as noted in Footnote 7, we have investigated three alternative measures of collocational strength, in addition to normalized ΔP values. These are, first, Mutual Information (MI); second, t-score; and third, an account for determining (asymmetrical) collocational strength that we will refer to as Alt. Let us begin with Mutual Information (cf. Church and Hanks 1990). This is a measure that results in high values for low-frequency W1–W2 combinations if W1 and W2 are very faithful to each other. If a word occurs only once in a corpus, it will have high MI values for the preceding (and following) word, whatever those words are. Thus, Mutual Information rewards low-frequency collocations (as long as at least one of the members does not occur with many other words, which is trivially true for a word count of 1, for example). Second, the t-score (cf. Church et al. 1991) is sensitive to the overall frequency of the collocation W1–W2 in the corpus. It produces high values, even if either W1 or W2 occur frequently with other words. And third, as yet another variation a reviewer has suggested an asymmetrical indicator of collocational strength based on the frequency of O given C, relative to the overall frequency of O, accompanied by log-transformed and scaled values.

Table 3 shows what while the results obtained with these measures differ from the results under ΔP, and also from one another, in several respects, the basic conclusions carry over unchanged (under normalization), and (with the possible exception of MI) these alternative approaches could in principle also have been employed in the present analysis.^[27] Still, it turns out that none of the alternatives manages to establish the near-perfect match with extraction options that is predicted by (normalized) ΔP.

Table 3:

Different measures of collocational strength.

Noun	Verb	N.Count	V.Count	Pair.Count	ΔP.NV.norm	ΔP.VN.norm	Alt.N_V	Alt.V_N	MI	T.Score	T.norm	MI.norm
Roman	schreiben	4,705	36,935	241	0.1315	1.0000	1.0000	0.6264	3.0127	13.6006	0.8143	0.8713
Buch	lesen	27,207	20,375	794	0.4550	0.6272	0.9247	0.8837	3.0593	24.7975	1.0000	0.8762
Roman	lesen	4,705	20,375	121	0.1246	0.5576	0.9078	0.6129	2.8769	9.5025	0.7463	0.8569
Buch	schreiben	27,207	36,935	756	0.2459	0.5441	0.9181	0.7910	2.1303	21.2154	0.9406	0.7776
Buch	kaufen	27,207	8,717	144	0.2004	0.1982	0.6961	0.7602	1.8211	8.6039	0.7314	0.7447
Bericht	lesen	13,674	20,375	96	0.0933	0.1926	0.7340	0.5796	1.0038	4.9118	0.6702	0.6579
Verlautbarung	verfassen	429	1793	1	0.0720	0.1633	0.5863	0.2723	2.9195	0.8678	0.6031	0.8614
Buch	aufschlagen	27,207	1,005	55	0.6315	0.1601	0.5673	0.9326	3.5492	6.7827	0.7012	0.9283
Roman	verfassen	4,705	1793	9	0.1141	0.1552	0.5599	0.5887	2.6343	2.5168	0.6304	0.8311
Bericht	schreiben	13,674	36,935	107	0.0727	0.1527	0.7485	0.5095	0.3021	1.9543	0.6211	0.5834
Geschichte	schreiben	34,246	36,935	267	0.0818	0.1523	0.7480	0.6412	0.2968	3.0386	0.6391	0.5828
Buch	verkaufen	27,207	6,201	66	0.1340	0.1505	0.5917	0.6969	1.1869	4.5556	0.6642	0.6774
Verlautbarung	lesen	429	20,375	2	0.0668	0.1465	0.6791	0.0222	0.4131	0.3522	0.5945	0.5952
Buch	vorlesen	27,207	1,343	36	0.3164	0.1452	0.5105	0.8299	2.5195	4.9536	0.6708	0.8189
Buch	verfassen	27,207	1793	33	0.2215	0.1416	0.4989	0.7757	1.9770	4.2854	0.6598	0.7613
Bericht	verfassen	13,674	1793	14	0.1281	0.1378	0.4762	0.6523	1.7325	2.6157	0.6321	0.7353
Roman	kaufen	4,705	8,717	10	0.0703	0.1361	0.5740	0.3762	0.5048	0.9337	0.6042	0.6049
Geschichte	vorlesen	34,246	1,343	20	0.1682	0.1308	0.4010	0.7452	1.3395	2.7050	0.6335	0.6936
Buch	weglegen	27,207	103	9	1.0000	0.1300	0.3250	1.0000	4.2242	2.8395	0.6358	1.0000
Geschichte	lesen	34,246	20,375	125	0.0694	0.1267	0.6464	0.6176	0.0601	0.4562	0.5963	0.5577
Geschichte	aufschlagen	34,246	1,005	10	0.1124	0.1262	0.3083	0.6872	0.7578	1.2921	0.6101	0.6318
Geschichte	verfassen	34,246	1793	14	0.0882	0.1258	0.3533	0.6523	0.4080	0.9217	0.6040	0.5946
Buch	stehlen	27,207	2,140	12	0.0770	0.1253	0.3635	0.6046	0.2624	0.5760	0.5982	0.5791
Roman	aufschlagen	4,705	1,005	1	0.0686	0.1246	0.2657	0.3557	0.2995	0.1875	0.5918	0.5831
Geschichte	weglegen	34,246	103	1	0.1097	0.1241	0.0000	0.6837	0.7223	0.3939	0.5952	0.6280
Buch	klauen	27,207	421	2	0.0674	0.1238	0.1236	0.5808	0.0231	0.0225	0.5891	0.5537
Bericht	vorlesen	13,674	1,343	3	0.0652	0.1236	0.2700	0.4721	−0.0729	−0.0898	0.5872	0.5435
Verlautbarung	weglegen	429	103	0	0.0657	0.1235	NA	NA	-Inf	-Inf	NA	NA
Roman	weglegen	4,705	103	0	0.0574	0.1235	NA	NA	-Inf	-Inf	NA	NA
Bericht	weglegen	13,674	103	0	0.0400	0.1235	NA	NA	-Inf	-Inf	NA	NA
Roman	vorlesen	4,705	1,343	1	0.0658	0.1235	0.2657	0.3139	−0.1187	−0.0858	0.5873	0.5387
Verlautbarung	klauen	429	421	0	0.0657	0.1224	NA	NA	-Inf	-Inf	NA	NA
Roman	klauen	4,705	421	0	0.0574	0.1224	NA	NA	-Inf	-Inf	NA	NA
Bericht	klauen	13,674	421	0	0.0400	0.1224	NA	NA	-Inf	-Inf	NA	NA
Geschichte	klauen	34,246	421	0	0.0001	0.1224	NA	NA	-Inf	-Inf	NA	NA
Verlautbarung	aufschlagen	429	1,005	0	0.0657	0.1205	NA	NA	-Inf	-Inf	NA	NA
Bericht	aufschlagen	13,674	1,005	0	0.0400	0.1204	NA	NA	-Inf	-Inf	NA	NA
Verlautbarung	vorlesen	429	1,343	0	0.0657	0.1193	NA	NA	-Inf	-Inf	NA	NA
Geschichte	stehlen	34,246	2,140	1	0.0054	0.1172	0.0000	0.2469	−3.6546	−11.5931	0.3964	0.1630
Verlautbarung	stehlen	429	2,140	0	0.0657	0.1167	NA	NA	-Inf	-Inf	NA	NA
Roman	stehlen	4,705	2,140	0	0.0574	0.1166	NA	NA	-Inf	-Inf	NA	NA
Bericht	stehlen	13,674	2,140	0	0.0400	0.1166	NA	NA	-Inf	-Inf	NA	NA
Buch	werfen	27,207	17,441	67	0.0571	0.1134	0.5937	0.5502	−0.2833	−1.7761	0.5592	0.5212
Roman	verkaufen	4,705	6,201	1	0.0592	0.1072	0.2657	0.0937	−2.3258	−4.0134	0.5221	0.3042
Geschichte	verkaufen	34,246	6,201	1	0.0019	0.1035	0.0000	0.0937	−5.1895	−35.4906	0.0000	0.0000
Verlautbarung	verkaufen	429	6,201	0	0.0657	0.1030	NA	NA	-Inf	-Inf	NA	NA
Bericht	verkaufen	13,674	6,201	0	0.0400	0.1030	NA	NA	-Inf	-Inf	NA	NA
Buch	öffnen	27,207	11,887	22	0.0346	0.0996	0.4446	0.4450	−1.3369	−7.1577	0.4700	0.4093
Roman	öffnen	4,705	11,887	3	0.0602	0.0964	0.4128	0.1582	−1.6796	−3.8165	0.5254	0.3728
Verlautbarung	kaufen	429	8,717	0	0.0657	0.0946	NA	NA	-Inf	-Inf	NA	NA
Bericht	kaufen	13,674	8,717	0	0.0400	0.0945	NA	NA	-Inf	-Inf	NA	NA
Geschichte	kaufen	34,246	8,717	0	0.0000	0.0944	NA	NA	-Inf	-Inf	NA	NA
Geschichte	öffnen	34,246	11,887	7	0.0066	0.0878	0.2605	0.2802	−3.3209	−23.7931	0.1940	0.1985
Bericht	öffnen	13,674	11,887	1	0.0409	0.0853	0.1229	0.0000	−4.8038	−26.9304	0.1420	0.0410
Verlautbarung	öffnen	429	11,887	0	0.0657	0.0840	NA	NA	-Inf	-Inf	NA	NA
Roman	werfen	4,705	17,441	4	0.0600	0.0819	0.4513	0.1444	−1.8177	−5.0503	0.5049	0.3582
Geschichte	werfen	34,246	17,441	23	0.0148	0.0782	0.4198	0.3962	−2.1578	−16.6048	0.3133	0.3220
Bericht	werfen	13,674	17,441	3	0.0419	0.0695	0.2700	0.1030	−3.7719	−21.9280	0.2250	0.1506
Verlautbarung	werfen	429	17,441	0	0.0657	0.0654	NA	NA	-Inf	-Inf	NA	NA
Verlautbarung	schreiben	429	36,935	0	0.0657	0.0000	NA	NA	-Inf	-Inf	NA	NA

References

Abels, Klaus. 2012. Phases. An essay on cyclicity in syntax (Linguistische Arbeiten 543). Berlin & Boston: De Gruyter.10.1515/9783110284225Suche in Google Scholar

Alexiadou, Artemis, Elena Anagnastopoulou & Florian Schäfer. 2015. External arguments in transitivity alternations. Oxford: Oxford University Press.10.1093/acprof:oso/9780199571949.001.0001Suche in Google Scholar

Bach, Emmon & George Horn. 1976. Remarks on ‘conditions on transformations’, Linguistic Inquiry 7(2). 265–299.Suche in Google Scholar

Baker, Mark. 1988. Incorporation: A theory of grammatical function changing. Chicago: University of Chicago Press.Suche in Google Scholar

Baltin, Mark & Paul Postal. 1996. More on reanalysis hypotheses. Linguistic Inquiry 27. 127–145.Suche in Google Scholar

Barbiers, Sjef. 2002. Remnant stranding and the theory of movement. In Artemis Alexiadou, Elena Anagnostopoulou, Sjef Barbiers & Hans-Martin Gärtner (eds.), Dimensions of movement, 47–67. Amsterdam & Philadelphia: John Benjamins.10.1075/la.48.04barSuche in Google Scholar

Boersma, Paul & Joe Pater. 2016. Convergence properties of a gradual learning algorithm for Harmonic Grammar. In John McCarthy & Joe Pater (eds.), Harmonic grammar and harmonic serialism, 289–434. London: Equinox.Suche in Google Scholar

Bowers, John. 2017. Deriving syntactic relations. Cambridge: Cambridge University Press.10.1017/9781316156414Suche in Google Scholar

Bresnan, Joan, Shipra Dingare & Christopher Manning. 2001. Soft constraints mirror hard constraints: Voice and person in English and Lummi. In Miriam Butt & Tracy Holloway King (eds.), Proceedings of the LFG 01 conference. Stanford, CA: CSLI Publications.Suche in Google Scholar

Broekhuis, Hans. 2006. Derivations (MP) and Evaluations (OT). In Ralf Vogel & Hans Broekhuis (eds.), Optimality theory and minimalism: A possible convergence? (Linguistics in Potsdam 25), 137–193. Potsdam: Universitätsverlag Potsdam.Suche in Google Scholar

Broekhuis, Hans & Ellen Woolford. 2013. Minimalism and optimality theory. In Marcel den Dikken (ed.), Cambridge handbook of generative syntax, 122–161. Cambridge: Cambridge University Press.10.1017/CBO9780511804571.008Suche in Google Scholar

Bruening, Benjamin. 2009. Selectional asymmetries between CP and DP suggest that the DP hypothesis is wrong. PWPL 15(1). Article 5. Available at: https://repository.upenn.edu/pwpl/vol15/iss1/5.Suche in Google Scholar

Bruening, Benjamin. 2020a. Idioms, collocations, and structure: Syntactic constraints on conventionalized expressions. Natural Language and Linguistic Theory 38(2). 365–424. https://doi.org/10.1007/s11049-019-09451-0.Suche in Google Scholar

Bruening, Benjamin. 2020b. The head of the nominal is N, not D: N-to-D movement, hybrid agreement, and conventionalized expressions. Glossa: A Journal of General Linguistics 5(1). 15. 1–19. https://doi.org/10.5334/gjgl.1031.Suche in Google Scholar

Bruening, Benjamin, Xuyen Dinh & Kim Lan. 2018. Selection, idioms, and the structure of nominal phrases with and without classifiers. Glossa: A Journal of General Linguistics 3(1): 42. 1–46. https://doi.org/10.5334/gjgl.288.Suche in Google Scholar

Cattell, Ray. 1976. Constraints on movement rules. Language 52. 18–50.10.2307/413206Suche in Google Scholar

Chaves, Rui. 2012. On the grammar of extraction and coordination. Natural Language and Linguistic Theory 30. 465–512.10.1007/s11049-011-9164-ySuche in Google Scholar

Chaves, Rui & Jeruen Dery. 2019. Frequency effects in subject Islands. Journal of Linguistics 55. 475–521. https://doi.org/10.1017/S0022226718000294.Suche in Google Scholar

Chomsky, Noam. 1977. On Wh-Movement. In Peter Culicover, Thomas Wasow & Adrian Akmajian (eds.), Formal syntax, 71–132. New York: Academic Press.Suche in Google Scholar

Chomsky, Noam. 1986. Barriers. Cambridge, MA: MIT Press.Suche in Google Scholar

Chomsky, Noam. 1995. The minimalist program. Cambridge, MA: MIT Press.Suche in Google Scholar

Chomsky, Noam. 2001. Derivation by Phase. In Michael Kenstowicz (ed.), Ken Hale: A life in language, 1–52. Cambridge, MA: MIT Press.10.7551/mitpress/4056.003.0004Suche in Google Scholar

Chomsky, Noam. 2008. On Phases. In Robert Freidin, Carlos Otero & Maria Luisa Zubizarreta (eds.), Foundational issues in linguistic theory, 133–166. Cambridge, MA: MIT Press.10.7551/mitpress/7713.003.0009Suche in Google Scholar

Church, Kenneth Ward & Patrick Hanks. 1990. Word association norms, mutual information, and lexicography. Computational Linguistics 16(1). 22–29.Suche in Google Scholar

Church, Kenneth, William Gale, Patrick Hanks & Donald Hindle. 1991. Using statistics in lexical analysis. In Uri Zernik (ed.), Lexical acquisition: Exploiting on-line resources to build a lexicon, 115–164. New York: Taylor & Francis.10.4324/9781315785387-8Suche in Google Scholar

Cinque, Guglielmo. 1990. Types of A-bar dependencies. Cambridge, MA: MIT Press.Suche in Google Scholar

Collins, Chris & Edward Stabler. 2016. A formalization of minimalist syntax. Syntax 19(1). 43–78.10.1111/synt.12117Suche in Google Scholar

Davies, William & Stanley Dubinsky. 2003. On extraction from NPs. Natural Language and Linguistic Theory 21. 1–37.10.1023/A:1021891610437Suche in Google Scholar

De Kuthy, Kordula. 2001. Splitting PPs from NPs. In Walt Detmer Meurers & Tibor Kiss (eds.), Constraint-based approaches to Germanic syntax, 25–70. Stanford, CA: CSLI Publications.Suche in Google Scholar

De Kuthy, Kordula & Walt Detmar Meurers. 2001. On partial constituent fronting in German. Journal of Comparative Germanic Linguistics 3(3). 143–205.10.1023/A:1011926510300Suche in Google Scholar

Diesing, Molly. 1992. Indefinites. Cambridge, MA: MIT Press.Suche in Google Scholar

Fanselow, Gisbert. 1987. Konfigurationalität. Tübingen: Narr.Suche in Google Scholar

Fanselow, Gisbert. 1991. Minimale syntax. Passau: Universität Passau Habilitation thesis.Suche in Google Scholar

Fanselow, Gisbert. 2001. Features, theta-roles, and free constituent order. Linguistic Inquiry 32. 405–436.10.1162/002438901750372513Suche in Google Scholar

Featherston, Sam. 2004. Bridge verbs and V2 verbs: The same thing in spades? Zeitschrift für Sprachwissenschaft 23. 181–210.10.1515/zfsw.2004.23.2.181Suche in Google Scholar

Featherston, Sam. 2005. The decathlon model of empirical syntax. In Stephen Kepser & Marga Reis (eds.), Linguistic evidence, 187–208. Berlin & New York: Mouton de Gruyter.10.1515/9783110197549.187Suche in Google Scholar

Featherston, Sam. 2019. The decathlon model. In Adras Kertész, Csilla Rákosi & Edith Moravcsik (eds.), Current approaches to syntax – A comparative handbook, 155–186. Berlin & Boston: De Gruyter Mouton.10.1515/9783110540253-006Suche in Google Scholar

Frey, Werner. 2015. NP-Incorporation in German. In Olga Borik & Berit Gehrke (eds.), The syntax and semantics of pseudo-incorporation, 225–261. Leiden: Brill.10.1163/9789004291089_008Suche in Google Scholar

Gallego, Ángel. 2007. Phase theory and parametric variation. Barcelona: Universitat Autónoma de Barcelona dissertation.Suche in Google Scholar

Gazdar, Gerald & Ewan Klein. 1978. Formal semantics of natural language: papers from a colloquium sponsored by the King’s College Research Centre, Cambridge, edited by Edward L. Keenan (review). 54(3). 663–667.10.1353/lan.1978.0022Suche in Google Scholar

Georgi, Doreen. 2017. Patterns of movement reflexes as the result of the order of Merge and Agree. Linguistic Inquiry 48(4). 585–626.10.1162/LING_a_00255Suche in Google Scholar

Georgi, Doreen & Gereon Müller. 2010. Noun phrase structure by reprojection. Syntax 13(1). 1–36.10.1111/j.1467-9612.2009.00132.xSuche in Google Scholar

Geyken, Alexander. 2007. The DWDS corpus: A reference corpus for the German language of the 20th century. In Christiane Fellbaum (ed.), Collocations and idioms: Linguistic, lexicographic, and computational aspects, 23–41. London: Continuum Press.Suche in Google Scholar

Grewendorf, Günther. 1989. Ergativity in German. Dordrecht: Foris.10.1515/9783110859256Suche in Google Scholar

Gries, Stefan. 2013. 50-Something years of work on collocations: What is or should be next. International Journal of Corpus Linguistics 18(1). 137–165.10.1075/ijcl.18.1.09griSuche in Google Scholar

Gries, Stefan, Beate Hampe & Doris Schönefeld. 2005. Converging evidence: Bringing together experimental and corpus data on the association of verbs and constructions. Cognitive Linguistics 16. 635–676.10.1515/cogl.2005.16.4.635Suche in Google Scholar

Gries, Stephan & Anatol Stefanowitsch. 2004. Extending collostructional analysis: A corpus-based perspective on “alternations”. International Journal of Corpus Linguistics 9. 97–129.10.1075/ijcl.9.1.06griSuche in Google Scholar

Grimshaw, Jane. 1997. Projection, heads, and optimality. Linguistic Inquiry 28. 373–422.Suche in Google Scholar

Grimshaw, Jane. 2006. Chains as unfaithful optima. In Eric Baković, Junko Ito & John J. McCarthy (eds.), Wondering at the Natural Fecundity of Things: Essays in Honor of Alan Prince, 97–109. Santa Cruz, CA: Linguistics Research Center.Suche in Google Scholar

Haider, Hubert. 1983. Connectedness effects in German. Groninger Arbeiten zur Germanistischen Linguistik 23. 82–119.Suche in Google Scholar

Haider, Hubert. 1992. The basic branching conjecture. Ms. Universität Stuttgart.Suche in Google Scholar

Haider, Hubert. 1993. Deutsche Syntax – generativ. Tübingen: Narr.Suche in Google Scholar

Halle, Morris & Alec Marantz. 1993. Distributed morphology and the pieces of inflection. In Kenneth Hale & Samuel Jay Keyser (eds.), The view from building 20, 111–176. Cambridge, MA: MIT Press.Suche in Google Scholar

Hawkins, John. 1999. Processing complexity and filler-gap dependencies across grammars. Language 75. 244–285.10.2307/417261Suche in Google Scholar

Hayes, Bruce. 2001. Gradient well-formedness in optimality theory. In Joost Dekkers, Frank van der Leeuw & Jeroen van de Weijer (eds.), Optimality theory: Phonology, syntax, and acquisition, 88–120. Oxford: Oxford University Press.10.1093/oso/9780198238430.003.0003Suche in Google Scholar

Heck, Fabian & Gereon Müller. 2013. Extremely local optimization. In Hans Broekhuis & Ralf Vogel (eds.), Linguistic derivations and filtering, 135–166. Sheffield: Equinox.Suche in Google Scholar

Hein, Johannes & Katja Barnickel. 2018. Replication of R-pronouns in German dialects. Zeitschrift für Sprachwissenschaft 37. 171–204.10.1515/zfs-2018-0009Suche in Google Scholar

Hofmeister, Philip, Peter Culicover & Susanne Winkler. 2015. Effects of processing on the acceptability of frozen extraposed constituents. Syntax 184. 464–483.10.1111/synt.12036Suche in Google Scholar

Huang, Cheng-Teh James. 1982. Logical relations in Chinese and the theory of grammar. Cambridge, MA: MIT dissertation.Suche in Google Scholar

Kimper, Wendell. 2016. Positive constraints and finite goodness in harmonic serialism. In John McCarthy & Joe Pater (eds.), Harmonic grammar and harmonic serialism, 221–235. London: Equinox.Suche in Google Scholar

Kluender, Robert. 2004. Are subject islands subject to a processing account? In Vineeta Chand, Ann Kelleher, Angelo J. Rodríguez & Benjamin Schmeiser (eds.), Proceedings of the 23rd West Coast Conference on Formal LInguistics, 101–125. Somerville, MA: Cascadilla Press.Suche in Google Scholar

Koopman, Hilda. 2000. The syntax of specifiers and heads. London: Routledge.10.4324/9780203171608Suche in Google Scholar

Koster, Jan. 1987. Domains and dynasties. Dordrecht: Foris.10.1515/9783110808520Suche in Google Scholar

Lee, Hyunjung. 2018. Generalized complementizer-trace effects in Gradient Harmonic Grammar: Deriving extraction asymmetries. Paper persented at the 40th DGfS Meeting University of Stuttgart, 7–9 March 2018.Suche in Google Scholar

Legate, Julie Anne. 2014. Voice and v: Lessons from Acehnese. Cambridge, MA: MIT Press.10.7551/mitpress/9780262028141.001.0001Suche in Google Scholar

Legendre, Géraldine, Colin Wilson, Smolensky Paul, Kristin Homer & William Raymond. 2006. Optimality in syntax II: Wh-questions. In Paul Smolensky & Géraldine Legendre (eds.), The harmonic mind, vol. 2, 183–230. Cambridge, MA: MIT Press.Suche in Google Scholar

Mahajan, Anoop. 1992. The specificity condition and the CED. Linguistic Inquiry 23. 510–516.Suche in Google Scholar

Manzini, Rita. 1995. From merge and move to form dependency. UCLWPL 7. 323–345.Suche in Google Scholar

Marantz, Alec. 1995. ‘Cat’ as a phrasal idiom: Consequences of late insertion in distributed morphology. Ms. Cambridge, MA: MIT.Suche in Google Scholar

McCarthy, John. 2010. An introduction to Harmonic Serialism. Language and Linguistics Compass 4. 1001–1018.10.1111/j.1749-818X.2010.00240.xSuche in Google Scholar

McCarthy, John & Joe Pater (eds.). 2016. Harmonic grammar and Harmonic Serialism. Sheffield: Equinox.Suche in Google Scholar

Müller, Gereon. 1991. Abstrakte Inkorporation. In Susan Olsen & Gisbert Fanselow (eds.), DET, COMP und INFL, 155–202. Tübingen: Niemeyer.10.1515/9783111353838.155Suche in Google Scholar

Müller, Gereon. 1995. A-bar syntax. Berlin & New York: Mouton de Gruyter.Suche in Google Scholar

Müller, Gereon. 1998. Incomplete category fronting. Dordrecht: Kluwer.10.1007/978-94-017-1864-6Suche in Google Scholar

Müller, Gereon. 2000. Das Pronominaladverb als Reparaturphänomen. Linguistische Berichte 182. 139–178.Suche in Google Scholar

Müller, Gereon. 2011. Constraints on displacement: A phase-based approach. Language faculty and beyond, vol. 7. Amsterdam & Philadelphia: John Benjamins.10.1075/lfab.7Suche in Google Scholar

Müller, Gereon. 2019. The third construction and strength of C: A Gradient Harmonic Grammar approach. In Ken Ramshøj Christensen, Johanna Wood & Henrik Jørgensen (eds.), The sign of the V – Papers in honour of Sten Vikner, 419–448. Aarhus: Aarhus University.10.7146/aul.348.107Suche in Google Scholar

Murphy, Andrew. 2017. Cumulativity in syntactic derivations. Universität Leipzig dissertation.Suche in Google Scholar

Newmeyer, Frederick. 1986. Linguistic theory in America. Chicago: The University of Chicago Press.10.1163/9789004454040Suche in Google Scholar

Newmeyer, Frederick. 2000. Language form and language function. Cambridge, MA: MIT Press.Suche in Google Scholar

O’Grady, William. 1998. The syntax of idioms. Natural Language and Linguistic Theory 16. 279–312.10.1023/A:1005932710202Suche in Google Scholar

Osborne, Timothy, Michael Putnam & Thomas Groß. 2012. Catenae: Introducing a novel unit of syntactic analysis. Syntax 15. 354–396.10.1111/j.1467-9612.2012.00172.xSuche in Google Scholar

Ott, Dennis. 2011. A note on free relative clauses in the theory of phases. Linguistic Inquiry 42. 183–192.10.1162/LING_a_00036Suche in Google Scholar

Pater, Joe. 2009. Weighted constraints in generative linguistics. Cognitive Science 33. 999–1035.10.1111/j.1551-6709.2009.01047.xSuche in Google Scholar

Pater, Joe. 2016. Universal grammar with weighted constraints. In John McCarthy & Joe Pater (eds.), Harmonic grammar and harmonic serialism, 1–46. Sheffield: Equinox.Suche in Google Scholar

Pesetsky, David & Esther Torrego. 2006. Probes, goals and syntactic categories. In Yukio Otsu (ed.), Proceedings of the seventh Tokyo conference on pycholinguistics, 25–60. Tokyo: Hituzi Syobo.Suche in Google Scholar

Prince, Alan & Paul Smolensky. 1993. Optimality theory: Constraint interaction in generative grammar. Book ms. Rutgers University.Suche in Google Scholar

Prince, Alan & Paul Smolensky. 2004. Optimality theory: Constraint interaction in generative grammar. Oxford: Blackwell.10.1002/9780470759400Suche in Google Scholar

Pustejovsky, James. 1995. The generative lexicon. Cambridge, MA: MIT Press.10.7551/mitpress/3225.001.0001Suche in Google Scholar

Putnam, Michael & Lara Schwarz. 2017. Predicting the well-formedness of hybrid representations in emergent production. Ms. Penn State University.Suche in Google Scholar

Riemsdijk, Henk van. 1978. A case study in syntactic markedness: The binding nature of prepositional phrases. Dordrecht: Foris.Suche in Google Scholar

Ross, John. 1973a. A fake NP squish. In Charles-James, N. Bailey & Roger Shuy (eds.), New ways of analyzing variation in English, 96–140. Washington, DC: Georgetown University Press.Suche in Google Scholar

Ross, John. 1973b. Nouniness. In Osamu Fujimura (ed.), Three dimensions of linguistic research, 137–257. Tokyo: TEC Company Ltd.Suche in Google Scholar

Ross, John. 1975. Clausematiness. In Edward Keenan (ed.), Formal semantics of natural language, 422–475. New York & Cambridge: Cambridge University Press.10.1017/CBO9780511897696.027Suche in Google Scholar

Sag, Ivan, Philip Hofmeister & Neal Snider. 2009. Processing complexity in subjacency violations: The complex noun phrase constraint. In CLS 43(1). 213–227.Suche in Google Scholar

Sauerland, Uli. 1995. Review of ‘A-bar Syntax’ by Gereon Müller. Glot International 1(8). 14–15.Suche in Google Scholar

Schmellentin, Claudia. 2006. PP-Extraktionen: Eine Untersuchung zum Verhältnis von Grammatik und Pragmatik. Tübingen: Niemeyer.10.1515/9783110921229Suche in Google Scholar

Schwarz, Lara. 2020. Accepting our mistakes: How variation completes the linguistic puzzle. Paper presented at the 42nd Annual Conference of the German Linguistic Society (DGfS) 4–6 March 2020, Hamburg.Suche in Google Scholar

Smolensky, Paul. 2017. Gradient representations. Tutorial given at Universität Leipzig, November 10–12, 2017.Suche in Google Scholar

Smolensky, Paul & Geraldine Legendre. 2006. The harmonic mind. Cambridge, MA: MIT Press.Suche in Google Scholar

Smolensky, Paul & Matthew Goldrick. 2016. Gradient symbolic representations in grammar: The case of French Liaison. Rutgers Optimality Archive (ROA) 1286. http://roa.rutgers.edu/article/view/1552.Suche in Google Scholar

Staudacher, Peter. 1990. Long movement from verb-second-complements in German. In Günter Grewendorf & Wolfgang Sternefeld (eds.), Scrambling and barriers, 319–339. Amsterdam & Philadelphia: John Benjamins.10.1075/la.5.17staSuche in Google Scholar

Stefanowitsch, Anatol. 2009. Bedeutung und Gebrauch in der Konstruktionsgrammatik. Wie kompositionell sind modale Infinitive im Deutschen? Zeitschrift für Germanistische Linguistik 37. 565–592.10.1515/ZGL.2009.036Suche in Google Scholar

Sternefeld, Wolfgang. 1991. Syntaktische Grenzen: Chomskys Barrierentheorie und ihre Weiterentwicklungen. Opladen: Westdeutscher Verlag.10.1007/978-3-322-97025-1Suche in Google Scholar

Trissler, Susanne. 1993. P-Stranding im Deutschen. In Franz-Josef d’Avis (ed.), Extraktion im Deutschen I: (Arbeitspapiere des SFB 340 number 34), vol. 34, 247–291. Stuttgart & Tübingen: SFB 340.Suche in Google Scholar

Urk, Coppe van. 2015. A uniform syntax for phrasal movement: A Dinka Bor case study. Cambridge, MA: MIT dissertation.Suche in Google Scholar

Webelhuth, Gert. 1988. A universal theory of scrambling. In Victoria Rosen (ed.), Papers from the 10th Scandinavian conference on linguistics, vol. II, 284–298. Dept. of Linguistics and Phonetics, University of Bergen.Suche in Google Scholar

Webelhuth, Gert. 1992. Principles and parameters of syntactic saturation. Oxford: Oxford University Press.10.1093/oso/9780195070415.001.0001Suche in Google Scholar

Received: 2020-03-12

Accepted: 2021-08-12

Published Online: 2022-07-21

Published in Print: 2022-09-27

This work is licensed under the Creative Commons Attribution 4.0 International License.

Artikel in diesem Heft

https://doi.org/10.1515/ling-2020-0049

Schlagwörter für diesen Artikel

frequency; gradient harmonic grammar; islands; minimalism; Optimality Theory

Creative Commons

BY 4.0