Nonparametric Identification in Nonseparable Duration Models with Unobserved Heterogeneity

Petyo Bonev

doi:10.1515/jbnst-2024-0001

Artikel Open Access

Nonparametric Identification in Nonseparable Duration Models with Unobserved Heterogeneity

Petyo Bonev

Veröffentlicht/Copyright: 1. September 2025

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen Erkunden Sie dieses Fachgebiet

Aus der Zeitschrift Jahrbücher für Nationalökonomie und Statistik

Abstract

This paper studies nonparametric identification of nonseparable duration models with unobserved heterogeneity. The models considered here are nonseparable in two ways. First, genuine duration dependence is allowed to depend on observed covariates. Second, observed and unobserved characteristics may interact in an arbitrary way. Identification is shown for a comprehensive account of settings. In particular, identification is shown in single-spell models with and without time-varying covariates, in multiple-spells models with shared frailty and lagged duration dependence, in single-spell and multiple-spell competing risks models, and in treatment effects models where treatment is assigned during the individual spell in the state of interest.

Keywords: duration models; identification; unobserved treatment heterogeneity; nonseparable models; competing risks; treatment effect;

JEL Classification: C14; C41; J64

1 Introduction

A major strategy for identification in duration models is to impose that the hazard function of the duration variable is multiplicatively separable with respect to genuine duration dependence, observed and unobserved covariates. The mixed proportional hazard (MPH) model, which incorporates this assumption, is the most commonly used duration model in the econometric literature, Hausman and Woutersen (2008). Separability is a powerful source of identifying variation. It can be used to distinguish between genuine duration dependence and spurious duration dependence caused by dynamic selection. Furthermore, in models with competing risks, separability has been used to break the nonidentification result of Tsiatis (1975).

Despite these appealing properties, multiplicative separability has two major drawbacks. First, separating covariates from duration dependence is difficult to justify with economic theory. In job search models, for example, this is only justified with myopic agents and under very particular parametric assumptions, see Van den Berg (2001). These theoretical arguments are reinforced by recent experimental studies that find that duration dependence varies in a complex, nonproportional way with observed covariates, see Eriksson and Rooth (2014), Farber et al. (2016) and Kroft et al. (2013). Second, separating observed and unobserved characteristics implies that the effect of an observed covariate is the same for all individuals with identical observed characteristics. This assumption, however, is often too restrictive, Jun et al. (2016). While unobserved effect heterogeneity has been addressed by numerous studies in the context of regression models (see e.g. Matzkin 2007), it has not been considered in duration context.

The objective of this paper is to address the drawbacks of fully separable models. I study identification of hazard models of the type

(1) θ ( t ∣ X , V ) = λ ( t , X ) r ( X , V ) ,

where θ is the structural hazard function of a duration random variable and X and V represent observed and unobserved individual characteristics, respectively. These models are nonseparable in the sense that they allow for (i) arbitrary observed heterogeneity in the “true” duration dependence (through the function λ) and (ii) arbitrary interaction between observed and unobserved covariates (through the function r). The functions λ and r are unknown to the econometrician and are not assumed to belong to a parametric family.

This paper shows identification strategies for a comprehensive account of empirically relevant settings. In particular, identification is shown in single-spell models with and without time-varying covariates, in multiple-spell models with shared frailty and lagged duration dependence, in single-spell and multiple-spell competing risks models, as well as in treatment effects models where treatment is assigned during the individual spell in the state of interest.

Identification in this study relies on two main strategies. The first one is based on the assumption that there is one special regressor whose effect on the individual hazard is homogeneous w.r.t. unobserved characteristics. No assumption is imposed on the genesis of this regressor. Furthermore, this regressor is allowed to impact the weeding out process (i.e. the change of the conditional distribution of V given X over time) in an unrestricted way. This paper is the first to show identification in single-spell hazard models with interaction between observed and unobserved characteristics. Other single-spell nonparametric studies rely either on the fully separable structure of the MPH model, Elbers and Ridder (1982), Heckman and Singer (1984), Ridder (1986), Horowitz (1999) and Chiappori et al. (2015), or at least on the multiplicative separability of the unobserved heterogeneity as in the Mixed Hazard (MH) model, Brinch (2007) and McCall (1994).

The second strategy is based on multiple spells and a mild fixed-effects assumption that restricts unobserved heterogeneity to enter the hazard of at least two subsequent spells in the same way. The model is very general and does not require a finite mean of the unobserved heterogeneity. Moreover, previous spells are allowed to impact the hazard of subsequent spells (a so-called lagged duration dependence). Thus, these multiple-spell models can be interpreted as a hazard version of a dynamic panel model with fixed-effects. Importantly, the lagged duration effect is allowed to be arbitrary heterogeneous with respect to observed characteristics. This assumption is motivated by recent empirical findings in the unemployment duration literature, see e.g. Cockx and Picchio (2013). In the context of unemployment duration, lagged duration dependence, sometimes also referred to as scarring effects, is an object of interest on its own and its analysis has a long tradition, Heckman and Borjas (1980). The models in this paper also nest a shared frailty model, which is widely used in demographics and appropriate when multiple individuals from the same group share unobserved characteristics, Hougaard (2000). Existing multiple-spells studies either rely on fully separable variation in the observed heterogeneity, Honoré (1993), or at least impose separability of the unobservables, Frijters (2002) and Picchio (2012).^[1]^,^[2] I show that the fixed-effects assumption is sufficient to avoid these restrictive assumptions. In particular, the fixed-effects assumption is used to derive a hazard analogue of a difference-in-differences approach.

Next, identification of nonseparable competing risks models is studied. Competing risks models arise naturally in unemployment duration context when more than one exit destination is possible, see e.g. Kyyrä and Ollikainen (2008) for an empirical example. Since only the first exit is observed, the marginal distributions are in general not identified. This paper shows that the nonseparable hazard structure (1) leads to identification when either there are multiple spells and the fixed-effects assumption is satisfied or when the researcher has access to a special regressor of the type discussed above. Previous studies have largely relied on the MPH structure, see Heckman and Honoré (1989), Abbring and Van Den Berg (2003a), Colby and Rilstone (2004) and Horny and Picchio (2010). One exception is the paper by Lee and Lewbel (2013) which nonparametrically identifies a bivariate accelerated failure time competing risk model.

This paper also contributes to the literature on treatment effects in the context of duration models. Much of this literature is discussed by Beyhum et al. (2024). In many studies, treatment is modeled as a binary variable, and is assumed that the treatment status realizes upon individuals entering into the state of interest (Blanco et al. 2020; Kastoryano and Beyhum 2020; Sant’Anna 2021). This assumption precludes many problems that are endemic to duration models. Most importantly, this assumptions states that the treatment status is observed. This assumption may be too restrictive for many cases such as unemployment search assistance, in which the treatment may realize at any time of the elapsed duration and thus will be censored if an individual exists unemployment prior to treatment, and, which makes identification more complicated, censoring is endogenous since the exit is co-determined by individual unobserved characteristics. Recent papers that deal with endogenous censoring of a dynamic treatment are for example Van den Berg et al. (2020a, 2020b) and Beyhum et al. (2024). A common assumption of these papers, as well as of the literature that assumes the treatment to be binary and realized upon inflow, is that the effect of the treatment does not depend on the duration of exposure to the treatment. In that sense, the effect is assumed to realize instantaneously. This assumption is technically important. Relaxing it requires identifying both the hazard function of the time to treatment from endogenously censored data and the duration dependence on the time spent in treatment. A binary treatment that realizes right at the beginning of a spell avoids these challenges. However, this assumption may also be restrictive in many important cases. The success of an Active Labor Market Policy such as training will depend on how long an individual is trained (Vooren et al. 2019). The success of many psychological therapies may depend on the length of exposure to the therapy (American Psychological Association 2017). The seminal paper by Abbring and Van Den Berg (2003b) allows the time to treatment to be endogenously censored and also allows that the treatment effect depends on the exposure to the treatment. The cost of doing so is that it is necessary to identify the full model, i.e. the underlying hazard functions of two duration variable (main duration and time to treatment). To that end, Abbring and Van Den Berg (2003b) have relied on the separability of the MPH model. This paper generalizes their results to a nonseparable context. Specifically, drawing on the results for nonseparable competing risks models shown here, the paper demonstrates that the treatment is identified if the time to treatment and the time to exit follow an augmented bivariate nonseparable duration model based on (1). The major cost for this generality is that the paper falls short of suggesting estimation procedures.

In all models discussed in this paper, identification of the function r draws on insights from the literature on nonadditive random functions, see Matzkin (2003) and Chesher (2007). In particular, r is assumed to be strictly increasing in the unobserved component. This is a natural generalization of the MPH assumption r(x, v) = id(v). Important MPH features such as spurious negative duration dependence in the empirical hazard translate to the general nonseparable setup in a straightforward way. Thus, this paper provides a link between the literature on identification in duration mixture models and the literature on identification in nonseparable regression models.

Nonparametric estimation of nonseparable duration models combines two challenges. First, nonparametric estimation of duration models, in particular the mixed proportional hazard model, is nontrivial. Second, nonseparable econometric models in general are hard to estimate (reference are given below). This paper restricts its focus to identification, which is also the major limitation of the paper.

The paper is structured as follows. In Section 2, the nonseparable model (1) is formally introduced and motivated. Identification results in single- and multiple spell models are presented in Section 3. Section 4 presents identification of competing risks models and Section 5 presents identification of duration treatment effect models. Section 6 concludes. All proofs and further supplementary results are in the Online Appendix.

2 Model and Motivation

For illustrative purposes, we build our exposition on a labor market example.^[3] Suppose that n unemployed individuals are searching for a job. Denote by T_i, i = 1, …, n the duration of unemployment of individual i, with T₁, …, T_n assumed to be independent and identically distributed. Let θ(t) be the unconditional hazard of T_i at elapsed length of unemployment t ≥ 0, θ ( t ) = lim h → 0 P { T ∈ [ t , t + h ) | T ≥ t } / h (the index i is omitted whenever this is possible). Let the random vector X_i represent observed characteristics of individual i that impact the duration T_i. X_i is assumed to have realized (just) prior entry into unemployment, so that its value is determined at t = 0. Typical examples are wage and experience in the preceding job spell, highest degree of education obtained until the moment of inflow into unemployment and gender. The realizations of X_i, denoted by small letters x, are assumed to be elements of a set X ⊂ R k , where k is a positive integer. By way of definition, X_i is time-constant. Time-varying covariates are considered in Section 3.2. Further, let V_i be a one-dimensional nonnegative unobserved random variable. V_i represents a single index of all unobserved individual characteristics that impact T_i. A typical example for factors contained in V_i is noncognitive skills. Analogously to the definition of X_i, V_i is required to be time-constant and determined before the start of the individual spell.

Henceforth, θ(t|X, V) denotes the individual hazard with notation in analogy to conditional probabilities. Conditionally on X and V, it is assumed fully specified. The empirical (or observed) hazard is denoted by θ(t|X). An important property of mixture hazard models (i.e. of hazard models allowing for unobserved heterogeneity V) is that T is a random variable even conditionally on realizations of X and V. The residual randomness, referred to as “the effect of luck” by Lancaster (1979), has been only scarcely discussed in the literature and has no established meaning. Lancaster (1990) interprets it as some intrinsic individual uncertainty, whereas Heckman (1991) distinguishes between characteristics known to the individual which are captured by V, and characteristics that are unknown to the individual and subsumed by the residual randomness. Decisions of agents are therefore based solely on the value of X, V. Heckman (1991) acknowledges, however, that this distinction has an arbitrary character. This paper follows both Lancaster (1979) and Heckman (1991) and assumes that the residual randomness is due to idiosyncratic factors that are independent of the factors determining the decisions of the agents. This interpretation is compatible with the well-known fact that the transformed duration Θ ( T | X , V ) ≔ ∫ 0 T θ ( t | X , V ) is independent of X and V, see e.g. Lancaster (1990).

With these preliminaries, consider the following model, which is referred here to as the generalized mixed hazard (GMH) model:

(2) θ ( t | X , V ) = λ ( t , X ) r ( X , V )

The functions λ and r are unknown to the econometrician. Two important special cases are the mixed proportional hazard (MPH) model,

(3) θ ( t | X , V ) = λ MPH ( t ) θ 0 ( X ) V

and the mixed hazard (MH) model,

(4) θ ( t | X , V ) = λ MH ( t , X ) V .

Both the MPH and MH models impose multiplicative separability of V. As observed by Lancaster (1985) and Chesher (2002), multiplicative separability of V effectively reduces the number of sources of stochastic variation from 2 to 1. The GMH model, on the contrary, allows X and V to interact through the function r in an arbitrary way. Furthermore, similarly to the MH model, the GMH model allows the duration dependence λ to depend on observed covariates.

Section A in the Appendix briefly discusses the relation of the GMH model to other existing models. In a nutshell, the accelerated failure time model and its generalizations the generalized AFT by Ridder (1990) and the extended generalized AFT by Brinch (2011) are not nested in the GMH model. Conversely, the GMH model is also not nested in those models.

The following examples motivate the GMH model.

Example 1: unobserved treatment heterogeneity. Van den Berg and Van der Klaauw (2006) evaluate the effect of counseling and monitoring on the re-employment chances of unemployed workers within a social experiment in the Netherlands. At inflow into unemployment, workers are either assigned at random to counseling and monitoring (Z = 1) or to a control group with no such services (Z = 0). The model estimated in their paper is the MPH model

θ ( t | X , Z , V ) = λ ( t ) exp { X β + δ Z + ln V }

where X denotes a list of controls. Separability of Z and V implies that the effect of the training does not depend on unobserved noncognitive abilities such as locus of control or on intrinsic motivation. However, recent empirical and theoretical evidence suggests that individuals with higher levels of locus of control tend to make higher effort when searching for a job Caliendo et al. (2015).^[4] This finding suggests that individuals with higher locus of control may benefit more from the counseling and monitoring process through more active participation. The simplest model that can capture this relationship is

(5) θ ( t | X , Z , V ) = λ ( t ) exp { X β + δ Z + γ Z ln V + ln V } .

This is a hazard version of the location-scale model in quantile regression, see He (1997). The coefficient γ captures the unobserved effect heterogeneity of Z.

Example 2: heterogeneous duration dependence. The baseline hazard function λ_MPH in model (3) represents the genuine duration dependence of the hazard. A commonly discussed reason for negative duration dependence of the unemployment hazard is stigma. Stigma occurs when the willingness of employers to hire unemployed individuals decreases with increasing spell of unemployment (Nüß 2018). For example, screening models predict that employers use the length of the current unemployment spell as a productivity signal, see e.g. Lockwood (1991). Recent experimental studies use fictitious job applications to actual job postings to analyze the effect of unemployment duration on the call-back rate. These studies provide evidence that the stigma-driven duration dependence might depend in a complex way on individual characteristics, in particular on the level of education and experience, as well on the type of occupation. As an example, Eriksson and Rooth (2014) find large drops of call-back rates over time for low- and medium-skilled occupations but little to no effect for high skilled jobs. Similar effects are found by Weber (2014). Farber et al. (2016) find that longer employment histories compensate for long ongoing unemployment spells, thus reducing or even eliminating stigma effects. Kroft et al. (2013) find significant heterogeneity of the duration dependence across levels of tightness of the local labor market. These empircial findings cannot be incorporated in a separable model. In particular, separability of λ and X implies that individual characteristics only shift the level of duration dependence.

The nonseparable duration dependence λ(t, X) in model (2), on the contrary, allows for a flexible interaction between time and observed characteristics. As an example, consider the Weibull specification

(6) λ ( t , x ) = λ 0 λ 1 ( x ) t λ 1 ( x ) − 1

with a scale parameter λ₀ and a shape parameter λ₁ whose value is now allowed to depend on the value of the observed characteristics.^[5] The findings of Farber et al. (2016) can be modeled as λ₁ < 1 for shorter employment histories and λ₁ = 1 for long employment histories. Alternatively, flexibility can be achieved in a piecewise-constant bazeline hazard, in which the coefficients depend on x.

Example 3: heterogeneous measurement error. Consider a case in which the duration variable is measured with error, see e.g. Abrevaya and Hausman (1999). Common reasons for measurement errors in duration context are time aggregation, Bergström and Edin (1992), as well as under-reporting in retrospective surveys, Mathiowetz and Duncan (1988) and Aït-Sahalia (2007). Lancaster (1985) shows that a multiplicative measurement error in the duration variable, together with a Weibull specification of the latent baseline hazard, lead to a MPH model, with the multiplicative V resulting from a measurement error. His model imposes that the measurement error does not depend on observed covariates. In general, however, the measurement error will depend on demographic characteristics, Bound et al. (1989). Accounting for that possibility naturally leads to a nonseparable model, which is now demonstrated. Denote by T* the latent duration variable which might be measured imprecisely. Assume that the observed duration T satisfies

T = g ( T * , η ) = T * ( 1 / η ) , η = k ( X , V ) .

where η is a measurement error, k is some unknown function and V╨(X, T^*). Let the conditional distribution of the latent variable follow a Weibull specification, P{T* ≤ t∣X = x} = 1 − exp{−t^αψ(x)} with α being a (unknown) scalar and ψ an unknown function. Then the individual hazard of T satisfies θ(t∣x, v) = αt^α−1ψ(x)k(x,v)^α, which is a special case of model (2) with λ(t, x) = αt^α−1ψ(x) and r(x, v) = k(x,v)^α.

Roadmap. The intuition behind the choice of models/settings in this paper is the following. The ultimate goal of the paper is to show identification of nonseparable treatment effect models, in which a treatment S is administered at random points and maybe censored by the outcome T. It will be thus unobserved for those individuals who exit the state of interest before being treated. A major insight here is that before the treatment kicks in (i.e., for realizations t, s of T, S with t < s), the bivariate hazard model of T, S is a competing risk model. Thus, identification in competing risks models need to be studied before identification in treatment effects models. In turn, the building blocks of a competing risk model are hazard models for each of the competing risks. The paper thus presents first two different strategies for identification of hazard models of the type (2). The first strategy consists of a special regressor that is modelled separably in certain sense. The second strategy relies on multiple spells. Then the paper applies these strategies to competing risks models, and ultimately, applies the result for competing risks models to identify treatment effect models. This procedure is depicted in Figure 1.

Figure 1:

Overview of identification models and strategies.

3 Identification Under Random Right Censoring

3.1 Two Basic Assumptions

Denote by F_T,X and F_X,V the distributions of (T, X) and (X, V), respectively, and by G the marginal distribution of V. In this section, T is assumed to be observed in the sense that its survival function S(t|x) = P(T > t|X = x) can be consistently estimated. This definition of observability allows for random right-censoring. Furthermore, following a convention in the analysis of identification, the paper ignores sample error: the distribution of the observables F_T,X is assumed to be known to the econometrician, see e.g. Lewbel (2019). We call a tripple S = ( λ , r , F X , V ) that satisfies (2) a model structure. Each structure implies exactly one distribution F_T,X. Two structures are observationally equivalent if they imply the same distribution F_T,X. A feature of a structure, say λ, is identified under a set of assumptions, if, under these assumptions, the value of the feature does not vary among any set of observationally equivalent structures, Chesher (2003). We also say that model (2) is identified under a given set of assumptions, if no two structures S , S ′ , S ≠ S ′ that satisfy these assumptions are observationally equivalent.

Consider the following two assumptions.

Assumption A1: the function r : X × R + → R is (i) nonnegative and (ii) strictly increasing in its second component.

A1(i) ensures that the hazard function is nonnegative. A1(ii) is borrowed from the literature on nonseparable regression models, Matzkin (2003), where it simply implies a monotonic relationship between unobservables and outcomes. In the case of duration models, however, it has an important additional implication. Good risks (individuals with high values of V) have a higher exit rate out of unemployment for every fixed t and X than bad risks (low values of V). Thus, the distribution of V changes over time t spent in state of interest (e.g., unemployment a certain disease, etc.), with the proportion of bad risks increasing. This process, called weeding out, Lancaster (1979), or dynamic selection, Van den Berg (2001), creates a spurious negative duration dependence of the hazard. Formally, the semi-elasticity of the observed hazard of the GMH model with respect to time fulfills

∂ ln θ ( t | x ) ∂ t = ∂ ln λ ( t , x ) ∂ t − Var [ V x | T ≥ t , X = x ] E [ V x | T ≥ t , X = x ] λ ( t , x ) ,

where we set V_x ≔ r(x, V) and Var denotes the variance of a random variable. The above equality implies that ∂ ln θ ( t | x ) ∂ t < ∂ ln λ ( t , x ) ∂ t . Put in words, for each subgroup of individuals characterized by a given value x, observed duration dependence is more negative than the true duration dependence. A1 is trivially fulfilled for the MPH and MH models.

To describe the set of observationally equivalent models under A1, assume that the distribution G is strictly increasing and denote by G⁻¹ its inverse. Define the function r ¯ : X × R + → R , r ¯ ( x , y ) = r ( x , G − 1 ( y ) ) . Furthermore, let V ¯ = G ( V ) . Then V ¯ is uniformly distributed on [0, 1], r ¯ is nonnegative and strictly increasing in its second argument, and it holds

(7) r ¯ ( x , V ¯ ) = r ( x , V )

for each x ∈ X . Thus, a normalization assumption is necessary.

Assumption A2: the unobserved random variable V is (i) uniformly distributed on the interval [0, 1] and (ii) independent of X.

Assumption A2 (i) is similar to normalization assumptions in nonseparable regression context, see Matzkin (2003). Assumption A2 (ii) is a standard assumption in the literature on hazard mixture models. Existing single-spell identification results crucially depends on it. Honoré (1993) shows that multiple-spell MH models are identified without imposing A2 (ii). This point is discussed in Section 3.3 below.

Example 1 (unobserved treatment heterogeneity) continued. The scale-location model (5) satisfies assumption A1 under the restriction γZ ≥ 0.

Under A1 and A2, identification boils down to retrieving λ and r from the data. It is clear, however, that this cannot be achieved without further assumptions. There are two reasons for this. First, without further restrictions on λ and r, it is possible to shift separable components depending on X between λ and r without changing the DGP. As an example, the structures (λ₁, r₁), (λ₂, r₂) with

(8) λ 1 ( t , x ) = ψ ( t ) exp { β 1 x 1 + β 2 x 2 } , r 1 ( x , v ) = exp β 3 x 3 + v

(9) λ 2 ( t , x ) = ψ ( t ) exp { β 1 x 1 } , r 2 ( x , v ) = exp β 2 x 2 + β 3 x 3 + v

are observationally equivalent. Second, assume for the moment that r does not depend on x, so that the problem of shifting components of x does not exist. The following lemma states the (well understood) fact that the MH model is not identified without further assumptions.

Lemma 1.

Assume that r(x₁, V) = r(x₂, V) = r(V) for any two x 1 , x 2 ∈ X and define V ¯ = r ( V ) . Denote by G ¯ the distribution of V ¯ . Then, there exists a generalized hazard λ ˜ and a distribution G ˜ of a nonnegative random variable V ˜ , such that the structures ( λ , G ¯ ) and ( λ ˜ , G ˜ ) are observationally equivalent.

Thus, identification is hampered by two distinct problems. The first one arises from the interplay of λ and r and the second one from the MH model. We present solutions to these problems in the next sections.

3.2 Nonparametric Identification in Single-Spell Models

3.2.1 Identification Under a Special Regressor Assumption

At the core of the single-spell approach presented in this paper is the assumption that at least one regressor, say X₁, can be excluded from the function r.

Assumption A3. There exists a random subvector X₁ with dimension d₁ ≥ 1 and domain X 1 , such that X = (X₁, X₂) has realizations in X = X 1 × X 2 ⊂ R d 1 × R d 2 and

(10) r ( x 1 , x 2 , v ) = r x 1 * , x 2 , v

for all x 1 , x 1 * ∈ X 1 , x 2 ∈ X 2 and all v ∈ R + .

A3 justifies the notation r(x₂, V). Unlike the special regressor assumption of Lewbel (1998), the special treatment here relates to the effect heterogeneity and not to the joint distribution of observables and unobservables. Therefore, knowledge of the selection process alone is not sufficient for motivating A3. One approach to discipline the model choice is to rely on evidence on treatment effect heterogeneity provided by studies with particularly rich numbers of covariates. The following example illustrates this point.

Example 1 (unobserved treatment heterogeneity) continued. In the context of example 1, suppose the researcher assumes that the major unobservable characteristics behind V are some characteristics of the case worker assigned to the individual. Evidence for interaction of labor market treatments and characteristics of the case worker can be found for example in Knaus et al. (2017). They use flexible estimators and a rich administrative dataset with Swiss unemployed to study the effect heterogeneity of job search programs. The study finds no heterogeneity of the effect with respect to most of the case worker characteristics. For example, the estimated effects are homogeneous with respect to tenure, age and gender of the case worker, as well as with respect to whether case worker and the unemployed have the same gender. These results could be used to make an informed model choice when case worker characteristics are unobserved.

For the further discussion, the following additional notation is necessary. For a fixed x₂, define Λ x 2 : = Λ ( . , . , x 2 ) and V x 2 ≔ r ( x 2 , V ) . Further, let G x 2 and L x 2 be the distribution of V x 2 and the corresponding Laplace transform.

To characterize identification under assumption A3, consider the x₂-strata model ( Λ x 2 , G x 2 ) , where x₂ is a given value of X₂. This sub-model can be identified using known results for the identification of the MH model. Such a procedure is common in empirical studies and not specific to duration models: models for subpopulations, indexed by the values of some regressor, are identified and estimated separately. As an example, researchers almost always estimate unemployment duration models separately for men and women.

However, it is not clear that this procedure says anything about the causal effect of the variable used to index the subpopulations. As an example, studies on stigma effects within groups with different education levels are silent on the effect of education on stigma, see example 2 above for references. The following proposition states that under assumptions A1, A2 and A3 (and under additional regularity assumptions), identifying the submodels ( Λ x 2 , G x 2 ) can be used to identify the function r – and thus the causal effect of the indexing variable X₂.

Proposition 1:

(Identification with a special regressor). Suppose that model (2) holds.

(i) If λ is known and the moment generating function of r(x, V) is finite for all x ∈ X , then under assumptions A1 and A2, the function r is identified.

(ii) Suppose that assumptions A1–A3 are satisfied and that the moment generating function of r(x₂, V) is finite for all x 2 ∈ X 2 . Then model (2) is nonparametrically identified with single spells if and only if for each x 2 ∈ X 2 the corresponding single-spell x₂-strata MH model structure ( λ x 2 , G x 2 ) is nonparametrically identified.

To interpret part (i) of Proposition 1, transform model (2) in the following way

(11) ln ∫ 0 T θ ( t | X , V ) d t = ln ∫ 0 T λ ( t , X ) d t + ln r ( X , V ) .

By rearranging, we obtain

(12) Y = k ( X , V ) + ε ,

where Y ≔ ln ∫ 0 T λ ( t , X ) d t , k(X, V) ≔ −ln r(X, V) and ε : = ln ϵ : = ln ∫ 0 T θ ( t | X , V ) d t . Y is observed by assumption. The error term ϵ is a unit exponential random variable and is independent of X and V.^[6] Note that the identification result remains valid under any known distribution of ϵ as long as its moment generating function is finite. Thus, Proposition 1 implies that the identification result of Matzkin (2003) continues to hold if we add a second source of unobserved stochastic variation with a known distribution. Hence, the doubly-stochastic regression model (12) with a known (or identified) transformation behaves as a standard nonseparable regression model. The assumptions needed for identification are identical in those two cases. The distribution of k(X, V) can be deduced from the distributions of Y and ϵ. Under the normalization assumption A2 (i), knowledge of the distribution of k identifies k.

The important statement behind part (ii) of Proposition 1 is that only strategies that identify all submodels indexed by x₂ on their own leads (under assumption A3) to identification of λ. One implication of this result is the following. Since each strategy to identify the nonseparable model must be able to identify each strata MPH model on its own, and since each strata-MPH model requires a normalization assumption, the nonseparable model requires a set of normalization assumptions, for each x₂-strata one. I discuss this in detail in the next section.

3.2.2 Two Identification Results Using the Special Regressor Assumption

Results (i) and (ii) of Proposition 1 require the moment generating functions of ( V x 2 ) x 2 ∈ X 2 to be finite. This considerably restricts the class of models for which these results hold. As an example, it rules out the log-normal distribution and the Gamma distributions for particular parameter values.^[7]

We present now two strategies that complement assumption A3 and drop the requirement that the moment generating functions are finite. The choice of these strategies is motivated by two reasons. First, they involve well known sources of identification. The first one utilizes separable variation of at least one covariate. The second one uses time-varying covariates. Thus, practically any strategy to identify the MH model will be used to identify the nonseparable GMH model. Second, these two strategies differ on their success in identifying a competing risks model, which is discussed in Section 4 below.

Assumption A4. (i) The special regressor X₁ from assumption A3 is separable from the genuine duration dependence, that is

(13) λ ( t , X ) = μ ( t , X 2 ) ϕ ( X 1 , X 2 )

(ii) For every x 2 ∈ X 2 , the set { ϕ ( x 1 , x 2 ) : x 1 ∈ X 1 } contains a non-empty open subset of (0, ∞).

Under assumptions A3 and A4 (i), model (2) can be written as

(14) θ ( t | X , V ) = θ ( t | X 1 , X 2 , V ) = μ ( t , X 2 ) ϕ ( X 1 , X 2 ) r ( X 2 , V ) .

For a fixed X₂ = x₂, (14) reduces to an MPH model. Equation (14) can be therefore interpreted as a generalized MPH model. A4 (ii) requires that there is sufficient variation in the separable regressor X₁. This is a modification of a standard assumption in the MPH context, see e.g. assumption 3 in Elbers and Ridder (1982) and assumption 6b in Van den Berg (2001).^[8] The variation in ϕ is required for every element x₂. This requires that there is variation in X₁ for each value of X₂. In addition, A4 (ii) implies that (at least one element of) X₁ is continuous. A4 (ii) can be replaced with the following weaker assumption: the set X 1 contains at least two elements x 1 , x 1 ′ such that for every x 2 ∈ X 2 it holds ϕ ( x 1 , x 2 ) ≠ ϕ x 1 ′ , x 2 . The elements x 1 , x 1 ′ can depend on x₂.

The alternative assumption A4′ requires the separable covariate X₁ to be time-varying, i.e. the value of X₁ depends on time, and X 1 = ( X 1 ( t ) ) t ∈ R + is a stochastic process. X₂ and V are as before time-constant random variables. x₁ denotes a path of X₁ and is a deterministic function of time.

Assumption A4′. (i) X₁ is a predictable process. (ii) The hazard of T at each t depends only on the value X₁(t). (iii) For each x₂, there are two paths of X₁, z₁, z₂ with z₁(t) = z₂(t) for all t in some fixed open interval (t₀, t₁) and S{t₀∣z₁, x₂} ≠ S{t₀∣z₂, x₂}, where S(t∣x₁, x₂) is the observed survival function.

The predictability of X₁ is commonly invoked in proportional models, see e.g. Kalbfleisch and Prentice (1980) and the discussion in Section 4.2 in Van den Berg (2001). The value of X₁(t) must be known just before t. This precludes anticipation by the individual which is not observed by the econometrician. A typical example is when an individual expects a child, which might affect the current job search. If this is not observed by the econometrician, then the process is not predictable. Assumption A2 (ii) now means that V ⊥ ⊥ ( X 1 ( t ) , X 2 ) for each t. Together with predictability, this implies that the current value of X₁ “only depends on past and outside random variation”, see Van den Berg (2001) p. 3399 as well as Andersen et al. (1996). “Outside variation” means that X₁ is not determined within the model. A4′ (iii) generalizes Condition 1 in Brinch (2007).^[9] Here, both the paths z₁ and z₂, as well as the interval (t₀, t₁), are allowed to depend on x₂. A4′ (iii) excludes processes that are constant over time (such as initial endowments), or do not change value over individuals (such calendar time), or are deterministic function of time (such as years of experience or age, measured throughout the spell). These requirements make it possible to distinguish between the genuine duration dependence and the time-varying effect of X₁. In addition, processes X₁ that are fully determined by the value of X₂ are also excluded. With A4′, the hazard can be written as

(15) θ ( t ∣ X , V ) = θ ( t ∣ X 1 ( t ) , X 2 , V ) = λ ( t , X 1 ( t ) , X 2 ) r ( X 2 , V ) .

There are two important implications of (15). First, at some elapsed duration t₀, past variation of X₁ (i.e. variation at t < t₀) impacts the hazard only through dynamic selection, that is, through the distribution of r(x₂, V) at t₀. This insight provides a source of identifying variation. Second, note that the hazard function at t does not depend on the total path of X₁ but only on its value X(t). Thus, information about future values X 1 ( t ˜ ) , t ˜ > t are not allowed to impact the hazard at t, unless they are accounted for by X₁(t). This can be viewed as a no-anticipation assumption (Abbring and Van Den Berg 2003b).

Example 4: Job search. To give an example of a case with predictability and no anticipation, consider again the case, in which an individual is searching for a job, and let T be the duration until a new job is found while searching on the job and X₁(t) the number of children. In certain cases, X₁(t) might be random and depending on some extrinsic uncertainty that is beyond the domain of impact of the individual. As an example, this might be random variation in the number of children caused by the randomness of an in vitro fertilization (IVF) process.^[10] It holds X₁ (t)╨V since the IVF process is idiosyncratic and hence not dependent on factors influencing the job market history. In addition, it is plausible to assume that only current values of X₁ impact the search outcome. Past number of children impact the current search outcome only through the current number of children. Moreover, both predictability and no anticipation can be defended on the grounds that the IVF outcome is equally uncertain for the individual and the econometrician. The intentions of the individual to have additional children are known to the econometrician through inflow into the IVF register.

In addition to assumption A4 (A4′), we need the following regularities and normalization assumptions.

Assumption A5. The function λ obtains only nonnegative values. For each t ∈ [0, ∞) and each x ∈ X , Λ ( t , x ) : = ∫ 0 t λ ( w , x ) d w exists and is finite.

Assumption A6. For every x 2 ∈ X 2 , there is a known x 1 * = x 1 * ( x 2 ) ∈ X 1 and a known t* = t*(x₂) ∈ [0, ∞) such that ϕ x 1 * , x 2 = 1 and Λ t * , x 2 = 1 .

Assumption A6′. For every x 2 ∈ X 2 , there is a known t* = t*(x₂) and x 1 * = x 1 * ( x 2 ) , such that λ ( t * , x 1 * ( t * ) , x 2 ) = 1 .

Assumption A7. For every x ∈ X , the random variable r(x, V) has a finite mean.

In the context of assumption A4, assumption A5 requires μ to be integrable for each x 2 ∈ X 2 . In the context of assumption A4′, it requires the integral ∫ 0 t λ ( s , x 1 ( s ) , x 2 ) d t to exist for every finite positive t.

A6 and A6′ are scale normalization assumptions needed under A4 and A4′, respectively. Consider first A6. For each X₂ = x₂, the conditional model is a standard MPH model. A6 reduces to a standard normalization assumption, see e.g. assumption 7 in Van den Berg (2001). Without A6, μ and ϕ would be identified only up to a scale for each value x₂. The following example from Abbring (2002) demonstrates this.

Example 2 (heterogeneous duration dependence) continued. Consider the Weibull specification in equation (6) and let X₂ be equal to some specific value x₂. Under assumption 4, this specification is now equal to

(16) λ ( t , x 1 , x 2 ) = λ 0 λ 1 ( x 2 ) t λ 1 ( x 2 ) − 1 .

Assume that the function ϕ(x₁, x₂) is equal to exp{x₁β₁ + x₂β₂} with β₁, β₂ being deterministic vectors. Also, assume that for each x₂, the random variable V x 2 ≔ r ( x 2 , V ) follows a Gamma distribution with parameters that depend on the value x₂. Fixing X₂ to a given value reduces the model to an MPH model. It can be easily shown (see derivations for the MPH model in Abbring 2002) that in this case, the observed survival function S(t|X₁, X₂ = x₂) depends on the quantity λ 0 exp { x 2 β 2 } α x 2 , where α x 2 is the scale parameter of the Gamma distribution corresponding to the value X₂ = x₂. In this case, the parameters ( λ 0 , β 2 , α x 2 ) cannot be identified, only the total quantity λ 0 exp { x 2 β 2 } α x 2 . Alternatively, to identify any of the three parameters, it is necessary to make a normalization assumption on the other two quantities. This is precisely what assumption 6 does. As analogy, this is a normalization similar to the normalization in a separable regression model, which assumes that the error term has an expectation equal to 0. The value 0 is set for convenience and can be replaced by any arbitrary value. In the nonseparable hazard model, the researcher might be willing to choose t * , x 1 * independently of x₂, for example t* = x* = 0.

The merit of assumption A6′ is similar. If A6′ is dropped, the class of observationally equivalent structures (Λ, r) can be described in the following way. For any c > 0, it holds

S ( t ∣ x 1 , x 2 ) = L x 2 ( Λ x 2 ( t , x 1 ) ) = L x 2 1 c ( c Λ x 2 ( t , x 1 ) ) .

Therefore, for any value x₂, the strata MH models ( Λ x 2 , L x 2 ) and ( Λ ˜ x 2 , L ˜ x 2 ) are observationally equivalent if there exist a constant c, such that Λ ˜ x 2 = c Λ x 2 and L ˜ x 2 ( s ) = L x 2 ( 1 c s ) for every s ∈ [0, ∞). Assumption A6′ normalizes the strata MH models corresponding to different values x₂.

Finally, A7 is also a normalization assumption. Without A7, it can no longer be distinguished between duration dependence (baseline hazard) and unobserved heterogeneity in the strata models Ridder (1990). Yet, im many cases, this assumption can be justified with economic theory. As an example, in job search models, individuals with very low hazard rates (i.e., those who are exceptionally bad at finding jobs) are eventually forced to exit the labor market, go on long-term welfare, or adjust their search strategies. This prevents an infinitely persistent pool of unemployed individuals, justifying a finite mean for heterogeneity.

With these assumptions, we can state the following results.

Proposition 2.

Under assumptions A1–A3, A4, A5, A6 and A7, the GMH model (2) is identified.

Proposition 3.

Under assumptions A1–A3, A4′, A5 and A6′, the GMH model (2) is identified.

While detailed proofs are provided in the Appendix, let us give some intuition on these results. For a given x₂, define θ x 2 ( t | x 1 , v x 2 ) ≔ θ ( t | x 1 , x 2 , v ) and λ x 2 analogously. Each of the assumptions A4 and A4′ ensure that for each x₂, the corresponding x₂-strata MH model θ x 2 ( t | x 1 , v x 2 ) = λ x 2 ( t | x 1 ) v x 2 is identified (that is, that the pair ( λ x 2 , G x 2 ) is identified). A4 uses the separability of x₁ as in the MPH model, while A4′ the time variation of the covariates as in Brinch (2007).^[11] The function r is identified over quantiles of the identified distributions ( G x 2 ) x 2 ∈ X 2 .

Remark 1.

Note that under A4′, the finite mean assumption A7 is not necessary for identification. This is not surprising in view of the identification results by Honoré (1990) for the MPH model and by Brinch (2007) for the MH model.^[12]

3.3 Nonparametric Identification in Multiple-Spells Models

The identification strategy in this section relies on availability of multiple spells and on a fixed-effects assumption. Let T₁ and T₂ be two duration variables. The hazard functions of T₁ and T₂ are modeled as

(17) θ 1 ( t | X , V ) = λ 1 ( t , X ) r ( X , V )

(18) θ 2 ( t | X , T 1 = t 1 , v ) = λ 2 ( t , X ) ψ ( t 1 , X ) r ( X , V )

The hazard model for the first duration is the GMH model (2). The hazard model for the second duration is an augmented GMH model which allows T₁ to have an effect on the hazard of T₂. This effect may depend on X in an arbitrary way. Notably, λ₁, λ₂ are allowed to be different. V and the function r, on the contrary, are restricted to be the same in both spells. In the following, this restriction is referred to as “fixed effects”.

We discuss two distinct setups of interest which differ w.r.t to the a priori assumption on ψ. In both setups, both T₁ and T₂ are assumed to be fully observed.

Setup 1. Suppose that T₁, T₂ describe the random length of two spells of the same individual. The first spell is finished before the beginning of the second spell. As an example, T₁, T₂ might represent the durations of two (consecutive) unemployment spells. Alternatively, T₁ might be the length of the last employment spell and T₂ the length of the current unemployment spell. Thus, the spells are not required to be of the same type. The notation implies that X and V realize prior to the beginning of the first spell. The function ψ captures the so called lagged duration effect. Hence, model (17) and (18) can be interpreted as a hazard version of a dynamic fixed-effects panel data model (the analogy is not entirely correct though, since here T₁ and T₂ could be outcomes of different types). When T₁, T₂ represent unemployment spells, ψ is referred to as “scarring effects”. Because of the consecutive character of the spells, T₁ is fully known to the individual (and the potential employer) throughout the second spell and is thus fully “anticipated”. As a result, T₁ has an impact on θ₂ right from the beginning of the second spell.^[13] Note that random censoring of T₂ does not change the identification results below.

Identification of lagged duration models has been first considered by Heckman and Borjas (1980), Honoré (1993), and more recently by Horny and Picchio (2010) and Picchio (2012). Recent empirical studies of lagged duration dependence can be found in Doiron and Gørgens (2008), Cockx and Picchio (2013), Dorsett and Lucchino (2018), among others.

Setup 2. Suppose now that T₁, T₂ describe duration variables of two distinct individuals. The individuals are assumed to share (observed and) unobserved characteristics. Such a context may arise when T₁, T₂ describe duration outcomes of twins, of employees in the same firm, or individuals from some common background, see Hougaard (2000) for an overview and Frederiksen et al. (2007) for an application in job duration context. Importantly, since the spells are assumed “parallel”, i.e. one does not require sequential realizations, it must be also assumed that there are no cross-effects, ψ = 1. In Section 5, this assumption is relaxed. Note that random censoring of either T₁ and T₂ or both does not change the identification results below.

Following the conventions in the literature, the two setups are referred to as lagged duration dependence model (setup 1) and shared frailty model (setup 2).

Remark 2:

(A third setup). The following combination of the two setups is also considered. As in Setup 1, let T₁, T₂ represent two unemployment spells of the same individual. The experimental study of Eriksson and Rooth (2014) suggests that in certain cases employment experienced after T₁ might offset the lagged duration dependence between T₁ and T₂. In such cases, ψ can be restricted to be equal to 1. A similar empirical finding holds for the case when T₁ represents past employment spells and T₂ subsequent unemployment spells. Specifically, Doiron and Gørgens (2008) find that the length of that past employment spells T₁ does not matter for subsequent unemployment spells T₂, as long as one conditions on the dummy variable “being employed”. In our model, this can be done automatically by the sample choice when only individuals with T₁ > 0 are considered.

In the following, Λ_j(t, x) denotes the integrated hazard of T_j, Λ j ( t , x ) ≔ ∫ 0 t λ j ( s , x ) d s , j = 1, 2. Consider the following assumptions.

Assumption MS4. (i) For each j = 1, 2, the function λ_j obtains nonnegative values. λ_j(t, x) is continuous in t for each x ∈ X . Λ_j(t, x) is finite for each x. (ii) The function ψ takes only positive values and is differentiable in t for each x ∈ X .

Assumption MS5. For each x ∈ X and j = 1, 2 there exists known t j * = t j * ( x ) ∈ R + such that (i.) λ 1 t 1 * , x and Λ 2 t 2 * , x are known and (ii) ψ t 1 * , x , ∂ t ψ t 1 * , x are also known, where ∂_t denotes the derivative with respect to t.

Assumption MS4 (i) is an innocuous regularity assumption. It generalizes a standard assumption in mixture hazard models, see e.g. assumption 2 in Elbers and Ridder (1982). MS4 (i) can be relaxed by requiring that t can be varied such that Λ₂(t, x) obtains all values in an open interval. MS4 (ii) requires smoothness of ψ. This is a mild assumption. In particular, consider the case in which T₁ is modeled as any other covariate in the hazard function of T₂ (i.e. T₁ is an element of the vector X). Then, under the standard MPH specification λ(t, x) = h(t) exp xβ, assumption MS4 (ii) is fulfilled.

MS5 (i) is a generalization of a standard scale normalization assumption. Specifically, under an MPH specification, MS5 (i) is equivalent to assumption 7 in Van den Berg (2001). It corresponds to assumption A6 in the context of single spell models. Assumption MS5 (ii) is a scale normalization assumption similar to the one made in Honoré (1993). A convenient normalization is ψ(0, x) = 1 for all x, for which case no normalization of Λ₂ is needed.

We can now state the main result of this section.

Proposition 4.

(i) Under assumptions A1, MS4 and MS5, the functions λ₁, λ₂, ψ are identified. (ii) If in addition A2 (i) is satisfied, then also the function r is identified.

The intuition of this result can be stated as follows. Identification of ψ can be interpreted as a hazard version of a difference-in-differences approach, which proceeds in two steps. In a first step, the fixed-effects assumption is used to show that the difference of the marginal exit rates ln ρ ( t 1 , t 2 | x ) = ln ∂ t 1 P ( T 1 > t 1 , T 2 > t 2 | x ) − ln ∂ t 2 P ( T 1 > t 1 , T 2 > t 2 | x ) does not depend on V and r. This step corresponds to first-differencing in panel data models. Thus, the first difference ln ρ is taken with respect to different periods and eliminates the impact of the unobserved heterogeneity. The second difference ρ(t, .|.) − ρ(0, .|.) is taken with respect to different points in time of the first spell and eliminates the impact of the duration dependence λ₁ of the first hazard.

Proposition 4 is now compared to the existing literature on identification of duration models with multiple spells. I compare the following features of my result: nonseparability, heterogeneous duration dependence, whether fixed effects are assumed, whether independence between X and V is required, whether a finite mean of V is assumed, and how many spells are necessary for identification.

First and most importantly, the only other study that allows observed and unobserved factors to enter the hazard in a nonseparable way is Evdokimov (2010). His model, however, rules out lagged duration dependence.

Second, as in Picchio (2012), the lagged duration dependence ψ in model (18) is allowed to depend in an arbitrary way on observed characteristics X.

Third, a fixed-effects assumption is adopted also by Picchio (2012) and in the shared frailty model (model 1) in Honoré (1993), while Evdokimov (2010) and model 3 in Honoré (1993) allow for spell-specific unobservables.

Fourth, Proposition 4 allows X and V to be dependent. This result is of particular importance since independence is often hard to defend. Furthermore, no finite mean assumption on the unobservables V_x is required. These two features – allowing for dependence and for infinite mean of V_x – are not surprising. They follow from the fixed-effects assumptions and are shared by the other two papers that assume fixed effects. However, model 1 in Honoré (1993) does not allow for lagged duration dependence and Picchio (2012) assumes that there are recurrent data for each outcome (i.e. at least two observations for each T_i, i = 1, 2). Independence and finite mean of unobservables are also imposed by the lagged duration model (model 3) in Honoré (1993). Note that the paper of Evdokimov (2010) allows for dependence and an infinite mean of V without imposing fixed effects. The price for the generality of his model is that his identification result relies on access to at least three spells for each individual and that (as noted above) lagged duration dependence is precluded. Finally, both the model of Evdokimov (2010) and model 3 in Honoré (1993) allow the unobserved heterogeneity to differ between spells. Thus, their models are richer than a standard shared-frailty model.

Table 1 provides a summary of this discussion. Each row corresponds to a given paper and each column corresponds to a distinct feature/assumption. As an example, column 1, row 1 states that the model in Evdokimov (2010) is allowed to be nonseparable in X and V.

Table 1:

Comparison of assumptions in the literature.

Paper	(a)	(b)	(c)	(d)	(e)	(f)	(g)
Evdokimov (2010)	Yes	Yes	Yes	No	No	No	Yes
Model 1, Honoré (1993)	No	Yes	Yes	No	No	Yes	No
Model 3, Honoré (1993)	No	No	No	Yes	No	No	No
Picchio (2012)	No	Yes	Yes	Yes	Yes	Yes	Yes
This paper	Yes	Yes	Yes	Yes	Yes	Yes	No

Each column represents one assumption/aspect of identification. (a) Model is nonseparable in X and V. (b) Independence of X and V not needed. (c) No finite mean assumption on the unobservables needed. (d) Model allows for lagged duration dependence. (e) Lagged duration dependence depends on x. (f) Fixed-effects assumption adopted. (g) Identification relies on more than two spells per individual.

4 Identification Under Endogenous Censoring: The Case of Competing Risks Models

An individual might consider several destinations of the transition out of unemployment. As an example, elderly unemployed might choose to search for a new job or to withdraw from the labor force, Kyyrä and Ollikainen (2008). In addition, the researcher might want to distinguish between full-time employment, part-time employment, employment on a short-term/long-term contract, and other forms of employment. These destinations are typically modeled as competing risks. The main feature of a competing-risks model is that the duration under each risk is latent and only the duration until the first cause of exit is observed. Put differently, the duration in each destination is potentially censored by the exit in a different destination. This censoring case is more involved than the case of random right censoring considered in the previous section. Specifically, since the distribution functions of each of the competing risks might depend on the same unobserved features of the individual, censoring is potentially endogenous. Without further structure, the “observed” joint (bivariate) survival function S(t₁, t₂) is nonparametrically nonidentified, Tsiatis (1975), which greatly complicates identification of the underlying structural model. This section studies identification of competing risks models when the hazard of each of the latent variables follows the GMH model (2).

Single-spell setting. Consider a setting with two risks, A and B. A generalization to a case with more than two risks is straightforward. Denote by T_i the latent duration under risk i = A, B. For each individual in the sample, the researcher observes T ˜ ≔ min { T A , T B } and the indicator function 1 { T A < T B } which informs about the cause of exit. Thus, instead of the joint survival function P{T_A > t_A, T_B > t_B}, the econometrician “knows” the crude survival functions P{T_i > t, T_j > T_i}, i, j = A, B, i ≠ j. Denote by V_A and V_B the risk-specific unobserved characteristics that impact T_A and T_B, respectively. We assume that (X, V_i) fully determine the distribution of T_i and that T_A and T_B are conditionally independent given (X, V_A, V_B). V_A and V_B are allowed to be dependent, while X╨(V_A, V_B). This is the standard setup in competing risks models with covariates, see Heckman and Honoré (1989). The hazards θ_A and θ_B of T_A and T_B, respectively, are modeled as

(19) θ A ( t | X , V A , V B ) = θ A ( t | X , V A ) = λ A ( t , X ) r A ( X , V A )

(20) θ B ( t | X , V A , V B ) = θ B ( t | X , V B ) = λ B ( t , X ) r B ( X , V B ) .

Notably, r_i and λ_i are allowed to be risk-specific, i.e. it is not required that λ_A = λ_B or r_A = r_B.

The following result states that the GMH structure identifies the single-spell competing risk model when there is a special regressor (assumption A3) which is common for both risks A, B and when in addition this special regressor is separable from the duration dependence function λ (assumption A4).

Proposition 5.

Suppose that V_A, V_B fulfill assumptions A2 and A7, and let each of the hazards θ_A, θ_B satisfy assumptions A1, A3–A6, where A4 (ii) is replaced by the condition

A4 (ii)′: for each x 2 ∈ X 2 , the set { ( ϕ A ( x , x 2 ) , ϕ B ( x , x 2 ) ) : x ∈ X 1 } contains a nonempty open subset of R + × R + .

Then, μ_i, θ_i, r_i, i = A, B are identified from the data.

Assumption A4 (ii)′ modifies assumption A4 (ii) to the competing risks setting. It can be relaxed along the lines of the discussion of assumption A4 (ii) above. Note that identification of ϕ_A, ϕ_B does not require the variation condition A4 (ii)′ and independence of X and V_A, V_B.

Discussion on nonidentification. Unlike assumption A4, assumption A4′ cannot be used to obtain identification of the competing risks model. The following discussion should provide the intuition. Due to the censoring problem, identification relies on the identified crude survival functions. These functions vary along a single time argument t. Identification with two risk-specific unobservables, however, requires either variation in two time arguments on an open subset of R + × R + or separable variation in x. Hence, the bivariate version of the MH model considered by Brinch (2007) (and in general, of any single-spell MH model with risk-specific unobservables but without separable x-variation) is not identified. Since the MH model is nested in the GMH model, the single-spell GMH model is not identified with competing risks under assumption A4′. This implies that when there is only one spell per individual, assumptions A3 and A4 are necessary for identification.

Multiple-spells setting. Consider now a setting with competing risks and multiple spells. For i = A, B, T_i,1, T_i,2 denote two consecutive spells of the latent variable under risk i. The hazards of the latent variables are modeled as

(21) θ i 1 ( t | X , V i ) = λ i 1 ( t , X ) r i ( X , V i )

(22) θ i 2 t | X , T i 1 = t i 1 * , V i = λ i 2 ( t , X ) ψ i t i 1 * , X r i ( X , V i ) .

The researcher observes the tripple ( X , T ˜ k = min { T A k , T B k } , 1 { T A k < T B k } ) , k = 1,2 . (T_A1, T_A2) are assumed conditionally independent from (T_B1, T_B2) given X, V_A, V_B. However, conditionally on (X, V_A, V_B), T_i1 may impact T_i2. Specification (21) and (22) contains a risk-specific fixed-effects assumption. In particular, for each risk i, the unobservables V_i and the generalized error function r_i are the same for both spells. However, both the unobservables and the function r are allowed to differ between risks. Furthermore, as in the single-spell competing-risks setup, V_A and V_B are allowed to be dependent.

The following proposition states that the fixed effects assumption used to identify the multiple-spells model in the previous section leads to identification in a competing risk setting.

Proposition 6.

Assume that both (T_A1, T_A2, X, V_A) and (T_B1, T_B2, X, V_B) satisfy the conditions of Proposition 4. (i) Then for i = A, B, k = 1, 2, λ_ik and ψ_i are identified from the data. (ii) If in addition the following condition holds:

MS4 (iii): for each x ∈ X , the set

( Λ A 1 ( t , x ) + ψ A ( t , x ) Λ A 2 ( t ˜ , x ) , Λ B 1 ( t , x ) + ψ B ( t , x ) Λ B 2 ( t ˜ , x ) ) : t , t ˜ ∈ R +

contains a nonempty open subset of R + × R + , then also r_A, r_B are identified.

Proposition 6 is based on similar assumptions and a similar difference-in-differences approach as Proposition 4 and so the intuition is common for both results. In particular, the use of the fixed-effects assumption is that the ratio of derivatives of crude survival functions is free of r and V. Furthermore, distinguishing between λ_i1 and ψ_i is possible, because the ratio of the subsurvival functions of the two spells depends on ψ_i in a way that varies with the elapsed duration in the second spell, which is not the case for λ_i1. Identification of r_i is achieved after the identification of λ_i1 and ψ_i.

The closest setup to the one considered in Proposition 6 is the one studied in Horny and Picchio (2010). They use multiple spells to prove identification in a competing risk model with a lagged duration dependence. Their result generalizes the single-risk results by Honoré (1993). In the models of Honoré (1993) and Horny and Picchio (2010), separable variation in X is crucial to identify the joint distribution of the unobservables. Identification of this distribution is necessary to distinguish between the lagged duration dependence ψ_i and the genuine duration dependence in the first spell λ_i1. The separable variation of X is provided by the structure of the MPH model. Model (21) and (22) does not require such variation.

A model that uses a fixed-effects assumption in a multiple-spell MPH context is developed in Abbring and Van Den Berg (2003a). The precise assumption there is V_i1 = V_i2. However, model (21) and (22) is substantially richer. In particular, X and V can interact nonseparably and a lagged duration dependence is allowed. Both these aspects are not allowed in Abbring and Van Den Berg (2003a).

5 Nonparametric Identification of Treatment Effects when the Treatment is Assigned During the Spell

Consider a setup, in which a treatment is assigned during the ongoing spell in the state of interest. As an example, ALMPs such as subsidized temporary work in the public sector are administered during unemployment. Related setups arise in epidemiology when a given medical treatment, such as kidney transplantation, is administered during an ongoing health condition.^[14] In such cases, the effect might vary with time to and duration of exposure.

To formalize this setup, let the random variable S denote time since entry into the state of interest (e.g. unemployment) until the start of treatment exposure. Contrary to X, V, which realize prior to or at begin of the spell in the state of interest, S realizes during the spell. As before, T denotes the duration of the underlying condition. In the example above, T is the duration of unemployment and S is the random time from the begin of unemployment until the individual starts temporary work as part of an ALMP. Define the vector V ≔ (V_T, V_S), where V_T, V_S are two unobserved scalar random variables. Consider the following bivariate hazard model:

(23) θ T ( t | S = s , X = x , V ) = λ T ( t , x ) r T ( x , V T ) if t < s λ T ( t , x ) δ ( t , s , x ) r T ( x , V T ) if t ≥ s .

(24) θ S ( t | X = x , V ) = λ S ( t , x ) r S ( x , V S ) .

In equation (24), the hazard θ_S(t|X = x, V) of the treatment variable S is specified as the nonseparable model (2). θ_S depends on V only through the S-specific unobservables V_S. Equation (23) specifies the hazard of T. It is an augmented GMH model. Before the treatment is administered (t < s), this is simply the GMH model (2). After the begin of the exposure, the hazard function is modified by the component δ, which can be interpreted as a treatment effect. In particular, conditionally on X and V, S can impact T (only) through δ. δ is allowed to depend on time to exposure s on its duration. The latter is inferred from the difference between t and s. δ is also allowed to depend on observed heterogeneity x. V_S and V_T might be dependent, so that the model allows for endogenous selection into the treatment. Furthermore, the functions r_T, r_S can be different (and analogously for λ_T, λ_S).

Model (23) and (24) is a generalization of the widely-used treatment effect model by Abbring and Van Den Berg (2003b). Identification in Abbring and Van Den Berg (2003b) relies on separable MPH structure of each of the two hazards θ_T, θ_S – an assumption that is relaxed here.

For the further discussion, it is informative to review the differences between the treatment effect model (23), (24) and the lagged duration model (17) and (18). The latter imposes that the two durations T₁, T₂ realize sequentially. This implies that both durations are observed. In the treatment effect model, S is potentially censored by T. Moreover, in model (17) and (18), the hazard of T₂ depends on T₁ through ψ for potentially all values t₁, t₂ (i.e. ψ might be ≠1 for all t₁, t₂). In the treatment effect model, on the other hand, δ in (23) is equal to 1 when s > t. This condition is plausible when the treatment is not anticipated by the individuals. As an example, ALMPs often have random unanticipated variation that arises from the random order in which case workers treat the unemployed, see e.g. Sianesi (2004). Alternatively, individuals might be aware of the future time of the treatment but they might not be able to act upon this information. In the kidney example, patients might have little other choice but to wait for their turn to come.

Identification of the treatment effect δ proceeds in two steps. First, consider model (23) and (24) for values t < s. For such values, the model represents a GMH competing risks model. Under the assumptions from Section 4, λ_j, r_j, j = T, S and the joint distribution of r_T(X, V_T), r_S(X, V_S) can be identified. This is stated in the following proposition.

Proposition 7.

Suppose that for t < s, the bivariate GMH competing-risks model (23) and (24) is characterized by the assumptions of either Propositions 5 or 6. Then, λ_j, r_j, j = S, T and the joint distribution of r_T(X, V_T) and r_S(X, V_S) are identified from the data.

In a second step, knowledge of the components of the competing risks model is used to identify the treatment effect δ. The following result takes a general standpoint and assumes directly knowledge of these components (i.e. the source of this knowledge is ignored).

Proposition 8.

Assume that the functions λ_j, r_j, j = S, T and the joint distribution of r_T(X, V_T) and r_S(X, V_S) are known. Then, the treatment effect δ is identified from the data.

Corollary 1.

Suppose that the assumptions of Proposition 7 are satisfied. Then, the treatment effect δ is identified.

The proof of Proposition 8 is novel. Intuitively, when the competing risk model is identified, the derivative of the joint survival function of S and T with respect to t (which can be identified through its relationship to the crude survival functions) can be linked through a differential equation to the treatment effect δ. This differential equation is shown to have a unique solution, which leads to identification. Thus, it is the change in the marginal hazard ∂P(T > t, S > s)/∂t over elapsed durations t that identifies the treatment effect. On the contrary, Abbring and Van Den Berg (2003b) use variation of the marginal exit rate with respect to s, ∂P(T > t, S > s)/∂s. Their strategy requires smoothness of the treatment effect. This assumption is not necessary here.

6 Discussion

This paper has provided identification results for nonseparable duration models. These models are nonseparable in two ways. First, genuine duration dependence is allowed to depend on observed covariates. Second, observed and unobserved characteristics may interact in an arbitrary way. The models considered in the paper constitute a comprehensive set of settings considered in theoretical and applied duration studies. In particular, identification has been shown in single-spell models with and without time-varying covariates, in multiple models with shared frailty and lagged duration dependence, in single-spell and multiple-spell competing risks models, and in treatment effects models where treatment is assigned during the individual spell in the state of interest. The latter contribution is also the paper’s most important one. Proposition 8 and its Corollary 1 relax common assumptions on the values and timing on the treatment. Specifically, while most papers require the treatment to be binary and to realize at the beginning of the duration spell, this paper allows both the treatment to be continuous and to realize at later points in time. The former aspect is important when the duration of exposure to treatment matters. The latter aspect is relevant for most practical applications: hardly any treatment is applied right at the onset of a disease; no job searching is administered on day 1 of unemployment. Building on Abbring and Van den Berg (2003b), who allow for endogenous treatment timing and exposure-dependent effects under separability, this paper extends identification results to a nonseparable setting. Specifically, it shows that treatment effects can still be identified in a bivariate nonseparable duration model.

A natural follow-up of these results would be to develop estimation techniques and to additionally allow unobserved heterogeneity to interact with duration dependence. These topics remain for future research.where the last equality follows from the uniform distribution of F(T|X, V).

Corresponding author: Petyo Bonev, Agroscope and University of St. Gallen, St. Gallen, Switzerland, E-mail: petyo.bonev@unisg.ch

Acknowledgments

I thank Gerard J. van den Berg, Richard W. Blundell, Christoph Breunig, Christian Brinch, Andrew Chesher, Dennis Kristensen, Michael Lechner and participants at seminars at the University College of London, ETH Zürich and the Humboldt University of Berlin for their helpful and critical comments. Special thanks to Georgios Effraimidis for his helpful ideas and suggestions.

Conflict of interest: The author reports there are no competing interests to declare.

References

Abbring, J. H. 2002. Econometric Duration and Event-History Analysis. Lecture Notes, University of Chicago.Suche in Google Scholar

Abbring, J. H., and G. J. Van Den Berg. 2003a. “The Identifiability of the Mixed Proportional Hazards Competing Risks Model.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65 (3): 701–10.10.1111/1467-9868.00410Suche in Google Scholar

Abbring, J. H., and G. J. Van Den Berg. 2003b. “The Nonparametric Identification of Treatment Effects in Duration Models.” Econometrica 71 (5): 1491–517.10.1111/1468-0262.00456Suche in Google Scholar

Abrevaya, J., and J. A. Hausman. 1999. “Semiparametric Estimation with Mismeasured Dependent Variables: An Application to Duration Models for Unemployment Spells.” Annals of Economics and Statistics (55–56): 243–75.10.2307/20076198Suche in Google Scholar

Aït-Sahalia, Y. 2007. Estimating Continuous-Time Models with Discretely Sampled Data, volume 3 of Econometric Society Monographs, 261–327. Cambridge University Press.10.1017/CCOL0521871549.009Suche in Google Scholar

American Psychological Association. 2017. “How Long Will it Take for Treatment to Work.” American Psychological Association. http://www.apa.org/ptsd-guideline/patients-and-families/length-treatment (accessed April 22, 2023).Suche in Google Scholar

Andersen, P., O. Borgan, R. Gill, and N. Keiding. 1996. “Statistical Models Based on Counting Processes.” In Springer Series in Statistics. Springer New York.Suche in Google Scholar

Bergström, R., and P.-A. Edin. 1992. “Time Aggregation and the Distributional Shape of Unemployment Duration.” Journal of Applied Econometrics 7 (1): 5–30.10.1002/jae.3950070104Suche in Google Scholar

Beyhum, J., S. Centorrino, J.-P. Florens, and I. Van Keilegom. 2024. “Instrumental Variable Estimation of Dynamic Treatment Effects on a Duration Outcome.” Journal of Business & Economic Statistics 42 (2): 732–42.10.1080/07350015.2023.2231053Suche in Google Scholar

Blanco, G., X. Chen, C. A. Flores, and A. Flores-Lagunes. 2020. “Bounds on Average and Quantile Treatment Effects on Duration Outcomes Under Censoring, Selection, and Noncompliance.” Journal of Business & Economic Statistics 38 (4): 901–20.10.1080/07350015.2019.1609975Suche in Google Scholar

Bound, J., C. Brown, G. J. Duncan, and W. L. Rodgers. 1989. “Measurement Error in Cross-Sectional and Longitudinal Labor Market Surveys: Results from Two Validation Studies.” Working Paper 2884, National Bureau of Economic Research.10.3386/w2884Suche in Google Scholar

Brinch, C. 2007. “Nonparametric Identification of the Mixed Hazards Model with Time-Varying Covariates.” Econometric Theory 23: 349–54.10.1017/S0266466607070144Suche in Google Scholar

Brinch, C. N. 2011. “Non-Parametric Identification of the Mixed Proportional Hazards Model with Interval-Censored Durations.” The Econometrics Journal 14 (2): 343–50.10.1111/j.1368-423X.2011.00347.xSuche in Google Scholar

Caliendo, M., D. A. Cobb-Clark, and A. Uhlendorff. 2015. “Locus of Control and Job Search Strategies.” The Review of Economics and Statistics 97 (1): 88–103.10.1162/REST_a_00459Suche in Google Scholar

Chesher, A. 2002. “Semiparametric Identification in Duration Models.” CeMMAP working paper CWP20/02. London: Centre for Microdata Methods and Practice.10.1920/wp.cem.2002.2002Suche in Google Scholar

Chesher, A. 2003. “Identification in Nonseparable Models.” Econometrica 71 (5): 1405–41.10.1111/1468-0262.00454Suche in Google Scholar

Chesher, A. 2007. Identification of Nonadditive Structural Functions, volume 3 of Econometric Society Monographs, 1–16. Cambridge University Press.10.1017/CCOL0521871549.001Suche in Google Scholar

Chiappori, P.-A., I. Komunjer, and D. Kristensen. 2015. “Nonparametric Identification and Estimation of Transformation Models.” Journal of Econometrics 188 (1): 22–39.10.1016/j.jeconom.2015.01.001Suche in Google Scholar

Cockx, B., and M. Picchio. 2013. “Scarring Effects of Remaining Unemployed for Long-Term Unemployed School-Leavers.” Journal of the Royal Statistical Society. Series A (Statistics in Society) 176 (4): 951–80.10.1111/j.1467-985X.2012.01086.xSuche in Google Scholar

Colby, G., and P. Rilstone. 2004. “Nonparametric Identification of Latent Competing Risks Models.” Econometric Theory 20 (5): 883–90.10.1017/S0266466604205035Suche in Google Scholar

De Luna, X., and P. Johansson. 2010. “Non-Parametric Inference for the Effect of a Treatment on Survival Times with Application in the Health and Social Sciences.” Journal of Statistical Planning and Inference 140 (7): 2122–37.10.1016/j.jspi.2010.02.012Suche in Google Scholar

Doiron, D., and T. Gørgens. 2008. “State Dependence in Youth Labor Market Experiences, and the Evaluation of Policy Interventions.” Journal of Econometrics 145 (1): 81–97. The use of econometrics in informing public policy makers.10.1016/j.jeconom.2008.05.010Suche in Google Scholar

Dorsett, R., and P. Lucchino. 2018. “Young People’s Labour Market Transitions: The Role of Early Experiences.” Labour Economics 54: 29–46.10.1016/j.labeco.2018.06.002Suche in Google Scholar

Elbers, C., and G. Ridder. 1982. “True and Spurious Duration Dependence: The Identifiability of the Proportional Hazard Model.” The Review of Economic Studies 49: 403–9.10.2307/2297364Suche in Google Scholar

Eriksson, S., and D.-O. Rooth. 2014. “Do Employers Use Unemployment as a Sorting Criterion when Hiring? Evidence from a Field Experiment.” The American Economic Review 104 (3): 1014–39.10.1257/aer.104.3.1014Suche in Google Scholar

Evdokimov, K. 2010. “Nonparametric Identification of a Nonlinear Panel Model with Application to Duration Analysis with Multiple Spells.” Discussion Paper.Suche in Google Scholar

Farber, H. S., D. Silverman, and T. von Wachter. 2016. “Determinants of Callbacks to Job Applications: An Audit Study.” The American Economic Review 106 (5): 314–8.10.1257/aer.p20161010Suche in Google Scholar

Frederiksen, A., B. E. Honoré, and L. Hu. 2007. “Discrete Time Duration Models with Group-Level Heterogeneity.” Journal of Econometrics 141 (2): 1014–43.10.1016/j.jeconom.2006.12.003Suche in Google Scholar

Frijters, P. 2002. “The Non-Parametric Identification of Lagged Duration Dependence.” Economics Letters 75 (3): 289–92.10.1016/S0165-1765(02)00004-6Suche in Google Scholar

Hausman, J. A., and T. M. Woutersen. 2008. “The Proportional Hazard Model.” In The New Palgrave Dictionary of Economics, edited by S. N. Durlauf, and L. E. Blume. Basingstoke: Palgrave Macmillan.10.1057/978-1-349-95121-5_2625-1Suche in Google Scholar

He, X. 1997. “Quantile Curves Without Crossing.” The American Statistician 51 (2): 186–92.10.1080/00031305.1997.10473959Suche in Google Scholar

Heckman, J. J. 1991. “Identifying the Hand of the Past: Distinguishing State Dependence from Heterogeneity.” The American Economic Review 81: 75–9.Suche in Google Scholar

Heckman, J., and G. Borjas. 1980. “Does Unemployment Cause Future Unemployment? Definitions, Questions and Answers from a Continuous Time Model of Heterogeneity and State Dependence.” Economica 47 (187): 247–83.10.2307/2553150Suche in Google Scholar

Heckman, J. J., and B. E. Honoré. 1989. “The Identifiability of the Competing Risks Model.” Biometrika 76 (2): 325–30.10.1093/biomet/76.2.325Suche in Google Scholar

Heckman, J. J., and B. Singer. 1984. “The Identifiability of the Proportional Hazard Model.” The Review of Economic Studies 51: 231–41.10.2307/2297689Suche in Google Scholar

Honoré, B. E. 1990. “Identification Results for Duration Models with Multiple Spells or Time-Varying Covariates.” Part of this paper was later published in the 1993 Review of Economics Studies paper.Suche in Google Scholar

Honoré, B. E. 1993. “Identification Results for Duration Models with Multiple Spells.” The Review of Economic Studies 60 (1): 241–6.10.2307/2297821Suche in Google Scholar

Horny, G., and M. Picchio. 2010. “Identification of Lagged Duration Dependence in Multiple-Spell Competing Risks Models.” Economics Letters 106 (3): 241–3.10.1016/j.econlet.2009.12.010Suche in Google Scholar

Horowitz, J. L. 1999. “Semiparametric Estimation of a Proportional Hazard Model with Unobserved Heterogeneity.” Econometrica 67: 1001–28.10.1111/1468-0262.00068Suche in Google Scholar

Hougaard, P. 2000. Analysis of Multivariate Survival Data. Heidelberg: Springer.10.1007/978-1-4612-1304-8Suche in Google Scholar

Jun, S. J., Y. Lee, and Y. Shin. 2016. “Treatment Effects with Unobserved Heterogeneity: A Set Identification Approach.” Journal of Business & Economic Statistics 34 (2): 302–11.10.1080/07350015.2015.1044008Suche in Google Scholar

Kalbfleisch, J., and R. Prentice. 1980. “The Statistical Analysis of Failure Time Data.” In Wiley series in probability and mathematical statistics: Applied probability and statistics. Wiley.Suche in Google Scholar

Kastoryano, S., and J. Beyhum. 2020. “Decomposing Causal Mechanisms in Duration Models with Unobserved Heterogeneity”.Suche in Google Scholar

Knaus, M., M. Lechner, and A. Strittmatter. 2017. “Heterogeneous Employment Effects of Job Search Programmes: A Machine Learning Approach.” arXiv e-prints: arXiv:1709.10279.10.2139/ssrn.3029832Suche in Google Scholar

Kroft, K., F. Lange, and M. J. Notowidigdo. 2013. “Duration Dependence and Labor Market Conditions: Evidence from a Field Experiment.” The Quarterly Journal of Economics 128 (3): 1123–67.10.1093/qje/qjt015Suche in Google Scholar

Kyyrä, T., and V. Ollikainen. 2008. “To Search or Not to Search? The Effects of UI Benefit Extension for the Older Unemployed.” Journal of Public Economics 92 (10-11): 2048–70.10.1016/j.jpubeco.2008.03.004Suche in Google Scholar

Lancaster, T. 1979. “Econometric Methods for the Duration of Unemployment.” Econometrica 47: 939–56.10.2307/1914140Suche in Google Scholar

Lancaster, T. 1985. “Simultaneous Equations Models in Applied Search Theory.” Journal of Econometrics 28 (1): 113–26.10.1016/0304-4076(85)90070-3Suche in Google Scholar

Lancaster, T. 1990. The Econometric Analysis of Transition Data. Econometric Society Monographs. Cambridge: Cambridge University Press.10.1017/CCOL0521265967Suche in Google Scholar

Lee, S., and A. Lewbel. 2013. “Nonparametric Identification of Accelerated Failure Time Competing Risks Models.” Econometric Theory 29 (5): 905–19.10.1017/S0266466612000795Suche in Google Scholar

Lewbel, A. 1998. “Semiparametric Latent Variable Model Estimation with Endogenous or Mismeasured Regressors.” Econometrica 66 (1): 105–21.10.2307/2998542Suche in Google Scholar

Lewbel, A. 2019. “The Identification Zoo: Meanings of Identification in Econometrics.” Journal of Economic Literature 57 (4): 835–903.10.1257/jel.20181361Suche in Google Scholar

Lockwood, B. 1991. “Information Externalities in the Labour Market and the Duration of Unemployment.” The Review of Economic Studies 58 (4): 733–53.10.2307/2297830Suche in Google Scholar

Lundborg, P., E. Plug, and A. W. Rasmussen. 2017. “Can Women Have Children and a Career? IV Evidence from IVF Treatments.” The American Economic Review 107 (6): 1611–37. https://doi.org/10.1257/aer.20141467.Suche in Google Scholar

Mathiowetz, N. A., and G. J. Duncan. 1988. “Out of Work, Out of Mind: Response Errors in Retrospective Reports of Unemployment.” Journal of Business & Economic Statistics 6 (2): 221–9.10.1080/07350015.1988.10509656Suche in Google Scholar

Matzkin, R. L. 2003. “Nonparametric Estimation of Nonadditive Random Functions.” Econometrica 71 (5): 1339–75.10.1111/1468-0262.00452Suche in Google Scholar

Matzkin, R. L. 2007. “Chapter 73 Nonparametric Identification.” Handbook of Econometrics 6: 5307–68.10.1016/S1573-4412(07)06073-4Suche in Google Scholar

McCall, B. P. 1994. “Identifying State Dependence in Duration Models. American Statistical Association 1994.” Proceedings of the Business and Economics Section 14.Suche in Google Scholar

Nüß, P. 2018. “Duration Dependence as an Unemployment Stigma: Evidence from a Field Experiment in Germany.” Technical report, Economics Working Paper.Suche in Google Scholar

Picchio, M. 2012. “Lagged Duration Dependence in Mixed Proportional Hazard Models.” Economics Letters 115 (1): 108–10.10.1016/j.econlet.2011.12.005Suche in Google Scholar

Ridder, G. 1986. “The Sensitivity of Duration Models to Misspecified Unobserved Heterogeneity and Duration Dependence.” Mimeo, University of Amsterdam.Suche in Google Scholar

Ridder, G. 1990. “The Non-Parametric Identification of Generalized Accelerated Failure-Time Models.” The Review of Economic Studies 57 (2): 167–81.10.2307/2297376Suche in Google Scholar

Ruf, J., and J. L. Wolter. 2019. “Nonparametric Identification of the Mixed Hazard Model Using Martingale-Based Moments.” Econometric Theory 1–16.10.1017/S0266466619000033Suche in Google Scholar

Sant’Anna, P. H. 2021. “Nonparametric Tests for Treatment Effect Heterogeneity with Duration Outcomes.” Journal of Business & Economic Statistics 39 (3): 816–32.10.1080/07350015.2020.1737080Suche in Google Scholar

Sianesi, B. 2004. “An Evaluation of the Swedish System of Active Labor Market Programs in the 1990s.” The Review of Economics and Statistics 86 (1): 133–55.10.1162/003465304323023723Suche in Google Scholar

Tsiatis, G. 1975. “A Nonidentifiability Aspect of the Problem of Competing Risks.” Proceedings of the National Academy of Sciences 72 (1): 20–2. https://doi.org/10.1073/pnas.72.1.20.Suche in Google Scholar

Van den Berg, G. J. 2001. “Duration Models: Specification, Identification, and Multiple Durations.” In Handbook of Econometrics, Vol. 5, chapter 55, edited by J. Heckman, and E. Leamer, 3381–460. Elsevier.10.1016/S1573-4412(01)05008-5Suche in Google Scholar

Van den Berg, G. J., and B. Van der Klaauw. 2006. “Counseling and Monitoring of Unemployed Workers: Theory and Evidence from a Controlled Social Experiment.” International Economic Review 47 (3): 895–936.10.1111/j.1468-2354.2006.00399.xSuche in Google Scholar

Van den Berg, G. J., P. Bonev, and E. Mammen. 2020a. “Nonparametric Instrumental Variable Methods for Dynamic Treatment Evaluation.” The Review of Economics and Statistics 102 (2): 355–67.10.1162/rest_a_00843Suche in Google Scholar

Van den Berg, G. J., A. Bozio, and M. Costa Dias. 2020b. “Policy Discontinuity and Duration Outcomes.” Quantitative Economics 11 (3): 871–916.10.3982/QE639Suche in Google Scholar

Vooren, M., C. Haelermans, W. Groot, and H. Maassen van den Brink. 2019. “The Effectiveness of Active Labor Market Policies: A Meta-Analysis.” Journal of Economic Surveys 33 (1): 125–49.10.1111/joes.12269Suche in Google Scholar

Weber, S. 2014. “Human Capital Depreciation and Education Level.” International Journal of Manpower 35 (5): 613–42.10.1108/IJM-05-2014-0122Suche in Google Scholar

Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/jbnst-2024-0001).

Received: 2024-01-04

Accepted: 2025-07-10

Published Online: 2025-09-01

This work is licensed under the Creative Commons Attribution 4.0 International License.

Supplementary Material

https://doi.org/10.1515/jbnst-2024-0001

Schlagwörter für diesen Artikel

duration models; identification; unobserved treatment heterogeneity; nonseparable models; competing risks; treatment effect

Creative Commons

BY 4.0

Nonparametric Identification in Nonseparable Duration Models with Unobserved Heterogeneity

Artikel

Abstract

1 Introduction

2 Model and Motivation

3 Identification Under Random Right Censoring

3.1 Two Basic Assumptions

Lemma 1.

3.2 Nonparametric Identification in Single-Spell Models

3.2.1 Identification Under a Special Regressor Assumption

Proposition 1:

3.2.2 Two Identification Results Using the Special Regressor Assumption

Proposition 2.

Proposition 3.

Remark 1.

3.3 Nonparametric Identification in Multiple-Spells Models

Remark 2:

Proposition 4.

4 Identification Under Endogenous Censoring: The Case of Competing Risks Models

Proposition 5.

Proposition 6.

5 Nonparametric Identification of Treatment Effects when the Treatment is Assigned During the Spell

Proposition 7.

Proposition 8.

Corollary 1.

6 Discussion

Acknowledgments

References

Supplementary Material

Zusatzmaterial