Difference in Differences, Ratio in Ratios, and Ratio in Odds Ratios for Limited Dependent Variables: A Review and More

Sanghyeok Lee; Myoung-jae Lee

doi:10.1515/snde-2024-0125

Enjoy 40% off

academic books on De Gruyter Brill *

Article Open Access

Difference in Differences, Ratio in Ratios, and Ratio in Odds Ratios for Limited Dependent Variables: A Review and More

Sanghyeok Lee and Myoung-jae Lee

Published/Copyright: August 19, 2025

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Studies in Nonlinear Dynamics & Econometrics

Abstract

Difference in differences (DD) is widely used to find policy/treatment effects with observational data, but applying DD to limited dependent variables (LDV’s) Y has been problematic. This paper addresses how to apply DD and related approaches, such as “ratio in ratios” and “ratio in odds ratios”, to LDV’s under the unifying framework of ‘generalized linear models with link functions’. We review and evaluate DD and the related approaches with simulation and empirical studies, and recommend ‘Poisson Quasi-MLE’ for non-negative (such as count or zero-censored) Y and (multinomial) logit MLE for binary, fractional or multinomial Y.

Keywords: difference in differences; limited dependent variable; ratio in odds ratios; ratio in ratios

JEL Classification: C31; C34; C35

1 Introduction

Difference in Differences (DD) is one of the most popular research designs in social sciences. Not just in social sciences, DD has been gaining popularity also in natural sciences; see, e.g. Jena et al. (2015), Cataife and Pagano (2017), and McGrath et al. (2019). There are various references for DD: Lee (2005, 2016a), Angrist and Pischke (2009), Blundell and Dias (2009), Lechner (2011), Lee and Kim (2014), Kim and Lee (2017), Lee and Sawada (2020), Ariella and Lang (2020), Roth et al. (2023) and De Chaisemartin and d’Haultfoeuille (2020, 2023).

DD is basically for linear models with additive components, which makes applying DD to limited dependent variables (LDV’s) problematic. This paper provides answers to this problem, using the unifying idea of ‘generalized linear models (GLM) with link functions’. An earlier version (arXiv: 2111.12948) of this paper presented the GLM-based approach first in 2021, which has been cited by Callaway (2023), McMichael (2023), Markowitz and Smith (2024) and Freeman et al. (2025), among others. The approach subsequently appeared also in Taddeo et al. (2022) and Wooldridge (2023).

The GLM approach specifies the conditional mean regression, but two approaches requiring weaker assumptions appeared recently. One is Tchetgen et al. (2024) who specify an “extended propensity score function”, instead of the conditional mean regression. The other is Kim and Lee (2025), who use a “causal-reduced-form” and apply the usual DD to LDV’s to find a non-negatively weighted average of heterogeneous effects. The goal of this paper is to explain the GLM-based approach in detail so that it can be readily applied by practitioners, and review the literature.

Consider an outcome Y_it for unit i at time t = 2, 3, a time-constant treatment qualification dummy Q_i, and a binary treatment D_it; we set t = 2, 3 to avoid the confusion with dummy variable values 0, 1. The hallmark of DD is that D_it is the interaction of Q_i and 1[t = 3]: D_it = Q_i1[t = 3], where 1[A] ≡ 1 if A holds and 0 otherwise; only the Q_i = 1 group is treated at t = 3, and untreated otherwise.

DD can be implemented with panel data or repeated cross-sections (RCS). We focus on RCS in this paper, while relating RCS to panel data whenever necessary, because RCS are easier to deal with as they can be handled as cross-section iid data; also, our empirical analysis uses RCS. If one desires to apply a RCS method in our paper to panel data, then the simplest way is to pool the panel data and regard the time-series data for unit i as a cluster/group with related observations. Inference with clustering in RCS can be done by block bootstrap or a clustering-robust variance.

With a sampling dummy S_i independent of the other random variables, RCS have

D i = Q i S i where S i ≡ 1 [ unit i is sampled at t = 3 ] .

In RCS, there is a huge reservoir of units, and random sampling is done each period. Hence, we can safely assume that each unit appears only in one period. Let Y i t d be the potential version of Y_it for D_it = d, and Y i d = ( 1 − S i ) Y i 2 d + S i Y i 3 d be the RCS potential outcome. Let W_it denote covariates, and W_i ≡ (1 − S_i)W_i2 + S_iW_i3 be the RCS covariates; RCS variables are derived from the underlying panel model variables. Henceforth, unless unclear, we often omit the subscript i indexing units.

As a preliminary, ignoring the covariates W for a while, define for RCS:

(1.1) μ Q S ≡ E ( Y | Q , S ) = λ − 1 ( β 2 + β τ S + β q Q + β d D ) , β τ ≡ β 3 − β 2 ,

where λ(⋅) is a ‘link function’ as in GLM (Nelder and Wedderburn 1972), (β₂, β₃) are the period-(2,3) intercepts, β_τ is the time effect of t = 3 relative to t = 2, β_q is the group effect of Q = 1, and β_d is the desired treatment effect.

Since (Q, S) generates four cells for four parameters (β₂, β_τ, β_q, β_d) in (1.1), there seems no loss of generality in (1.1). However, (1.1) does include a restriction: QS should not appear separately from the treatment D. If the group effect of Q changes across time, then QS becomes relevant other than through D. This restriction – no change in the group effect over time – is the DD parallel/common trend assumption.

For continuous Y, λ(⋅) in (1.1) is the identity, in which case DD is defined as

μ 11 − μ 10 − ( μ 01 − μ 00 ) = β 2 + β τ + β q + β d − ( β 2 + β q ) − { ( β 2 + β τ ) − β 2 } = β d :

DD removes β₂ + β_τS + β_qQ to leave β_dD that changes across both times and groups. In practice, to account for the covariates W, a linear model such as

(1.2) E ( Y | Q , S , W ) = β 2 + β τ S + β q Q + β d D + β w ′ W

is typically estimated to find the slope of D as the treatment effect.

For LDV’s, the story changes much. E.g., consider Y = 1[0 ≤ Y*] where Y* is the latent continuous outcome. With the N(0, 1) distribution function Φ(⋅), the probit is

(1.3) E ( Y | Q , S ) = P ( Y = 1 | Q , S ) = Φ ( β 2 + β τ S + β q Q + β d D ) .

One way to stick to DD is estimating (1.3) to interpret β_d as the effect on Y*, not on Y. E.g., if β_d = 2, then D shifts Y* by twice the standard deviation (SD) of Y*. However, many practitioners desire the effect as a change in P(Y = 1|Q, S), not in Y*.

The ‘marginal effect’ that is a change of P(Y = 1|Q, S) in (1.3) due to D is

(1.4) Φ ( β 2 + β τ S + β q Q + β d ) − Φ ( β 2 + β τ S + β q Q ) .

Ai and Norton (2003) noted that this is not the correct effect, but their criticism applies to the case of an interaction treatment, where both Q and S are genuine treatments and the interest is in the effect of taking both treatments (e.g. drugs) together. Differently from this, Q and S are not treatments per se in the usual DD; D = QS just happened to be the way the treatment was implemented. Indeed, Puhani (2012, eq. (10)) showed that (1.4) with S = Q = 1 is a legitimate treatment effect of interest.

The complication involving (1.3) and (1.4) arises because DD is applied to a nonlinear model, although DD is designed basically for linear models. To drive home our point, consider the ‘log link’ λ(⋅) = ln(⋅) ⇔ λ⁻¹(⋅) = exp(⋅), with which (1.1) becomes

(1.5) μ Q S ≡ E ( Y | Q , S ) = exp ( β 2 + β τ S + β q Q + β d D ) .

This is appropriate for non-negative Y due to the lower bound μ_QS > 0. For (1.5), ‘ratio in ratios (RR)’ removes the time and group effects, and ‘RR minus one’ renders

(1.6) ‘proportional effect’ : μ 11 μ 10 / μ 01 μ 00 − 1 = E Y 3 1 − Y 3 0 | Q = 1 E Y 3 0 | Q = 1 = exp ( β d ) − 1 ,

which is proven below. Just as the linear model is estimated instead of the DD to find β_d, the exponential regression model (1.5) is usually estimated instead of the RR.

For positive outcomes, one might consider DD with lnY, which raises the issue of transforming Y before applying DD or RR. For this issue, Athey and Imbens (2006) proposed “change in changes” based on Y-transformation-invariant distributional effects, neither an additive (for DD) nor multiplicative (for RR) effect. Also, Roth and Sant’Anna (2023) considered distribution-functon-based DD that is invariant/insensitive to any strictly monotonic transformation of Y to propose a moment inequality test for parallel trends. Athey and Imbens (2006) and Roth and Sant’Anna (2023) are not exactly practical, because W should be controlled conditionally.

For binary or multinomial outcomes, ‘logit link’ can be used:

(1.7) μ Q S ≡ E ( Y | Q , S ) = exp ( β 2 + β τ S + β q Q + β d D ) 1 + exp ( β 2 + β τ S + β q Q + β d D ) ;

the logit link is λ(⋅) ≡ ln{⋅/(1 − ⋅)} ⇔ λ⁻¹(⋅) = exp(⋅)/{1 + exp(⋅)}. For (1.7), “ratio in odds-ratios (ROR)” similar to RR removes the time and group effects, and ROR minus one renders a “proportional odds effect” similar to (1.6). For binary outcomes, instead of estimating ROR directly, the logistic model in (1.7) is typically estimated, and for multinomial outcomes, multinomial logistic model is estimated.

In the remainder of this paper, Section 2 reviews the literature of DD with LDV, and lists this paper’s contributions near the end. Sections 3 and 4 examine RR and ROR, where W is explicitly controlled in addition to (Q, S) and estimation is done with Maximum Likelihood Estimator (MLE) or Quasi-MLE (QMLE). Section 5 presents an empirical analysis. Finally, Section 6 concludes this paper. The Appendix examines ROR with multinomial Y, and provides a simulation study showing that the usual linear-model DD can be misleading for LDV’s, and RR or ROR had better be used.

2 Literature for DD with LDV

2.1 Early Papers on DD with LDV

Despite the frequent appearance of LDV’s in reality, studies on DD with LDV are fairly scarce. The first highly cited paper is Ai and Norton (2003) who suggested to use the cross-derivative as the treatment effect: with our notation,

(2.1) ∂ 2 E ( Y | Q = q , S = s , W ) / ∂ q ∂ s ( or to use the analogous “cross − difference” ) .

Ai and Norton (2003) has been cited about 7,500 times in Google Scholar.

Ai and Norton (2003) is correct when the interest is in the interaction of two genuine treatments. However, QS itself is not a treatment in DD. Rather, QS just represents the way how the single, not double, treatment D is administered: only to the Q = 1 group at t = 3. Hence, as was noted in Section 1, Puhani (2012, eq. (10)) showed that Φ(β₂ + β_τ + β_q + β_d) − Φ(β₂ + β_τ + β_q) is an appropriate treatment effect. Puhani (2012) has been cited about 880 times in Google Scholar.

Wooldridge (2023, p. C39) concurred with Puhani: “Puhani’s definition is the correct one for identifying τ₂ when the PT assumption is stated in terms of the linear index”, where τ₂ is the average effect on the treated at the post-treatment period and ‘PT’ means parallel trends. Puhani (2012) addressed only the effect identification, whereas Ai and Norton (2003) discussed how to estimate (2.1) as well, and showed with an empirical example that (2.1) can differ from the slope of D = QS in sign.

Unaware of the early version of our paper in 2021, Taddeo et al. (2022) applied GLM to DD with LDV, as they proposed to use RR for count outcomes, and ROR for binary outcomes. Differently from our paper, however, Taddeo et al. (2022) considered neither zero-censored nor fractional outcome, although they mentioned categorical outcomes briefly. Also, the word ‘QMLE’ never appeared, and the types of covariates were not discussed although they matter much for categorical outcomes.

Taddeo et al. (2022, p. 404) also suggested a sensitivity analysis for parallel trend assumption. E.g., as to be seen later, the multiplicative form of parallel trends requires a RR form with only Y t 0 ’s (analogous to the RR in (1.6)) being equal to one. Replacing the constant one with another number can reveal how the treatment effect is affected when the number deviates from one. For instance, if an empirical finding gets reversed when the constant is 1.5, then it takes a 50 % violation of the parallel trends, and thus the initial finding may be deemed to be robust to violations of parallel trends.

2.2 DD and Staggered DD with LDV

The paper that overlaps most with our paper is Wooldridge (2023), which was written to generalize Wooldridge (2021) for linear DD models to nonlinear ones; for this reason, the working paper version of Wooldridge (2023) was available only in 2022. Wooldridge (2023) initially examined the canonical two-group two-period DD, and then moved onto staggered DD which has been gaining popularity these days; see, e.g. Callaway and Sant’Anna (2021), Goodman-Bacon (2021), Sun and Abraham (2021), Athey and Imbens (2022), Liu et al. (2024), and Borusyak et al. (2024).

Table 1 at the end of Section 2 of Wooldridge (2023) summarizes the recommendation for the appropriate estimator for each type of LDV: Poisson QMLE for count, zero-censored and non-negative outcomes; logit for binary and fractional outcomes; and multinomial logit for multinomial/categorical outcomes. These recommendations agree with our paper, which is why Wooldridge (2023) overlaps much with our paper. A word of caution for those trying to read our paper and Wooldridge’s: our (Q_i, D_it, W_it) is (D_i, W_it, X_it) in the notation of Wooldridge (2023), which can be confusing.

Given the similarity between our paper and Wooldridge (2023), the reader may wonder what the differences are between the two papers. Wooldridge (2023) is more general than our paper, as it addresses staggered DD with LDV in multi-period settings. Wooldridge (2023) also compares ‘pooled estimators using all observations together’ to ‘imputation estimators using only the untreated observations to impute the counterfactual untreated outcomes of the Q = 1 group at t = 3’. In imputation estimators, the imputed entity is subtracted from the outcome average of the Q = 1 group at t = 3, which our paper using all observations together does not examine. Wooldridge (2023) further provides conditions to make the two types of estimators the same. In the following, we show what our paper has that Wooldridge (2023) does not.

First, when the adopted model is estimated for the LDV Y generated by its latent continuous version Y* with a linear index β 2 + β τ S + β q Q + β d D + β w ′ W , our paper shows that β_d can be viewed either as the effect of D on Y*, or as the proportional effect on Y for the log link as in (1.6) or the proportional-odds effect for the logit link. Even if Y* is inconceivable, still the proportional effect interpretation holds. In contrast to this, Wooldridge (2023) takes one extra step to estimate E Y 3 1 − Y 3 0 | Q = 1 , W (or E Y 3 1 − Y 3 0 | W ), which is not a necessity; e.g.,

1 ∑ i Q i ∑ i Q i exp β ̂ 2 + β ̂ τ + β ̂ q + β ̂ d + β ̂ w ′ W i − exp β ̂ 2 + β ̂ τ + β ̂ q + β ̂ w ′ W i

is an estimator with the log link and QMLE estimates β ̂ 2 , β ̂ τ , β ̂ q , β ̂ d , β ̂ w ′ ′ .

Second, we also review the more recent literature on DD with LDV in 2024 and 2025: Tchetgen et al. (2024) and Kim and Lee (2025), who tried to reduce the regression function specification errors. Kim and Lee (2025) is reviewed shortly below, whereas Tchetgen et al. (2024) is reviewed in Section 4 after ROR is introduced because their approach is applicable only with ROR.

Third, covariates are often “brushed aside” in the DD literature. That is, a theory is developed without W first, and when W appears, a remark such as “The theory developed without W holds with W, either by conditioning on W or by controlling W linearly” is made. Wooldridge (2023) also assumes time-constant W, and either condition on W or control W linearly. Handling W “lightly” in this way can be problematic, because the nature of W matters much in panel data (i.e. unit-constant/varying and time-constant/varying), and it is more so for categorical/multinomial outcomes, as the nature of W is three-dimensional (unit-constant/varying, time-constant/varying, and category-constant/varying), where normalizations matter much even for RCS as can be seen in Lee (2015) and our Appendix for multinomial outcomes.

Fourth, in addition to the above three main contributions, there are also some minor ones: (i) DD identification conditions are usually stated and verified for panel data models, but since we use QMLE/MLE with RCS, we verify them with RCS; (ii) “triple ratios” are used to deal with non-parallel multiplicative trends; (iii) we provide the details on justifying RR for zero-censored outcomes; (iv) for proportional odds effect which is difficult to grasp, we show a ‘rare event condition’ which turns the proportional odds effect into a proportional effect; and (v) we present the exact maximand of the QMLE/MLE, so that it can be easily understood and implemented by practitioners, going well beyond presenting a summary table as in Table 1 of Wooldridge (2023).

2.3 Causal Reduced Form for OLS to DD with LDV

Recently, Kim and Lee (2025) showed that, for any form of Y (binary, count, continuous, …) with meaningful Y¹ − Y⁰, a linear model holds always for ΔY ≡ Y₃ − Y₂, to which ordinary least squares (OLS) can be applied. To see how, recall D_it = Q_i1[t = 3] and observe:

Y 2 = Y 2 0 , Y 3 = Y 3 0 + Y 3 1 − Y 3 0 Q ⇒ Δ Y ≡ Y 3 − Y 2 = Δ Y 0 + Y 3 1 − Y 3 0 Q .

Under PT ΔY⁰⨿Q|W, take E(⋅|Q, W) on ΔY where ‘ ⨿’ stands for independence:

The ΔY equation linear in Q holds for any Y, and it is a “causal reduced form (CRF)” in the sense that it is a reduced/derived form with a causal parameter τ₁(W):

Δ Y = E ( Δ Y 0 | W ) + τ 1 ( W ) Q + U , τ 1 ( W ) ≡ E Y 3 1 − Y 3 0 | Q = 1 , W , E ( U | Q , W ) = 0 . ( C R F )

CRF may sound strange, but it has been used fruitfully in Lee (2018) and Lee et al. (2025) for binary exogenous D; Lee (2021), Lee et al. (2023), Choi et al. (2023), and Kim and Lee (2024) for binary endogenous D; Lee (2024) for binary exogenous D with a mediator; and Kim (2025) for network/spillover effect.

To remove E(ΔY⁰|W) from the CRF, take E(⋅|W) on the CRF:

E ( Δ Y | W ) = E ( Δ Y 0 | W ) + τ 1 ( W ) ⋅ p W where p W ≡ E ( Q | W ) .

Subtract this from the CRF to get ΔY − E(ΔY|W) = τ₁(W) ⋅ (Q − p_W) + U. Now, OLS of ΔY − E(ΔY|W) on Q − p_W can be done, which is equivalent to OLS of ΔY on Q − p_W because Q − p_W is orthogonal to E(ΔY|W). The propensity score p_W can be estimated with probit or logit.

Kim and Lee (2025) showed that OLS of ΔY on Q − p_W is consistent to:

E w o w ( W ) τ 1 ( W ) where w o w ( W ) ≡ p W ( 1 − p W ) E { p W ( 1 − p W ) } ≥ 0 with E { w o w ( W ) } = 1 ;

OW stands for “overlap weight”. CRF is nonparametric as no parametric assumption is invoked, and OLS to the CRF is consistent to E w o w ( W ) τ 1 ( W ) for any LDV, as long as Y¹ − Y⁰ makes sense. Extending this finding to more general cases, such as multi-periods, non-binary D and endogenous Q, remains to be explored.

The fact that OLS estimates an OW-average of heterogeneous effects is well known (e.g. Angrist 1998; Angrist and Pischke 2009) under the “saturated model” assumption – i.e., p_W is equal to the linear projection of Q on W – which is, however, not necessary in the above OLS. See Choi and Lee (2023) for details on OW and its advantages. OW has been gaining popularity in statistics, epidemiology, and medical science; see, e.g. Li et al. (2018), Mao et al. (2019), Li (2019), Zhou et al. (2020), Thomas et al. (2020), Aminian et al. (2021), Cheng et al. (2022), Anderson et al. (2023), Wei et al. (2023), Matsouaka and Zhou (2024), and Xu et al. (2025).

3 Ratio in Ratios (RR) for Non-Negative Outcome

This section studies RR for non-negative outcomes including count and zero-censored outcomes. First, the identification aspect is examined. Second, although RR can be done by nonparametrically estimating the conditional means in RR, this is not how RR is done in practice; instead, a practical semiparametric estimator for RR using an exponential regression model is advocated. Third, several remarks are made.

3.1 Proportional Effect Identification with RR

To simplify notation when covariates W are allowed for, define

μ Q S ( w ) ≡ E ( Y | w , Q , S ) ≡ E ( Y | W = w , Q , S ) ;

note μ_Q0(w) = E(Y₂|W₂ = w, Q) and μ_Q1(w) = E(Y₃|W₃ = w, Q). Analogously to (1.6), define RR or ‘proportional-effect +1 given W = w’:

R R ( w ) ≡ μ 11 ( w ) μ 10 ( w ) / μ 01 ( w ) μ 00 ( w ) .

The identification condition for RR(w) is “unity multiplicative trends”:

E Y 3 0 | w , Q = 1 E Y 2 0 | w , Q = 1 / E Y 3 0 | w , Q = 0 E Y 2 0 | w , Q = 0 = 1 ; ( I D R R )

keep in mind that S is independent of the other random variables. In ID_RR, E Y 3 0 | w , Q = 1 is the counterfactual, because only Y 3 1 is realized for Q = 1 at t = 3.

ID_RR is analogous to the usual linear-model DD parallel trend condition:

E Y 3 0 | w , Q = 1 − E Y 2 0 | w , Q = 1 − E Y 3 0 | w , Q = 0 − E Y 2 0 | w , Q = 0 = 0 , ( I D D D )

which dictates that the counterfactual E Y 3 0 | w , Q = 1 be constructed as E Y 2 0 | w , Q = 1 + E Y 3 0 | w , Q = 0 − E Y 2 0 | w , Q = 0 . In the same vein, ID_RR dictates that the counterfactual be constructed multiplicatively as the baseline untreated outcome of the Q = 1 group times the ratio of the control group means at t = 3 to t = 2:

(3.1) E Y 3 0 | w , Q = 1 = E Y 2 0 | w , Q = 1 × E Y 3 0 | w , Q = 0 E Y 2 0 | w , Q = 0 .

The main point is that RR(w) − 1 is the ‘proportional effect on the treated at the post-treatment period’, as the first and last terms in the following show:

Instead of the difference effect E Y 3 1 − Y 3 0 | w , Q = 1 , examining the proportional effect can be beneficial (Yadlowsky et al. 2021). E.g., suppose E Y 3 0 | w , Q = 1 = G ( w ) for a function G(⋅) and the proportional effect is a constant β ̆ d . Then the difference effect E Y 3 1 − Y 3 0 | w , Q = 1 = β ̆ d G ( w ) introduces effect heterogeneity unnecessarily, compared with the simple β ̆ d . Proportional effects for exponential models have been advocated in many studies: Lee and Kobayashi (2001), Dukes and Vansteelandt (2018) and Ciani and Fisher (2019), among others.

If the dimension of W is low (or if W is discrete), RR(w) can be estimated nonparametrically, but typically the dimension of W is high in practice, and thus we explore a simpler semiparametric exponential regression next – semiparametric because only E(Y|W, Q, S) is specified, not the full distribution of Y|(W, Q, S).

3.2 Poisson Quasi-MLE (QMLE)

In view of (1.5), suppose that a panel data exponential model holds for Y i t d :

(3.3) E Y t d | W 2 , W 3 , Q = E Y t d | W t , Q = exp β t + β q Q + β d d + β w ′ W t

If periods 1, …, T were available, then W₁, …, W_T would appear in the first term. Then ID_RR holds with (3.3):

(3.4) E Y 3 0 | w , Q = 1 E Y 2 0 | w , Q = 1 / E Y 3 0 | w , Q = 0 E Y 2 0 | w , Q = 0 = exp β 3 + β q + β w ′ w exp β 2 + β q + β w ′ w / exp β 3 + β w ′ w exp β 2 + β w ′ w = 1 .

For the corresponding RCS model, the observed Y in RCS is: due to D = QS,

Y = ( 1 − S ) Y 2 + S Y 3 = ( 1 − S ) Y 2 0 + S ⋅ ( 1 − Q ) Y 3 0 + Q Y 3 1 = ( 1 − S ) Y 2 0 + S ( 1 − Q ) Y 3 0 + D Y 3 1 .

Take E(⋅|W₂, W₃, Q, S) on this Y and invoke (3.3): recalling β_τ ≡ β₃ − β₂ in (1.1),

(3.5) E ( Y | W 2 , W 3 , Q , S ) = ( 1 − S ) exp β 2 + β q Q + β w ′ W 2 + S ( 1 − Q ) exp β 3 + β q Q + β w ′ W 3 + D ⁡ exp β 3 + β q Q + β d + β w ′ W 3 = exp β 2 + β τ S + β q Q + β d D + β w ′ W , W ≡ ( 1 − S ) W 2 + S W 3

(3.6) ⇒ R R ( w ) − 1 = exp ( β d ) − 1 ;

the second equality can be verified by substituting Q, S = 0, 1 into both sides. If β_d is near zero, then β_d is an approximate proportional effect. RCS data and the model (3.5) can be used to estimate β_d along with the other parameters.

The heterogeneous effect can be easily allowed by replacing β_d with β d ( W t ) = β d 0 + β d w ′ W t . Then we have β d ( W ) = β d 0 + β d w ′ W in RCS, and (3.5) becomes

E ( Y | W , Q , S ) = exp β 2 + β τ S + β q Q + β d ( W ) D + β w ′ W ⇒ R R ( w ) − 1 = exp { β d ( w ) } − 1 .

It is often believed that if a treatment effect is random but a constant effect is assumed, then its mean effect would be estimated. However, this is false as the following explains.

The staggered DD literature revealed that, if a constant effect is assumed when the true effect is heterogeneous, then the popular two-way fixed effect estimator is consistent to a weighted average of heterogeneous effects with some weights negative. This paints a rather pessimistic picture, but positive findings also exist: if OLS is applied to a constant-effect model when the effect is actually heterogeneous in W, then the OLS is consistent to an OW average of heterogeneous effects; see Lee and Han (2024) and references therein. This kind of positive scenario was already noted when Kim and Lee (2025) was reviewed. It would take another full paper to investigate the analogous question for proportional effect in general settings, which we thus eschew.

The simplest estimator for (3.5) is Poisson QMLE, whose maximand is:

∑ i Y i X i ′ b − exp X i ′ b , X i ≡ 1 , S i , Q i , D i , W i ′ ′ , b = b 2 , b τ , b q , b d , b w ′ ′

as in Poisson MLE. The first order condition at b = β is ∑ i Y i − exp X i ′ β X i = 0 for β ≡ β 2 , β τ , β q , β d , β w ′ ′ , and the second order derivative − ∑ i X i X i ′ ⁡ exp X i ′ b is n.d. for all b. Hence, just under E(Y|X) = exp(X′β), Poisson QMLE is consistent for β. For heterogeneous effects, we may use β d ( W ) = β d 0 + β d w ′ W in Poisson QMLE.

The difference between Poisson QMLE and Poisson MLE is that the first-order condition is taken just as a moment condition to estimate β in the former. Hence, the asymptotic variance of Poisson QMLE is estimated with a “sandwich-form” estimator. Since Poisson QMLE requires only E(Y|X) = exp(X′β), not the full distribution of Y|X, it is advocated by Santos Silva and Tenreyro (2006), which has been cited about 9,300 times in Google Scholar as of this writing.

3.3 Remarks

Here we make remarks on the applicability of the above RR identification and Poisson QMLE to count and zero-censored outcomes. The semiparametric exponential model (3.5) is appropriate for these, as it requires only the lower bound 0 ≤ Y.

3.3.1 Remarks on Count Outcomes

Suppose (3.5) holds under

Y = exp ( Y * ) , Y * ≡ β 2 + β τ S + β q Q + β d D + β w ′ W + U , E { exp ( U ) | W 2 , W 3 , Q , S } = E { exp ( U ) | W 2 , W 3 , Q } = 1 .

Then we can interpret β_d as the DD effect on Y*, whereas exp(β_d) − 1 is the proportional effect on Y. However, if Y is a count outcome with P(Y = y|X) = {exp(X′β)}^y exp{−exp(X′β)}/y! (i.e. Poisson distribution), then there is no Y*, and the proportional effect interpretation on the observed Y with RR is the only way to interpret the slope β_d of D in the exponential model. This statement applies also to count outcomes based on other distributions such as Negative Binomial.

If β_qτtQ with β_qτ ≠ 0 appears as a regressor in (3.3), then ID_RR fails due to β_qτtQ:

(3.7) exp β 3 + β q + 3 β q τ + w ′ β w exp β 2 + β q + 2 β q τ + w ′ β w / exp β 3 + w ′ β w exp β 2 + w ′ β w = exp ( β q τ ) ≠ 1 ;

call this “non-unity multiplicative trends” – a RR analog for non-parallel (i.e. additive) trends in DD. Using tQ as a regressor is an easy way to test/allow for non-unity multiplicative trends, but tQ cannot be used if only two periods are available, because using tQ is equivalent to using QS = D.

One way to allow for non-unity multiplicative trends with more than two waves is using tQ as an extra regressor as just noted, and using tQ in the usual linear DD appeared in, e.g. Goodman-Bacon (2018), Dobkin et al. (2018), and Hwang and Lee (2020). For panel data, tQ can be used as such, but for RCS, Q i τ ≡ Q i ∑ t S i t t should be used where S_it = 1 if i is sampled in period t and 0 otherwise. Then, the untreated group ratio/difference is allowed to change systematically with tQ over time, and any deviation from the systematic change is taken as the treatment effect. Using tQ can be generalized to using (tQ, t²Q) as in Friedberg (1998) and Wolfers (2006), and going further, (tQ, t²Q, t³Q, …) as in Lee (2016b).

Another way to deal with this kind of trend discrepancy is using triple ratios, or “ratio in ratios in ratios (RRR)” generalizing RR, analogously to triple differences (Lee 2016b) to allow for non-parallel trends in DD. With t = 1, 2, 3 available, let:

E Y t 0 | w , Q = 1 E Y t − 1 0 | w , Q = 1 / E Y t 0 | w , Q = 0 E Y t − 1 0 | w , Q = 0 = γ for t = 2,3 , ( I D R R R )

which allows ID_RR to be violated when γ ≠ 1 as follows.

With m_Qt(w) ≡ E(Y|w, Q, sampled at t), observe

The last two terms in [⋅] are both equal to γ to cancel each other. Hence, under ID_RRR, RRR identifies the same effect as RR identifies, even when ID_RR fails.

3.3.2 Remarks on Zero-Censored Outcomes

Consider a RCS zero-censored model:

(3.8) Y = max ( 0 , Y * ) = Y * 1 [ 0 < Y * ] , Y * ≡ β 2 + β τ S + β q Q + β d D + β w ′ W + U ⇒ E ( Y | X ) = E ( Y * 1 [ 0 < Y * ] | X ) .

Since E(Y|X) = E(Y*1[0 < Y*]|X) is non-negative without any upper bound, the exponential regression model (3.5) can be adopted, although it may not be as appealing as for count outcomes because the transformation max(0, ⋅) is not smooth.

Santos Silva and Tenreyro (2011) showed that the exponential regression holds for (3.8) if Y i = ∑ j = 1 M i Z i j , where M_i is a non-negative integer random variable such as Poisson count, and (Z_i1, Z_i2, …) are iid positive random variables with Z_ij⨿M_i|X_i; Y = 0 occurs if M = 0. Then,

It is not clear what Y* is here, but the interpretation of exp(β_d) − 1 as a proportional effect on Y still holds regardless of what Y* might be.

An example for Y i = ∑ j = 1 M i Z i j is that Y_i is the person-i expenditure on unfrozen fish in a year, Z_ij is the expenditure on unfrozen fish in month j, M_i is the number of the unfrozen-fish-purchasing months, Q_i = 1 if living close enough to sea to access unfrozen fish, and D_i = Q_iS_i is a policy to increase unfrozen fish prices. Here, M_i is how frequently unfrozen fish are purchased, which is unlikely to be affected by the policy because one cannot stock up on unfrozen fish, whereas Z_ij would be affected.

4 Ratio in Odds Ratios (ROR)

This section studies ROR. We examine the identification aspect first, followed by logit-based estimation for binary and fractional outcomes, and then by a review on Tchetgen et al. (2024). ROR is also applicable to multinomial outcome, but it is presented (along with a simulation study) in the Appendix due to the complexity involving multiple equations and additional notation.

4.1 Proportional Odds Effect Identification with ROR

For binary Y, define the ‘odds conditional on (W = w, Q = q, S = s)’ for RCS as

(4.1) R q s ( Y ; w ) ≡ P ( Y = 1 | w , Q = q , S = s ) P ( Y = 0 | w , Q = q , S = s ) which leads to R 11 ( Y ; w ) = R 11 Y 3 1 ; w , R 01 ( Y ; w ) = R 01 Y 3 0 ; w , R 10 ( Y ; w ) = R 10 Y 2 0 ; w , R 00 ( Y ; w ) = R 00 Y 2 0 ; w .

Also define ‘Ratio in Odds Ratios (ROR) conditional on W = w’:

R O R ( Y ; w ) ≡ R 11 ( Y ; w ) R 10 ( Y ; w ) / R 01 ( Y ; w ) R 00 ( Y ; w ) .

The identification condition to be invoked for ROR is

R O R ( Y 0 ; w ) = R 11 Y 3 0 ; w R 10 Y 2 0 ; w / R 01 Y 3 0 ; w R 00 Y 2 0 ; w = 1 , ( I D R O R )

where R 11 Y 3 0 ; w is a counterfactual. Just as in (3.1), ID_ROR dictates that the counterfactual R 11 Y 3 0 ; w be constructed multiplicatively:

R 11 Y 3 0 ; w = R 10 Y 2 0 ; w × R 01 Y 3 0 ; w R 00 Y 2 0 ; w .

Doing analogously to (3.2), ROR(Y; w) − 1 is the ‘proportional odds effect on the treated at the post-treatment period t = 3’:

(4.2) R O R ( Y ; w ) − 1 = R 11 ( Y ; w ) R 10 ( Y ; w ) / R 01 ( Y ; w ) R 00 ( Y ; w ) − 1 = R 11 Y 3 1 ; w R 10 Y 2 0 ; w / R 01 Y 3 0 ; w R 00 Y 2 0 ; w − 1 = R 11 Y 3 1 ; w R 11 Y 3 0 ; w ⋅ R 11 Y 3 0 ; w R 10 Y 2 0 ; w / R 01 Y 3 0 ; w R 00 Y 2 0 ; w − 1 = R 11 Y 3 1 ; w R 11 Y 3 0 ; w − 1 = R 11 Y 3 1 ; w − R 11 Y 3 0 ; w R 11 Y 3 0 ; w ( under ID ROR ) .

Although odds ratio is extensively used in the medical and epidemiological literature, still it is not necessarily easy to interpret the proportional odds effect. For this, suppose Y = 1 is a rare event in the sense

(4.3) P Y 3 1 = 0 | w , Q = 1 P Y 3 0 = 0 | w , Q = 1 ≃ 1 for all w ;

e.g. Y = 1 is a rare cancer occurrence such that P Y 3 d = 0 | w , Q = 1 ≃ 1 for all w and d = 0, 1. Under (4.3), R O R ( Y ; w ) = R 11 Y 3 1 ; w / R 11 Y 3 0 ; w in (4.2) becomes

Hence, the proportional odds effect (4.2) becomes the RR proportional effect (3.2):

R O R ( Y ; w ) − 1 ≃ E Y 3 1 − Y 3 0 | w , Q = 1 E Y 3 0 | w , Q = 1 under the rare event condition ( 4.3 ) .

ROR(Y; w) can be estimated nonparametrically by substituting sample analogs into the components of ROR(Y; w). However, as was the case for DD and RR(w), this is not what practitioners would do. Instead, we apply logistic regression next.

4.2 Logit for Binary Outcome

Analogously to (3.3), suppose that a panel data logistic model holds for Y i t d :

(4.4) E Y t d | W 2 , W 3 , Q = E Y t d | W t , Q = exp β t + β q Q + β d d + β w ′ W t 1 + exp β t + β q Q + β d d + β w ′ W t .

The logistic panel data model for Y i t 0 renders, with β_τ ≡ β₃ − β₂,

(4.5) R 11 Y 3 0 ; w = exp β 2 + β τ + β q + β w ′ w , R 01 Y 3 0 ; w = exp β 2 + β τ + β w ′ w , R 10 Y 2 0 ; w = exp β 2 + β q + β w ′ w , R 00 Y 2 0 ; w = exp β 2 + β w ′ w .

Hence, ID_ROR holds analogously to (3.4), because (4.5) yields

R O R ( Y 0 ; w ) = R 11 Y 3 0 ; w R 10 Y 2 0 ; w / R 01 Y 3 0 ; w R 00 Y 2 0 ; w = 1 .

For the corresponding RCS, Y = ( 1 − S ) Y 2 0 + S ( 1 − Q ) Y 3 0 + D Y 3 1 holds. Analogously to (3.5) and (3.6), take E(⋅|W₂, W₃, Q, S) on Y and invoke (4.4) to get

(4.6) E ( Y | W 2 , W 3 , Q , S ) = exp β 2 + β τ S + β q Q + β d D + β w ′ W 1 + exp β 2 + β τ S + β q Q + β d D + β w ′ W , W ≡ ( 1 − S ) W 2 + S W 3 .

Also, R 11 Y 3 1 ; w = exp β 2 + β τ + β q + β d + w ′ β w and R 11 Y 3 0 ; w in (4.5) yield

R O R ( Y ; w ) − 1 = R 11 Y 3 1 ; w / R 11 Y 3 0 ; w − 1 = exp ( β d ) − 1 .

Estimate β_d by the MLE with (4.6) to use exp(β_d) − 1 as the proportional odds effect, which is also the RR proportional effect when Y = 1 is a rare event as in (4.3).

Suppose β_qτtQ with β_qτ ≠ 0 appears as an extra regressor in (4.4). Then the parallel trends do not hold for the latent Y*. The appearance of β_qτtQ also ruins ID_ROR for binary Y because ID_ROR becomes (3.7), just as β_qτtQ ruins ID_RR in (3.7). As in (3.7), using tQ is an easy way to test or allow for non-parallel trends in Y*. Overall, the comments made for (3.7) hold more or less the same for ROR as well.

To allow for heterogeneous effect, suppose now that the slope of d in (4.4) is β_d(W_t); e.g. β d ( W t ) = β d 0 + β d w ′ W t . Then (4.6) and ROR(Y; w) − 1 become, respectively,

E ( Y | W , Q , S ) = exp β 2 + β τ S + β q Q + β d ( W ) D + β w ′ W 1 + exp β 2 + β τ S + β q Q + β d ( W ) D + β w ′ W , R O R ( Y ; w ) − 1 = exp { β d ( w ) } − 1 .

ROR is not applicable to ordinal outcomes, but they can be reduced to binary in multiple ways, and then the overlapping information in those ways can be combined with minimum distance estimation (see, e.g. Lee 2005, 2015).

4.3 Logit for Fractional Outcome

When Y takes on a value in [0,1], Y is a fractional outcome; e.g. the share of asset invested in stocks. There are two types of fractional outcomes: (i) P(Y = 0 or Y = 1) = 0 and (ii) P(Y = 0 or Y = 1) > 0. Since logistic regression always gives a value in (0,1), it can be adopted for type-(i) fractional outcome. As for type (ii), analogously to max(0, Y*), we can use Y = max{0, min(Y*, 1)}. Since max{0, min(⋅, 1)} is not smooth, one may object to adopting a logistic model for type (ii), but the following provides a justification under 0 < E(Y|X) < 1 for all X.

For a function B_X of X with 0 < B_X < 1 for all X, consider maximizing

(4.7) E { Y ⁡ ln B X + ( 1 − Y ) ln ( 1 − B X ) } = E [ E ( Y | X ) ln B X + { 1 − E ( Y | X ) } ln ( 1 − B X ) ]

with respect to (wrt) B_X. Differentiate (4.7) wrt B_X to obtain

E E ( Y | X ) B X − 1 − { 1 − E ( Y | X ) } ( 1 − B X ) − 1 { = 0 when B X = E ( Y | X ) } .

Further differentiate this wrt B_X to see that B_X = E(Y|X) is the unique maximizer:

E − E ( Y | X ) B X − 2 − { 1 − E ( Y | X ) } ( 1 − B X ) − 2 < 0 .

Using this fact, Papke and Wooldridge (1996) maximized wrt b:

∑ i Y i ⁡ ln exp X i ′ b 1 + exp X i ′ b + ( 1 − Y i ) ln 1 1 + exp X i ′ b under E ( Y | X ) = exp ( X ′ β ) 1 + exp ( X ′ β ) ;

X_i and b were defined for the Poisson QMLE. The first-order condition is

∑ i Y i − exp X i ′ b 1 + exp X i ′ b X i = 0 ( satisfied at b = β ) .

As in Poisson QMLE, a “sandwich form” variance estimator should be used.

4.4 ROR and Extended Propensity Score

A disadvantage of proportional odds effect is its interpretation difficulty, unless the rare event condition (4.3) holds. For this, Tchetgen et al (2024; “TPR”) proposed to find E Y 3 1 − Y 3 0 | W , Q = 1 under a condition close to ID_ROR, as explained in this subsection. Although we assume discrete Y to ease exposition here, since each value of y ≠ 0 is paired with y = 0, one might think of Y just as binary.

Define: for y ≠ 0,

(4.8) R qsy ( Y ; w ) ≡ P ( Y = y | w , Q = q , S = s ) P ( Y = 0 | w , Q = q , S = s ) which leads to R 11y ( Y ; w ) = R 11y Y 3 1 ; w , R 01y ( Y ; w ) = R 01y Y 3 0 ; w , R 10y ( Y ; w ) = R 10y Y 2 0 ; w , R 00y ( Y ; w ) = R 00y Y 2 0 ; w .

Also, assume “odds ratio equi-confounding” which generalizes ID_ROR:

β 2 ( w , y ) = β 3 ( w , y ) where β t ( w , y ) ≡ ln P Y t 0 = y | w , Q = 1 / P Y t 0 = 0 | w , Q = 1 P Y t 0 = y | w , Q = 0 / P Y t 0 = 0 | w , Q = 0 ;

with y = 1, ‘β₂(w, 1) = β₃(w, 1) for all w’ is equivalent to ID_ROR.

Using the Baye’s rule, rewrite β_t(w, y) as:

(4.10) ⟺ ln P Q = 1 | w , Y t 0 = y 1 − P Q = 1 | w , Y t 0 = y = ln P Q = 1 | w , Y t 0 = 0 P Q = 0 | w , Y t 0 = 0 + β t ( w , y )

putting the last term of (4.9) on the opposite side. TPR calls P Q = 1 | w , Y t 0 = y “the extended propensity score function (given y)”.

TPR approximates the first term on the right-hand side of (4.10) linearly with η t ′ w , and β_t(w, y) with α w ′ w + α y y (no t in α’s, due to β₂(w, y) = β₃(w, y)), where η_t and α’s are parameters; TPR also allows the interaction term α w y ′ w y , which is omitted here. The logit of P Q = 1 | w , Y t 0 = y in the left-hand side of (4.10) renders:

P Q = 1 | W , Y t 0 = exp η t ′ W + α w ′ W + α y Y t 0 1 + exp η t ′ W + α w ′ W + α y Y t 0 = exp ( η t + α w ) ′ W + α y Y t 0 1 + exp ( η t + α w ) ′ W + α y Y t 0 .

Now, η 2 ′ + α w ′ , α y ′ can be estimated with the logistic MLE using only the t = 2 observations due to Y 2 = Y 2 0 . However, η₃ + α_w cannot be estimated in the analogous way because the Q = 1 group does not have Y 3 0 , which is addressed next.

Observe:

P Q = 1 | W , Y 3 0 = P Q = 0 | W , Y 3 0 ⋅ exp ( η 3 + α w ) ′ W + α y Y 3 0

which implies

(4.11) E ( 1 − Q ) exp { ( η 3 + α w ) ′ W + α y Y 3 } − Q | W = 0 ⇒ E ( E ( 1 − Q ) exp ( η 3 + α w ) ′ W + α y Y 3 0 − Q | W , Y 3 0 | W ) = 0 ⇒ E ( E ( 1 − Q ) exp ( η 3 − η 2 ) ′ W + ( η 2 + α w ) ′ W + α y Y 3 0 − Q | W , Y 3 0 | W ) = 0 ⇒ E ( E ( 1 − Q ) exp ( η 3 − η 2 ) ′ W + ( η 2 + α w ) ′ W + α y Y 3 0 − Q | W , Y 3 0 ⋅ W ) = 0 .

The last unconditional moment condition identifies η₃ − η₂ as the solution, if η₂ + α_w and α_y are known, for which β_t(w, y) should be time-constant, so that η₂ + α_w and α_y found at t = 2 can be substituted into the moment condition at t = 3 to identify η₃ − η₂. Henceforth, write (η₃ − η₂)′W + (η₂ + α_w)′W as (η₃ + α_w)′W as in (4.11).

With all parameters identified, the DD effect E Y 3 1 | Q = 1 − E Y 3 0 | Q = 1 is:

(4.12) E ( Y 3 | Q = 1 ) − E ( 1 − Q ) Y 3 ⁡ exp ( η 3 + α w ) ′ W + α y Y 3 E ( 1 − Q ) exp ( η 3 + α w ) ′ W + α y Y 3 .

To understand the second term, observe that its numerator is equal to:

E E ( 1 − Q ) Y 3 0 ⁡ exp { ( η 3 + α w ) ′ W + α y Y 3 0 } | W , Y 3 0 = E Y 3 0 ⁡ exp ( η 3 + α w ) ′ W + α y Y 3 0 ⋅ E ( 1 − Q ) | W , Y 3 0 = E Y 3 0 ⁡ exp ( η 3 + α w ) ′ W + α y Y 3 0 ⋅ 1 1 + exp ( η 3 + α w ) ′ W + α y Y 3 0 = E Y 3 0 P Q = 1 | W , Y 3 0 = E E Y 3 0 Q | W , Y 3 0 = E Y 3 0 Q .

Analogously, the denominator of the second term in (4.12) is E(Q), which thus makes the second term of (4.12) equal to E Y 3 0 | Q = 1 .

Evaluating the contribution of TPR, they turn ROR into something that involves only P(Q|⋅), using the Bayes’ rule. This way, one has to specify a binary model for Q, not for Y, which renders no advantage for binary Y though. For non-binary Y, such as ordinal, multinomial or even continuous, not specifying the outcome probability/density would be an advantage. However, it is not clear whether β₂(w, y) = β₃(w, y) holds or not for a general Y, as ID_ROR and its generalization β₂(w, y) = β₃(w, y) were motivated by the (multinomial) logit form of the Y t 0 probability.

This problem notwithstanding, the main contribution of TPR is that, given the difficulty of interpreting proportional odds effect unless the rare event condition is invoked, TPR showed a way to find the usual DD effect under the ID_ROR-type condition β₂(w, y) = β₃(w, y), avoiding the interpretation difficulty of proportional odds effect.

5 Empirical Analysis

In this section, we estimate the effects of the Affordable Care Act Dependent Coverage Provision (‘DCP’) on various health outcomes. Under the DCP that went into effect in September 2010, dependents can remain on the parent’s private health plan until age 26. The treatment group is dependents aged 23–25, and the control group is dependents aged 27–29; 26 was excluded due to the treatment status ambiguity. Our data are from the Behavioral Risk Factor Surveillance System for 2007–2013, which is health-related telephone surveys in the U.S. Almost the same data were used in Barbaresco et al (2015) (‘BCQ’, henceforth), with small differences occurring due to updates, imputed values, data cleaning, etc.

BCQ considered 18 outcomes, of which we use 12. Each outcome variable has a different sample size, as we replaced “Don’t Know” and “Refused” with missing values. With the sample size in brackets [⋅], the 12 outcome variables are: (1) ‘any health insurance’ [127,618], (2) ‘any primary (care) doctor’ [127,533], (3) needed medical care in the past year not taken due to cost (‘cost blocked care’) [108,433], (4) current smoker [126,557], (5) ‘risky drinker (in the past 30 days)’ [122,035], (6) ‘obese (BMI ≥ 30)’ [121,294], (7) ‘pregnant (while) unmarried’ [40,006], (8) ‘(alcoholic) drinks (in the past) 30 days’ [121,845], (9) BMI [121,290], (10) days of the last 30 not in good mental health (‘days poor mental’) [125,681], (11) days of the last 30 not in good physical health (‘days poor physical’) [125,766], and (12) days of the last 30 with health-related limitations (‘days health limits’) [71,079]. The first seven outcomes are binary, and the remaining five are non-negative (counts or continuous).

Table 1 presents summary statistics on covariates: age, gender, race, marital status, education, state unemployment rate, ‘any DCP’ for any state mandate on DCP although the dependent may not be covered, household income, the number of children, ‘cell phones only’ (vs. cell phone plus landline), student, and unemployed. Because the treatment group is younger than the control group by 2–6 years, the treatment group has fewer married, fewer college degree, lower household income, fewer children, more students, and more unemployed. Also, the treatment group has the lower state unemployment rate, higher any DCP, and higher cell phone only.

Table 1:

Summary statistics of covariates: mean & standard deviation (SD).

Covariates	Treated	Control	Covariates	Treated	Control
	Mean (SD)	Mean (SD)		Mean (SD)	Mean (SD)
Age (age 23 omitted)			Household income (less than $10 K omitted)
Age 24	0.35 (0.48)	–	$10 K–$15 K	0.07 (0.26)	0.05 (0.22)
Age 25	0.32 (0.47)	–	$15 K–$20 K	0.10 (0.30)	0.08 (0.27)
Age 27	–	0.31 (0.46)	$20 K–$25 K	0.12 (0.32)	0.10 (0.30)
Age 28	–	0.34 (0.48)	$25 K–$35 K	0.14 (0.35)	0.13 (0.33)
Age 29	–	0.35 (0.48)	$35 K–$50 K	0.16 (0.37)	0.16 (0.37)
Female	0.51 (0.50)	0.51 (0.50)	$50 K–$75 K	0.14 (0.35)	0.18 (0.39)
Race (non-Hispanic whites omitted)			$75 K $ over	0.19 (0.39)	0.24 (0.43)
Black	0.11 (0.31)	0.11 (0.32)	Number of children
Hispanic	0.23 (0.42)	0.22 (0.41)	1	0.23 (0.42)	0.23 (0.42)
Others	0.09 (0.28)	0.08 (0.27)	2	0.16 (0.36)	0.23 (0.42)
Married	0.30 (0.46)	0.56 (0.50)	3	0.05 (0.23)	0.11 (0.31)
Education (less than HS degree omitted)			4	0.02 (0.13)	0.04 (0.19)
High school (HS)	0.28 (0.45)	0.26 (0.44)	5 or more	0.01 (0.09)	0.02 (0.12)
Non-4-yr college	0.30 (0.46)	0.27 (0.44)
College graduate	0.31 (0.46)	0.36 (0.48)	Cell phone only	0.70 (0.46)	0.67 (0.47)
State unemp. rate	7.23 (2.72)	7.37 (2.73)	Student	0.11 (0.31)	0.05 (0.23)
Any DCP	0.26 (0.44)	0.04 (0.21)	Unemployed	0.13 (0.34)	0.12 (0.33)

Any DCP: the state has any DCP mandate despite that the person is not covered.

Let ‘Lin-DD’ stand for the linear model DD using (1.2). Table 2 shows β_qτ (non-parallel/multiplicative trends) and β_d (effect) estimates, although the effect of interest is the proportional effect exp(β_d) − 1 due to Y being an LDV. Poisson QMLE estimates are β ̃ q τ and β ̃ d , and Lin-DD estimates ignoring the LDV nature are β ̂ q τ and β ̂ d . In Table 2, age-clustering-robust standard errors are computed using the survey weights.

Table 2:

Non-parallel/multiplicative trend β_qτ & effect β_d: estimate (t-value).

Outcome variable	RR and ROR		Lin-DD (linear model DD)
	β ̃ q τ	β ̃ d	DD β ̂ q τ	DD β ̂ d
	Estimate (tv)	Estimate (tv)	Estimate (tv)	Estimate (tv)
Binary outcome
Any health insurance	−0.009 (−0.27)	0.407 (3.06)	−0.002 (−0.34)	0.068 (2.88)
Any primary doctor	0.010 (0.39)	0.145 (1.28)	0.002 (0.34)	0.028 (1.18)
Cost blocked care	0.014 (2.78)	−0.167 (−1.82)	0.003 (3.54)	−0.029 (−1.97)
Current smoker	−0.053 (−2.70)	0.204 (2.89)	−0.009 (−2.86)	0.035 (3.15)
Risky drinker	0.027 (1.63)	−0.092 (−2.44)	0.005 (1.90)	−0.019 (−2.88)
Obese	0.032 (1.90)	−0.081 (−1.07)	0.007 (1.86)	−0.018 (−1.02)
Pregnant unmarried	0.008 (0.12)	−0.079 (−0.36)	0.000 (0.11)	−0.003 (−0.39)
Non-negative outcome
Drinks 30 days	−0.085 (−3.48)	0.263 (3.05)	−1.346 (−4.03)	4.242 (3.04)
BMI	−0.001 (−0.83)	−0.004 (−0.73)	−0.035 (−0.86)	−0.104 (−0.66)
Days poor mental	−0.004 (−0.15)	0.050 (0.43)	−0.019 (−0.18)	0.203 (0.43)
Days poor physical	−0.037 (−0.94)	0.189 (1.18)	−0.087 (−0.96)	0.423 (1.13)
Days health limits	0.019 (0.47)	0.052 (0.30)	0.050 (0.40)	0.149 (0.28)

Three main findings emerge from Table 2, which are also seen in the simulation part of the Appendix: (i) RR and ROR estimates differ much from Lin-DD estimates; (ii) ROR estimates are overall greater than Lin-DD whereas RR estimates are overall smaller than Lin-DD; and (iii) tests for β_qτ = 0 in Lin-DD and in RR or ROR give the same conclusions at the 5 % significance level. Also, comparing RR, ROR and Lin-DD in their qualitative conclusions on testing for β_d = 0, they lead to the same conclusions at the 5 % significance level, except for ‘cost blocked care’.

Turning to interpreting effect magnitude, proportional odds effects for binary outcomes are a little difficult to interpret; e.g. DCP increases the odds of ‘any health insurance’ by about 41 %, an approximation to exp(0.41) − 1 ≃ 51 %. This should not be taken as a drastic effect, because odds ratios can easily take on large values, which is, in fact, one of the reasons why some researchers prefer odds ratios. Compared with the overall large magnitudes in proportional odds effects for binary outcomes, the proportional effect magnitudes for non-negative outcomes are in a smaller scale and easier to interpret, ranging just over −0.004 to 0.263; e.g. DCP increases ‘drinks 30 days’ by about 26 %, an approximation to exp(0.26) − 1 ≃ 30 %. As an example for proportional odds effects becoming proportional effects for rare events, unmarried pregnancies are fairly rare (4–5 %) in our data, and consequently, we can interpret the ROR estimate −0.078 for ‘pregnant unmarried’ as a 8 % decrease due to DCP.

BCQ checked out the parallel trend assumption with graphs plotting the pre-treatment trends across the treatment and control groups. BCQ also estimated their models using different time periods or using more aggregated data. Whereas these are informal ways of testing for parallel trends, our approach of using tQ as an extra regressor provides a formal yet simple way of testing parallel/unity-multiplicative trends. The β ̃ q τ estimates reveal that parallel trend assumption in Y* and the analogous ID_RR/ID_ROR assumptions do not hold at least for ‘cost blocked care’, ‘current smoker’, and ‘drinks 30 days’ while tests for β_qτ = 0 in Lin-DD give almost the same conclusions.

To appreciate better how much difference allowing β_qτ ≠ 0 makes, Table 3 repeats Table 2 under the restriction β_qτ = 0 (i.e. without using tQ as a regressor). The differences between Tables 2 and 3 are substantial both in terms of effect magnitude and t-value. In RR and ROR, only ‘any health insurance’ maintained their statistical significance, whereas ‘current smoker’, ‘risky drinker’ and ‘drinks 30 days’ become misleadingly insignificant by imposing β_qτ = 0 falsely. Also, ‘any primary doctor’, BMI, ‘days poor physical’ and ‘days health limits’ become significant by imposing β_qτ = 0 unnecessarily. In Lin-DD as well, only ‘any health insurance’ maintains its statistical significance in Tables 2 and 3, whereas the statistical significance of seven other outcomes is switched.

Table 3:

Effects under parallel/multiplicative trends.

Outcome variable	RR and ROR	Lin-DD
	Estimate (tv)	Estimate (tv)
Binary outcome
Any health insurance	0.375 (4.57)	0.061 (4.41)
Any primary doctor	0.178 (5.26)	0.035 (4.76)
Cost blocked care	−0.113 (−1.34)	−0.019 (−1.38)
Current smoker	0.016 (0.49)	0.003 (0.49)
Risky drinker	0.006 (0.23)	0.001 (0.13)
Obese	0.033 (1.06)	0.008 (1.10)
Pregnant unmarried	−0.049 (−0.58)	−0.002 (−0.57)
Non-negative outcome
Drinks 30 days	−0.037 (−0.50)	−0.569 (−0.39)
BMI	−0.009 (−3.21)	−0.227 (−3.19)
Days poor mental	0.036 (0.82)	0.136 (0.76)
Days poor physical	0.057 (2.32)	0.113 (1.90)
Days health limits	0.121 (3.05)	0.329 (2.54)

The main finding in BCQ is that DCP increases ‘any health insurance’, ‘any primary doctor’ and ‘risky drinker’, but decreases BMI. This finding is similar to that of the RR and ROR column in Table 3, except for ‘risky drinker’ that is insignificant in Table 3. This similarity is due to β_qτ = 0 assumed in both BCQ and Table 3.

Since Table 3 imposes the unnecessary restriction β_qτ = 0, it is interesting to compare the finding in BCQ to that in Table 2. The RR and ROR column of Table 2 reveals significantly increasing effects on ‘any health insurance’, ‘current smoker’ and ‘drinks 30 days’, and a significantly decreasing effect on ‘risky drinker’. Hence, only the increasing effect on ‘any health insurance’ is shared by BCQ and the RR and ROR column of Table 2; the sign of ‘risky drinker’ changes across BCQ and the RR and ROR column of Table 2. Overall, the differences due to allowing β_qτ ≠ 0 are large.

6 Conclusions

Difference in Differences (DD) is one of the most popular approaches in finding the effect of a binary treatment D on an outcome Y. However, DD is suitable for linear models, and consequently, applying DD to limited dependent variables (LDV’s), or more generally to nonlinear models, has been problematic. Many researchers with LDV’s simply ignore the LDV nature to use a linear model.

The goal of this paper is to explore what can be done in this case, adopting the framework of generalized linear models (GLM) with link functions. Because several papers have been published on GLM for DD with LDV since an early version of our paper in 2021, we reviewed the literature including recent studies not covered by the other reviews. The main recommendation is using Poisson QMLE for non-negative (such as count or zero-censored) Y and (multinomial) logit MLE for binary, fractional or multinomial Y. This agrees with the other studies adopting GLM. Despite the overlap, however, this paper differs from the other reviews as follows.

First, we focused on proportional (odds) effect using ‘ratio in ratios (RR)’ or ‘ratio in odds ratios (ROR)’, instead of trying to obtain the usual average effect on the treated (or on the population) which requires an unnecessary extra step. Second, as was noted just above, we reviewed the recent literature on DD with LDV that has not been covered by the other reviews. Third, we paid more attention to covariate (W) types, instead of simply assuming a time-constant W. Fourth, we made further minor contributions, such as verifying the identification condition for repeated cross sections (RCS) and proposing triple ratios to deal with non-parallel multiplicative trends.

Our simulation study revealed that RR and ROR can give much different findings from DD, although our empirical study showed that the DD findings may not differ much from RR and ROR as well. Also, using a power function of t times the policy qualification dummy to account for non-parallel/multiplicative trends made big differences in our empirical findings, compared to imposing the parallel/multiplicative trend restriction unnecessarily from the outset.

Corresponding author: Myoung-jae Lee, Department of Economics, Korea University, Seoul 02841, South Korea; and Department of Finance, Accounting & Economics, University of Nottingham Ningbo China, Ningbo 315100, China, E-mail: myoungjae@korea.ac.kr

Funding source: National Research Foundation of Korea

Award Identifier / Grant number: RS-2024-00337766.

Acknowledgments

The authors are grateful to the Editor and a reviewer for their helpful comments and directing the authors’ attention to the relevant literature.

Data and program availability: The data and programs used in this paper will be made publicly available on a public depository, once the paper is accepted.
Compliance with ethical standard, no conflict of interest, and no AI usage: No human/animal subject is involved in this research, and there is no conflict of interest to disclose. Also, no generative AI-related technology has been used for this paper.
Funding information: The research of Myoung-jae Lee has been supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (RS-2024-00337766).

Appendix

A.1 Simulation Study

Our simulation study addresses four types of LDV’s: (i) positive continuous, (ii) count, (iii) zero-censored, and (iv) binary. Poisson QMLE is applied to (i), (ii) and (iii), and logistic MLE to (iv); their estimates are compared with the linear model DD (‘Lin-DD’). Fractional Y is not tried because it is not yet clear how to generate fractional Y subject to the exponential regression, and multinomial Y is addressed separately below because it is inconceivable to apply Lin-DD to multinomial outcome.

In the following, we explain (i) and Table A1 in detail, from which it will be clear how (ii), (iii) and (iv) are dealt with and how to interpret the other tables. In all cases, the effect of interest is exp(β_d) − 1, which is the proportional (odds) effect, but we examine β_d mainly, because knowing β_d is equivalent to knowing exp(β_d) − 1.

For (i) positive continuous outcome, we generate Y_it for t = 0, 1, 2, 3:

(A.1) Y i t = exp ( β t + β q Q i + β q τ t Q i + β d D i t + U i t ) where P ( Q i = 0 ) = P ( Q i = 1 ) = 0.5 , D i t = Q i 1 [ t = 3 ] , U i 0 , U i 1 , U i 2 , U i 3 iid N ( 0,1 ) ⨿ Q i , β 0 = − 2 , β 1 = − 2 , β 2 = − 1 , β 3 = − 1 , β q = 0.5 , β q τ = 0 , 0.5 , β d = 0 , 0.5 ;

β_qτ = 0 makes the parallel trends hold in Y* and ID_RR hold in Y, but not β_qτ = 0.5.

From the Y_it’s in (A.1), the RCS outcome Y_i and its regressor X_i are obtained:

(A.2) S i is the sampled period for i , S i t ≡ 1 [ S i = t ] , P ( S i t = 1 ) = 0.25 for all t , Y i = ∑ t = 0 3 S i t Y i t , Q i τ ≡ Q i ∑ t = 1 3 S i t t , X i ≡ 1 , S i 1 , S i 2 , S i 3 , Q i , Q i τ , D i ′ for parameters { β 0 + ln ( 1.64 ) , β 1 − β 0 , β 2 − β 0 , β 3 − β 0 , β q , β q τ , β d } ;

1.64 comes from E{exp(U_it)} = 1.64 with U_it ∼ N(0, 1), which appears due to

E ( Y i t | Q i , S i ) = E exp ( β t + β q Q i + β q τ t Q i + β d D i t ) ⋅ E { exp ( U i t ) } .

That is, having E{exp(U_it)} = 1.64 and β₀ is equivalent to having E{exp(U_it)} = 1 and β₀ + ln(1.64). For the other LDV models, the RCS data are generated analogously.

Table A1 presents the results from 5,000 repetitions with N = 250 and 10,000; each entry shows the absolute bias (|Bias|), SD, and Root Mean Squared Error (RMSE). With N = 250, Lin-DD estimates β ̂ q τ and β ̂ d do sometimes better than the Poisson QMLE β ̃ q τ and β ̃ d , but this is due to the low SD’s; the |Bias| of Lin-DD β ̂ d is huge in several cases. With N = 10,000, the |Bias|’s for the Lin-DD estimates remain almost the same as those with N = 250 whereas the gaps in SD between Lin-DD and Poisson QMLE are reduced, and consequently, Poisson QMLE does better than Lin-DD. Using tQ solves the problem of ID_RR violation for Poisson QMLE, but not for Lin-DD; β ̂ q τ in Lin-DD is biased much even when β_qτ = 0. In short, Table A1 demonstrates that Lin-DD is highly biased when the true model is exponential for positive Y.

Table A1:

Positive Y: |Bias|, SD and (RMSE).

N = 250	β_qτ, β_d: 0, 0	β_qτ, β_d: 0.5, 0	β_qτ, β_d: 0, 0.5	β_qτ, β_d: 0.5, 0.5
β ̃ q τ	0.00 0.23 (0.23)	0.00 0.24 (0.24)	0.00 0.23 (0.23)	0.00 0.24 (0.24)
β ̃ d	0.01 0.60 (0.60)	0.01 0.60 (0.60)	0.01 0.60 (0.60)	0.01 0.60 (0.60)
DD β ̂ q τ	0.12 0.15 (0.19)	0.48 0.33 (0.59)	0.12 0.15 (0.19)	0.48 0.33 (0.59)
DD β ̂ d	0.08 0.46 (0.47)	1.05 1.38 (1.73)	0.08 0.56 (0.57)	3.48 1.96 (4.00)
N = 10,000
β ̃ q τ	0.00 0.04 (0.04)	0.00 0.04 (0.04)	0.00 0.04 (0.04)	0.00 0.04 (0.04)
β ̃ d	0.00 0.10 (0.10)	0.00 0.10 (0.10)	0.00 0.10 (0.10)	0.00 0.10 (0.10)
DD β ̂ q τ	0.12 0.02 (0.13)	0.48 0.05 (0.49)	0.12 0.02 (0.13)	0.48 0.05 (0.49)
DD β ̂ d	0.08 0.07 (0.11)	1.03 0.22 (1.05)	0.07 0.09 (0.11)	3.44 0.30 (3.45)

β_qτ = 0 for parallel trends in Y* & ID_RR in Y; β_d (exp(β_d) − 1) is the desired effect; β ̃ q τ , β ̃ d : Poisson QMLE; β ̂ q τ , β ̂ d : linear-model DD.

For (ii) count outcome, similarly to (A.1), Y_it is generated from the Poisson distribution with parameter exp(β_t + β_qQ_i + β_qτtQ_i + β_dD_it) for t = 0, 1, 2, 3. Then Y_i and X_i are generated as in (A.2), and Poisson QMLE is implemented. The same parameters as in (A.2) are estimated except for the intercept because ln(1.64) is no more present. Table A2 presents the simulation results, and what was mentioned for Table A1 applies to Table A2 almost word to word.

Table A2:

Poisson count Y: |Bias|, SD and (RMSE).

N = 250	β_qτ, β_d: 0, 0	β_qτ, β_d: 0.5, 0	β_qτ, β_d: 0, 0.5	β_qτ, β_d: 0.5, 0.5
β ̃ q τ	0.01 0.40 (0.40)	0.00 0.38 (0.38)	0.01 0.40 (0.40)	0.00 0.38 (0.38)
β ̃ d	0.00 0.85 (0.85)	0.00 0.78 (0.78)	0.01 0.84 (0.84)	0.00 0.78 (0.78)
DD β ̂ q τ	0.08 0.10 (0.13)	0.10 0.14 (0.17)	0.08 0.10 (0.13)	0.10 0.14 (0.17)
DD β ̂ d	0.06 0.31 (0.31)	0.61 0.47 (0.77)	0.17 0.33 (0.37)	1.88 0.53 (1.95)
N = 10,000
β ̃ q τ	0.00 0.06 (0.06)	0.00 0.05 (0.05)	0.00 0.06 (0.06)	0.00 0.05 (0.05)
β ̃ d	0.00 0.12 (0.12)	0.00 0.10 (0.10)	0.00 0.11 (0.11)	0.00 0.10 (0.10)
DD β ̂ q τ	0.08 0.02 (0.08)	0.10 0.02 (0.10)	0.08 0.02 (0.08)	0.10 0.02 (0.10)
DD β ̂ d	0.05 0.05 (0.07)	0.62 0.07 (0.63)	0.16 0.05 (0.16)	1.89 0.08 (1.89)

β_qτ = 0 for parallel trends in Y* & ID_RR in Y; β_d (exp(β_d) − 1) is the desired effect; β ̃ q τ , β ̃ d : Poisson QMLE; β ̂ q τ , β ̂ d : linear-model DD.

Table A3:

Zero-censored Y: |Bias|, SD and (RMSE).

N = 250	β_qτ, β_d: 0, 0	β_qτ, β_d: 0.5, 0	β_qτ, β_d: 0, 0.5	β_qτ, β_d: 0.5, 0.5
β ̃ q τ	0.01 0.31 (0.31)	0.01 0.32 (0.32)	0.01 0.31 (0.31)	0.01 0.32 (0.32)
β ̃ d	0.01 0.80 (0.80)	0.02 0.80 (0.80)	0.01 0.80 (0.80)	0.02 0.80 (0.80)
DD β ̂ q τ	0.12 0.19 (0.22)	0.47 0.42 (0.63)	0.12 0.19 (0.22)	0.47 0.42 (0.63)
DD β ̂ d	0.07 0.59 (0.60)	1.04 1.76 (2.04)	0.07 0.71 (0.72)	3.44 2.48 (4.24)
N = 10,000
β ̃ q τ	0.00 0.05 (0.05)	0.00 0.05 (0.05)	0.00 0.05 (0.05)	0.00 0.05 (0.05)
β ̃ d	0.00 0.12 (0.12)	0.00 0.13 (0.13)	0.00 0.12 (0.12)	0.00 0.13 (0.13)
DD β ̂ q τ	0.12 0.03 (0.13)	0.48 0.07 (0.49)	0.12 0.03 (0.13)	0.48 0.07 (0.49)
DD β ̂ d	0.08 0.09 (0.12)	1.03 0.28 (1.06)	0.07 0.11 (0.13)	3.43 0.39 (3.46)

β_qτ = 0 for parallel trends in Y* & ID_RR in Y; β_d (exp(β_d) − 1) is the desired effect; β ̃ q τ , β ̃ d : Poisson QMLE; β ̂ q τ , β ̂ d : linear model DD.

For (iii) zero-censored outcome, we use (3.9) where M_i ∼ Poisson(1) with P(M_i = 0) = 0.37 and Y i t = ∑ j = 0 M i Z ijt with Z_ijt = exp{β_t + β_qQ_i + β_qτtQ_i + β_dD_it + N(0, 1)}. The same parameters as in (A.2) are estimated except for the intercept because exp(1) from E(M) is added to β₀ in view of (3.9). Despite the big difference in the data generating processes, Table A3 differ little from Tables A1 and A2, and all comments made for Tables A1 and A2 apply to Table A3 as well. The similarities in the findings from Tables A1–A3 seem to stem from the common exponential regression specification.

For (iv) binary outcome, Y_it is generated with Logistic U_it:

Y i t = 1 [ 0 < β t + β q Q i + β q τ t Q i + β d D i t + U i t ] , U i 0 , U i 1 , U i 2 , U i 3 are iid Logistic .

β_qτ = 0 makes the parallel trends hold for Y*, and makes ID_ROR hold for Y.

Table A4:

Binary Y: |Bias|, SD and (RMSE).

N = 250	β_qτ, β_d: 0, 0	β_qτ, β_d: 0.5, 0	β_qτ, β_d: 0, 0.5	β_qτ, β_d: 0.5, 0.5
β ̃ q τ	0.01 0.50 (0.50)	0.01 0.51 (0.51)	0.01 0.50 (0.50)	0.01 0.51 (0.51)
β ̃ d	0.02 1.16 (1.16)	0.04 1.18 (1.19)	0.04 1.15 (1.15)	0.13 1.54 (1.55)
DD β ̂ q τ	0.02 0.07 (0.08)	0.35 0.08 (0.36)	0.02 0.07 (0.08)	0.35 0.08 (0.36)
DD β ̂ d	0.02 0.21 (0.21)	0.02 0.21 (0.21)	0.39 0.21 (0.45)	0.43 0.20 (0.48)
N = 10,000
β ̃ q τ	0.00 0.07 (0.07)	0.00 0.07 (0.07)	0.00 0.07 (0.07)	0.00 0.07 (0.07)
β ̃ d	0.00 0.17 (0.17)	0.00 0.17 (0.17)	0.00 0.17 (0.17)	0.00 0.17 (0.17)
DD β ̂ q τ	0.02 0.01 (0.03)	0.35 0.01 (0.35)	0.02 0.01 (0.03)	0.35 0.01 (0.35)
DD β ̂ d	0.02 0.03 (0.04)	0.02 0.03 (0.04)	0.39 0.03 (0.39)	0.43 0.03 (0.43)

β_qτ = 0 for parallel trends in Y* & ID_ROR in Y; β_d (exp(β_d) − 1) is the desired effect; β ̃ q τ , β ̃ d : logit estimates; β ̂ q τ , β ̂ d : linear-model DD.

Since the logistic regression is used in Table A4, the results in Table A4 differ much from those in Tables A1–A3. First, the overall magnitude of |Bias| is much smaller than in Tables A1–A3. Second, surprisingly, when β_qτ = β_d = 0, the Lin-DD estimates with almost zero bias do several times better than the logistic MLE estimates. Third, biases in Lin-DD are persistent even when N increases to 10,000, which implies that Lin-DD will be eventually dominated by logistic MLE for a large enough N. Nevertheless, less harm is seen in using Lin-DD for binary Y, compared with the other LDV’s.

A.2 Multinomial Logit for DD with Multinomial Outcome

A.2.1 Identification

For multinomial outcome Y taking on a value among 0, 1, …, C classes, define the ‘class-c odds’ (with the base class 0) conditional on (W = w, Q = q, S = s) as

R q s c ( Y ; w ) ≡ P ( Y = c | w , Q = q , S = s ) P ( Y = 0 | w , Q = q , S = s ) which implies R 11 c ( Y ; w ) = R 11 c Y 3 1 ; w , R 01 c ( Y ; w ) = R 01 c Y 3 0 ; w , R 10 c ( Y ; w ) = R 10 c Y 2 0 ; w , R 00 c ( Y ; w ) = R 00 c Y 2 0 ; w ,

analogously to (4.1). Also define ‘class-c ROR conditional on W = w’:

R O R c ( Y ; w ) ≡ R 11 c ( Y ; w ) R 10 c ( Y ; w ) R 01 c ( Y ; w ) R 00 c ( Y ; w ) .

The identification condition for ROR^c with multinomial outcome is

R O R c ( Y 0 ; w ) = R 11 c Y 3 0 ; w R 10 c Y 2 0 ; w R 01 c Y 3 0 ; w R 00 c Y 2 0 ; w = 1 . ( I D R O R c )

As in (4.2), ROR^c(Y; w) − 1 is equal to the ‘class-c proportional odds effect on the treated at the post-treatment period t = 3’:

R O R c ( Y ; w ) − 1 = R 11 c Y 3 1 ; w − R 11 c Y 3 0 ; w R 11 c Y 3 0 ; w under ID RORc .

Also, as in (4.3), if Y = c ≠ 0 is a rare event in the sense of (4.3), then

(A.3) R O R c ( Y ; w ) − 1 ≃ P Y 3 1 = c | w , Q = 1 − P Y 3 0 = c | w , Q = 1 P Y 3 0 = c | w , Q = 1

which is the class-c proportional effect on the treated at the post-treatment period.

A.2.2 Estimation

In panel multinomial choice with classes c = 0, 1, …, C, there are a few possibilities for regressors, depending on whether they vary across units, classes or times. Here, we consider three types of regressors: A_i varying only across units (e.g. race), H_it varying only across units and times (e.g. income), and W_ict varying across units, classes and times (e.g. expense from choosing class c). Let the ‘latent utility from class c’ of unit i at period t = 2, 3 be

(A.4) L itc d ≡ β t c + β q c Q i + β d c d + β a c ′ A i + β h c ′ H i t + β w c ′ W itc + U itc , c = 0,1 , … , C

where the error terms (U_i20, …, U_i2C, U_i30, …, U_i3C) are iid with the type-I extreme value distribution, and independent of all regressors at all times (‘strict exogeneity’).

The potential choice Y i t d with D = d is

Y i t d = ∑ j = 0 C j × 1 L itj d > L itk d for all k ≠ j ;

Y i t d takes on 0, 1, …, C, depending on which class gives the maximum utility. Using (A.4), the choice probabilities for the untreated Y i t 0 = c ∈ { 0,1 , … , C } are:

P Y i t 0 = c | Q i , A i , H i t , W i 20 , … , W i 2 C , W i 30 , … , W i 3 C = P Y i t 0 = c | Q i , A i , H i t , W it0 , … , W itC = exp β t c + β q c Q i + β a c ′ A i + β h c ′ H i t + β w c ′ W itc ∑ j = 0 C exp β t j + β q j Q i + β a j ′ A i + β h j ′ H i t + β w j ′ W itj = exp Δ β t c + Δ β q c Q i + Δ β a c ′ A i + Δ β h c ′ H i t − β w 0 ′ W it0 + β w c ′ W itc 1 + ∑ j = 1 C exp Δ β t j + Δ β q j Q i + Δ β a j ′ A i + Δ β h j ′ H i t − β w 0 ′ W it0 + β w j ′ W ijt , Δ β t j ≡ β t j − β t 0 , Δ β q j ≡ β q j − β q 0 , Δ β a j ≡ β a j − β a 0 , Δ β h j ≡ β h j − β h 0 ;

the last equality holds, dividing through by exp β t 0 + β q 0 Q i + β a 0 ′ A i + β h 0 ′ H i t + β w 0 ′ W it0 for the base class c = 0. The numerator of the last ratio is one for the base class 0. ID_RORc holds for P Y i t 0 = c | ⋅ , whose proof is similar to the proof for ID_ROR.

Analogously derive the model for P Y i t 1 = c | ⋅ , which then gives (i omitted)

R 11 c ( Y d ; w ) = exp Δ β 3 c + Δ β q c + Δ β d c d + Δ β a c ′ A + Δ β h c ′ H 3 − β w 0 ′ W 30 + β w c ′ W 3 c where Δ β d c ≡ β d c − β d 0 .

Since P Y i t 1 = c | ⋅ differs from P Y i t 0 = c | ⋅ only in the extra term Δβ_dc, we get the class-c proportional effect, if the rare event condition (4.3) holds:

R 11 c Y 3 1 ; w / R 11 c Y 3 0 ; w − 1 = exp ( Δ β d c ) − 1 .

In RCS, we observe Y ≡ (1 − S)Y₂ + SY₃ where Y_t = 0, …, C, along with Q, S and

A , H ≡ ( 1 − S ) H 2 + S H 3 , W 0 ≡ ( 1 − S ) W 20 + S W 30 , … , W C ≡ ( 1 − S ) W 2 C + S W 3 C ;

with i omitted. The choice probabilities are, with Δ²β_3j ≡ Δβ_3j − Δβ_2j, j = 0, 1…C,

P ( Y = c | Q , S , A , H , W 0 , … , W C ) = exp Δ β 2 c + Δ 2 β 3 c S + Δ β q c Q + Δ β d c D + Δ β a c ′ A + Δ β h c ′ H − β w 0 ′ W 0 + β w c ′ W c 1 + ∑ j = 1 C exp Δ β 2 j + Δ 2 β 3 j S + Δ β q j Q + Δ β d j D + Δ β a j ′ A + Δ β h j ′ H − β w 0 ′ W 0 + β w j ′ W j .

The numerator is 1 for c = 0, because all differences are relative to c = 0.

Because D alters the choice probability for class c by β_dc, the “net increase” in the propensity to choose class c relative to the base class 0 is Δβ_dc ≡ β_dc − β_d0, not β_dc. Estimate Δβ_d1, …, Δβ_dC with cross-section multinomial logit using the last display. Then, exp(Δβ_dc) − 1 is the class-c proportional odds effect relative to the class 0, and the class-c proportional effect as well when Y = c is a rare event in the sense of (4.3).

A.2.3 Simple Simulation Study for Multinomial Outcome

Table A5 presents the results from a simulation study for multinomial outcome.

Table A5:

Multinomial Y: 3 classes, N = 10,000, 5,000 repetitions.

	Class c = 1			Class c = 2
	True, \|Bias\|	SD, RMSE	AvgSD	True, \|Bias\|	SD, RMSE	AvgSD
Δβ_2c	−4.0, 0.019	0.15, 0.023	0.15	−4.0, 0.030	0.15, 0.024	0.15
Δ²β_3c	−1.0, 0.017	0.28, 0.078	0.28	−1.0, 0.011	0.28, 0.077	0.28
Δβ_qc	−0.5, 0.004	0.23, 0.055	0.23	−0.5, 0.006	0.23, 0.054	0.23
Δβ_dc	0.5, 0.004	0.41, 0.166	0.41	0.5, 0.007	0.41, 0.168	0.41
Δβ_ac	0.5, 0.003	0.16, 0.026	0.17	0.5, 0.005	0.17, 0.027	0.17
β _w0	0.0, 0.001	0.07, 0.004	0.07
β _wc	0.5, 0.000	0.09, 0.008	0.10	0.5, 0.001	0.09, 0.008	0.10

AvgSD is the average of the asymptotic SD estimates.

Our simulation study using the above P(Y = c|Q, S, A, H, W₀, …, W_C) with C = 2 has the following design (the error terms generated as in (A.4), and H_it excluded):

A ∼ U n i f o r m ( − 1,1 ) , W 2 c , W 3 c for c = 0,1,2 are iid N ( 0,1 ) , P ( Q = 1 ) = 0.5 , P ( S = 1 ) = 0.5 , β 20 = β 30 = β q 0 = β d 0 = β a 0 = β w 0 = 0 ( for class 0 ) , β 21 = β 22 = − 4 , β 31 = β 32 = − 5 , β q 1 = β q 2 = − 0.5 ( for classes 1 , 2 ) , β d 1 = β d 2 = 0.5 , β a 1 = β a 2 = 0.5 , β w 1 = β w 2 = 0.5 ( for classes 1 , 2 ) .

That is, the class-0 parameters are all zero, and the parameters of classes 1 and 2 are the same. Due to β₂₀ = β₃₀ = 0 but β₂₁ = β₂₂ = −4 and β₃₁ = β₃₂ = −5 (much smaller intercepts for classes 1 and 2 relative to class 0), the events Y = 1, 2 are rare.

Table A5 presents the simulation results, where each entry consists of true values (True), |Bias|, SD, RMSE, and the average of the asymptotic SD estimates (AvgSD). Overall, biases are very small, and AvgSD’s are almost the same as the (simulation) SD’s. The multinomial logit with RCS works well even for rare events Y = 1, 2.

References

Ai, Chunrong, and Edward C. Norton. 2003. “Interaction Terms in Logit and Probit Models.” Economics Letters 80 (1): 123–9. https://doi.org/10.1016/s0165-1765(03)00032-6.Search in Google Scholar

Aminian, Ali, Abbas Al-Kurd, Rickesha Wilson, James Bena, Hana Fayazzadeh, Tavankit Singh, et al.. 2021. “Association of Bariatric Surgery with Major Adverse Liver and Cardiovascular Outcomes in Patients with Biopsy-Proven Nonalcoholic Steatohepatitis.” JAMA 326 (20): 2031–42. https://doi.org/10.1001/jama.2021.19569.Search in Google Scholar PubMed PubMed Central

Anderson, Timothy S., Shoshana J. Herzig, Bocheng Jing, W. John Boscardin, Kathy Fung, Edward R. Marcantonio, et al.. 2023. “Clinical Outcomes of Intensive Inpatient Blood Pressure Management in Hospitalized Older Adults.” JAMA Internal Medicine 183 (7): 715–23. https://doi.org/10.1001/jamainternmed.2023.1667.Search in Google Scholar PubMed PubMed Central

Angrist, Joshua D. 1998. “Estimating the Labor Market Impact of Voluntary Military Service Using Social Security Data on Military Applicants.” Econometrica 66 (2): 249–88. https://doi.org/10.2307/2998558.Search in Google Scholar

Angrist, Joshua D., and Jörn-Steffen Pischke. 2009. Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton University Press.10.1515/9781400829828Search in Google Scholar

Ariella, Kahn-Lang, and Kevin Lang. 2020. “The Promise and Pitfalls of Differences-in-Differences: Reflections on 16 and Pregnant and Other Applications.” Journal of Business & Economic Statistics 38 (3): 613–20. https://doi.org/10.1080/07350015.2018.1546591.Search in Google Scholar

Athey, Susan, and Guido W. Imbens. 2006. “Identification and Inference in Nonlinear Difference-in-Differences Models.” Econometrica 74 (2): 431–97. https://doi.org/10.1111/j.1468-0262.2006.00668.x.Search in Google Scholar

Athey, Susan, and Guido W. Imbens. 2022. “Design-Based Analysis in Difference-in-Differences Settings with Staggered Adoption.” Journal of Econometrics 226 (1): 62–79. https://doi.org/10.1016/j.jeconom.2020.10.012.Search in Google Scholar

Barbaresco, Silvia, Charles J. Courtemanche, and Yanling Qi. 2015. “Impacts of the Affordable Care Act Dependent Coverage Provision on Health-Related Outcomes of Young Adults.” Journal of Health Economics 40: 54–68. https://doi.org/10.1016/j.jhealeco.2014.12.004.Search in Google Scholar PubMed

Blundell, Richard, and Monica Costa Dias. 2009. “Alternative Approaches to Evaluation in Empirical Microeconomics.” Journal of Human Resources 44 (3): 565–640. https://doi.org/10.1353/jhr.2009.0009.Search in Google Scholar

Borusyak, Kirill, Xavier Jaravel, and Jann Spiess. 2024. “Revisiting Event-Study Designs: Robust and Efficient Estimation.” The Review of Economic Studies 91 (6): 3253–85. https://doi.org/10.1093/restud/rdae007.Search in Google Scholar

Callaway, Brantly. 2023. “Difference-in-Differences for Policy Evaluation.” In Handbook of Labor, Human Resources and Population Economics, edited by K. F. Zimmermann, 1–61. Springer-Nature.10.1007/978-3-319-57365-6_352-1Search in Google Scholar

Callaway, Brantly, and Pedro H. C. Sant’Anna. 2021. “Difference-in-Differences with Multiple Time Periods.” Journal of Econometrics 225 (2): 200–30. https://doi.org/10.1016/j.jeconom.2020.12.001.Search in Google Scholar

Cataife, Guido, and Monica B. Pagano. 2017. “Difference in Difference: Simple Tool, Accurate Results, Causal Effects.” Transfusion 57 (5): 1113–4. https://doi.org/10.1111/trf.14063.Search in Google Scholar PubMed

Cheng, Chao, Fan Li, Laine E. Thomas, and Fan Li. 2022. “Addressing Extreme Propensity Scores in Estimating Counterfactual Survival Functions via the Overlap Weights.” American Journal of Epidemiology 191 (6): 1140–51. https://doi.org/10.1093/aje/kwac043.Search in Google Scholar PubMed

Choi, Jin-Young, and Myoung-Jae Lee. 2023. “Overlap Weight and Propensity Score Residual for Heterogeneous Effects: A Review with Extensions.” Journal of Statistical Planning and Inference 222: 22–37. https://doi.org/10.1016/j.jspi.2022.04.003.Search in Google Scholar

Choi, Jin-Young, Goeun Lee, and Myoung-Jae Lee. 2023. “Endogenous Treatment Effect for Any Response Conditional on Control Propensity Score.” Statistics & Probability Letters 196: 109747. https://doi.org/10.1016/j.spl.2022.109747.Search in Google Scholar

Ciani, Emanuele, and Paul Fisher. 2019. “Dif-in-Dif Estimators of Multiplicative Treatment Effects.” Journal of Econometric Methods 8 (1): 20160011. https://doi.org/10.1515/jem-2016-0011.Search in Google Scholar

De Chaisemartin, Clément, and Xavier d’Haultfoeuille. 2020. “Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects.” The American Economic Review 110 (9): 2964–96. https://doi.org/10.1257/aer.20181169.Search in Google Scholar

De Chaisemartin, Clément, and Xavier d’Haultfoeuille. 2023. “Two-Way Fixed Effects and Differences-in-Differences with Heterogeneous Treatment Effects: A Survey.” The Econometrics Journal 26 (3): C1–C30. https://doi.org/10.1093/ectj/utac017.Search in Google Scholar

Dobkin, Carlos, Amy Finkelstein, Raymond Kluender, and Matthew J. Notowidigdo. 2018. “The Economic Consequences of Hospital Admissions.” The American Economic Review 108 (2): 308–52. https://doi.org/10.1257/aer.20161038.Search in Google Scholar PubMed PubMed Central

Dukes, Oliver, and Stijn Vansteelandt. 2018. “A Note on G-estimation of Causal Risk Ratios.” American Journal of Epidemiology 187 (5): 1079–84. https://doi.org/10.1093/aje/kwx347.Search in Google Scholar PubMed

Freeman, Rain E., Quynh A. Nguyen, Sandy Haaf, Jason L. Salemi, and Catherine M. Bulka. 2025. “Florida Red Tides and Adverse Birth Outcomes: Investigating Harmful Algae as a Novel Environmental Exposure.” Chemosphere 385: 144536. https://doi.org/10.1016/j.chemosphere.2025.144536.Search in Google Scholar PubMed

Friedberg, L. 1998. “Did Unilateral Divorce Raise Divorce Rates? Evidence from Panel Data.” The American Economic Review 88 (3): 608–27.10.3386/w6398Search in Google Scholar

Goodman-Bacon, Andrew. 2018. “Public Insurance and Mortality: Evidence from Medicaid Implementation.” Journal of Political Economy 126 (1): 216–62. https://doi.org/10.1086/695528.Search in Google Scholar

Goodman-Bacon, Andrew. 2021. “Difference-in-Differences with Variation in Treatment Timing.” Journal of Econometrics 225 (2): 254–77. https://doi.org/10.1016/j.jeconom.2021.03.014.Search in Google Scholar

Hwang, Hyeonjun, and Myoung-Jae Lee. 2020. “A Simple Makeover Can Increase Bus Ridership: The Story of Tayo Bus.” Transport Policy 97: 103–12. https://doi.org/10.1016/j.tranpol.2020.07.005.Search in Google Scholar

Jena, Anupam B., Dana P. Goldman, and Seth A. Seabury. 2015. “Incidence of Sexually Transmitted Infections After Human Papillomavirus Vaccination Among Adolescent Females.” JAMA Internal Medicine 175 (4): 617–23. https://doi.org/10.1001/jamainternmed.2014.7886.Search in Google Scholar PubMed PubMed Central

Kim, Bora. 2025. Forthcoming. “Estimating Spillover Effects in the Presence of Isolated Nodes.” Spatial Economic Analysis. https://doi.org/10.1080/17421772.2025.2523982.Search in Google Scholar

Kim, Young-Sook, and Myoung-Jae Lee. 2017. “Ordinal Response Generalized Difference-in-Differences with Varying Categories: The Health Effect of a Disability Program in Korea.” Health Economics 26 (9): 1121–31. https://doi.org/10.1002/hec.3526.Search in Google Scholar PubMed

Kim, Bora, and Myoung-Jae Lee. 2024. “Instrument-Residual Estimator for Multi-Valued Instruments under Full Monotonicity.” Statistics & Probability Letters 213: 110187. https://doi.org/10.1016/j.spl.2024.110187.Search in Google Scholar

Kim, Bora, and Myoung-Jae Lee. 2025. “Overlap-Weighted Difference-in-Differences: A Simple Way to Overcome Poor Propensity Score Overlap.” Economics Letters 250: 112301. https://doi.org/10.1016/j.econlet.2025.112301.Search in Google Scholar

Lechner, Michael. 2011. “The Estimation of Causal Effects by Difference-in-Difference Methods.” Foundations and Trends in Econometrics 4 (3): 165–224. https://doi.org/10.1561/0800000014.Search in Google Scholar

Lee, Myoung-Jae. 2005. Micro-Econometrics for Policy, Program, and Treatment Effects. Oxford University Press.10.1093/0199267693.001.0001Search in Google Scholar

Lee, Myoung-Jae. 2015. “Panel Conditional and Multinomial Logit Estimators.” In The Oxford Handbook of Panel Data, edited by B. Baltagi, 202–32. Oxford University Press.10.1093/oxfordhb/9780199940042.013.0007Search in Google Scholar

Lee, Myoung-Jae. 2016a. Matching, Regression Discontinuity, Difference in Differences, and Beyond. Oxford University Press.10.1093/acprof:oso/9780190258733.001.0001Search in Google Scholar

Lee, Myoung-Jae. 2016b. “Generalized Difference in Differences with Panel Data and Least Squares Estimator.” Sociological Methods & Research 45 (1): 134–57. https://doi.org/10.1177/0049124114566717.Search in Google Scholar

Lee, Myoung-Jae. 2018. “Simple Least Squares Estimator for Treatment Effects Using Propensity Score Residuals.” Biometrika 105 (1): 149–64. https://doi.org/10.1093/biomet/asx062.Search in Google Scholar

Lee, Myoung-Jae. 2021. “Instrument Residual Estimator for Any Response Variable with Endogenous Binary Treatment.” Journal of the Royal Statistical Society (Series B) 83 (3): 612–35. https://doi.org/10.1111/rssb.12442.Search in Google Scholar

Lee, Myoung-Jae. 2024. “Direct, Indirect and Interaction Effects Based on Principal Stratification with a Binary Mediator.” Journal of Causal Inference 12 (1): 20230025. https://doi.org/10.1515/jci-2023-0025.Search in Google Scholar

Lee, Myoung-Jae, and Chirok Han. 2024. “Ordinary Least Squares and Instrumental Variable Estimators for Any Outcome and Heterogeneity.” STATA Journal 24 (1): 72–92. https://doi.org/10.1177/1536867x241233645.Search in Google Scholar

Lee, Myoung-Jae, and Young-Sook Kim. 2014. “Difference in Differences for Stayers with a Time-Varying Qualification: Health Expenditure Elasticity of the Elderly.” Health Economics 23 (9): 1134–45. https://doi.org/10.1002/hec.3049.Search in Google Scholar PubMed

Lee, Myoung-Jae, and S. Kobayashi. 2001. “Proportional Treatment Effects for Count Response Panel Data: Effects of Binary Exercise on Health Care Demand.” Health Economics 10 (5): 411–28. https://doi.org/10.1002/hec.626.Search in Google Scholar PubMed

Lee, Myoung-Jae, and Yasuyuki Sawada. 2020. “Review on Difference in Differences.” Korean Economic Review 36 (1): 135–73.Search in Google Scholar

Lee, Goeun., Jin-Young Choi, and Myoung-Jae Lee. 2023. “Minimally Capturing Heterogeneous Complier Effect of Endogenous Treatment for Any Outcome Variable.” Journal of Causal Inference 11 (1): 20220036. https://doi.org/10.1515/jci-2022-0036.Search in Google Scholar

Lee, Myoung-Jae, Goeun Lee, and Jin-Young Choi. 2025. “Linear Probability Model Revisited: Why it Works and How it Should be Specified.” Sociological Methods & Research 54 (1): 173–86. https://doi.org/10.1177/00491241231176850.Search in Google Scholar

Li, Fan. 2019. “Propensity Score Weighting for Causal Inference with Multiple Treatments.” Annals of Applied Statistics 13 (4): 2389–415. https://doi.org/10.1214/19-aoas1282.Search in Google Scholar

Li, Fan, Kari Lock Morgan, and Alan M. Zaslavsky. 2018. “Balancing Covariates via Propensity Score Weighting.” Journal of the American Statistical Association 113 (521): 390–400. https://doi.org/10.1080/01621459.2016.1260466.Search in Google Scholar

Liu, Licheng, Ye Wang, and Yiqing Xu. 2024. “A Practical Guide to Counterfactual Estimators for Causal Inference with Time-Series Cross-Sectional Data.” American Journal of Political Science 68 (1): 160–76. https://doi.org/10.1111/ajps.12723.Search in Google Scholar

Mao, Huzhang, Liang Li, and Tom Greene. 2019. “Propensity Score Weighting Analysis and Treatment Effect Discovery.” Statistical Methods in Medical Research 28 (8): 2439–54. https://doi.org/10.1177/0962280218781171.Search in Google Scholar PubMed

Markowitz, Sara, and Andrew J. D. Smith. 2024. “Nurse Practitioner Scope of Practice and Patient Harm: Evidence from Medical Malpractice Payouts and Adverse Action Reports.” Journal of Policy Analysis and Management 43 (2): 420–45. https://doi.org/10.1002/pam.22507.Search in Google Scholar

Matsouaka, Roland A., and Yunji Zhou. 2024. “Causal Inference in the Absence of Positivity: The Role of Overlap Weights.” Biometrical Journal 66 (4): 2300156. https://doi.org/10.1002/bimj.202300156.Search in Google Scholar

McGrath, Susan P., Irina M. Perreard, Melissa D. Garland, Kelli A. Converse, and Todd A. Mackenzie. 2019. “Improving Patient Safety and Clinician Workflow in the General Care Setting with Enhanced Surveillance Monitoring.” IEEE Journal of Biomedical and Health Informatics 23 (2): 857–66. https://doi.org/10.1109/jbhi.2018.2834863.Search in Google Scholar

McMichael, Benjamin J. 2023. “Supply-Side Health Policy: The Impact of Scope-of-Practice Laws on Mortality.” Journal of Public Economics 222: 104901. https://doi.org/10.1016/j.jpubeco.2023.104901.Search in Google Scholar

Nelder, John Ashworth, and R. W. M. Wedderburn. 1972. “Generalized Linear Models.” Journal of the Royal Statistical Society (Series A) 135 (3): 370–84. https://doi.org/10.2307/2344614.Search in Google Scholar

Papke, Leslie E., and Jeffrey M. Wooldridge. 1996. “Econometric Methods for Fractional Response Variables with an Application to 401 (k) Plan Participation Rates.” Journal of Applied Econometrics 11 (6): 619–32. https://doi.org/10.1002/(sici)1099-1255(199611)11:6<619::aid-jae418>3.0.co;2-1.10.1002/(SICI)1099-1255(199611)11:6<619::AID-JAE418>3.0.CO;2-1Search in Google Scholar

Puhani, Patrick A. 2012. “The Treatment Effect, the Cross Difference, and the Interaction Term in Nonlinear ‘Difference in Differences’ Models.” Economics Letters 115 (1): 85–7. https://doi.org/10.1016/j.econlet.2011.11.025.Search in Google Scholar

Roth, Jonathan, and Pedro H. C. Sant’Anna. 2023. “When is Parallel Trends Sensitive to Functional Form?” Econometrica 91 (2): 737–47. https://doi.org/10.3982/ecta19402.Search in Google Scholar

Roth, Jonathan, Pedro H. C. Sant’Anna, Alyssa Bilinski, and John Poe. 2023. “What’s Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature.” Journal of Econometrics 235 (2): 2218–44. https://doi.org/10.1016/j.jeconom.2023.03.008.Search in Google Scholar

Santos Silva, J. M. C., and Silvana Tenreyro. 2006. “The Log of Gravity.” The Review of Economics and Statistics 88 (4): 641–58. https://doi.org/10.1162/rest.88.4.641.Search in Google Scholar

Santos Silva, J. M. C., and Silvana Tenreyro. 2011. “Further Simulation Evidence on the Performance of the Poisson Pseudo-Maximum Likelihood Estimator.” Economics Letters 112 (2): 220–2. https://doi.org/10.1016/j.econlet.2011.05.008.Search in Google Scholar

Sun, Liyang, and Sarah Abraham. 2021. “Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects.” Journal of Econometrics 225 (2): 175–99. https://doi.org/10.1016/j.jeconom.2020.09.006.Search in Google Scholar

Taddeo, Marcelo M., Leila D. Amorim, and Rosana Aquino. 2022. “Causal Measures Using Generalized Difference-in-Difference Approach with Nonlinear Models.” Statistics and Its Interface 15 (4): 399–413. https://doi.org/10.4310/21-sii704.Search in Google Scholar

Tchetgen, Eric J. Tchetgen, Chan Park, and David B. Richardson. 2024. “Universal Difference-in-Differences for Causal Inference in Epidemiology.” Epidemiology 35 (1): 16–22. https://doi.org/10.1097/ede.0000000000001676.Search in Google Scholar PubMed PubMed Central

Thomas, Laine E., Fan Li, and Michael J. Pencina. 2020. “Overlap Weighting: A Propensity Score Method that Mimics Attributes of a Randomized Clinical Trial.” Journal of the American Medical Association 323 (23): 2417–8. https://doi.org/10.1001/jama.2020.7819.Search in Google Scholar PubMed

Wei, Jie, Hyon K. Choi, Nicola Dalbeth, Xiaoxiao Li, Changjun Li, Chao Zeng, et al.. 2023. “Gout Flares and Mortality After Sodium-Glucose Cotransporter-2 Inhibitor Treatment for Gout and Type 2 Diabetes.” JAMA Network Open 6 (8): e2330885. https://doi.org/10.1001/jamanetworkopen.2023.30885.Search in Google Scholar PubMed PubMed Central

Wolfers, Justin. 2006. “Did Unilateral Divorce Laws Raise Divorce Rates? A Reconciliation and New Results.” The American Economic Review 96 (5): 1802–20. https://doi.org/10.1257/aer.96.5.1802.Search in Google Scholar

Wooldridge, Jeffrey M. 2021. Two-Way Fixed Effects, the Two-Way Mundlak Regression, and Difference-in-Differences Estimators. Available at SSRN 3906345.10.2139/ssrn.3906345Search in Google Scholar

Wooldridge, Jeffrey M. 2023. “Simple Approaches to Nonlinear Difference-in-Differences with Panel Data.” The Econometrics Journal 26 (3): C31–C66. https://doi.org/10.1093/ectj/utad016.Search in Google Scholar

Xu, Shenbo, Bang Zheng, Bowen Su, Stan Neil Finkelsten, Roy Welsch, Kenney Ng, et al.. 2025. “Can Metformin Prevent Cancer Relative to Sulfonylureas? A Target Trial Emulation Accounting for Competing Risks and Poor Overlap via Double/Debiased Machine Learning Estimators.” American Journal of Epidemiology 194 (2): 512–23. https://doi.org/10.1093/aje/kwae217.Search in Google Scholar PubMed

Yadlowsky, Steve, Fabio Pellegrini, Federica Lionetto, Stefan Braune, and Lu Tian. 2021. “Estimation and Validation of Ratio-based Conditional Average Treatment Effects Using Observational Data.” Journal of the American Statistical Association 116 (533): 335–52. https://doi.org/10.1080/01621459.2020.1772080.Search in Google Scholar PubMed PubMed Central

Zhou, Yunji, Roland A. Matsouaka, and Laine Thomas. 2020. “Propensity Score Weighting Under Limited Overlap and Model Misspecification.” Statistical Methods in Medical Research 29 (12): 3721–56. https://doi.org/10.1177/0962280220940334.Search in Google Scholar PubMed

Received: 2024-11-13

Accepted: 2025-07-31

Published Online: 2025-08-19

This work is licensed under the Creative Commons Attribution 4.0 International License.

Supplementary Material Details

https://doi.org/10.1515/snde-2024-0125

Keywords for this article

difference in differences; limited dependent variable; ratio in odds ratios; ratio in ratios

Creative Commons

BY 4.0

Difference in Differences, Ratio in Ratios, and Ratio in Odds Ratios for Limited Dependent Variables: A Review and More

Article

Abstract

1 Introduction

2 Literature for DD with LDV

2.1 Early Papers on DD with LDV

2.2 DD and Staggered DD with LDV

2.3 Causal Reduced Form for OLS to DD with LDV

3 Ratio in Ratios (RR) for Non-Negative Outcome

3.1 Proportional Effect Identification with RR

3.2 Poisson Quasi-MLE (QMLE)

3.3 Remarks

3.3.1 Remarks on Count Outcomes

3.3.2 Remarks on Zero-Censored Outcomes

4 Ratio in Odds Ratios (ROR)

4.1 Proportional Odds Effect Identification with ROR

4.2 Logit for Binary Outcome

4.3 Logit for Fractional Outcome

4.4 ROR and Extended Propensity Score

5 Empirical Analysis

6 Conclusions

Acknowledgments

A.1 Simulation Study

A.2 Multinomial Logit for DD with Multinomial Outcome

A.2.1 Identification

A.2.2 Estimation

A.2.3 Simple Simulation Study for Multinomial Outcome

References

Supplementary Material