Dif-in-Dif Estimators of Multiplicative Treatment Effects

Emanuele Ciani; Paul Fisher

doi:10.1515/jem-2016-0011

Article Open Access

Dif-in-Dif Estimators of Multiplicative Treatment Effects

Emanuele Ciani and Paul Fisher

Published/Copyright: February 3, 2018

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Journal of Econometric Methods Volume 8 Issue 1

Abstract

We consider a difference-in-differences setting with a continuous outcome. The standard practice is to take its logarithm and then interpret the results as an approximation of the multiplicative treatment effect on the original outcome. We argue that a researcher should rather focus on the non-transformed outcome when discussing causal inference. The first step should be to decide whether the time trend is more likely to hold in multiplicative or level form. If the former, it is preferable to estimate an exponential model by Poisson Pseudo Maximum Likelihood, which does not require statistical independence of the error term. Running OLS on the log-linearised model might instead lead to confounding distributional and mean changes. We illustrate the argument with a simulation exercise.

Keywords: difference-in-differences; log-linearisation; Poisson Pseudo Maximum Likelihood

JEL Classification: C21; C51; I38

1 Introduction

In applied empirical research, it is common to replace continuous outcomes, such as earnings or expenditure, with their logarithm. Often, the choice is motivated by distributional features, like skewness.^[1] In the difference-in-differences (dif-in-dif) setting, the desire to give a causal interpretation to the estimates complicates the choice. The model the researcher has in mind is usually one with multiplicative effects, which are linearised taking logs. If this is the case, the assumptions needed for causal inference refer to the non-transformed model. In general, this is not explicitly discussed.

We reviewed papers published in the Quarterly Journal of Economics (QJE), between 2001 and 2011. For our main literature review, 25 papers using a dif-in-dif estimator with continuous outcomes were found. A table with complete references is available in Appendix A. In 9 cases, the outcome is not transformed and an additive model is estimated. We found 16 papers in which at least one outcome is expressed in logarithmic form. The variables most commonly log-transformed are earnings and productivity, followed by a group of monetary quantities including expenditure, land value, exports and loans. In only 5 out of 16 cases is an explicit reason for the log-transformation given. For example, Nunn and Qian (2011) refer to concerns about skewness, whereas DellaVigna and Kaplan (2007) state that they wish to account for percentage changes in the control variables. In general, no discussion of the impact of the log transformation on the causal interpretation is given. Only Finkelstein (2007) states that the OLS estimates for the log of the dependent variable relate to E(ln(y|x)), and not ln(E(y|x)). To provide estimates of ln(E(y|x)), Finkelstein (2007) estimates a generalised linear model (GLM) with log links.

Previous theoretical literature on non-linear dif-in-dif mostly focused on the interpretation of the interaction effect (Mullahy 1999; Ai and Norton 2003; Puhani 2012). A separate stream of research, not directly related to dif-in-dif, focused on the estimation of exponential models (Mullahy 1997; Manning 1998; Manning and Mullahy 2001; Ai and Norton 2008; Santos Silva and Tenreyro 2006; Blackburn 2007). In this paper we reconcile the two streams of research for the dif-in-dif case. Using a potential outcome framework, we reinterpret and review previous findings to argue that the choice between a multiplicative and an additive model is fundamental to the causal interpretation of the estimands. This choice should be taken before deciding whether or not to take logs, which should be understood as an estimation strategy rather than a matter of model specification.

Specifically, in Section 2 we compare and contrast the additive and multiplicative models. We then point out that using OLS on the log-linearised model may give biased estimates of the true multiplicative effect. This problem can arise if the treatment causes not only a shift in the mean, but also other distributional changes, for instance an increase in the variance for the treated group. Fortunately, a simple and robust non-linear estimator (Poisson Pseudo Maximum Likelihood) is available. In Section 3 we present a simulation to illustrate our main arguments. Section 4 concludes with a guideline for practitioners.

2 Model Specification and Inference

A practitioner estimating a dif-in-dif model with a continuous outcome usually faces two main issues:

Shall I model the time trend in additive or in multiplicative form? And shall I report the treatment effect as a difference in levels, or as a percentage change?
How can I estimate the multiplicative model? Shall I take logs?

We argue that these points should be addressed independently, in order to correctly separate model specification from estimation. The next subsections are dedicated to these issues.

2.1 Multiplicative or Additive Effects?

In this section we compare multiplicative and additive dif-in-dif models. First, we highlight the key difference is the specification of the time trend. Second, we show that the two models are related, but crucially one cannot give a causal interpretation to both.

We start with the simplest, though quite popular dif-in-dif setting, involving two groups (g ∈ {control, treated}) and two time periods (t ∈ {pre, post}), with only one group actually receiving the treatment in the second period. We analyse the case of a continuous outcome y, such as earnings or consumption.^[2] First, we specify a model for the expected value of y when non treated (y_0igt), conditional on g and t. The second step is to assume how the expected value of the potential outcome when treated (y_1igt) is related with the expected y_0igt.

The standard dif-in-dif in levels (Angrist and Pischke 2009) combines an additive common trends assumption with an additive treatment effect:

(1) E [ y 1 i g t | g , t ] = E [ y 0 i g t | g , t ] + δ ∗ = μ g ∗ + λ t ∗ + δ ∗ .

where μg∗ are group specific effects, λt∗ are time specific, while δ^* is the treatment effect. The superscript ^* is used to differentiate the model in levels from the multiplicative one.^[3] The additive model for the counterfactual outcomes leads to a linear model for observed outcome y_it:

(2) y i t = β 0 ∗ + β 1 ∗ t r e a t e d i t + β 2 ∗ p o s t i t + β 3 ∗ t r e a t e d i t × p o s t i t + ϵ i t

(3) E [ ϵ i t | t r e a t e d i t , p o s t i t ] = 0.

where treated_it is a dummy for the treatment group and post_it for the second period. If the correct model for the counterfactuals is additive, then β0∗=μcontrol∗+λpre∗, β1∗=μtreated∗−μcontrol∗ and β2∗=λpost∗−λpre∗. The coefficient on the interaction term captures the quantity of interest, because it is equal to the treatment effect: β3∗=δ∗.

Differently, one might specify an exponential model

(4) E [ y 0 i g t | g , t ] = e x p ( μ g + λ t )

where the assumption of common trends is in multiplicative form (see Mullahy 1997, for a discussion of IV estimation of an exponential model). Over time, the outcome in the absence of treatment would increase by the same percentage in both groups. We can assume a proportional treatment effect:

(5) E [ y 1 i g t | g , t ] − E [ y 0 i g t | g , t ] E [ y 0 i g t | g , t ] = e x p ( δ ) − 1 ,

which implies that the effect is expressed as a proportional change with respect to the counterfactual scenario in the absence of the treament. This comes naturally in applications involving continuous variables such as consumption and wages, where changes are commonly expressed in proportional terms, and it is consistent with the (proportional) specification of the time trend and the group difference. The multiplicative model is therefore:

(6) E [ y 1 i g t | g , t ] = e x p ( μ g + λ t + δ )

Intuitively, the total percentage change in the expected outcome of the treated group is composed of a percentage change due to time (call it %time) and the percentage effect of the treatment (call it %effect), so that (1 + %change) = (1 + %time) × (1 + %effect). Differently, for the control group (1 + %change) = (1 + %time). In this case the counterfactual model leads to an exponential model for the observed outcomes:

(7) y i t = e x p ( β 0 + β 1 t r e a t e d i t + β 2 p o s t i t + β 3 t r e a t e d i t × p o s t i t ) η i t

where η_it is a multiplicative error term that satisfies a mean independence assumption:

(8) E [ η i t | t r e a t e d i t , p o s t i t ] = 1.

If the correct model for the counterfactuals is multiplicative, then β₀ = μ_control + λ_pre, β₁ = μ_treated − μ_control and β₂ = λ_post − λ_pre. More importantly, the exponentiated coefficient on the interaction term, which is a ratio of ratios (ROR, see Mullahy 1999; Buis 2010), is the quantity of interest to the researcher because it is directly related to the proportional treatment effect:

(9) e x p ( β 3 ) = E [ y i t | t r e a t e d i t = 1 , p o s t i t = 1 ] E [ y i t | t r e a t e d i t = 1 , p o s t i t = 0 ] / E [ y i t | t r e a t e d i t = 0 , p o s t i t = 1 ] E [ y i t | t r e a t e d i t = 0 , p o s t i t = 0 ] = e x p ( δ ) .

The researcher can also calculate the impact of the treatment (on the treated) during the post period in levels:

(10) e x p ( β 0 + β 1 + β 2 + β 3 ) − e x p ( β 0 + β 1 + β 2 ) = e x p ( μ t r e a t + λ p o s t + δ ) − e x p ( μ t r e a t + λ p o s t ) = E [ y 1 i , g = t r e a t e d , t = p o s t ] − E [ y 0 i , g = t r e a t e d , t = p o s t ] .

The well-known suggestion by Ai and Norton (2003) for non-linear models would instead lead us to calculate the cross difference (Mullahy 1999), which is not equal to the treatment effect:

(11) [ e x p ( β 0 + β 1 + β 2 + β 3 ) − e x p ( β 0 + β 1 ) ] − [ e x p ( β 0 + β 2 ) − e x p ( β 0 ) ] = [ e x p ( μ t r e a t e d + λ p o s t + δ ) − e x p ( μ t r e a t e d + λ p r e ) ] − [ e x p ( μ c o n t r o l + λ p o s t ) − e x p ( μ c o n t r o l + λ p r e ) ] ,

Indeed, as argued by Puhani (2012) (reprised in Karaca-Mandic, Norton, and Dowd 2012), in any non-linear dif-in-dif model with an index structure and a strictly monotonic transformation function, the treatment effect is not equal to the cross-difference of the observed outcome, but rather to the difference between two cross-differences. Differently from the general non-linear case of Puhani (2012), the exponentiated interaction coefficient is directly interpretable as the proportional effect.

The treatment effect can therefore be easily expressed in levels in both models (using eq. 10). This highlights that the key difference between them is how the common trends assumption is specified. The reason is that the counterfactual y_0igt is identified by looking at the change in the control group over time. Once this counterfactual is correctly modeled, we can use it to understand which share of the average change in the treated group has to be attributed to the treatment. To clarify, Figure 1 is generated with an exponential model as in eq. (6). In this case the treated group starts from a lower position. Given the multiplicative trend, in the absence of the treatment the increase in this group over time would be smaller in absolute value. Therefore a standard dif-in-dif in levels would underestimate the share of the change that has to be attributed to the treatment. The bias will be larger the larger the pre-treatment difference between the groups and the larger the proportional time effect. Differently, once we account correctly for the multiplicative time trend, it does not matter whether we express the treatment effect as a percentage difference or as a level difference. Indeed, the former is the fraction on the left hand side of eq. (5), while the latter is simply its numerator. Nevertheless, once the time trend is in multiplicative form, having a multiplicative treatment effect leads to an exponential model, which is clearer and easier to estimate. Furthermore, the effect in levels as measured in eq. (10) is specific to the post period. If we want to predict how the policy will affect future outcomes, and we believe that the true treatment effect is multiplicative, then it is more appropriate to focus on the percentage change.

Figure 1:

An Example of a Dif-in-Dif Setting with Multiplicative Time Trend.

The figure shows the expected value of the actual and counterfactual outcome for the different groups according to the exponential model in eq. (4) and (5) with μ_control = −0.3; μ_treated = −0.7; λ_pre = −0.2; λ_post = 0; δ = 0.2.

In a standard dif-in-dif setting with two periods and two groups, the practitioner may still be induced to treat the two models as fully equivalent. This is actually true with respect to the model for observed outcomes, because both the exponential model (7) and the linear one (2) are saturated: the four parameters fit perfectly the four averages given by the combination of treated_it and post_it. Indeed, the exponential model is just a reparametrisation of the linear one, with

(12) e x p ( β 3 ) − 1 = ( β 0 ∗ + β 1 ∗ + β 2 ∗ + β 3 ∗ ) / ( β 0 ∗ + β 1 ∗ ) ( β 0 ∗ + β 2 ∗ ) / β 0 ∗ − 1.

This was noted by Gregg, Waldfogel, and Washbrook (2006), who showed that we can estimate eq. (2) and then recover both the level and the percentage (multiplicative) effect.^[4]

In spite of (12), if the true model for counterfactuals is multiplicative, β3∗ cannot be interpreted as a causal parameter, because it includes not only the level change due to the treatment, but also the difference between the time change in levels for the treatment and control groups [in fact, β3∗ is equal to the cross-difference from eq. (11)]. More generally, the equivalence (12) does not hold if we are willing to condition on other covariates, such as demographic controls. The reason is that the equation for the observed outcome is no longer saturated. Therefore it must be that either the linear model is correctly specified, or the exponential one, but not both. This is also true if we have more than two periods and a time trend is included.

The discussion of how the different specifications of the counterfactual are crucial for causal interpretation is related to Angrist and Pischke (2009, p. 230) comment that the assumption of common trends can hold either in logs or in levels, but not in both. We find it more natural to look at the choice between multiplicative or additive effects, rather than focusing on whether taking logs or not. This perspective has the advantage of stressing the distinction between specification and estimation. More importantly, in the next section we show that the multiplicative model and the log-linearised one are equivalent only under a strong restriction.

2.2 Estimation of a Multiplicative Dif-in-Dif Model

A popular solution to estimate a multiplicative model is to log-linearise it. As is well known, this practice may lead to biased estimates. However, this issue is often neglected in the dif-in-dif case (see Section 1). To understand how it applies in this context, we follow the discussion in Santos Silva and Tenreyro (2006), but adapt it to the dif-in-dif setting. Log-linearising the model for the observed outcome (eq. 7) we obtain:

(13) l n y i t = β 0 + β 1 t r e a t e d i t + β 2 p o s t i t + δ t r e a t e d i t × p o s t i t + l n η i t .

where we used the fact that β₃ = δ if the counterfactual model is multiplicative. The OLS estimator for the coefficient on the interaction term consistently estimates a quantity, say δ~, equal to

(14) δ ~ = δ + { E [ l n η i t | t r e a t e d i t = 1 , p o s t i t = 1 ] − E [ l n η i t | t r e a t e d i t = 1 , p o s t i t = 0 ] } − { E [ l n η i t | t r e a t e d i t = 0 , p o s t i t = 1 ] − E [ l n η i t | t r e a t e d i t = 0 , p o s t i t = 0 ] }

The mean-independence assumption E[ηit|treatedit,postit]=1, which has been imposed on the exponential model (eq. 8), does not ensure that lnη_it is mean-independent, as a consequence of Jensen’s inequality. In general, the conditional expectation of lnη_it depends on higher moments of the conditional distribution of η_i.

Three interesting cases can be discussed. The first and most restrictive is when η_it is statistically independent from xit≡(1,treatedit,postit). Under this condition E[lnηit|xit] is a constant and therefore log-OLS is consistent for δ.^[5] Consider the restrictive condition this places on the conditional variance of the outcome:

(15) V a r [ y i t | 1 , x i t ] = V a r ( η i t ) e x p ( 2 β 0 + 2 β 1 t r e a t e d i t + 2 β 2 p o s t i t + 2 δ t r e a t e d i t × p o s t i t )

In short, the ratio of variances between different groups or time periods should be directly related to the differences in the conditional mean. Hence the time trend must not only shift the conditional mean, but also multiply the conditional variance by a factor equal to the square of exp(β₂). Similarly, the treatment effect must also shift the variance by a factor equal to the square of exp(δ). This pattern of variance does not necessarily hold under the weaker condition of mean independence (E[ηit|xit]=1) introduced in eq. (8).

In the second case of interest statistical independence holds within groups, but not across groups (this is the cross-sectional 2 group case in Blackburn 2007). It follows that E[lnηit|xit] is constant over time within groups. Although both the intercept and the coefficient on the group dummy will be different from β₀ and β₁, the coefficient on the interaction would be equal to δ, because the two terms in curly brackets in eq. (14) are both equal to zero. In terms of the conditional variance, we are allowing the distribution of the outcome to be arbitrarily different across the two groups, but we have reasons to believe that the dispersion over time respects condition (15) within each group.

In the third case of interest, only mean-independence holds, hence the expectation of lnη_it may change over time within the same group. Both terms in curly brackets in eq. (14) would be different from zero and log-OLS would give a biased estimate of δ. Focusing only on the second moment, this implies that there is an additional change in the variance within groups apart from the shifts induced by the time trend and treatment. This can happen, for instance, if a shock to the control group increases the dispersion, but not the average, in a way that violates (15). This mean preserving shock would not violate the parallel trends assumption, which only refers to the expected values, but would still lead to a bias for Log-OLS. A similar situation arises if there is an additional shock to the dispersion in the treated group that violates (15).^[6]

This last case is likely to occur if treatment effects are heterogeneous across individuals. If all treated individuals show a response to the treatment equal to δ, then condition (15) is likely to be respected (in the absence of other distributional changes), because all values are shifted by a proportional factor. But if treatment effects are heterogeneous, then this is not necessarily true, because the distribution may change in different directions. A similar situation arises in those settings in which we define the treatment group in terms of elegibility for a policy, but not all the individuals in it effectively receive the treatment. Usually in these cases we are willing to estimate the intention-to-treat (that is the effect of being eligibile). By construction, in the eligible group only some individuals (those actually treated) experience a change due to the policy, hence there may be other distributional changes apart from the mean.

Given that using OLS on the log-linearised model may lead to biased estimates, which are the available alternatives? If the error term is log-normal, one can work out the analytical formula for the bias and estimate the model using Maximum Likelihood Estimation (in Appendix B we derive the analytical formulas). However, practitioners using dif-in-dif for the estimation of treatment effects usually avoid introducing distributional assumptions other than mean-independence, so that MLE cannot be employed. Fortunately, we know from the literature that we can directly estimate the exponential model assuming only mean-independence (Santos Silva and Tenreyro 2006; Blackburn 2007), by using either Non Linear Least Squares (NLS) or Poisson Pseudo Maximum Likelihood (PPML). Santos Silva and Tenreyro (2006) argued in favour of the latter, because NLS is likely to be less efficient. The PPML simply estimates the model as if it was a Poisson, by maximizing the relative likelihood.^[7] The fact that the actual variable is continuous rather than count does not hinder the consistency of the estimator, because PPML is consistent as long as the mean is correctly specified (as exponential).^[8] As the other properties of the Poisson distribution are not respected, a robust covariance matrix should be used. More generally, there are known difficulties of getting standard errors right in dif-in-dif designs. We discuss these in the context of the multiplicative setting in Section 3.2.

Instead of directly going for PPML, one could prefer to test whether statistical independence of the error term is likely to hold, in order to use OLS on the log-linearized model. One could use a Park test (Manning and Mullahy 2001; Santos Silva and Tenreyro 2006) for whether the conditional variance of y is proportional to the conditional mean squared, using consistent estimates from PPML. A more standard alternative is the Breusch-Pagan (BP) test for whether the estimated variance of the residuals from log-OLS is statistically dependent on the value of the treated × post variable.

A researcher may also want to perform tests of whether the multiplicative specification is appropriate. This is not possible in the standard 2-periods 2-groups, because the additive and multiplicative model are observationally equivalent. When there are more time periods/groups (provided the model is not saturated), one could use Ramsey’s RESET test (Ramsey 1969) for misspecification of the conditional mean or the robust Lagrange Multiplier (LM) test proposed by Wooldridge (1992). However, as discussed, the log-linearised model can be correctly specified as linear even if its coefficients do not identify the true treatment effect on the untransformed outcome, hence the test may not be informative if it fails to reject both specifications. Specification testing is further discussed in Appendix B.

Finally, we argued that the presence of heterogeneous treatment effects in the multiplicative model is a reason to avoid log-OLS and prefer alternatives, such as PPML, that do not require statistical independence. However, one important question is what the multiplicative model actually identifies in this circumstance. In the level model, given the additive nature of the effects, the well know result is that the dif-in-dif estimand identifies the average treatment effect on the treated. One may expect this interpretation to apply to the multiplicative case as well. Going back to the counterfactual model (eq. 5), we know that the quantitity identified by the empirical model, exp(δ) − 1, captures the ratio between the average difference y_1i − y_0i and the average outcome in the absence of treatment. Similarly to the linear case, we do not need to impose the constraint that the treatment effect is constant across all individuals. However, if it is heterogeneous in the population, exp(δ) − 1 captures the multiplicative treatment effect on the average, and not the average multiplicative effect (Angrist 2001, makes a similar point for the identification of treatment effects in IV estimation of exponential models). Further details can be found in Appendix B.

In the next section we illustrate our arguments and the use of PPML using a simulation. We also show that, for the dif-in-dif case, the BP test, relative to the Park test, seems to have more power in detecting deviations from homoskedasticity. In Appendix C, we further illustrate our arguments with an original empirical analysis of a UK educational grant on households’ expenditure (the Educational Maintenance Allowance). In our analysis, treatment is less than 100 percent for the treated group and so treatment effects are heterogenous. It also covers more than two time periods, and allows us to illustrate the use of RESET and LM tests for misspecification.

3 Simulations

3.1 Simulation of a Standard Dif-in-Dif Setting with Two Groups and Two Periods

We consider a similar setting to Figure 1 (see Appendix D for a graph using the simulation parameters). The outcome of interest is generated according to eq. (7), i = 1,…, 2631.

Each replication is generated according to β₀ = 3.5, β₁ = −0.4, β₂ = 0.03 and with the hypothetical reform having a constant multiplicative treatment effect equal to δ = 0.2. The group sizes are 1090 for the treated and 1541 for the control (and so match with the applied example in Appendix C). Simulations with a negative time trend and a positive mean difference between groups lead to the same conclusions (available on request).

Each individual observation is generated according to yit=exp(xitβ)ηit where η_it is log-normally distributed with E[ηit|xit]=1. The variance of η_it is specified as:

(16) V a r [ η i t | x i t ] = e x p ( α × 1 ( t r e a t e d i t × p o s t i t = 1 ) )

where 1 is an indicator function and α a parameter that determines the degree of heteroskedasticity in η_it. It follows that:

(17) V a r [ l n η i t | x i t ] = σ t r e a t e d i t , p o s t i t 2 = l n [ 1 + e x p ( α × 1 ( t r e a t e d i t × p o s t i t = 1 ) ) ]

(18) E [ l n η i t | x i t ] = − σ t r e a t e d i t , p o s t i t 2 2 = − l n [ 1 + e x p ( α × 1 ( t r e a t e d i t × p o s t i t = 1 ) ) ] 2

To assess the performance of the three estimation strategies outlined above, simulations are reported for five key values of α. Table 1 reports results from 1000 replications of the simulation procedure.^[9]

Table 1:

Simulation Results.

	(1)	(2)	(3)
	Log-OLS	PPML	Level-OLS
α = 0
Treated × Post	0.1999924	0.1984138	4.689477
	(0.0711943)	(0.0842905)	(2.510064)
Treated	−0.399073	−0.3976382	−10.84626
	(0.0398905)	(0.0474284)	(1.290385)
Park Test	0.052	0.014	–
Breusch-Pagan	0.049	–	–
α = 0.1
Treated × Post	0.1710568	0.1951523	4.585499
	(0.0731679)	(0.0882464)	(2.62745)
Treated	−0.4002561	−0.4000494	−10.90821
	(0.0404467)	(0.0470304)	(1.277552)
Park Test	0.04	0.005	−
Breusch-Pagan	0.154	–	–
α = 0.2
Treated × Post	0.1506154	0.2026701	4.814591
	(0.0727646)	(0.0887447)	(2.625155)
Treated	−0.4010721	−0.401937	−10.96927
	(0.0406465)	(0.0496038)	(1.345574)
Park Test	0.046	0.017	–
Breusch-Pagan	0.441	–	–
α = 0.4
Treated × Post	0.0861068	0.1960358	4.632431
	(0.0776181)	(0.096877)	(2.870258)
Treated	−0.399795	−0.3997392	−10.90645
	(0.0404998)	(0.0484243)	(1.310166)
Park Test	0.064	0.022	–
Breusch-Pagan	0.945	–	–

Results from 10,000 replications. Mean of the estimated coefficients reported with standard deviations in parentheses.

The first special case of interest is where α = 0 implying η_it is statistically independent of the treatment and other regressors. Here, OLS estimates from the log-linear model will provide consistent estimates of the multiplicative treatment effect. As expected both the log-OLS (column 1) and PPML (column 2) estimates are close to the true multiplicative treatment effect of 0.2. Whilst the difference between the two estimates is negligible, the log-OLS estimates are less dispersed, confirming the greater efficiency of the OLS estimator under statistical independence of the error term.

With α = 0.1 we introduce heteroskedasticity, as in the third case of interest of Section 2.2, where in the post period there is change in the dispersion of the treated group that violates conditon (15). The analytical bias can be calculated as (see equation 14 and Appendix B)

(19) E [ l n η i t | t r e a t e d i t × p o s t i t = 1 ] − E [ l n η i t | t r e a t e d i t × p o s t i t ≠ 1 ] = − 0.5 [ l n ( 1 + e x p ( α ) ) ] + 0.5 [ l n ( 2 ) ] .

Therefore, we expect that an increase in variance in the post-treatment period for the treated group (due to α > 0) should induce a negative bias. Accordingly, we observe that now the log-OLS procedure performs less well. The distance from the true effect is around 2.9 percentage points, similar to the 2.6 point difference that can be calculated using formula (19). As expected, the bias is increasing with α, even though the variance of the estimated effects remains small. For example, the mean of the estimated treatment effects being only 43% of the true effect when α = 0.4. On the other hand, the PPML estimator performs well under all values of α, giving estimates close to the true treatment effect in all cases.

It is worth pointing out that the parameter values considered above imply an independent effect of treatment on the conditional variance of y that deviates only slightly from statistical independence. For example, under the strongest pattern of heteroskedasticity considered (α = 0.4), the independent effect of treatment is to increase the conditional standard deviation of y by only 22%, whereas when α = 0.1 the increase in standard deviation is just 5%. Even when very small distributional effects of treatment are introduced, the estimates from the log-linearised model are strongly biased. Moreover, equation (19) is independent of δ, so the bias as a proportion of the treatment effect will be larger if the treatment effect were smaller (indeed in much applied work treatment effects are much more modest than we consider here).

One could test whether log-linearisation is likely to give consistent estimates. Table 1 reports rejection rates at the 5% level for both the Park and the BP tests. The Park test is conducted both on the log-linearised estimates and directly on the PPML ones. Results from the simulation are not promising. In all cases the Park test fails to detect the mild pattern of heteroskedasticity that treatment introduces into the model. The rejection rates are around 5% for all values of α.^[10] For the BP test, the heteroskedasticity introduced into η_i is detected with reasonable power. For example, in the case where α = 0.4 the test detects the inadequacy of the log-linearised specification 94.5% of the time.

Another important question is what would happen if, ignoring the multiplicative structure, we estimate a standard additive dif-in-dif regression. Column 3 of Table 1 presents such estimates. We observe that the estimated treatment effects repeatedly underestimate the true reform impact. The estimate is £4.69 in the baseline case, in contrast to the change in levels implied by the multiplicative model (£5.06), which can be calculated from eq. (10) using the true parameters.^[11] The bias is independent from α. So although the regression for y_i is saturated and therefore correctly specified, the level estimates confound the treatment and trend effects.

The discussion above has mainly focussed on the consistency properties of PPML but we know that it may not be the most efficient estimator. If we do indeed have statistical independence of the error term, then log-OLS will be both consistent and efficient. If the error is truly log-normal, even when α ≠ 0 one could also use the MLE estimator. In Appendix B we show that, for this simulation, the efficiency loss using PPML is quite modest.

In Appendix D we analyse the case in which the treatment has no distributional effect, but (as in the second case of interest of Section 2.2) the pattern of variance across the treated and control groups does not respect the proportional structure from eq. (15). As discussed above, the log-OLS estimator for the treatment effect is consistent, while the treated-control difference is biased. We also analysed the case with a constant variance of y. Again here log-OLS performs poorly. Results are available from the authors.

3.2 Standard Errors in DID Designs: Simulation with Autocorrelated Errors

Procedures to correct for the fact that regular standard errors may overstate the precision of estimates of a treatment effect in DID regressions have been the subject of much debate and the literature is still unsettled (Bertrand, Duflo, and Mullainathan 2004; Donald and Lang 2007; Wooldridge 2003; 2006). Regular standard errors are derived under an iid assumption. This is violated when many years of (grouped) data is analysed in the presence of serially correlated outcomes. By means of a set of simulations, Bertrand, Duflo, and Mullainathan (2004) show that, in this context, conventional standard errors tend to over-reject the null of a zero reform effect.

To examine the relative performance of PPML and log-OLS standard errors, we performed a different set of monte-carlo simulations where the data generating processes are multiplicative AR(1) models with log normal errors and varying degrees of autocorrelation. Further details and results are provided in Appendix D. We found that rejection rates of a zero reform effect are increasing in the amount of serial correlation (as in Bertrand, Duflo, and Mullainathan 2004) but importantly, that they are comparable for both estimators. There was an indication that log-OLS performs slightly better under the most extreme pattern of autocorrelation we considered (ρ = 0.8), but marginally so. Put together, we conclude PPML standard errors appear to be no more biased than the log-OLS ones.

We also examined the performance of two common solutions (originally proposed by Bertrand, Duflo, and Mullainathan 2004): (1) restrict analysis to a short panel and (2) cluster standard errors at the group level (for large g).^[12] Both solutions worked well in our simulations. When the sample size is very small (either small g or t) PPML performed relatively worse, but once again the differences were fairly small.

We conclude that the issues due to autocorrelation in DID designs are generally no more of a problem for the PPML estimator compared to the log-OLS one. Furthermore, while clustering standard errors may solve this issue, it does not address the inconsistency of log-OLS when the multiplicative error violates statistical independence, that we discussed in Section 2.2. If, apart from autocorrelation, we also introduce patterns of heteroskedasticity as in Section 3.1 (Table 1), log-OLS rejection rates for the placebo reform are well above PPML ones, irrespective of whether we correct standard errors or not.

4 Conclusion

We critically assessed the standard practice of log-linearising in a dif-in-dif setting. We argued that a researcher should first decide whether a multiplicative or additive effect model is appropriate for the non-transformed outcome, because we cannot give a causal interpretation to both. If the multiplicative model is chosen and the researcher makes only a standard mean independence assumption, using PPML on the non-transformed variable can be preferable to using OLS on the log-linearisation. The reason is that the latter might give biased estimates of the multiplicative effect if there are changes in the higher moments of the outcome distribution that make the log-linearised error not mean independent. In particular, this bias may cause the OLS estimator to confound other distributional effects with the treatment effect on the mean.

As a summary, we think that the best practice for an applied researcher willing to estimate a dif-in-dif model with continuous outcome should be (a summary table can be found at the end of Appendix D):

Decide whether the time trend is more likely to hold in multiplicative or in level form.
If in levels, the best solution would be to use the standard level model and estimate it through OLS. The coefficient on the interaction term could be interpreted as an average treatment effect for the treated.
If in multiplicative form, the most coherent solution is to use and estimate an exponential model, with a multiplicative treatment effect.
1. In the presence of heterogeneous effects we can identify the multiplicative effect on the average for the treated group, and not an average multiplicative effect.
2. Without covariates, the multiplicative treatment effect can be recovered from OLS estimates of the standard dif-in-dif regression in levels (eq. 12).
3. Estimating the exponential model with PPML allows for covariates and for the presence of zeros in the dependent variable, and does not require statistical independence of the error term.
4. The researcher can test for heteroskedasticity using a BP test for the presence of heteroskedasticity with respect to the treated × post variable. If they fail to reject the null of homoskedasticity, and the researcher is willing to assume statistical independence, OLS on the log-linearised model would be unbiased and efficient. This method also requires to eliminate or censor the zeros, which may introduce another source of bias.

Acknowledgments

We wish to thank João Santos Silva, Marco Francesconi, Mike Brewer, Susan Harkness, Jonathan James, Iva Tasseva, Vincenzo Mariani, Juan Hernandez, Roberto Nisticó, Massimo Baldini, Ben Etheridge, Ludovica Giua, seminar participants at Essex, two anonymous referees and the editor for useful comments. Financial support from the ESRC (Fisher and Ciani) and from the Royal Economic Society Junior Fellowship (Ciani) are gratefully acknowledged. The views expressed in this paper are those of the author and do not necessarily reflect those of the Bank of Italy. Data from the Expenditure and Food Survey has been accessed through the UK Data Archive.

References

Ai, C., and E. C. Norton. 2003. “Interaction Terms in Logit and Probit Models.” Economics Letters 80: 123–129.10.1016/S0165-1765(03)00032-6Search in Google Scholar

Ai, C., and E. C. Norton. 2008. “A Semiparametric Derivative Estimator in Log Transformation Models.” Econometrics Journal 11: 538–553.10.1111/j.1368-423X.2008.00252.xSearch in Google Scholar

Angrist, J. D. 2001. “Estimation of Limited Dependent Variable Models with Dummy Endogenous Regressors: Simple Strategies for Empirical Practice.” Journal of Business and Economic Statistics 19: 2–16.10.1198/07350010152472571Search in Google Scholar

Angrist, J. D., and J.-S. Pischke. 2009. Mostly Harmless Econometrics. Princeton, NJ: Princeton University Press.10.1515/9781400829828Search in Google Scholar

Athey, S., and G. W. Imbens. 2006. “Identification and Inference in Nonlinear Difference-in-Differences Models.” Econometrica 74: 431–497.10.3386/t0280Search in Google Scholar

Bertrand, M., E. Duflo, and S. Mullainathan. 2004. “How Much Should We Trust Differences-in-Differences Estimates?” The Quarterly Journal of Economics 119: 249–275.10.3386/w8841Search in Google Scholar

Blackburn, M. L. 2007. “Estimating Wage Differentials Without Logarithms.” Labour Economics 14: 73–98.10.1016/j.labeco.2005.04.005Search in Google Scholar

Buis, M. L. 2010. “Stata Tip 87: Interpretation of Interactions in Non-Linear Models.” The Stata Journal 10: 305–308.10.1177/1536867X1001000211Search in Google Scholar

DellaVigna, S., and E. Kaplan. 2007. “The Fox News Effect: Media Bias and Voting.” The Quarterly Journal of Economics 122: 1187–1234.10.3386/w12169Search in Google Scholar

Donald, S. G., and K. Lang. 2007. “Inference with Difference-in-Differences and Other Panel Data.” The Review of Economics and Statistics 89: 221–233.10.1162/rest.89.2.221Search in Google Scholar

Finkelstein, A. 2007. “The Aggregate Effects of Health Insurance: Evidence from the Introduction of Medicare.” The Quarterly Journal of Economics 122: 1–37.10.3386/w11619Search in Google Scholar

Gregg, P., J. Waldfogel, and E. Washbrook. 2006. “Family Expenditures Post-Welfare Reform in the UK: Are Low-Income Families Starting to Catch Up? Labour Economics 13: 721–746.10.1016/j.labeco.2005.10.002Search in Google Scholar

Karaca-Mandic, P., E. C. Norton, and B. Dowd. 2012. “Interaction Terms in Nonlinear Models.” Health Services Research 47: 255–274.10.1111/j.1475-6773.2011.01314.xSearch in Google Scholar

Manning, W. G. 1998. “The Logged Dependent Variable, Heteroscedasticity, and the Retransformation Problem.” Journal of Health Economics 17: 283–295.10.1016/S0167-6296(98)00025-3Search in Google Scholar

Manning, W. G., and J. Mullahy. 2001. “Estimating Log Models: To Transform or Not to Transform?” Journal of Health Economics 20: 461–494.10.3386/t0246Search in Google Scholar

Mullahy, J. 1997. “Instrumental-Variable Estimation of Count Data Models: Applications to Models of Cigarette Smoking Behavior.” The Review of Economics and Statistics 79: 586–593.10.1162/003465397557169Search in Google Scholar

Mullahy, J. 1999. “Interaction Effects and Difference-in-Difference Estimation in Log-Linear Models.” NBER Technical Working Paper 245.10.3386/t0245Search in Google Scholar

Nunn, N., and N. Qian. 2011. “The Potato’s Contribution to Population and Urbanization: Evidence from a Historical Experiment.” The Quarterly Journal of Economics 126: 593–650.10.3386/w15157Search in Google Scholar

Puhani, P. A. 2012. “The Treatment Effect, the Cross Difference, and the Interaction Term in Nonlinear “Difference-in-Differences” Models.” Economics Letters 115: 85–87.10.1016/j.econlet.2011.11.025Search in Google Scholar

Ramsey, J. B. 1969. “Tests for Specification Errors in Classical Linear Least-Squares Regression Analysis.” Journal of the Royal Statistical Society. Series B (Methodological) 31: 350–371.10.1111/j.2517-6161.1969.tb00796.xSearch in Google Scholar

Santos Silva, J. M. C., and S. Tenreyro. 2006. “The Log of Gravity.” The Review of Economics and Statistics 88: 641–658.10.2139/ssrn.380442Search in Google Scholar

Wooldridge, J. 1992. “Some Alternatives to the Box-Cox Regression Model.” International Economic Review 33: 935–955.10.2307/2527151Search in Google Scholar

Wooldridge, J. M. 2003. “Cluster-Sample Methods in Applied Econometrics.” American Economic Review 93: 133–138.10.1257/000282803321946930Search in Google Scholar

Wooldridge, J. M. 2006. “Cluster-Sample Methods in Applied Econometrics: An Extended Analysis.” Working Paper, Michigan State University, Department of Economics.Search in Google Scholar

Supplemental Material

The online version of this article offers supplementary material (https://doi.org/10.1515/jem-2016-0011).

Published Online: 2018-02-03

This work is licensed under the Creative Commons Attribution 4.0 International License.

Supplementary material

Articles in the same Issue

https://doi.org/10.1515/jem-2016-0011

Keywords for this article

difference-in-differences; log-linearisation; Poisson Pseudo Maximum Likelihood

Creative Commons

BY 4.0

Dif-in-Dif Estimators of Multiplicative Treatment Effects

Article

Abstract

1 Introduction

2 Model Specification and Inference

2.1 Multiplicative or Additive Effects?

2.2 Estimation of a Multiplicative Dif-in-Dif Model

3 Simulations

3.1 Simulation of a Standard Dif-in-Dif Setting with Two Groups and Two Periods

3.2 Standard Errors in DID Designs: Simulation with Autocorrelated Errors

4 Conclusion

Acknowledgments

References

Supplemental Material

Supplementary Material

Articles in the same Issue

Articles in the same Issue

Articles in the same Issue