Home Regression Discontinuity and Heteroskedasticity Robust Standard Errors: Evidence from a Fixed-Bandwidth Approximation
Article Open Access

Regression Discontinuity and Heteroskedasticity Robust Standard Errors: Evidence from a Fixed-Bandwidth Approximation

  • Otávio Bartalotti ORCID logo EMAIL logo
Published/Copyright: February 21, 2018
Become an author with De Gruyter Brill

Abstract

In regression discontinuity designs (RD), for a given bandwidth, researchers can estimate standard errors based on different variance formulas obtained under different asymptotic frameworks. In the traditional approach the bandwidth shrinks to zero as sample size increases; alternatively, the bandwidth could be treated as fixed. The main theoretical results for RD rely on the former, while most applications in the literature treat the estimates as parametric, implementing the usual heteroskedasticity-robust standard errors. This paper develops the “fixed-bandwidth” alternative asymptotic theory for RD designs, which sheds light on the connection between both approaches. I provide alternative formulas (approximations) for the bias and variance of common RD estimators, and conditions under which both approximations are equivalent. Simulations document the improvements in test coverage that fixed-bandwidth approximations achieve relative to traditional approximations, especially when there is local heteroskedasticity. Feasible estimators of fixed-bandwidth standard errors are easy to implement and are akin to treating RD estimators as locally parametric, validating the common empirical practice of using heteroskedasticity-robust standard errors in RD settings. Bias mitigation approaches are discussed and a novel bootstrap higher-order bias correction procedure based on the fixed bandwidth asymptotics is suggested.

JEL Classification: C12; C21

1 Introduction

Regression discontinuity (RD) designs have been propelled to the spotlight of economic analysis in recent years,[1] especially in the policy and treatment evaluation literatures, as a form of estimating treatment effects in a non-experimental setting. In this design, treatment is assigned based on values of an observed characteristic, with the probability of receiving treatment jumping discontinuously at a known threshold. RD’s appeal stems from the relatively weak assumptions necessary for the nonparametric identification of treatment effects, specifically smoothness of the conditional expectation of outcome for treated and untreated individuals at the cutoff. This paper examines the effect of the additional assumptions that are explicitly and implicitly made to justify estimation and inference and shows that those assumptions can be weakened by using a heteroskedasticity-robust variance estimator, which is a common approach in empirical research but has not been supported by previous theoretical research.

Local polynomial estimators are the most common choice in empirical and theoretical work due to ease of implementation and flexibility. These kernel-based estimators rely on fitting a polynomial function to a range of the data, the size of which is determined by a bandwidth, h, just around the threshold. To perform inference in this setting, additional assumptions are usually imposed. In particular, asymptotic approximations assume smoothness of the conditional variance of the outcome, σ2(x). That assumption is somewhat restrictive and not necessary as it will be discussed below. Implementation of the local polynomial estimators also depends on the choice of the bandwidth, which can be benchmarked against several bandwidth selectors available in the literature.

The traditional nonparametric asymptotic approximations (“small-h”) provide formulas for the variance of such estimators as the bandwidth shrinks towards zero asymptotically. Those formulas reflect the fact that the distribution of the kernel-based estimator is asymptotically equivalent to the distribution with homoskedastic data when the smoothing parameter shrinks, which may have led researchers to overlook the need to develop robust inference in small samples. A similar point has been recently made by Kim, Sun, and Yang (2017) when discussing time series dependence. Moreover, in practice variance estimators that rely on homoskedasticity are discouraged even though they are supported by the theory. Hence, most researchers often use a different “pre-asymptotics” variance formula to compute standard errors, based on the White-Huber-Eicker robust standard errors or some nearest neighbor alternative.[2] This paper provides an analysis of when and why the common practice of using White-Huber-Eicker robust standard errors in the RD context is appropriate.

I focus on presenting an alternative asymptotic approximation for the standard RD treatment effects estimator using a fixed bandwidth that relaxes the smoothness conditions imposed on σ2(x). This “fixed-h” approximation provides expressions for the estimator’s asymptotic variance that incorporate the bandwidth used by the researcher, and lead naturally to the standard error formulas used in the applied literature, clarifying the assumptions behind its use. The results show that the heteroskedasticity robust standard errors continue to provide valid inference under these relaxed assumptions. Furthermore, these standard errors are appropriate for any specific bandwidth chosen.

This paper shows that this form of the variance estimator is valid both under an asymptotic nesting where the bandwidth parameter, h, is fixed, and as under conventional RD asymptotics where h → 0. However, the traditional asymptotic variance estimator is not consistent for fixed bandwidths. Consequently the common empirical strategy of using robust standard errors in RD studies is appropriate especially when heteroskedasticity is a concern.

While I focus on a setting where the bias is “small” and does not affect the asymptotic distribution of the RD estimator, an alternative approximation for the estimator’s asymptotic bias that differs from the usual approximations in the literature is presented and provides additional intuition on the robustness-precision trade-off facing the researcher in RD designs. A discussion on how to address bias in practice is presented.

We first consider the procedure proposed by Calonico, Cattaneo, and Titiunik (2014) and implement an analytical bias correction reduction based on a first order expansion of the bias, reducing the order of the leading bias term. The framework proposed here is compatible with such bias reduction procedures. Furthermore, I explore a novel approach to bias mitigation based on a higher order correction inspired by the fixed bandwidth asymptotic bias approximations. That strategy is easily implemented by modifying the wild bootstrap RD procedure in Bartalotti, Calhoun, and He (2017). Simulation exercises attest that such approach can significantly improve test performance relative to first order based corrections.

This work contributes to the emerging literature on inference for treatment effects in the context of RD designs. Hahn, Todd, and van der Klaauw (1999 and 2001) and Lee (2008) presented the conditions for identification and estimation in RD designs. Porter (2003) provided results on the asymptotic properties of the estimators for the treatment effect of interest, obtaining limiting distributions for estimators based on local polynomial regression and partially linear estimation. Calonico, Cattaneo, and Titiunik (2014) studied asymptotic approximations for the bias-corrected local polynomial RD estimator as described above. Other studies about estimation, inference, and bandwidth choice in RD designs include Bartalotti and Brummet (2017), Calonico, Cattaneo, and Farrell (Forthcoming), Cattaneo, Frandsen, and Titiunik (2015), and Imbens and Kalyanaraman (2012) among others. McCrary (2008) studied specification testing. Finally, a broad review of the theoretical and applied literature, with emphasis on the identification of the parameter of interest and its potential interpretations, can be found in Imbens and Lemieux (2008) and Lee and Lemieux (2009).

The asymptotic framework using fixed bandwidths follows a growing literature that recognizes the potential for practical improvements in inference procedures in nonparametric methods. Notably, Neave (1970), in the context of spectral density estimation, obtained more accurate approximations to the variance of nonparametric spectral estimates. Neave’s work was later extended by Hashimzade and Vogelsang (2008). Similarly, Fan (1998) provided an alternative approximation for goodness-of-fit tests for density function estimates, obtaining improved approximations to the asymptotic behavior of the test and critical values for inference. More recent studies, on the context of time-series or spatial dependence, analyze and justify the use asymptotic variance formulas based on fixed-bandwidth approximations when pursuing heteroskedasticity, autocorrelation and spatial dependence robust inference (Bester et al. 2016; Chen, Liao, and Sun 2014; Kim, Sun, and Yang 2017; Sun 2014). This paper contributes to that literature in the context of local polynomial estimators and RD designs.

Section 5 provides simulation evidence that standard errors used by practitioners work well relative to the ones based in formulas suggested by the traditional approach (h → 0). The robust standard errors based on the fixed bandwidth formulas perform well for larger bandwidths, especially when heteroskedasticity is present. The intuition is that, for larger bandwidths, the homoskedasticity condition becomes less likely to hold as it would need to be valid over a larger support of the running variable. Finally, I provide an empirical application using Lee (2008), exemplifying with actual data the improvements obtained.

2 Model and Estimator

The interest lies in estimating the average treatment effect, τ, of a certain policy affecting part of a population of interest.[3] There are two types of RD designs: sharp and fuzzy. They differ in regard to the assignment of treatment and to the impact of the discontinuity on the assignment process. At first, I focus on sharp RD designs. The discussion and extensions for the fuzzy design are presented afterwards.

In the sharp design, the treatment status, d, is a deterministic function of a so-called “running” variable, x, such that,

d i = { 1  if  x i x ¯ 0  if  x i < x ¯ , }

where x¯ is the known cut-off point. Then let Y1 and Y0 be the potential outcomes corresponding to the two possible treatment assignments. As usual, we cannot observe both potential outcomes, having access only to Y = dY1 + (1 − d)Y0. As described by Hahn, Todd, and van der Klaauw (2001) and Porter (2003), under some weak smoothness assumptions, the average treatment effect can be estimated by comparing points just above and just below the discontinuity. If the running variable is continuous, as well as the conditional expectations of Y0 and Y1, the discontinuity in treatment assignment at x¯ provides the opportunity to identify the average treatment effect at the cutoff without additional parametric functional form restrictions on the conditional expectations of the outcome variable. The average causal effect of the treatment at the discontinuity is

τ E [ Y 1 Y 0 X = x ¯ ] = lim x x ¯ E [ Y X = x ] lim x x ¯ E [ Y X = x ] .

The sharp RD design uses the discontinuity in the conditional expectation of Y given X to uncover the average treatment effect at the cutoff. For a comprehensive review of RD designs and their applications and interpretation, see Lee and Lemieux (2009).

I focus on estimates of τ obtained using local polynomial estimation, which is the most common in applied work. The order p local polynomial estimator is defined as follows. In the sharp design case, given data (yi,xi)i=1,2,,n, let di=1[xix¯], k(⋅) be a kernel function, and h denote a bandwidth that controls the size of the local neighborhood to be averaged over. Also, define the p + 1 × 1 vector Z(x)=(1,(xx¯h),(xx¯h)2,,(xx¯h)p) and let(α^p+,β^p+) be the solution to the minimization problem:

min a , b 1 , , b p 1 n i = 1 n 1 h k ( x i x ¯ h ) d i [ y i a b 1 ( x i x ¯ h ) b p ( x i x ¯ h ) p ] 2 ,

while, similarly, (α^p,β^p) minimizes the same objective function, but with 1 − di replacing di. The estimator of the parameter of interest is given by

τ ^ α ^ p = α ^ p + α ^ p .

In the fuzzy design the probability of receiving treatment still changes discontinuously at the threshold, but is not required to go from 0 to 1,

lim x x ¯ Pr ( d X = x ) lim x x ¯ Pr ( d X = x )

This framework allows for a greater range of applications since it includes cases in which the incentives to receive treatment change discontinuously at the threshold, but are not strong enough to induce all individuals above it to be treated. The average treatment effect at the cutoff can be identified by the ratio of the change in the conditional expectation for the outcome variable to the change in the conditional probability of receiving treatment (Imbens and Lemieux 2008):

τ F lim x x ¯ E [ Y X = x ] lim x x ¯ E [ Y X = x ] lim x x ¯ E [ d X = x ] lim x x ¯ E [ d X = x ]

The estimator of the parameter of interest is given by the ratio

τ ^ F = α ^ p θ ^ p

where α^p is the estimator described above, and θ^ is the estimator for the change in the probability of being in the treated group at the cutoff which is obtained by using the same estimators with the treatment assignment variable, Di, as the dependent variable.

In practice, the implementation of RD designs is not without challenges, even in simple cases, that require implicit assumptions or choices by the researcher not fully reflected in this basic set up. Importantly for the aims of this paper, is that several of these situations can be better understood by relying on an alternative asymptotic approximation based on a fixed bandwidth. That not only sheds light on the assumptions and issues dealt implicitly by the practitioners, but also naturally allows for the careful use tools in parametric analysis to approach these challenges.

To develop this framework, I consider the following assumptions, which are more flexible than the ones used in the literature on RD designs, e.g., Hahn, Todd, and van der Klaauw (2001) and Porter (2003), etc. Let Fo(x) denote the density of x, m(x)=E[yx], and x¯ is the cutoff that determines treatment. Finally, define ε=yE[yX=x]=ym(x) and σ2(x)=E[ε2X=x]., Importantly, we allow for the conditional variance, σ2(x), to be not continuous and non-smooth in a region around the cutoff.

Assumption 1

k() is a symmetric, bounded, Lipschitz function, zero outside a bounded set; k(u)du = 1.

Assumption 2

Suppose the data (yi,xi)i=1,2,,n is i.i.d. and α is defined by

α = lim x x ¯ E [ y X = x ] lim x x ¯ E [ y X = x ] .

For the compact interval ˙, defined as [x¯h,x¯+h], (a) Fo(x¯+h)Fo(h) and Fo(x¯)Fo(x¯h) are bounded away from zero; (b) m(x) is lm times continuously differentiable for x{x¯}, and m is continuous at x¯ with finite right- and left-hand derivatives to order lm; and (c) the running variable, x, has enough variation within ˙ that we can identify (α^p+,β^p+) and (α^p,β^p).

Assumption 3

(a) σ2(x)=E[ε2X=x] exists and is well defined on ˙.

(b) For some ζ > 0, E[|ε|2+ζX=x] is uniformly bounded on .

Assumption 1 is very standard in the RD literature and does not restricts the support of the kernel being used in estimation significantly for most applications. Assumption 2 and Assumption 3 relax some of the usual smoothness conditions present in the literature. A relevant feature of the analysis performed in the next section is that fixed bandwidth asymptotics will allow the researcher to directly incorporate heteroskedasticity into the asymptotic distribution of the estimates of interest, including more complex dependence problems that are generally intractable in a shrinking bandwidth framework, e.g., clustering (Bartalotti and Brummet 2017). A similar point has been made by Kim, Sun, and Yang (2017) in the time series context. Most importantly, the conditional variance is allowed to not be continuous within the bandwidth around the cutoff. Hence, the framework developed here can be used to analyze cases in which the running variable might exhibit discreteness, which could be due to the nature of the running variable, as described by Lee and Card (2008), heaping (Barreca, Lindo, and Waddell 2016), measurement error (Bartalotti, Brummet, and Dieterle 2017 and Dong 2015) or some other characteristic of the application at hand.

3 Asymptotic Distributions

To derive the asymptotic properties of the estimator for τ, the usual regularity and smoothness conditions in the literature are sufficient for both the traditional and fixed-h approximations, e.g., Hahn, Todd, and van der Klaauw (2001) and Porter (2003), etc. As discussed in Section 2, we are able to relax some of the smoothness conditions by using a fixed bandwidth approach, which is particularly useful when dealing with scenarios in which the usual nonparametric approach is not feasible. In the following, let Fo(x) denote the density of x and m(x) denote the conditional expectation of y given x, i.e., m(x)=E[yx], and x¯ is the value of the running variable in which the discontinuity occurs. Finally, define ε=yE[yX=x]=ym(x) and σ2(x)=E[ε2X=x].

Using the traditional nonparametric asymptotic approximations, small-h, Hahn, Todd, and van der Klaauw (2001), Porter (2003), and Imbens and Lemieux (2008) among others obtain the following asymptotic variance formula for the (scaled) τ^,

(1) V s m a l l - h = σ 2 + ( x ¯ ) + σ 2 ( x ¯ ) f o ( x ¯ ) e 1 Γ 1 Δ Γ 1 e 1

where e1Γ1ΔΓ1e1 is a constant scalar which depends only on the order of the local polynomial, p, and the kernel used.[4] Crucially for practical purposes, it does not depend on the bandwidth (h) by construction. That is natural since the approximation is based on a shrinking bandwidth around the cutoff. However, that does not capture heteroskedasticity or changes in the values for the density of x in the range of data actually used in practice. That issue has been recognized in the literature, e.g. Imbens and Lemieux (2008) and Calonico, Cattaneo, and Titiunik (2014) and by practitioners who commonly implement the usual heteroskedasticity robust White-Huber-Eicker standard errors (or weighted analogues depending on the choice of kernel) when implementing RD based on intuitive or “pre-asymptotics” arguments.

Interestingly these heteroskedasticity robust standard errors can also be justified by an approximation under the asymptotic nesting where the bandwidth parameter, h, is fixed. The fixed-h asymptotic approximation can be summarized in the following result.

Theorem 1

Suppose Assumption 1, Assumption 2 and Assumption 3 hold. If h is fixed, positive, as n → ∞, then

A v a r [ n h ( α ^ p α ) ] p V f i x e d - h ,

where

V f i x e d - h = e 1 [ ( Γ + ) 1 Δ + ( Γ + ) 1 + ( Γ ) 1 Δ ( Γ ) 1 ] e 1

Additionally, if Bias[nh(α^pα)]=op(1), then

(2) n h ( α ^ p α ) d N ( 0 , V f i x e d - h )

where Γ+(), Δ+(), e 1,γj+, γj, δj+, and δj are defined in the Appendix A.

The fixed-h asymptotic variance formula is given by Vfixed-h, and depends directly on the behavior of σ2(x), fo(x) inside the bandwidth around the cutoff, p, and the kernel function through (Γ+)1Δ+(Γ+)1 and (Γ)1Δ(Γ)1. Hence, it captures the impact of h on the asymptotic variance accounting for potential local heteroskedasticity. This asymptotic approximation provides a natural and intuitive way to understand the sources of improvement in inference observed in practice when using these standard errors, as discussed below.

There are two noteworthy cases in which the formulas for the fixed-h asymptotic variance (and bias) simplify to the small-h formulas. First, if in the bandwidth around the cutoff, fo(x), σ2(x) and m(x) are continuous, then when h → 0, [5]

lim h 0 V f i x e d - h = V s m a l l - h .

Hence, if h is small, fixed-h and small-h provide similar approximations to the asymptotic behavior of α^p. This is intuitive since the fixed bandwidth approximation is valid for any value of h, and the difference between the formulas arising from the behavior of σ2(x), fo(x) inside the bandwidth will likely be small as the support of the running variable on which they are being evaluated becomes smaller.

Second, if fo(x) and σ2(x) are constant around the cutoff, the fixed-h asymptotic variance formula simplifies to the small-h formula, i.e., Vfixed-h = Vsmall-h.[6] This fact makes clear that the differences between the fixed bandwidth and traditional approximations are due to incorporating the behavior of fo(x) and σ2(x) in the ranges around the cutoff in the first case, while considering only its values at the cutoff, fo(x¯) and σ2(x¯) in the second.

3.1 Addressing the Asymptotic Bias

The asymptotic normality described in Theorem 1 holds even if the bias is non-negligible in the sense that nh(α^pE[α^p])dN(0,Vfixed-h) similarly to the results presented by Pagan and Ullah (1999). Even though not used directly in the asymptotic variance discussion which is the focus of this paper, an ancillary result is the asymptotic bias formula under the fixed bandwidth asymptotics (see Appendix A),

(3) B i a s [ n h ( α ^ p α ) ] = e 1 { ( Γ + ) 1 [ 0 k ( u ) Z ( x ¯ + u h ) m ( x ¯ + u h ) d F o ( x ¯ + u h ) ] ( Γ ) 1 [ 0 k ( u ) Z ( x ¯ u h ) m ( x ¯ u h ) d F o ( x ¯ u h ) ] } .

A more familiar representation of the asymptotic bias term can be obtained if we are willing to assume differentiability of m(x) and fo(x) up to order (p + 2) in the local support ˙ and approximate it by polynomial of order p + 2. Then one can rewrite the asymptotic bias term above using a (cruder) approximation as

(4) B s m a l l - h = h p + 1 ( p + 1 ) ! [ m ( p + 1 ) + ( x ¯ ) ( 1 ) p + 1 m ( p + 1 ) ( x ¯ ) ] e 1 Γ 1 [ γ p + 1 γ 2 p + 1 ] + h p + 2 ( p + 2 ) ! [ m ( p + 2 ) + ( x ¯ ) ( 1 ) p + 2 m ( p + 2 ) ( x ¯ ) ] e 1 Γ 1 [ γ p + 2 γ 2 p + 2 ] + o p ( h p + 2 )

which are the formulas obtained in the traditional asymptotics where h → 0 (Calonico, Cattaneo, and Titiunik 2014; Porter 2003).[7]

Both equations provide some intuition for conditions under which the condition of negligible asymptotic bias, Bias[nh(α^pα)]=op(1), in Theorem 1 holds.

Under the more general conditions discussed in Section 2 with fixed bandwidth, the bias will be asymptotically negligible if the local polynomial of order p correctly captures the relevant features of m(x) in the neighborhood of the cutoff. That could be because the model is correctly specified (which is often implicitly assumed by practitioners when implementing RD) and, hence the bias vanishes asymptotically as n → ∞. More interestingly, that condition will hold with h fixed if the researcher treats p as embedded in an increasing sequence as n increases as in a series estimator at the cost of imposing additional smoothness conditions on m(x) in the neighborhood around the cutoff. Details on these conditions could be found on Pagan and Ullah (1999) section 3.9 and Andrews (1991). It is important to notice that in the RD case, the restrictions on m(x) need only to hold in the smaller support ˙ since our focus is to improve the polynomial fit only on this neighborhood, hence maintaining a substantial amount of the flexibility afforded by the RDD.

In the case where the running variable is continuous and the usual smoothness conditions hold, the asymptotic bias described above will be negligible if nh2p+3 → 0 (Hahn, Todd, and van der Klaauw 2001; Porter 2003). As Calonico, Cattaneo, and Titiunik (2014) points out, in that case the usual MSE-optimal bandwidths proposed by Imbens and Kalyanaraman (2012) are too“large” because they do not satisfy this condition. They proposed an analytical bias correction by estimating the first term in equation 4, reducing the order of the remaining asymptotic bias. Therefore, if such bias correction procedure were to be implemented , the condition Bias[nh(α^pα)]=op(1) would hold if nh2p+5 → 0. This bias correction procedure has become standard in the literature and is readily available in STATA and R in the package rdrobust (Calonico et al. 2017).

It is well known that, unless the true specification of the population model is known, the local polynomial estimator is inconsistent. Even though the treatment effect is nonparametrically identified, in practice the local polynomial estimator of the RD design will potentially provide a biased estimate of the average treatment effect, unless the polynomial used correctly specifies the conditional expectation of Y in the bandwidth around the cutoff. Nevertheless, the formulas for Bfixed-h provide important additional intuition relative to Bsmall-h. While the latter depends on a first-order (or second-order) approximation of the bias term, the former directly captures the effect of the bandwidth used and its impacts on the approximation provided by the a local polynomial of order p in the whole support within the bandwidth.

It is worth noting that both fixed-h and small-h asymptotic approximations are based on the same estimator for α^. For a given bandwidth used in practice, the initial bias present in the estimate is given. Using the small-h approximation for standard errors does nothing to eliminate potential bias in practice and any potential bias should be treated as a separate empirical issue. When implementing RD, the researcher can consider some potential bias mitigation strategies. The analytical bias correction procedures proposed by Calonico, Cattaneo, and Titiunik (2014) and its bootstrap equivalent in Bartalotti, Calhoun, and He (2017) could be used, and rely on first order approximations and have performance dependent on the curvature of m(x) and the bandwidth used. Both adjustments require an additional adjustment to the standard errors due to the estimation of the first-order bias adjustment.

The key insight obtained by the fixed bandwidth asymptotic bias presented above is that using a first-order bias adjustment might not be sufficient to capture the complexity of m(x). Especially for larger bandwidths, one would expect that approach to become less reliable in controlling the asymptotic bias. Hence, the fixed bandwidth asymptotic framework suggests two related empirical strategies of bias mitigation.

As described above, one could implement a series estimator within the fixed bandwidth such that the complexity of the polynomial fitted locally (its order p) increases as more data is accumulated around the cutoff.

Second, the wild bootstrap procedure in Bartalotti, Calhoun, and He (2017) can be easily be adjusted to provide higher order bias correction and inference, by estimating a higher order local polynomial, e.g. (p + 2), in step 1 of Algorithm 1 in Bartalotti, Calhoun, and He (2017, p. 431). Simulations comparing coverage rates obtained by the usual analytical first-order bias mitigation (Calonico, Cattaneo, and Titiunik 2014) and the higher order bootstrap approach are presented in the Appendix Section A.1.

Intuitively, the potential bias is due to the local model misspecification since the local polynomial used might not be able to correctly capture m(x)’s features within the bandwidth. To that point, it is interesting to draw a parallel between Bfixed-h with the issue of model misspecification in parametric models (White 1982, 1996). The problem of estimating the average treatment effect at the cutoff can be seen as one of correctly estimating E[YX] on both sides of the cutoff. By using a relatively small bandwidth, we fit the conditional expectation on a restricted support, and hence expect a polynomial of order p to produce a better fit than if we were trying to fit E[YX] globally. This better fit is the benefit associated with a local approach, since it allows the conditional expectation to be unrestricted outside the bandwidth.

3.2 Fuzzy Regression Discontinuity

The Fuzzy RD case follows the delta method applied to τ^F=α^pθ^p (Porter 2003),

A v a r [ n h ( τ ^ F τ F ) ] p V F R D - f i x e d - h = 1 θ 2 V α 2 α θ 3 C α θ + α 2 θ 4 V θ ,

where Vα and Vθ are the same asymptotic variances presented in Theorem 1 with the outcome and treatment status as the variable of interest. If, additionally Bias[nh(τ^FτF)]=op(1), and a multivariate CLT holds for (nh(α^α)nh(θ^θ)) (Pagan and Ullah 1999) then

(5) n h ( τ ^ F τ F ) d N ( 0 , V F R D - f i x e d - h )

The proof follows directly from the delta method and is omitted.

All the terms that appear in the asymptotic distribution above, except for Cαθ, can be obtained from Theorem 1.

Theorem 2

Let σεη(x)=E[εηX=x]. If α^ and θ^ are the local polynomial estimators and the conditions of Theorem 1 hold for both estimators, then

C α θ = e 1 [ ( Γ + ) 1 Δ + ρ ( Γ + ) 1 + ( Γ ) 1 Δ ρ ( Γ ) 1 ] e 1

where Δ+()ρ=[ρ0+()ρp+()ρp+()ρ2p+()],

ρ j + = 0 k 2 ( u ) u j σ ε η ( x ¯ + u h ) d F o ( x ¯ + u h ) ,

ρ j = ( 1 ) j 0 k 2 ( u ) u j σ ε η ( x ¯ u h ) d F o ( x ¯ u h ) , Γ+ and Γ are defined as in previous results.

The proof is presented in the Appendix A. Similarly to the results for the Sharp RD, the asymptotic covariance formula converges to the small-h asymptotic covariance both as h → 0, or for fixed-h if in the bandwidth around the cutoff, fo(x) and σεη(x) are constant.[8]

4 Variance Estimators

To perform inference about α, appropriate estimates for the asymptotic variance formulas from Theorem 1 are necessary. The components of Vfixed-h, Δ+ and Γ+, have typical elements given by

γ j + = 0 k ( u ) u j d F o ( x ¯ + u h ) = E [ h 1 k ( x ¯ x h ) ( x ¯ x h ) j d ] δ j + = 0 k 2 ( u ) u j σ 2 ( x ¯ + u h ) d F o ( x ¯ + u h ) = E [ h 1 k ( x ¯ x h ) 2 ( x ¯ x h ) j d ε 2 ]

and similarly for γj and δj. A natural estimator for the asymptotic variance is given by their sample analogues, using the estimated residuals ε^=ym^(x),

γ ^ j + = ( n h ) 1 i = 1 n k ( x ¯ x i h ) ( x ¯ x i h ) j d i δ ^ j + = ( n h ) 1 i = 1 n k ( x ¯ x i h ) 2 ( x ¯ x i h ) j d i ε ^ i 2 ,

which are consistent by standard arguments in both asymptotic frameworks.

Then, the natural plug-in estimator of the fixed-h variance-covariance matrix is given by

(6) [ ( Γ ^ + ) 1 Δ ^ + ( Γ ^ + ) 1 + ( Γ ^ ) 1 Δ ^ ( Γ ^ ) 1 ] .

These are simple averages of the data and kernel weights, and they have the familiar “sandwich form” (Fan and Gijbels 1996). This estimator is analogous to the usual heteroskedasticity robust standard errors in a general weighted least squares framework, and it comes naturally from the fixed-h framework developed above. The fixed-h approach provides a framework that justifies the use of such estimators by practitioners. Intuitively, the variance estimators are “robust to the choice of bandwidths,” since they are valid for any finite h, take into consideration the impact of higher order polynomials on the estimator’s variance and are flexible regarding the conditional variance and density of X around the cutoff. Note that,[(Γ^+)1Δ^+(Γ^+)1+(Γ^)1Δ^(Γ^)1]p[(Γ+)1Δ+(Γ+)1+(Γ)1Δ(Γ)1]=Vfixed-h.

In the rectangular kernel case, the variance estimator in equation (6) simplifies to the usual heteroskedastic robust variance estimator when using the data just above and below the cutoff.

Variance estimators based on small-h asymptotic variance formulas, as proposed by Porter (2003) and Lee and Lemieux (2009), are not fully robust to local heteroskedasticity. For example, Porter (2003) suggested an estimator for the variance of α^ based on the small-h approximation that requires only the estimation of the density of x andconditional variance of the errors at the cutoff. Let

(7) σ ^ 2 + ( x ¯ ) = ( n h ) 1 i = 1 n k ( x ¯ x i h ) d i ε ^ i 2 1 2 f ^ o ( x ¯ ) ,

(8) σ ^ 2 ( x ¯ ) = ( n h ) 1 i = 1 n k ( x ¯ x i h ) ( 1 d i ) ε ^ i 2 1 2 f ^ o ( x ¯ ) ,

(9) f ^ o ( x ¯ ) = ( n h ) 1 i = 1 n k ( x ¯ x i h ) ,

and

(10) σ ^ 2 + ( x ¯ ) + σ ^ 2 ( x ¯ ) f ^ o ( x ¯ ) e 1 Γ 1 Δ Γ 1 e 1

is the estimator for the traditional asymptotic variance matrix. The matrix Γ1ΔΓ1 can be calculated directly because it is a deterministic function of the kernel and local polynomial order.

An additional drawback of the variance estimator in formula (10) is the need to estimate fo(x¯), which is sidestepped if formula (6) is used. To obtain f^o(x¯), we need to choose a kernel and a bandwidth for the density estimator, increasing the number of tuning parameters to be chosen.[9]

5 Simulations

This section presents simulation evidence displaying the empirical coverage of a standard t-statistic used to perform inference about the treatment effect of interest. The objective of the simulations is to determine the extent to which the heteroskedasticity robust variance estimator discussed in Section 4 improves on the small-h, homoskedastic variance estimator when the bias introduced by local misspecification is small. As shown previously, it is expected that both approaches yield similar test performance when the bandwidths are small, and differences in empirical coverage should be of greater importance when local heteroskedasticity is present around the cutoff.

To evaluate the relative performance of tests based on the fixed-h and small-h asymptotic variance approximations and their respective estimators, in this section I present evidence from two data generating processes for which the local polynomial estimator has negligible or mild asymptotic bias (in the sense that the bias does not overwhelm inference completely), in line with the condition imposed on Theorem 1. Obviously, if the bias in the local linear estimator is important, the inference on both approaches would suffer equally, since they use the same estimator.

Evidence from the simulations presented below indicates that, both in the theoretical (unfeasible) and feasible cases, inference using the heteroskedasticity robust standard errors performs well, especially for larger bandwidths and when local heteroskedasticity is present.

The simulations are based on a sharp RD design and consist of 2,000 replications with sample size n equal to 1,000 observations; the effective sample size included depends on the bandwidth used.[10] For the DGPs used the running variable is drawn from X2Beta(2,4)1, and σ(x) = 0.1295 for the homoskedastic case, σ(x)=0.1295+(x)2 for the first (mild) heteroskedastic case, and σ(x)=0.1295+(5x)2 for the second (acute) case. DGP 1 is linear in X and will serve as a benchmark of the refinements offered by the robust asymptotic approximation developed in Section 3. DGP 2 follows Calonico, Cattaneo, and Titiunik (2014) and is based on an empirical RD problem, corresponding to the regression function fitted to Lee’s (2008) data both above and below the cutoff. Since this DGP introduces bias for some bandwidths, it will allow us to compare the respective coverage obtained by both approaches in the presence of some bias.

The bandwidths used range from 0.1 to 1. While bandwidth selection algorithms have been proposed in the literature, in practice implementation requires the researcher to use a particular set of bandwidths. For example, for discrete running variables, a discretely positive bandwidth is necessary. Even with a continuous running variable, sample sizes often are small enough that concerns about precision impel researchers to use a relatively large bandwidth. As pointed out by Calonico, Cattaneo, and Titiunik (2014), bandwidth selectors in the literature “typically lead to bandwidth choices that are too large for the usual distributional assumptions to be valid.” Furthermore, as pointed out in several recent studies, these optimal bandwidth selectors could change significantly from setting to setting in a way not captured by the bandwidth selection algorithm [see Calonico et al. (2017) or Bartalotti, Calhoun, and He (2017) for cluster cases]. Hence, most researchers use data-driven bandwidth selectors to benchmark their results while reporting results for a set of chosen ad hoc bandwidths, which is adequate. The simulations presented here cover a wide range of fixed bandwidth values including those suggested by the selectors, being representative of the practice in the applied literature, highlighting the robustness of the fixed bandwidth approximation regardless of bandwidth choice.

In Appendix A.1 I present two additional DGPs which have high bias and compare two potential bias mitigation approaches, one based on small-h bias approximation in equation 4 (Calonico, Cattaneo, and Titiunik 2014) and a novel approach inspired by the fixed-h bias approximation in equation 3, which is based on a modified version of the bias correction bootstrap described in Bartalotti, Calhoun, and He (2017).

The empirical coverages presented are the fraction of rejections in the 2,000 repetitions for a two-sided test of nominal size 5%. More specifically, the DGPs are:

  • DGP 1: yi=0.48+1.27xi+uiif x<0,0.52+0.84xi+uiif x0.

  • DGP 2: yi=0.48+1.27xi+7.18xi2+20.21xi3+21.54xi4+7.33xi5+uiif x<0,0.52+0.84xi3.00xi2+7.99xi39.01xi4+3.56xi5+uiif x0.

    The simulations use the local linear estimator (p = 1), since it is the preferred choice in applied work.[11] The next subsection compares the test coverages obtained by the theoretical fixed-h and small-h asymptotic distributions derived in Section 3. Subsection 5.2 compares the empirical coverages obtained with (feasible) estimated standard errors.

5.1 Simulations for Infeasible Inference

This subsection documents the differences in test coverages based on the theoretical fixed-h and small-h asymptotic variance formulas. The tests are infeasible since they depend on knowledge of fo(x) and σ2(x) around the cutoff, and are intended to highlight the improvements achieved by the fixed-h asymptotic variance approximation in describing the behavior of the estimators of interest.

These comparisons illustrate how conventional small-h approximations, while valid for small bandwidths, becomes unreliable as we move away from the cutoff, even in the absence of bias. That finding is natural, since the small-h asymptotic approximation’s derivation is based on the boundary variance and density and should not be expected to adequately describe the estimator’s behavior away from the threshold.

The empirical coverages for DGP 1 in the cases studied are presented on Panel A on Table 1. For smaller bandwidths, both asymptotic variances generate similar empirical coverages as expected, but there is a significant decrease in the small-h coverage as the bandwidth increases, while fixed-h inference increasingly outperforms the standard approximation.

Table 1:

Infeasible Inference – Empirical Coverage of t-test (%) – Nominal Test Size: 5%.

Homoskedastic
Heteroskedastic 1
Heteroskedastic 2
Bandwidth Fixed-h Small-h Fixed-h Small-h Fixed-h Small-h
Panel A: DGP 1 (Linear)
0.1 94.4 94.2 94.3 94.0 94.7 84.9
0.2 94.6 94.5 94.5 93.4 95.0 50.4
0.3 95.2 94.8 95.3 92.0 94.3 26.0
0.4 95.5 94.8 95.0 89.2 95.1 15.4
0.5 95.5 94.3 95.3 84.2 94.5 10.3
0.6 95.3 93.3 95.3 77.8 94.5 7.4
0.7 94.6 92.2 95.2 70.0 94.0 5.0
0.8 94.9 91.1 94.8 62.2 95.2 4.3
0.9 95.0 89.7 94.3 57.3 94.0 4.1
1.0 95.2 87.9 94.0 52.5 94.0 3.5
Panel B: DGP 2 [Lee (2008)]
0.1 93.8 93.8 93.8 93.4 93.8 83.5
0.2 83.2 82.8 84.0 81.4 92.1 49.5
0.3 62.3 60.9 67.7 59.7 94.2 27.1
0.4 53.2 51.2 64.9 49.9 94.8 15.8
0.5 66.6 63.0 79.7 59.5 94.9 10.7
0.6 84.1 80.8 89.5 69.3 94.8 7.4
0.7 91.8 88.3 94.1 68.2 95.8 5.0
0.8 87.8 81.7 93.3 59.7 95.0 3.9
0.9 56.1 42.4 86.2 43.5 94.3 2.9
1.0 22.7 13.4 77.8 27.2 95.8 2.8

DGP 2, based on data in Lee (2008), presents qualitatively similar results, as can be seen on Panel B in Table 1. Note that the coverage varies severely depending on the bandwidth used due to the bias implied by each choice. Nevertheless, the bias is small enough in this case not to overwhelm the tests completely, and it is clear that tests based on fixed-h asymptotic variance produce more adequate coverage even in the presence of bias. Furthermore, the improvements increase with bandwidth sizes, as predicted.

As described in Section 3, the refinements obtained by the fixed-h approach are due to considering the behavior of fo(x) and σ2(x) inside the bandwidth. Hence, in the presence of heteroskedasticity those should be even more important.

The coverages for the (mild) heteroskedastic case, and the second (acute) heteroskedastic cases are presented in the last four columns of Table 1. The (infeasible) tests based on fixed-h asymptotics behave well in all cases, highlighting the robustness of the approach. In the first case, the small-h asymptotic approximation presents a more pronounced pattern of decreasing coverage as the bandwidths increase, compared to the homoskedastic case; as expected. In contrast, for the acute heteroskedasticity case, the small-h based test has a steep decline in coverage as the bandwidth increases, since it is not able to properly capture the effect of the heteroskedasticity for larger bandwidths.

The heteroskedasticity described in this simulation is a “worst case scenario” for small-h asymptotics since the conditional variance of the error at the cutoff, σ2(x¯), is at the extreme of the range of values assumed by σ2(x) in any given bandwidth. As can be seen from formula (1), the small-h and fixed-h asymptotic variances will be more similar if σ2(x¯) is close to the “weighted average” of σ2(x) inside the bandwidth.

Some points are worth emphasizing. First, the general pattern is that the empirical coverages obtained using fixed-h results from Theorem 1 outperform those from the small-h approximations, especially for larger bandwidths. Second, for smaller bandwidths, small-h asymptotics provide similar coverages to the fixed-h approach, making it clear that the core difference is due to the suitability of the restrictions imposed on fo(x) and σ2(x) as the bandwidth increases (Corollary 2). Naturally, those restrictions tend to be less realistic for larger bandwidths. Third, in the presence of heteroskedasticity, the small-h approximation can have poor performance, while the fixed-h asymptotic variance still provides a reliable approximation for the estimator’s behavior.

5.2 Simulations for Feasible Inference

This subsection presents simulations for the empirical coverage of the tests using two different estimated standard errors. The first is based on the fixed-h asymptotic variance and is given by formula (6), which is akin to treating the estimates as locally parametric as discussed above. The second is analogous to the ones proposed by Porter (2003) and Imbens and Lemieux (2008), and described by formula (10). Even though it is nowadays common practice to implement variance estimators that are similar to the usual heteroskedasticity robust standard errors in RD applications these exercises are helpful. They provide evidence of the shortcomings in relying on standard error formulas based on small-h asymptotics.

For locally homoskedastic errors, the heteroskedasticity robust standard errors’ estimator incorporates the gains of improved inference as described in the theory and infeasible simulations. These results are seen in the first two columns of Table 2. Perhaps surprisingly, tests obtained using small-h standard error estimators behave very similarly to those of fixed-h even at relatively larger bandwidths, for which one would expect a significantly smaller coverage considering the evidence in subsection 5.1. Essentially, the small-h variance estimators benefit from the fact that, by using data on xi and ε^i in practice, the estimator for the standard errors partially captures the behavior of fo(x) and σ2(x) in the range around the cutoff – even though the theoretical small-h asymptotic approximation ignores it.

Table 2:

Feasible Inference – Empirical Coverage of t-test (%) – Nominal Test Size: 5%.

Homoskedastic
Heteroskedastic 1
Heteroskedastic 2
Bandwidth Fixed-h Small-h Fixed-h Small-h Fixed-h Small-h
Panel A: DGP 1 (Linear)
0.1 94.0 93.8 94.2 94.0 94.3 98.0
0.2 94.3 94.3 94.3 95.5 94.2 99.4
0.3 94.8 94.2 94.9 97.0 94.2 99.5
0.4 95.1 94.8 95.0 97.8 94.6 99.7
0.5 95.8 94.3 95.1 98.8 94.9 99.7
0.6 94.8 93.8 94.6 98.8 94.5 99.2
0.7 94.5 93.0 94.8 99.0 94.2 99.4
0.8 94.4 92.3 94.2 99.2 94.3 99.5
0.9 94.5 91.6 93.9 99.4 93.5 99.4
1.0 94.3 91.8 93.4 99.2 93.3 99.4
Panel B: DGP 2 [Lee (2008)]
0.1 93.2 93.1 93.5 93.5 93.7 98.0
0.2 82.8 82.5 84.0 85.7 92.0 99.0
0.3 61.7 60.2 67.6 74.5 94.0 99.1
0.4 52.8 50.8 63.9 76.0 94.2 99.5
0.5 66.5 63.7 78.8 90.3 94.7 99.7
0.6 84.6 81.5 89.0 96.5 94.0 99.5
0.7 92.0 89.5 93.5 98.8 94.6 99.6
0.8 87.9 83.9 92.7 98.2 93.9 99.1
0.9 55.9 50.0 85.5 97.0 93.5 99.5
1.0 23.2 20.1 76.5 94.0 94.5 99.7

To see this point, note that the researcher is not able to exactly estimate fo(x¯) and σ2(x¯) from a given dataset, as would have been suggested by the theoretical small-h’s asymptotic variance formula. By being “forced” to estimate the variance and the density within the bandwidth, the small-h variance estimator is able to partially capture the local behavior of those terms.

In the heteroskedastic cases, the fixed-h variance estimator produces tests with coverage very close to the test’s nominal size for DGP 1, while the coverage for small-h rapidly increases towards 1 as the bandwidth increases. Note that the empirical coverage shows under-rejection in this case since, first, σ2(x) increases away from the cutoff and, second, σ^2+ and σ^2 will significantly overestimate the true weighted average of the conditional variance within the bandwidth.

Hence, there is evidence that heteroskedasticity can be accurately captured by tests based on fixed-h approximations; small-h based standard errors can produce tests with significant size distortion. Furthermore, the distortion on coverage encountered on this particular DGP for heteroskedasticity is not even in the same direction that the unfeasible (theoretical) tests would predict!

Therefore, it seems the heteroskedasticity robust standard errors are a “safer choice” for practitioners since it is based on an asymptotic approximation that is “robust to bandwidth choice” and its computation is very easy once a kernel and bandwidth are chosen. Furthermore, the robust variance estimator has the advantage of not requiring the estimation of fo(x¯). That estimation would entail the choice of a (potentially different) kernel and bandwidth for f^o(x¯). These two tuning parameters might significantly alter the empirical size of the tests and depend on the discretion of the researcher.

To the empirical researcher, a useful conclusion can be drawn from these simulations. By performing inference using fixed-h based standard errors, which is akin to treating the estimates as locally parametric and simplifies to the standard heteroskedastic robust standard errors for rectangular kernels, one can feel confident about the standard errors for any bandwidth used.

The researcher can then focus his attention on choosing a bandwidth to deal with the bias at hand.[12] As pointed out in Section 3 and the simulations above, this issue similarly affects the empirical test coverage ragardless of the standard errors used. However, the fixed-h approximation has the benefit of clarifying that, even under the validity of the RD design, local misspecification can be an important factor. Taking advantage of RD to estimate a treatment effect of interest is the exercise of estimating the conditional expectation of the outcome variable inside the bandwidth. Naturally, for larger bandwidths, one would expect that the likelihood of misspecification increases, requiring higher-order bias corrections. I discuss potential approaches to address the bias on Section 3.1. Appendix A presents simulation evidence that higher-orders bias mitigation procedures motivated by the fixed-h asymptotics provide improved coverage performance relative to first-order bias correction based on the small-h asymptotics, especially for larger bandwidths.

6 Empirical Example

To illustrate the potential differences in the small-h and fixed-h approximations discussed above, this section uses data from Lee’s (2008) study of the electoral advantage of incumbency in the United States.[13]Lee (2008) argues that the U.S. Congressional electoral system has a “built-in” RD design. That is, being the incumbent party in a congressional district is a deterministic function of the candidate-party’s vote share in the district during the last electoral cycle. This feature can be described in the following model:

v i 2 = α w i 1 + β v i 1 + γ d i 2 + e i 2 d i 2 = 1 [ v i 1 1 2 ] ,

where vit is the democratic candidate’s vote share in congressional district i in election year t, wit is a vector of characteristics or agents’ choices (potentially unobserved) as of election day on period t, and dit indicates if the Democratic party is the incumbent in district i at election period t. We also assume that fi1(vw), the density of vi1 conditional on wi1, is continuous in v. The main issue in the analysis, as discussed in detail by Lee (2008), is that wit is potentially unobserved and would likely be correlated with being incumbent in a certain district. For example, wi1 would include party resources, demographic characteristics, and political leaning of districts, all of which could affect both the vote share in periods 1 and 2, thus biasing the estimates for the causal effect of incumbency.

The thought experiment to find the causal effect of incumbency would be to randomly allocate incumbency in districts to Democrats and Republicans while keeping all other characteristics constant. This clearly cannot be done, but by looking at closely contested elections, we can consider that the incumbents in those districts, whether Democrats or Republicans, were decided randomly, given the inherent uncertainty regarding the outcome of such events.[14] This can provide reasonable estimates of the causal effect of incumbency in closely contested elections.

The data includes U.S. congressional election returns from 1946 to 1998, excluding the years ending in ‘0’ and ‘2’ due to the decennial redistricting which characterizes the U.S. congressional electoral system. The running variable is defined as the difference in vote share between the Democratic candidate and the strongest opponent. Hence, the Democrat wins the election when this variable crosses the 0 threshold – i.e. there is a positive difference in vote share, indicating that the Democrat has received more votes.

Table 3 shows the estimated advantage of incumbency and the estimated standard errors given by formulas based on a fixed bandwidth approximation (6), which in the rectangular kernel case is the usual heteroskedasticity robust s.e., and a small-h approximation (10), respectively.

Table 3:

Incumbency Effects and Estimated Standard Errors – Lee (2008).

Dependent Variable: Democrat Vote Share – Election t + 1
All | M a r g i n | 0.5 | M a r g i n | 0.05
Panel A: Nadaraya-Watson Estimator (p = 0)
Estimated Effect 0.351 0.257 0.096
(Fixed-h Standard Errors) (0.0041) (0.0038) (0.0090)
[Small-h Standard Errors] [0.0041] [0.0038] [0.0090]
Difference (%) 1.6% 0.5% 0.1%
Panel B: Local Linear Estimator (p = 1)
Estimated Effect 0.118 0.090 0.048
(Fixed-h Standard Errors) (0.0056) (0.0062) (0.0159)
[Small-h Standard Errors] [0.0068] [0.0071] [0.0180]
Difference (%) 21.4% 14.5% 13.2%
Panel C: Local Polynomial Estimator (p = 4)
Estimated Effect 0.077 0.066 0.105
(Fixed-h Standard Errors) (0.0113) (0.0144) (0.0312)
[Small-h Standard Errors] [0.0167] [0.0179] [0.0447]
Difference (%) 46.5% 24.3% 42.4%
Observations 6558 4900 610

Panel A presents estimates of the Nadaraya–Watson estimator (p = 0) for different bandwidths. Column 1 uses all the data available, column 2 looks only at elections for which the margin of victory in period t − 1 was within 50% of the total votes, and column 3 uses only elections with margins lower than 5%. Panel B presents similar estimates and standard errors obtained by a local linear estimator (p = 1), which is the preferred specification in several RD applications in the literature and is expected to significantly reduce bias in the estimates of the incumbency advantage effect.

Panel C presents the results Lee (2008) called the “parametric fit” in the first column, which uses a polynomial of order 4 to fit the whole data. The other two columns use smaller bandwidths to emphasize how the order of the polynomial chosen to fit the data can significantly impact estimates and standard errors, especially in small samples, by over-fitting the data.

The point estimates are exactly the same as presented by Lee (2008) for panel A and the first column in panel C. The remaining estimates are new. The results indicate a significant incumbent advantage in U.S. congressional races, even when comparing districts that had close elections in the previous electoral cycle for which the determination of incumbent status can be considered “as good as randomized.”[15] See Table 5 in the Appendix A for similar results that include the pre-determined variables that Lee (2008) uses and the comparisons for fixed-h and small-h standard errors. The observed differences in pre-determined variables between incumbents and challengers vanish as we compare districts that previously had competitive races, lending credibility to the RD as an identification strategy for the incumbency effect. More relevant to this paper are the differences between the competing standard error estimates.

The estimated standard errors, shown in Table 3, differ significantly, with the fixed-h standard errors being smaller in most of the cases – as one would expect, given the simulations in Section 5. Also as expected, using smaller bandwidths comes at a large cost in terms of precision of the estimates due to the smaller amount of data available, negatively affecting both standard error estimators. The two last columns in all three panels show an increase of the estimated standard errors when the data is restricted to districts that had victory margins smaller than 5% in the previous election.

Interestingly, it is usual for the relative gap between the two standard errors estimates to decline as the bandwidth shrinks, as predicted in Section 3. For panels A and B and for the first two columns of panel C, this pattern is confirmed as we compare the percent difference between the standard errors within panels.[16] For the third entry in panel C, note that both standard errors become larger than those for the wider bandwidth in column 2, but the relative gap increases. That can be due to the fact that the fixed-h standard error estimate requires the calculation of 4(2p + 1) terms (see equation 6). Hence, it is more susceptible to the combination of large polynomial orders and small sample sizes induced by the smaller bandwidth. Nevertheless, it still provides tighter confidence intervals than the small-h estimated standard error.

Finally, note that the differences between standard errors increase as the order of the polynomial used to fit the data increases for the same bandwidth. This is due to the fact that the small-h standard error estimator is based on a fixed scaling term for a given kernel and polynomial order choice, e1Γ1ΔΓ1e1. Intuitively, as the polynomial order increases, the greater the distortion is likely to be between using this approximation and the more refined formula implied by the fixed-h approach.

7 Conclusion

This paper focuses on presenting an alternative asymptotic approximation for the standard RD treatment effects estimator using a fixed bandwidth. This approximation provides expressions for the estimator’s asymptotic variance that incorporate the bandwidth used by the researcher, and leads naturally to the standard error formulas used in the applied literature, i.e., White-Huber-Eicker robust standard errors when a rectangular kernel is implemented.

The proposed approximation is valid under an asymptotic framework where the bandwidth parameter, h, is fixed, as well as under conventional RD asymptotics where h → 0, while the traditional asymptotic approximations based on a shrinking bandwidth suggested, for example, in Imbens and Lemieux (2008) and Porter (2003) are not adequate for relatively larger bandwidths used in practice, especially in the presence of heteroskedasticity. Furthermore, the approximations provided in this study relax the assumption of continuity of the conditional variance of the outcome at the cutoff, providing a theoretical framework for that case. The common empirical strategy of using robust standard errors in RD studies is appropriate when the misspecification is small and heteroskedasticity is a concern.

Simulations document the theoretical refinements provided by relying on asymptotic variances based on fixed bandwidth asymptotics, and illustrate that tests using heteroskedasticity robust standard errors produces significantly more reliable test coverage than its counterparts using standard errors based on asymptotic approximations where h → 0.

Furthermore, an alternative, fixed-h, approximation for the estimator’s potential asymptotic bias is obtained and provides additional intuition on the robustness-precision trade-off facing the researcher in RD designs. I investigate two alternatives to address the bias in practice. The first is the widely used analytical approach based on the traditional asymptotics suggested by Calonico, Cattaneo, and Titiunik (2014). The second follows more naturally from the fixed bandwidth asymptotics developed here and is based on a iterative bootstrap approach that provides higher order bias mitigation by building upon the wild bootstrap algorithm in Bartalotti, Calhoun, and He (2017).

Acknowledgement

I would like to thank Gray Calhoun, Jeff Wooldridge, Tim Vogelsang, Gary Solon, Matias Cattaneo, Peter Schmidt, Steve Dieterle, Quentin Brummet, Valentin Verdier, Ilya Rahkovsky, Keith Finlay, Michael Darden, Brigham Frandsen, two anonymous referees, and participants at several seminars and conferences where earlier versions of this paper were presented for valuable comments and discussions. I also thank Yang He for his helpful work as research assistant for this paper.

A Appendix

A.1 Simulations for Bias Control

Below we present simulation evidence of the small sample performance for the first order bias correction proposed by Calonico, Cattaneo, and Titiunik (2014) as implemented in the rdrobust package in R and the modified Bartalotti, Calhoun, and He (2017) wild bootstrap procedure which easily allows for higher order bias mitigation, as discussed on Section 3. To emphasize the bias mitigation aspect the DGPs in this exercise are chosen to exhibit moderate to severe bias. Specifically, we implement the DGPs described in Calonico, Cattaneo, and Titiunik (2014) and used by Bartalotti, Calhoun, and He (2017).

The first DGP is the same as described in Section 5 as DGP 2

  • DGP Bias 1: yi=0.48+1.27xi+7.18xi2+20.21xi3+21.54xi4+7.33xi5+uiif x<0,0.52+0.84xi3.00xi2+7.99xi39.01xi4+3.56xi5+uiif x0.The population ATE for this DGP is 0.04(= 0.52 − 0.48).

The second DGP is based on Ludwig and Miller’s (2007) analysis of the Head Start program. In this case, eligibility to grant-writing assistance was determined at the county level using historical poverty rate, with a sharp threshold that determines the provision of services.

  • DGP Bias 2: yi=3.71+2.30xi+3.28xi2+1.45xi3+0.23xi4+0.03xi5+uiif x<0,0.26+18.49xi54.81xi2+74.30xi345.02xi4+9.83xi5+uiif x0.

and the population ATE is −3.45(= 0.26 − 3.71).

Finally, for the third DGP, we use CCT’s modification of the first DGP, given by

  • DGP Bias 3: yi=0.48+1.27xi+3.59xi2+14.147xi3+23.694xi4+xi5+uiif x<0,0.52+0.84xi0.30xi2+2.397xi30.901xi4+xi5+uiif x0.

and the population ATE is again 0.04. CCT introduce this DGP because it has high curvature and local linear models are likely to exhibit high bias.

As the focus of this set of simulations is comparing the bias mitigation achieved by the first order bias correction (Calonico, Cattaneo, and Titiunik 2014) inspired by the usual shrinking bandwidth asymptotics, and the higher order bias correction using the modified bootstrap procedure prompted by the fixed bandwidth approach, all reported results are based on std. errors compatible with the fixed bandwidth asymptotics described on Section 3.

I follow a similar approach to the simulations as described in Bartalotti, Calhoun, and He (2017). I simulate 5000 samples from each of the three DGPs and calculate nominal 95% two-sided confidence intervals. We use 999 bootstrap replications to calculate the asymptotic distribution of the bias corrected estimator, and each of those replications uses an additional 500 replications to estimate the bias. The bandwidths are set equal to h and are fixed within each Monte Carlo replication, in line with the fixed-h thought experiment.

As it can be seem in Table 4 below, the bias mitigation obtained by the first order bias correction becomes less capable of capturing the bias distortion as the bandwidth moves away from zero. This is in line with the discussion in Section 3.1, since for larger bandwidths the bias being captured by the expansion in equation 4 becomes a less reliable approximation of the bias in the local support ˙. It is important to note that in the exercise described below I used the correct polynomial order (p = 5) for the bias estimation in the modified bootstrap, so it reflects a best case scenario in terms of approximating the bias in the local support of the running variable.

Table 4:

Bias Corrected Inference – Empirical Coverage of t-test (%) – Nominal Test Size: 5%.

DGP Bias 1
DGP Bias 2
DGP Bias 3
Homoskedastic
Heteroskedastic
Homoskedastic
Heteroskedastic
Homoskedastic
Heteroskedastic
h Bootstrap CCT Bootstrap CCT Bootstrap CCT Bootstrap CCT Bootstrap CCT Bootstrap CCT
0.1 95.7 92.1 96.1 92.5 95.7 94.0 93.1 94.2 95.7 92.0 95.9 92.8
0.2 95.2 93.5 94.4 92.2 95.2 93.6 91.8 93.5 95.2 93.7 94.6 92.6
0.3 93.8 92.5 94.8 93.0 93.8 81.2 92.1 79.7 93.8 94.3 95.0 94.1
0.4 95.3 87.2 94.4 90.0 95.3 36.2 92.2 25.1 95.3 92.8 94.7 94.0
0.5 93.7 79.2 94.5 81.7 93.7 2.1 93.2 1.2 93.7 93.2 95.2 94.2
0.6 95.2 72.0 95.5 77.3 95.2 0.1 93.2 0.0 95.2 87.3 96.9 93.2
0.7 95.0 73.1 94.9 80.2 95.0 0.0 93.8 0.0 95.0 66.5 96.4 92.8
0.8 94.8 87.2 95.6 88.3 94.8 0.0 94.2 0.0 94.8 27.9 96.7 92.7
0.9 94.3 93.3 95.8 93.5 94.3 0.0 95.0 0.0 94.3 9.6 96.8 92.5
1.0 95.1 90.9 95.8 93.1 95.1 0.0 94.6 0.0 95.1 4.3 96.4 92.0

These simulations reinforce the main point of this paper that the fixed-h asymptotic framework provides a robust and reliable alternative to the usual asymptotic approximations that rely on shrinking the bandwidth when dealing with practical implementation of RD designs. Furthermore, these approximations provide insight on the properties of the estimator that can lead to improved empirical practices, as exemplified by the modified wild bootstrap bias mitigation suggested in this section.

A.2 Additional Results for Empirical Example: Lee (2008)

Table 5:

Electoral outcomes and pre-determined election characteristics: 1948–1996 – Lee (2008).

Variable Nadaraya-Watson Estimator (p = 0)
Local Linear Estimator (p = 1)
Local Polynomial Estimator (p = 4)
All Margin ≤ 0.5 Margin ≤ 0.05 All Margin ≤ 0.5 Margin ≤ 0.05 All Margin ≤ 0.5 Margin ≤ 0.05
Democrat vote share t + 1 0.351 0.257 0.096 0.118 0.090 0.048 0.077 0.066 0.105
(fixed-h std. error) (0.0041) (0.0038) (0.0090) (0.0056) (0.0062) (0.0159) (0.0114) (0.0144) (0.0312)
[small-h std. error] [0.0041] [0.0038] [0.0090] [0.0068] [0.0071] [0.0180] [0.0167] [0.0179] [0.0447]
Democrat vote share t − 1 0.313 0.215 0.027 0.066 0.032 0.010 −0.004 0.002 0.104
(fixed-h std. error) (0.0042) (0.0040) (0.0106) (0.0059) (0.0070) (0.0214) (0.0134) (0.0177) (0.0431)
[small-h std. error] [0.0043] [0.0040] [0.0106] [0.0070] [0.0074] [0.0211] [0.0173] [0.0184] [0.0525]
Democrat win prob. t − 1 0.780 0.724 0.137 0.562 0.335 0.160 −0.0116 0.004 0.545
(fixed-h std. error) (0.0077) (0.0097) (0.0392) (0.0159) (0.0228) (0.0802) (0.0436) (0.0609) (0.1809)
[small-h std. error] [0.0076] [0.0097] [0.0391] [0.0146] [0.0181] [0.0782] [0.0342] [0.0440] [0.1944]
Democrat Political Exper. 3.551 3.246 0.673 2.459 1.101 0.584 0.035 0.223 1.627
(fixed-h std. error) (0.0657) (0.0796) (0.2070) (0.1073) (0.1412) (0.4202) (0.2537) (0.3491) (0.8309)
[small-h std. error] [0.0739] [0.0820] [0.2102] [0.1459] [0.1576] [0.4203] [0.3570] [0.3924] [1.0448]
Opposition Political Exper. −2.631 −2.458 −0.162 −1.676 −0.729 −0.417 0.131 0.145 −0.365
(fixed-h std. error) (0.0563) (0.0840) (0.1648) (0.0897) (0.1153) (0.3643) (0.2072) (0.2840) (0.8260)
[small-h std. error] [0.0492] [0.0608] [0.1651] [0.0966] [0.1171] [0.3291] [0.2363] [0.2913] [0.8189]
Democrat Electoral Exper. 3.480 3.200 0.673 2.405 1.083 0.552 0.0149 0.240 1.557
(fixed-h std. error) (0.0673) (0.0813) (0.2118) (0.1098) (0.1446) (0.4309) (0.2593) (0.3566) (0.8755)
[small-h std. error] [0.0750] [0.0835] [0.214] [0.1485] [0.1614] [0.4289] [0.3641] [0.4021] [1.0656]
Opposition Electoral Exper. −2.607 −2.415 −0.154 −1.623 −0.713 −0.440 0.122 0.120 −0.054
(fixed-h std. error) (0.0575) (0.0638) (0.1692) (0.0915) (0.1179) (0.3753) (0.2126) (0.2917) (0.8539)
[small-h std. error] [0.0505] [0.0622] [0.1693] [0.0993] [0.1201] [0.3372] [0.2434] [0.2989] [0.8394]
Observations 6558 4900 610 6558 4900 610 6558 4900 610

A.3 Technical Details

A.3.1 Additional Notation

In this appendix, all integrals are defined as Stieltjes integrals to accommodate potentially discrete and hybrid running variables (Ramanathan 1993). In the results presented in the main text, the following notation is employed.

Let, Γ+()=[γ0+()γp+()γp+()γ2p+()] , Δ+()=[δ0+()δp+()δp+()δ2p+()], e1=[100]; γj+=0k(u)ujdFo(x¯+uh); γj=(1)j0k(u)ujdFo(x¯uh); δj+=0k2(u)ujσ2(x¯+uh)dFo(x¯+uh); δj=(1)j0k2(u)ujσ2(x¯uh)dFo(x¯uh).

Also, for the results when nh, h → 0, Γ=[γ0γpγpγ2p], Δ=[δ0δpδpδ2p], γj=0k(u)ujdu, δj=0k2(u)ujdu.

A.3.2 Main Results

The proofs presented below follow closely from the insights and work in Porter (2003).

Proof of Theorem 1.

The local polynomial estimator is given by

α ^ p = α ^ p + α ^ p

note that,

α ^ p + = e 1 [ 1 n h i = 1 n k ( x i x ¯ h ) d i Z i Z i ] 1 [ 1 n h i = 1 n k ( x i x ¯ h ) d i Z i y i ] = e 1 D n + [ 1 n h i = 1 n k ( x i x ¯ h ) d i Z i y i ]

where Dn+=[1nhi=1nk(xix¯h)diZiZi]1. Similarly, for Dn=[1nhi=1nk(xix¯h)(1di)ZiZi]1

α ^ p = e 1 D n [ 1 n h i = 1 n k ( x i x ¯ h ) ( 1 d i ) Z i y i ]

Then,

α ^ p + = e 1 D n + [ 1 n h i = 1 n k ( x i x ¯ h ) d i Z i [ m ( x i ) + α d i + ε i ] ] = e 1 D n + [ 1 n h i = 1 n k ( x i x ¯ h ) d i Z i m ( x i ) + 1 n h i = 1 n k ( x i x ¯ h ) d i Z i α + 1 n h i = 1 n k ( x i x ¯ h ) d i Z i ε i ]

note that Zi=ZiZie1, e1e1=1 then

α ^ p + α = e 1 D n + [ 1 n h i = 1 n k ( x i x ¯ h ) d i Z i m ( x i ) ] + e 1 D n + [ 1 n h i = 1 n k ( x i x ¯ h ) d i Z i ε i ]

and similarly for α^p.Then

n h ( α ^ p α ) = n h ( α ^ p + α α ^ p ) = e 1 D n + { n h [ 1 n h i = 1 n k ( x i x ¯ h ) d i Z i m ( x i ) ] + [ 1 n h i = 1 n k ( x i x ¯ h ) d i Z i ε i ] } e 1 D n { n h [ 1 n h i = 1 n k ( x i x ¯ h ) ( 1 d i ) Z i m ( x i ) ] + + [ 1 n h i = 1 n k ( x i x ¯ h ) ( 1 d i ) Z i ε i ] }

For the denominator terms Dn+ and Dn−,

D n + 1 = 1 n h i = 1 n k ( x i x ¯ h ) d i Z i Z i

and each element of this matrix is given by

[ D n + 1 ] l , j = 1 n h i = 1 n k ( x i x ¯ h ) d i ( x i x ¯ h ) j + l 2

which has asymptotic variance converging to zero since

V a r ( [ D n + 1 ] j , l ) = 1 ( n h ) 2 V a r ( i = 1 n k ( x i x ¯ h ) d i ( x i x ¯ h ) j + l 2 ) = 1 n h 2 V a r ( k ( x i x ¯ h ) d i ( x i x ¯ h ) j + l 2 ) 1 n h x ¯ x ¯ + h 1 h k 2 ( x x ¯ h ) ( x x ¯ h ) 2 ( j + l 2 ) d F o ( x )

Note that the terms in the integral and the integral itself are O(1) and 1nh=o(1). Hence, Var([Dn+1]l,j)0. Now, using the usual transformation u=xx¯h

[ D n + 1 ] l , j = E { [ D n + 1 ] l , j } + o p ( 1 ) = 1 n h E [ i = 1 n k ( x i x ¯ h ) d i ( x i x ¯ h ) j + l 2 ] + o p ( 1 ) = 0 k ( u ) u j + l 2 d F o ( x ¯ + u h ) + o p ( 1 )

Let, γj+=0k(u)ujdFo(x¯+uh) and Γ+ is the (p+1)×(p+1) matrix with (j, l) element γj+l2+ for j,l=1,,p+1. Then, Dn+p(Γ+)1 and Dnp(Γ)1, where Γ is the (p+1)×(p+1) matrix with (j, l) element γj+l2 for j,l=1,,p+1, and γj=(1)j0k(u)ujdFo(x¯uh).

Now we will derive the asymptotic distribution of 1nhi=1nk(xix¯h)diZiεi. Following Porter (2003) I use the Cramer-Wold device to derive the asymptotic distribution. Let λ be a nonzero, finite vector. Then,

E [ | 1 n h i = 1 n k ( x i x ¯ h ) d i λ Z i ε i | 2 + ζ ] = ( 1 n h ) ζ 2 1 h E [ | k ( x x ¯ h ) | 2 + ζ d | l = 1 p λ l ( x x ¯ h ) l | 2 + ζ E [ | ε | 2 + ζ X = x ] ] ( 1 n h ) ζ 2 1 h sup x { E [ | ε | 2 + ζ X = x ] } E [ | k ( x x ¯ h ) | 2 + ζ d l = 1 p | λ l ( x x ¯ h ) l | 2 + ζ ] = ( 1 n h ) ζ 2 1 h sup x { E [ | ε | 2 + ζ X = x ] } x ¯ x ¯ + h | k ( x x ¯ h ) | 2 + ζ l = 1 p | λ l ( x x ¯ h ) l | 2 + ζ d F o ( x ) = ( 1 n h ) ζ 2 O ( 1 ) O ( 1 ) = o ( 1 )

then, 1nhi=1nk(xix¯h)diZiεi follows a CLT. Note that,

E [ 1 n h i = 1 n k ( x i x ¯ h ) d i Z i ε i ] = 0

and

V a r [ 1 n h i = 1 n k ( x i x ¯ h ) d i Z i ε i ] = 1 h E [ k 2 ( x i x ¯ h ) d i Z i Z i ε i 2 ] = x ¯ x ¯ + h 1 h k 2 ( x x ¯ h ) Z Z σ 2 ( x ) d F o ( x )

It helps to remember that ZiZi is a function of the x,

Z i Z i = [ 1 ( x i x ¯ h ) ( x i x ¯ h ) p ( x i x ¯ h ) ( x i x ¯ h ) 2 ( x i x ¯ h ) p + 1 ( x i x ¯ h ) p ( x i x ¯ h ) p + 1 ( x i x ¯ h ) 2 p ]

Let δj+=x¯x¯+h1hk2(xx¯h)(xx¯h)jσ2(x)dFo(x)=0k2(u)ujσ2(x¯+uh)dFo(x¯+uh)du and Δ+ is the (p+1)×(p+1) matrix with (j, l) element δj+l2+ for j,l=1,,p+1. Then,

1 n h i = 1 n k ( x i x ¯ h ) d i Z i ε i p N ( 0 , Δ + )

Similarly we can show that

1 n h i = 1 n k ( x i x ¯ h ) ( 1 d i ) Z i ε i p N ( 0 , Δ )

where Δ is the (p+1)×(p+1) matrix with (j, l) element δj+l2 for j,l=1,,p+1, and δj=x¯hx¯1hk2(xx¯h)(xx¯h)jσ2(x)dFo(x)=(1)j0k2(u)ujσ2(x¯uh)dFo(x¯uh).

The bias term is given by

n h e 1 { D n + [ 1 n h i = 1 n k ( x i x ¯ h ) d i Z i m ( x i ) ] D n [ 1 n h i = 1 n k ( x i x ¯ h ) ( 1 d i ) Z i m ( x i ) ] }

Note that,

E [ 1 n h i = 1 n k ( x i x ¯ h ) d i Z i m ( x i ) ] = E [ 1 h k ( x i x ¯ h ) d i Z i m ( x i ) ] = 0 k ( u ) Z ( x ¯ + u h ) m ( x ¯ + u h ) d F o ( x ¯ + u h )

and similarly, below the cutoff. Hence, the bias term can be approximated by,

e 1 { ( Γ + ) 1 [ 0 k ( u ) Z ( x ¯ + u h ) m ( x ¯ + u h ) d F o ( x ¯ + u h ) ] ( Γ ) 1 [ 0 k ( u ) Z ( x ¯ u h ) m ( x ¯ u h ) d F o ( x ¯ u h ) ] }

There are two noteworthy cases in which the formulas for the fixed-h asymptotic variance and bias simplify to those of the small-h approximation. First, when h → 0, the fixed-h formulas for the asymptotic variance and bias approach the small-h approximation formulas.

Corollary 1

If, in the bandwidth around the cutoff, fo(x), σ2(x) and m(x) are continuous, then

lim h 0 V f i x e d - h = V s m a l l - h lim h 0 B f i x e d - h = B s m a l l - h

Proof of Corollary 1.

First, note that, if h → 0,

(11) γ j + = lim h 0 0 k ( u ) u o j f o ( x ¯ + u h ) d u = f o ( x ¯ ) 0 k ( u ) u j d u = f o ( x ¯ ) γ j

and

(12) δ j + = lim h 0 0 k 2 ( u ) u j σ 2 ( x ¯ + u h ) f o ( x ¯ + u h ) d u = σ 2 + ( x ¯ ) f o ( x ¯ ) 0 k 2 ( u ) u j d u = σ 2 + ( x ¯ ) f o ( x ¯ ) δ j

and similarly for γj and δj. Then, for the variance,

lim h 0 ( Γ + ) 1 Δ + ( Γ + ) 1 + ( Γ ) 1 Δ ( Γ ) 1 = σ 2 + ( x ¯ ) + σ 2 ( x ¯ ) f o ( x ¯ ) e 1 Γ 1 Δ Γ 1 e 1

For the bias, if we approximate m(x¯+uh)=m(x) just above m(x¯):

m ( x ) = m ( x ¯ ) + m + ( x ¯ ) ( x x ¯ ) + + 1 p ! m ( p ) + ( x ¯ ) ( x x ¯ ) p + 1 ( p + 1 ) ! m ( p + 1 ) + ( x ¯ ) ( x x ¯ ) p + 1 + o ( h p + 1 )

and similarly for approximating m(x) just below the cutoff. When we evaluate the linear projection of m(x) on Z(x) at x¯, we get the intercept m(x¯) and the “residual” as described above. A helpful fact is that, by the definition of Z(x),

(13) 0 k ( u ) Z ( x ¯ + u h ) u p + 1 d u = [ γ p + 1 γ 2 p + 1 ]

(14) 0 k ( u ) Z ( x ¯ u h ) u p + 1 d u = [ γ p + 1 ( 1 ) p γ 2 p + 1 ]

Note that Γ1[γp+1γ2p+1] is equal both above and below the cutoff. The bias formula in Theorem 1 is given by

(15) e 1 { ( Γ + ) 1 [ 0 k ( u ) Z ( x ¯ + u h ) m ( x ¯ + u h ) f o ( x ¯ + u h ) d u ] ( Γ ) 1 [ 0 k ( u ) Z ( x ¯ u h ) m ( x ¯ u h ) f o ( x ¯ u h ) d u ] }

as discussed in Section 3 in the main text the main term is just the difference between the intercepts of the linear projections of k(u)m(x) on k(u)Z(x) in the bandwidth above below the cutoff, which is equal to the linear projections evaluated at x¯. Hence, plugging the bias formula for the linear projection, formula (15) can be written as

e 1 { [ ( Γ + ) 1 0 k ( u ) Z ( x ¯ + u h ) ( 1 ( p + 1 ) ! m ( p + 1 ) + ( x ¯ ) ( u h ) p + 1 + o ( h p + 1 ) ) f o ( x ¯ + u h ) d u ] [ ( Γ ) 1 0 k ( u ) Z ( x ¯ u h ) ( ( 1 ) p + 1 ( p + 1 ) ! m ( p + 1 ) ( x ¯ ) ( u h ) p + 1 + o ( h p + 1 ) ) f o ( x ¯ u h ) d u ] } = h p + 1 ( p + 1 ) ! e 1 { { m ( p + 1 ) + ( x ¯ ) [ ( Γ + ) 1 0 k ( u ) Z ( x ¯ + u h ) u p + 1 f o ( x ¯ + u h ) d u ] ( 1 ) p + 1 m ( p + 1 ) ( x ¯ ) [ ( Γ ) 1 0 k ( u ) Z ( x ¯ u h ) u p + 1 f o ( x ¯ u h ) d u ] } + o ( 1 ) }

If h → 0, using the equalities in equations (11), (12), (13) and (14),

= lim h 0 ( h p + 1 ) ( p + 1 ) ! [ m ( p + 1 ) + ( x ¯ ) ( 1 ) p + 1 m ( p + 1 ) ( x ¯ ) ] e 1 Γ 1 [ γ p + 1 γ 2 p + 1 ]

which is the same limit of the bias term in the small-h approximation. ■

Hence, if h is small, fixed-h and small-h provide similar approximations to the asymptotic behavior of α^.

Secondly, if fo(x) and σ2(x) are constant around the cutoff and m(x) can be exactly approximated by a polynomial of order p + 1, the fixed-h asymptotic variance and bias approximations simplify to the small-h asymptotic formulas.

Corollary 2

If, in the bandwidth around the cutoff, fo(x), σ2(x) are constant and m(x) can be exactly approximated by an expansion of order p + 1, then the asymptotic variance and bias of nh(α^pα) obtained by fixed-h (Theorem 1) and small-h (Porter 2003) are the same.

V f i x e d - h = V s m a l l - h B f i x e d - h = B s m a l l - h

Proof of Corollary 2.

First, note that, if h > 0 and, in the bandwidth around the cutoff, fo(x)=fo(x¯), σ2(x)=σ2(x¯) and

m ( x ) = m ( x ¯ ) + m + ( x ¯ ) ( x x ¯ ) + + 1 p ! m ( p ) + ( x ¯ ) ( x x ¯ ) p + 1 ( p + 1 ) ! m ( p + 1 ) + ( x ¯ ) ( x x ¯ ) p + 1

then,

(16) γ j + = 0 k ( u ) u j f o ( x ¯ + u h ) d u = f o ( x ¯ ) 0 k ( u ) u j d u = f o ( x ¯ ) γ j

and

(17) δ j + = 0 k 2 ( u ) u j σ 2 ( x ¯ + u h ) f o ( x ¯ + u h ) d u = σ 2 + ( x ¯ ) f o ( x ¯ ) 0 k 2 ( u ) u j d u = σ 2 + ( x ¯ ) f o ( x ¯ ) δ j

and similarly for γj and δj. Then, for the variance,

( Γ + ) 1 Δ + ( Γ + ) 1 + ( Γ ) 1 Δ ( Γ ) 1 = σ 2 + ( x ¯ ) + σ 2 ( x ¯ ) f o ( x ¯ ) e 1 Γ 1 Δ Γ 1 e 1

For the bias, the strategy is basically the same as in the proof of corollary 1:

m ( x ) = m ( x ¯ ) + m + ( x ¯ ) ( x x ¯ ) + + 1 p ! m ( p ) + ( x ¯ ) ( x x ¯ ) p + 1 ( p + 1 ) ! m ( p + 1 ) + ( x ¯ ) ( x x ¯ ) p + 1

Once again the bias formula in Theorem 1 is given by

e 1 { ( Γ + ) 1 [ 0 k ( u ) Z ( x ¯ + u h ) m ( x ¯ + u h ) f o ( x ¯ + u h ) d u ] ( Γ ) 1 [ 0 k ( u ) Z ( x ¯ u h ) m ( x ¯ u h ) f o ( x ¯ u h ) d u ] }

Plugging the bias formula for the linear projection:

e 1 { [ ( Γ + ) 1 0 k ( u ) Z ( x ¯ + u h ) ( 1 ( p + 1 ) ! m ( p + 1 ) + ( x ¯ ) ( u h ) p + 1 ) f o ( x ¯ + u h ) d u ] [ ( Γ ) 1 0 k ( u ) Z ( x ¯ u h ) ( ( 1 ) p + 1 ( p + 1 ) ! m ( p + 1 ) ( x ¯ ) ( u h ) p + 1 ) f o ( x ¯ u h ) d u ] } = h p + 1 ( p + 1 ) ! e 1 { [ ( Γ + ) 1 0 k ( u ) Z ( x ¯ + u h ) m ( p + 1 ) + ( x ¯ ) u p + 1 f o ( x ¯ + u h ) d u ] [ ( Γ ) 1 0 k ( u ) Z ( x ¯ u h ) ( ( 1 ) p + 1 m ( p + 1 ) ( x ¯ ) u p + 1 ) f o ( x ¯ u h ) d u ] } = h p + 1 ( p + 1 ) ! e 1 { m ( p + 1 ) + ( x ¯ ) [ ( Γ + ) 1 0 k ( u ) Z ( x ¯ + u h ) u p + 1 f o ( x ¯ + u h ) d u ] ( 1 ) p + 1 m ( p + 1 ) ( x ¯ ) [ ( Γ ) 1 0 k ( u ) Z ( x ¯ u h ) u p + 1 f o ( x ¯ u h ) d u ] }

Using fo(x)=fo(x¯) and the equalities in formulas (16), (17), (13) and (14),

= h p + 1 ( p + 1 ) ! [ m ( p + 1 ) + ( x ¯ ) ( 1 ) p + 1 m ( p + 1 ) ( x ¯ ) ] e 1 Γ 1 [ γ p + 1 γ 2 p + 1 ]

A.3.3 Fuzzy Regression Discontinuity Design

Proof of Theorem Theorem 2.

The covariance between the estimators for the outcome of interest and the treatment probability will be given by two independent terms, one for each side of the threshold. The upper side is given by

E { e 1 D n + [ 1 n h i = 1 n k ( x i x ¯ h ) d i Z i y i ] [ 1 n h i = 1 n k ( x i x ¯ h ) d i Z i t i ] D n + e 1 }

where ti is the dummy variable indicating that the observation has received treatment. In obtaining the asymptotic covariance, the bias term of the estimator can be ignored, hence

E [ e 1 D n + ( 1 n h i = 1 n k ( x i x ¯ h ) d i Z i ε i ) ( 1 n h i = 1 n k ( x i x ¯ h ) d i Z i η i ) D n + e 1 ] = E [ e 1 D n + ( 1 n h i = 1 n k ( x i x ¯ h ) 2 d i Z i Z i E [ ε i η i X = x ] ) D n + e 1 ] = e 1 D n + [ x ¯ x ¯ + h 1 h k ( x x ¯ h ) 2 Z Z σ ε η ( x ) d F o ( x ) ] D n + e 1

where I used the assumption that E[εiηjX=x]=0 for ji.

Similarly for the second term,

E [ e 1 D n ( 1 n h i = 1 n k ( x i x ¯ h ) ( 1 d i ) Z i ε i ) ( 1 n h i = 1 n k ( x i x ¯ h ) ( 1 d i ) Z i η i ) D n e 1 ] = e 1 D n [ x ¯ h x ¯ 1 h k ( x x ¯ h ) 2 Z Z σ ε η ( x ) d F o ( x ) ] D n e 1

Let ρj+=x¯x¯+h1hk2(xx¯h)(xx¯h)jσεη(x)dFo(x)=0k2(u)ujσεη(x¯+uh)dFo(x¯+uh), Δ+ρ is the (p+1)×(p+1) matrix with (j, l) element ρj+l2+ for j,l=1,,p+1, ρj=x¯hx¯1hk2(xx¯h)(xx¯h)jσεη(x)dFo(x)=(1)j0k2(u)ujσεη(x¯uh)dFo(x¯uh) and Δρ is the (p+1)×(p+1) matrix with (j, l) element ρj+l2 for j,l=1,,p+1. Then the asymptotic covariance is given by

C α θ = e 1 [ ( Γ + ) 1 Δ + ρ ( Γ + ) 1 + ( Γ ) 1 Δ ρ ( Γ ) 1 ] e 1

Similarly to the result in Corollary 1, as h → 0, the fixed-h covariance formulas converges to the small-h asymptotic covariance.

Corollary 3

If, in the bandwidth around the cutoff, σεη(x) is continuous and h ⟶ 0, then the asymptotic covariance, Cαθ, obtained by fixed-h (Theorem 2) and small-h (Porter 2003) are the same:

lim h 0 C α θ = σ ε η + ( x ¯ ) + σ ε η ( x ¯ ) f o ( x ¯ ) e 1 Γ 1 Δ Γ 1 e 1

Proof of Corollary 3.

Using the results in equation 11 and noting that, if h → 0

ρ j + = lim h 0 0 k 2 ( u ) u j σ ε η ( x ¯ + u h ) f o ( x ¯ + u h ) d u = σ ε η + ( x ¯ ) f o ( x ¯ ) δ j

and similarly for ρj. Then,

lim h 0 e 1 [ ( Γ + ) 1 Δ + ρ ( Γ + ) 1 + ( Γ ) 1 Δ ρ ( Γ ) 1 ] e 1 = e 1 [ ( f o ( x ¯ ) Γ ) 1 σ ε η + ( x ¯ ) f o ( x ¯ ) Δ ( f o ( x ¯ ) Γ ) 1 + ( f o ( x ¯ ) Γ ) 1 σ ε η ( x ¯ ) f o ( x ¯ ) Δ ( f o ( x ¯ ) Γ ) 1 ] e 1 = σ ε η + ( x ¯ ) + σ ε η ( x ¯ ) f o ( x ¯ ) e 1 Γ 1 Δ Γ 1 e 1

Also, a result similar to the Corollary 2 is readily available.

Corollary 4

If in the bandwidth around the cutoff, fo(x) and σεη(x) are constant, then the asymptotic covariance, Cαθ, obtained by fixed-h (Theorem 2) and small-h (Porter 2003) are the same.

C α θ = σ ε η + ( x ¯ ) + σ ε η ( x ¯ ) f o ( x ¯ ) e 1 Γ 1 Δ Γ 1 e 1

Proof of Corollary 4.

The proof follows very closely Corollary 3. Using the results in equation 11 and noting that, if h > 0 and fo(x)=fo(x¯) and σεη(x)=σεη(x¯) for any x in the range around the cutoff

ρ j + = 0 k 2 ( u ) u j σ ε η ( x ¯ + u h ) f o ( x ¯ + u h ) d u = σ ε η + ( x ¯ ) f o ( x ¯ ) δ j

and similarly for ρj. Then,

e 1 [ ( Γ + ) 1 Δ + ρ ( Γ + ) 1 + ( Γ ) 1 Δ ρ ( Γ ) 1 ] e 1 = e 1 [ ( f o ( x ¯ ) Γ ) 1 σ ε η + ( x ¯ ) f o ( x ¯ ) Δ ( f o ( x ¯ ) Γ ) 1 + ( f o ( x ¯ ) Γ ) 1 σ ε η ( x ¯ ) f o ( x ¯ ) Δ ( f o ( x ¯ ) Γ ) 1 ] e 1 = σ ε η + ( x ¯ ) + σ ε η ( x ¯ ) f o ( x ¯ ) e 1 Γ 1 Δ Γ 1 e 1

References

Andrews, D. 1991. “Asymptotic Normality of Series Estimators for Nonparametric and Semiparametric Regression Models.” Econometrica 59 (2): 307–345.10.2307/2938259Search in Google Scholar

Angrist, J., and J. Pischke. 2009. Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton, NJ: Princeton University Press.10.1515/9781400829828Search in Google Scholar

Barreca, A., J. Lindo, and G. Waddell. 2016. “Heaping-Induced Bias in Regression Discontinuity Designs.” Economic Inquiry 54 (1): 268–293.10.3386/w17408Search in Google Scholar

Bartalotti, O., and Brummet, Q. 2017. “Regression Discontinuity Designs with Clustered Data.” In Regression discontinuity designs: Theory and applications, 383-420. Emerald Publishing Limited.10.1108/S0731-905320170000038017Search in Google Scholar

Bartalotti, O., Q. Brummet, and S. Dieterle. 2017. “A Correction for Regression Discontinuity Designs with Group-Specific Mismeasurement of the Running Variable.” Working Paper, Department of Economics, Iowa State University.Search in Google Scholar

Bartalotti, O., G. Calhoun, and Y. He. 2017. “Bootstrap Confidence Intervals for Sharp Regression Discontinuity Designs.” In Regression Discontinuity Designs: Theory and Applications. Advances in Econometrics, Vol. 38, edited by M. Cattaneo and J. Escanciano, 421–453. Bingley: Emerald Publishing.10.1108/S0731-905320170000038018Search in Google Scholar

Bester, C. A., T. G. Conley, C. B. Hansen, and T. J. Vogelsang. 2016. “Fixed-b Asymptotics for Spatially Dependent Robust Nonparametric Covariance Matrix Estimators.” Econometric Theory 32: 154–186.10.1017/S0266466614000814Search in Google Scholar

Calonico, S., M. Cattaneo, and M. Farrell. Forthcoming. “On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference.” Journal of the American Statistical Association.10.1080/01621459.2017.1285776Search in Google Scholar

Calonico, S., M. Cattaneo, and R. Titiunik. 2014. “Robust Nonparametric Confidence Intervals for Regression Discontinuity Designs.” Econometrica 82 (6): 2295–2326.10.3982/ECTA11757Search in Google Scholar

Calonico, S., M. Cattaneo, M. Farrell, and R. Titiunik. 2017. “rdrobust: Software for Regression-Discontinuity Designs.” Stata Journal 17 (2): 372–404.10.1177/1536867X1701700208Search in Google Scholar

Cattaneo, M., B. Frandsen, and R. Titiunik. 2015. “Randomization Inference in the Regression Discontinuity Design: An Application to Party Advantages in the U.S. Senate.” Journal of Causal Inference 3 (1): 1–24.10.1515/jci-2013-0010Search in Google Scholar

Caughey, D., and J. Sekhon. 2011. “Elections and the Regression Discontinuity Design: Lessons from Close U.S. House Races, 1942–2008.” Political Analysis 19: 385–408.10.1093/pan/mpr032Search in Google Scholar

Chen, X., Z. Liao, and Y. Sun. 2014. “Sieve Inference on Possibly Misspecified Semi-nonparametric Time Series Models.” Journal of Econometrics 178 (3): 639–658.10.1016/j.jeconom.2013.10.002Search in Google Scholar

Fan, J., and I. Gijbels. 1996. Local Polynomial Modelling and Its Applications. New York, NY: Chapman & Hall.Search in Google Scholar

Dong, Y. 2015. “Regression Discontinuity Applications with Rounding Errors in the Running Variable.” Journal of Applied Econometrics 30 (3): 422–446.10.1002/jae.2369Search in Google Scholar

Fan, Y. 1998. “Goodness-of-Fit Tests Based on Kernel Density Estimators with Fixed Smoothing Parameters.” Econometric Theory 14: 604–621.10.1017/S0266466698145036Search in Google Scholar

Hahn, J., P. Todd, and W. van der Klaauw. 1999. “Evaluating the Effect of an Antidiscrimination Law Using a Regression-Discontinuity Design.” NBER Working Paper Series, n. 7131.10.3386/w7131Search in Google Scholar

Hahn, J., P. Todd, and W. van der Klaauw. 2001. “Identification and Estimation of Treatment Effects with a Regression-Discontinuity Design.” Econometrica 69: 201–209.10.1111/1468-0262.00183Search in Google Scholar

Hashimzade, N., and T. Vogelsang. 2008. “Fixed-b Asymptotic Approximation of the Sampling Behaviour of Nonparametric Spectral Density Estimators.” Journal of Time Series Analysis 29 (1): 142–162.10.1111/j.1467-9892.2007.00548.xSearch in Google Scholar

Imbens, G., and T. Lemieux. 2008. “Regression Discontinuity Designs: A Guide to Practice.” Journal of Econometrics 142 (2): 615–635.10.1016/j.jeconom.2007.05.001Search in Google Scholar

Imbens, G., and K. Kalyanaraman. 2012. “Optimal Bandwidth Choice for the Regression Discontinuity Estimator.” Review of Economic Studies 79 (3): 933–959.10.1093/restud/rdr043Search in Google Scholar

Kim, M., Y. Sun, and J. Yang. 2017. “A Fixed-Bandwidth View of the Pre-asymptotic Inference for Kernel Smoothing with Time Series Data.” Journal of Econometrics 197 (2): 298–322.10.1016/j.jeconom.2016.11.008Search in Google Scholar

Lee, D. 2008. “Randomized Experiments from Non-random Selection in U.S. House Elections.” Journal of Econometrics 142 (2): 675–697.10.1016/j.jeconom.2007.05.004Search in Google Scholar

Lee, D., and D. Card. 2008. “Regression Discontinuity Inference with Specification Error.” Journal of Econometrics 142 (2): 655–674.10.3386/t0322Search in Google Scholar

Lee, D., and T. Lemieux. 2009. “Regression Discontinuity Designs in Economics.” NBER Working Paper Series, n. 14723.10.3386/w14723Search in Google Scholar

Ludwig, J., and D. L. Miller. (2007). “Does Head Start Improve Children’s Life Chances? Evidence from a Regression Discontinuity Design.” The Quarterly journal of economics 122 (1): 159–208.10.3386/w11702Search in Google Scholar

McCrary, J. 2008. “Manipulation of the Running Variable in the Regression Discontinuity Design: A Density Test.” Journal of Econometrics 142 (2): 698–714.10.3386/t0334Search in Google Scholar

Neave, H. 1970. “An Improved Formula for the Asymptotic Variance of Spectrum Estimates.” The Annals of Mathematical Statistics 41: 70–77.10.1214/aoms/1177697189Search in Google Scholar

Pagan, A., and A. Ullah. 1999. Nonparametric Econometrics. Cambridge: Cambridge University Press.10.1017/CBO9780511612503Search in Google Scholar

Porter, J. 2003. “Estimation in the Regression Discontinuity Model.” Unpublished Working Paper. Department of Economics, University of Wisconsin. Manuscript date: May 7, 2003.Search in Google Scholar

Ramanathan, R. 1993. Statistical Methods in Econometrics. San Diego, CA: Academic Press.Search in Google Scholar

Stock, J., and M. Yogo. 2005. “Asymptotic Distributions of Instrumental Variables Statistics with Many Instruments.” In Identification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg. 1st ed., edited by D. Andrews and J. Stock, 109–120. Cambridge: Cambridge University Press.10.1017/CBO9780511614491.007Search in Google Scholar

Sun, Y. 2014. “Let’s Fix It: Fixed-b Asymptotics Versus Small-b Asymptotics in Heteroskedasticity and Autocorrelation Robust Inference.” Journal of Econometrics 178 (3): 659–677.10.1016/j.jeconom.2013.10.001Search in Google Scholar

White, H. 1982. “Maximum Likelihood Estimation of Misspecified Models.” Econometrica 50: 1–24.10.2307/1912526Search in Google Scholar

White, H. 1996. Estimation Inference and Specification Analysis. Cambridge: Cambridge University Press.Search in Google Scholar

Published Online: 2018-02-21

©2019 Otávio Bartalotti, published by De Gruyter, Berlin/Boston

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 27.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/jem-2016-0007/html
Scroll to top button