Home Data-Adaptive Causal Effects and Superefficiency
Article Open Access

Data-Adaptive Causal Effects and Superefficiency

  • Peter M. Aronow EMAIL logo
Published/Copyright: November 11, 2016
Become an author with De Gruyter Brill

Abstract

Recent approaches in causal inference have proposed estimating average causal effects that are local to some subpopulation, often for reasons of efficiency. These inferential targets are sometimes data-adaptive, in that they are dependent on the empirical distribution of the data. In this short note, we show that if researchers are willing to adapt the inferential target on the basis of efficiency, then extraordinary gains in precision can potentially be obtained. Specifically, when causal effects are heterogeneous, any asymptotically normal and root-n consistent estimator of the population average causal effect is superefficient for a data-adaptive local average causal effect.

1 Introduction

When causal effects are heterogeneous, then inferences depend on the population for which causal effects are estimated. Although population average causal effects have traditionally been the inferential targets, recent results have focused on estimating average causal effects that are local to some subpopulation for reasons of efficiency. These approaches include trimming observations based on the distribution of the propensity score [1], using regression adjustment to estimate reweighted causal effects [2, 3, 4], or implementing calipers for propensity-score matching [5, 6]. In some cases, the target parameter is dependent on the empirical distribution of the data, including cases where the researcher is explicitly conducting inference on, e. g., the average treatment effect among the treated conditional on the observed covariate distribution [7], or other causal sample functionals [8, 9], without revision to the estimator being used.

These approaches privilege efficiency in estimation over targeting population average causal effects, and often allow for the target to be defined on the basis of the observed data. We provide an example of how these approaches, taken to their extreme, can provide extraordinary gains in statistical certainty. We consider the case of a data-adaptive target parameter [10] that is allowed to vary with the data depending on which subpopulation’s local average causal effect is best estimated. When treatment effects are heterogeneous, adaptively changing the target parameter on the basis of efficiency yields an unusual result: if the population average causal effect can be consistently estimated with a root-n consistent and asymptotically normal estimator θˆ, then the same estimator θˆ is always superefficient (i. e., faster than root-n consistent) for a data-adaptive local average causal effect. Furthermore, with an additional regularity condition on mean square convergence, we show that the mean square error of θˆ for a data-adaptive local average causal effect is of o(n1).

2 Results

Consider a full data probability distribution G with an associated causal effect distribution τ with finite expectation EG[τ], where EG[.] denotes the expectation over the distribution G. Further denote the support of the distribution of τ as SuppG[τ]. We impose a regularity condition on τ establishing non-degeneracy of τ.

Assumption 1:

(Effect heterogeneity). min(supSuppG[τ])EG[τ],EG[τ],inf(SuppG[τ])=c>0

Assumption 1 is equivalent to assuming that causal effects are not constant across observations in the distribution G; i. e., causal effects are heterogeneous.

We do not observe the full data probability distribution G, but we observe an empirical distribution Fn. Suppose that, using Fn, we have a root-n consistent and asymptotically normal estimator of the average causal effect EG[τ], θˆ.

Definition 1:

An estimatorθˆis root-nconsistent and asymptotically normal forθ0ifn(θˆθ0)=N(0,σ2)+op(1), for some0<σ2<.

We now define the target parameter, θFn.

Definition 2:

Let the target parameter

θFn={θˆ:|θˆEG[τ]|cEG[τ]+c:θˆEG[τ]>cEG[τ]c:θˆEG[τ]<c,

where, as in Assumption 1, c=minsupSupp1G[τ]EG[τ],EG[τ]infSupp1ptG[τ].

The target parameter adapts naturally to the closest value in an interval surrounding EG[τ], where the width of the interval is defined by the support of τ. We formalize how each θFn is a local average treatment effect.

Proposition 1:

There exists a nonnegative weighting associated with each empirical distributionFn, wFn, such that across allFn, θFn=EG[wFnτ]EG[wFn].

A proof of Proposition 1 follows directly from the fact that a weighted mean can obtain any value in the interval defined by the infimum and supremum of its distribution’s support. Proposition 1 asserts that across all realizations, the target parameter θFn corresponds to an average causal effect for at least one subpopulation. (There in fact may be infinitely many subpopulations to which θFn corresponds.) The composition of the subpopulation(s) associated with each θFn is not directly knowable by the researcher and may vary across realizations of the data.

However, mirroring results on other data-adaptive parameters under random sampling, including the sample average causal effect, the target parameter θFn will converge to the average causal effect EG[τ] at root-n rate. Proposition 2 proves that the data-adaptive local average causal effect is asymptotically equivalent to the average causal effect, and establishes its rate of convergence.

Proposition 2:

Suppose thatθˆis a root-nconsistent and asymptotically normal estimator ofEG[τ]. Thenn(θFnEG[τ])=Op(1).

A proof of Proposition 2 follows by noting that n(θˆEG[τ])=Op(1) and that across every realization, |θFnEG[τ]||θˆEG[τ]|.

We now turn to our primary result, proving the superefficiency of θˆ in estimating θFn.

Proposition 3:

Suppose that Assumption 1 holds and thatθˆis a root-nconsistent and asymptotically normal estimator ofEG[τ]. Thenn(θˆθFn)=op(1).

Proof:

Decompose θˆ into θ˜=N(EG[τ],σ2/n) and u=op(n1/2), so that θˆ=θ˜+u. Since (θ˜θFn) is op(an) for any positive sequence (an), the rate of convergence of θˆ is at worst governed by the bound ensured by u’s op(n1/2) convergence. To prove the claim, note that for any positive ε, Pr|θ˜θFn|anεPrθ˜θFn0=2Φ(cn/σ), where Φ(.) denotes the standard Normal CDF. Since limn2Φ(cn/σ)=0, (θ˜θFn) is op(an). Thus θˆθFn=op(an)+op(n1/2)=op(n1/2), yielding the result. □

In short, Proposition 3 demonstrates that the probability that θˆ falls inside the support of the effect distribution converges to one quickly; conditional on this event, then estimation error is zero (as the target parameter takes on the value as the estimator with probability one). To illustrate this result, we can consider a case where an interval defined by the support of the effect distribution encompasses the sampling distribution of the estimator.

Corollary 1:

Suppose thatcmaxsupSupp[θˆ]EG[τ],EG[τ]infSupp[θˆ]. ThenPr(θˆ=θFn)=1.

A proof of Corollary 1 follows by noting that Pr(|θˆEG[τ]|c)=0, and applying Definition 2. In other words, if the support of the estimator being used lies entirely within the interval [EG[τ]c,EG[τ]+c], then estimation error is always zero. This condition necessarily holds if SuppG[τ]=ℝ, then the value that any estimator θˆ takes must coincide with a local average causal effect. But note that Corollary 1 would not hold if SuppG[τ]=ℝ+ and Pr(θˆ<0)>0.

Our results can be generalized to stronger claims straightforwardly. When a regularity condition is imposed on the rate of convergence of θˆ to normality, a stronger result can be obtained about the rate of mean square convergence.

Proposition 4:

Suppose that Assumption 1 holds andθˆobeysn(θˆEG[τ])=N(0,σ2)+ε, whereE[ε2]=o(n1/2).ThenE[(θˆθFn)2]=o(n1).

Proof:

We will show that the mean square error of (θ˜θFn) converges to zero sufficiently quickly, implying that the rate of convergence of θˆ is at worst governed by the mean square error bound ensured by ε’s convergence rate. To obtain the rate of convergence of the mean square error of θ˜, we integrate over its squared deviation from the target parameter. Within c of EG[τ], the squared deviation is zero, thus we need only integrate over the squared deviation over the tails of the normal distribution. To ease calculations, we obtain an upper bound by integrating over the squared deviation from EG[τ], rather than from θFn:

EG[(θ˜θFn)2]2cx2nσ2πex2n2σ2=cσ2πec2n2σ2n+2σ2Φcnσn=o(n1).

Since E[(θ˜θFn)2]=o(n1) and n1/2E[ε2]=o(n1), the Cauchy-Schwarz inequality ensures that E[(θˆθFn)2]=o(n1)+o(n1)=o(n1).□

3 Discussion

Our results highlight the additional certainty obtained by data-adaptively choosing the population for which average causal effects are measured on the basis of efficiency. It is well known that efficiency gains may be obtained through data-adaptive inference. But the extent to which the researcher can benefit from such practice has been understated. Under treatment effect heterogeneity – a precondition for locality to be a concern – all root-n consistent and asymptotically normal estimators of the average treatment effect are superefficient for a local average treatment effect.

There is of course a cost to this superefficiency: the target parameter is likely not of intrinsic interest. This issue is not unique to our setting, and other methods that change the inferential target based on efficiency concerns may be subject to this critique. As Crump et al. ([1], p. 188) notes, “external validity may be lost by changing the focus to average treatment effects for a subset of the original sample.” This is exacerbated in our setting by the researcher’s lack of knowledge about the characteristics of the subpopulation under study. Our result represents an extreme case of privileging efficiency over targeting population average causal effects. However, our results provide insight into a potential pathology of data-adaptivity purely on efficiency concerns: the gains in statistical certainty may be essentially unbounded without further restrictions. We hope that future work in the domain of efficiency theory for data-adaptive parameters will consider classes of restrictions that would exclude the case considered here.

Acknowledgement

The author thanks Don Green, Cyrus Samii, Jas Sekhon, Mark van der Laan, and two anonymous reviewers for helpful comments. The author expresses particular gratitude to Jas Sekhon for suggesting a parsimonious proof strategy for Proposition 3 and to an anonymous reviewer for inspiring Corollary 1. All remaining errors are the author’s responsibility.

References

1. Crump RK, Hotz VJ, Imbens GW, Mitnik OA. Dealing with limited overlap in estimation of average treatment effects. Biometrika 2009.10.1093/biomet/asn055Search in Google Scholar

2. Humphreys M. Bounds on least squares estimates of causal effects in the presence of heterogeneous assignment probabilities Columbia University, 2009 Manuscript.Search in Google Scholar

3. Angrist JD, Pischke JS. Mostly harmless econometrics: An empiricist’s companion. Princeton, NJ: Princeton University Press, 2009.10.1515/9781400829828Search in Google Scholar

4. Aronow PM, Samii C. Does regression produce representative estimates of causal effects? Am J Pol Sci 2016;60(1):250–267.10.1111/ajps.12185Search in Google Scholar

5. Austin PC. Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies. Pharm Stat 2011;10(2):150–161.10.1002/pst.433Search in Google Scholar PubMed PubMed Central

6. Rosenbaum PR, Rubin DB. Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am Stat 1985;39(1):33–38.10.1017/CBO9780511810725.019Search in Google Scholar

7. Abadie A, Imbens G. Simple and bias-corrected matching estimators for average treatment effects. NBER technical working paper no. 283 2002.10.3386/t0283Search in Google Scholar

8. Aronow PM, Green DP, Lee DK. Sharp bounds on the variance in randomized experiments. Ann Stat 2014;42(3):850–871.10.1214/13-AOS1200Search in Google Scholar

9. Balzer LB, Petersen ML, van der Laan MJ. Targeted estimation and inference for the sample average treatment effect. Berkeley, CA: Bepress, 2015.Search in Google Scholar

10. van der Laan MJ, Hubbard AE, Pajouh SK. Statistical inference for data adaptive target parameters. Princeton, NJ: Bepress, 2013.Search in Google Scholar

Published Online: 2016-11-11
Published in Print: 2016-9-1

©2016 by De Gruyter

This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Downloaded on 6.10.2025 from https://www.degruyterbrill.com/document/doi/10.1515/jci-2016-0007/html
Scroll to top button