Data-Adaptive Causal Effects and Superefficiency

Peter M. Aronow

doi:10.1515/jci-2016-0007

40% Rabatt

auf Fachbücher bei De Gruyter Brill *

Artikel Open Access

Data-Adaptive Causal Effects and Superefficiency

Peter M. Aronow

Veröffentlicht/Copyright: 11. November 2016

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen

Aus der Zeitschrift Journal of Causal Inference Band 4 Heft 2

Abstract

Recent approaches in causal inference have proposed estimating average causal effects that are local to some subpopulation, often for reasons of efficiency. These inferential targets are sometimes data-adaptive, in that they are dependent on the empirical distribution of the data. In this short note, we show that if researchers are willing to adapt the inferential target on the basis of efficiency, then extraordinary gains in precision can potentially be obtained. Specifically, when causal effects are heterogeneous, any asymptotically normal and root-n consistent estimator of the population average causal effect is superefficient for a data-adaptive local average causal effect.

Keywords: causal inference; superefficiency; data-adaptive target parameter; local average treatment effect

1 Introduction

When causal effects are heterogeneous, then inferences depend on the population for which causal effects are estimated. Although population average causal effects have traditionally been the inferential targets, recent results have focused on estimating average causal effects that are local to some subpopulation for reasons of efficiency. These approaches include trimming observations based on the distribution of the propensity score [1], using regression adjustment to estimate reweighted causal effects [2, 3, 4], or implementing calipers for propensity-score matching [5, 6]. In some cases, the target parameter is dependent on the empirical distribution of the data, including cases where the researcher is explicitly conducting inference on, e. g., the average treatment effect among the treated conditional on the observed covariate distribution [7], or other causal sample functionals [8, 9], without revision to the estimator being used.

These approaches privilege efficiency in estimation over targeting population average causal effects, and often allow for the target to be defined on the basis of the observed data. We provide an example of how these approaches, taken to their extreme, can provide extraordinary gains in statistical certainty. We consider the case of a data-adaptive target parameter [10] that is allowed to vary with the data depending on which subpopulation’s local average causal effect is best estimated. When treatment effects are heterogeneous, adaptively changing the target parameter on the basis of efficiency yields an unusual result: if the population average causal effect can be consistently estimated with a root-n consistent and asymptotically normal estimator θˆ, then the same estimator θˆ is always superefficient (i. e., faster than root-n consistent) for a data-adaptive local average causal effect. Furthermore, with an additional regularity condition on mean square convergence, we show that the mean square error of θˆ for a data-adaptive local average causal effect is of o(n−1).

2 Results

Consider a full data probability distribution G with an associated causal effect distribution τ with finite expectation EG[τ], where EG[.] denotes the expectation over the distribution G. Further denote the support of the distribution of τ as SuppG[τ]. We impose a regularity condition on τ establishing non-degeneracy of τ.

Assumption 1:

(Effect heterogeneity). min(supSuppG[τ])−EG[τ],EG[τ],−inf(SuppG[τ])=c>0

Assumption 1 is equivalent to assuming that causal effects are not constant across observations in the distribution G; i. e., causal effects are heterogeneous.

We do not observe the full data probability distribution G, but we observe an empirical distribution Fn. Suppose that, using Fn, we have a root-n consistent and asymptotically normal estimator of the average causal effect EG[τ], θˆ.

Definition 1:

An estimatorθˆis root-nconsistent and asymptotically normal forθ0ifn(θˆ−θ0)=N(0,σ2)+op(1), for some0<σ2<∞.

We now define the target parameter, θFn.

Definition 2:

Let the target parameter

θFn={θˆ:|θˆ−EG[τ]|≤cEG[τ]+c:θˆ−EG[τ]>cEG[τ]−c:θˆ−EG[τ]<−c,

where, as in Assumption 1, c=minsupSupp1G[τ]−EG[τ],EG[τ]−infSupp1ptG[τ].

The target parameter adapts naturally to the closest value in an interval surrounding EG[τ], where the width of the interval is defined by the support of τ. We formalize how each θFn is a local average treatment effect.

Proposition 1:

There exists a nonnegative weighting associated with each empirical distributionFn, wFn, such that across allFn, θFn=EG[wFnτ]EG[wFn].

A proof of Proposition 1 follows directly from the fact that a weighted mean can obtain any value in the interval defined by the infimum and supremum of its distribution’s support. Proposition 1 asserts that across all realizations, the target parameter θFn corresponds to an average causal effect for at least one subpopulation. (There in fact may be infinitely many subpopulations to which θFn corresponds.) The composition of the subpopulation(s) associated with each θFn is not directly knowable by the researcher and may vary across realizations of the data.

However, mirroring results on other data-adaptive parameters under random sampling, including the sample average causal effect, the target parameter θFn will converge to the average causal effect EG[τ] at root-n rate. Proposition 2 proves that the data-adaptive local average causal effect is asymptotically equivalent to the average causal effect, and establishes its rate of convergence.

Proposition 2:

Suppose thatθˆis a root-nconsistent and asymptotically normal estimator ofEG[τ]. Thenn(θFn−EG[τ])=Op(1).

A proof of Proposition 2 follows by noting that n(θˆ−EG[τ])=Op(1) and that across every realization, |θFn−EG[τ]|≤|θˆ−EG[τ]|.

We now turn to our primary result, proving the superefficiency of θˆ in estimating θFn.

Proposition 3:

Suppose that Assumption 1 holds and thatθˆis a root-nconsistent and asymptotically normal estimator ofEG[τ]. Thenn(θˆ−θFn)=op(1).

Proof:

Decompose θˆ into θ˜=N(EG[τ],σ2/n) and u=op(n−1/2), so that θˆ=θ˜+u. Since (θ˜−θFn) is op(an) for any positive sequence (an), the rate of convergence of θˆ is at worst governed by the bound ensured by u’s op(n−1/2) convergence. To prove the claim, note that for any positive ε, Pr|θ˜−θFn|an≥ε≤Prθ˜−θFn≠0=2Φ(−cn/σ), where Φ(.) denotes the standard Normal CDF. Since limn→∞2Φ(−cn/σ)=0, (θ˜−θFn) is op(an). Thus θˆ−θFn=op(an)+op(n−1/2)=op(n−1/2), yielding the result. □

In short, Proposition 3 demonstrates that the probability that θˆ falls inside the support of the effect distribution converges to one quickly; conditional on this event, then estimation error is zero (as the target parameter takes on the value as the estimator with probability one). To illustrate this result, we can consider a case where an interval defined by the support of the effect distribution encompasses the sampling distribution of the estimator.

Corollary 1:

Suppose thatc≥maxsupSupp[θˆ]−EG[τ],EG[τ]−infSupp[θˆ]. ThenPr(θˆ=θFn)=1.

A proof of Corollary 1 follows by noting that Pr(|θˆ−EG[τ]|≤c)=0, and applying Definition 2. In other words, if the support of the estimator being used lies entirely within the interval [EG[τ]−c,EG[τ]+c], then estimation error is always zero. This condition necessarily holds if Supp_G[τ]=ℝ, then the value that any estimator θˆ takes must coincide with a local average causal effect. But note that Corollary 1 would not hold if Supp_G[τ]=ℝ⁺ and Pr(θˆ<0)>0.

Our results can be generalized to stronger claims straightforwardly. When a regularity condition is imposed on the rate of convergence of θˆ to normality, a stronger result can be obtained about the rate of mean square convergence.

Proposition 4:

Suppose that Assumption 1 holds andθˆobeysn(θˆ−EG[τ])=N(0,σ2)+ε, whereE[ε2]=o(n−1/2).ThenE[(θˆ−θFn)2]=o(n−1).

Proof:

We will show that the mean square error of (θ˜−θFn) converges to zero sufficiently quickly, implying that the rate of convergence of θˆ is at worst governed by the mean square error bound ensured by ε’s convergence rate. To obtain the rate of convergence of the mean square error of θ˜, we integrate over its squared deviation from the target parameter. Within c of EG[τ], the squared deviation is zero, thus we need only integrate over the squared deviation over the tails of the normal distribution. To ease calculations, we obtain an upper bound by integrating over the squared deviation from EG[τ], rather than from θFn:

EG[(θ˜−θFn)2]≤2∫c∞x2nσ2πe−x2n2σ2=cσ2πe−c2n2σ2n+2σ2Φ−cnσn=o(n−1).

Since E[(θ˜−θFn)2]=o(n−1) and n−1/2E[ε2]=o(n−1), the Cauchy-Schwarz inequality ensures that E[(θˆ−θFn)2]=o(n−1)+o(n−1)=o(n−1).□

3 Discussion

Our results highlight the additional certainty obtained by data-adaptively choosing the population for which average causal effects are measured on the basis of efficiency. It is well known that efficiency gains may be obtained through data-adaptive inference. But the extent to which the researcher can benefit from such practice has been understated. Under treatment effect heterogeneity – a precondition for locality to be a concern – all root-n consistent and asymptotically normal estimators of the average treatment effect are superefficient for a local average treatment effect.

There is of course a cost to this superefficiency: the target parameter is likely not of intrinsic interest. This issue is not unique to our setting, and other methods that change the inferential target based on efficiency concerns may be subject to this critique. As Crump et al. ([1], p. 188) notes, “external validity may be lost by changing the focus to average treatment effects for a subset of the original sample.” This is exacerbated in our setting by the researcher’s lack of knowledge about the characteristics of the subpopulation under study. Our result represents an extreme case of privileging efficiency over targeting population average causal effects. However, our results provide insight into a potential pathology of data-adaptivity purely on efficiency concerns: the gains in statistical certainty may be essentially unbounded without further restrictions. We hope that future work in the domain of efficiency theory for data-adaptive parameters will consider classes of restrictions that would exclude the case considered here.

Acknowledgement

The author thanks Don Green, Cyrus Samii, Jas Sekhon, Mark van der Laan, and two anonymous reviewers for helpful comments. The author expresses particular gratitude to Jas Sekhon for suggesting a parsimonious proof strategy for Proposition 3 and to an anonymous reviewer for inspiring Corollary 1. All remaining errors are the author’s responsibility.

References

1. Crump RK, Hotz VJ, Imbens GW, Mitnik OA. Dealing with limited overlap in estimation of average treatment effects. Biometrika 2009.10.1093/biomet/asn055Suche in Google Scholar

2. Humphreys M. Bounds on least squares estimates of causal effects in the presence of heterogeneous assignment probabilities Columbia University, 2009 Manuscript.Suche in Google Scholar

3. Angrist JD, Pischke JS. Mostly harmless econometrics: An empiricist’s companion. Princeton, NJ: Princeton University Press, 2009.10.1515/9781400829828Suche in Google Scholar

4. Aronow PM, Samii C. Does regression produce representative estimates of causal effects? Am J Pol Sci 2016;60(1):250–267.10.1111/ajps.12185Suche in Google Scholar

5. Austin PC. Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies. Pharm Stat 2011;10(2):150–161.10.1002/pst.433Suche in Google Scholar PubMed PubMed Central

6. Rosenbaum PR, Rubin DB. Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am Stat 1985;39(1):33–38.10.1017/CBO9780511810725.019Suche in Google Scholar

7. Abadie A, Imbens G. Simple and bias-corrected matching estimators for average treatment effects. NBER technical working paper no. 283 2002.10.3386/t0283Suche in Google Scholar

8. Aronow PM, Green DP, Lee DK. Sharp bounds on the variance in randomized experiments. Ann Stat 2014;42(3):850–871.10.1214/13-AOS1200Suche in Google Scholar

9. Balzer LB, Petersen ML, van der Laan MJ. Targeted estimation and inference for the sample average treatment effect. Berkeley, CA: Bepress, 2015.Suche in Google Scholar

10. van der Laan MJ, Hubbard AE, Pajouh SK. Statistical inference for data adaptive target parameters. Princeton, NJ: Bepress, 2013.Suche in Google Scholar

Published Online: 2016-11-11

Published in Print: 2016-9-1

This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Artikel in diesem Heft

https://doi.org/10.1515/jci-2016-0007

Schlagwörter für diesen Artikel

causal inference; superefficiency; data-adaptive target parameter; local average treatment effect

Creative Commons

BY-NC-ND 3.0