Abstract
Recent approaches in causal inference have proposed estimating average causal effects that are local to some subpopulation, often for reasons of efficiency. These inferential targets are sometimes data-adaptive, in that they are dependent on the empirical distribution of the data. In this short note, we show that if researchers are willing to adapt the inferential target on the basis of efficiency, then extraordinary gains in precision can potentially be obtained. Specifically, when causal effects are heterogeneous, any asymptotically normal and root-
1 Introduction
When causal effects are heterogeneous, then inferences depend on the population for which causal effects are estimated. Although population average causal effects have traditionally been the inferential targets, recent results have focused on estimating average causal effects that are local to some subpopulation for reasons of efficiency. These approaches include trimming observations based on the distribution of the propensity score [1], using regression adjustment to estimate reweighted causal effects [2, 3, 4], or implementing calipers for propensity-score matching [5, 6]. In some cases, the target parameter is dependent on the empirical distribution of the data, including cases where the researcher is explicitly conducting inference on, e. g., the average treatment effect among the treated conditional on the observed covariate distribution [7], or other causal sample functionals [8, 9], without revision to the estimator being used.
These approaches privilege efficiency in estimation over targeting population average causal effects, and often allow for the target to be defined on the basis of the observed data. We provide an example of how these approaches, taken to their extreme, can provide extraordinary gains in statistical certainty. We consider the case of a data-adaptive target parameter [10] that is allowed to vary with the data depending on which subpopulation’s local average causal effect is best estimated. When treatment effects are heterogeneous, adaptively changing the target parameter on the basis of efficiency yields an unusual result: if the population average causal effect can be consistently estimated with a root-
2 Results
Consider a full data probability distribution
(Effect heterogeneity).
Assumption 1 is equivalent to assuming that causal effects are not constant across observations in the distribution
We do not observe the full data probability distribution
An estimator
We now define the target parameter,
Let the target parameter
where, as in Assumption 1,
The target parameter adapts naturally to the closest value in an interval surrounding
There exists a nonnegative weighting associated with each empirical distribution
A proof of Proposition 1 follows directly from the fact that a weighted mean can obtain any value in the interval defined by the infimum and supremum of its distribution’s support. Proposition 1 asserts that across all realizations, the target parameter
However, mirroring results on other data-adaptive parameters under random sampling, including the sample average causal effect, the target parameter
Suppose that
A proof of Proposition 2 follows by noting that
We now turn to our primary result, proving the superefficiency of
Suppose that Assumption 1 holds and that
Decompose
In short, Proposition 3 demonstrates that the probability that
Suppose that
A proof of Corollary 1 follows by noting that
Our results can be generalized to stronger claims straightforwardly. When a regularity condition is imposed on the rate of convergence of
Suppose that Assumption 1 holds and
We will show that the mean square error of
Since
3 Discussion
Our results highlight the additional certainty obtained by data-adaptively choosing the population for which average causal effects are measured on the basis of efficiency. It is well known that efficiency gains may be obtained through data-adaptive inference. But the extent to which the researcher can benefit from such practice has been understated. Under treatment effect heterogeneity – a precondition for locality to be a concern – all root-
There is of course a cost to this superefficiency: the target parameter is likely not of intrinsic interest. This issue is not unique to our setting, and other methods that change the inferential target based on efficiency concerns may be subject to this critique. As Crump et al. ([1], p. 188) notes, “external validity may be lost by changing the focus to average treatment effects for a subset of the original sample.” This is exacerbated in our setting by the researcher’s lack of knowledge about the characteristics of the subpopulation under study. Our result represents an extreme case of privileging efficiency over targeting population average causal effects. However, our results provide insight into a potential pathology of data-adaptivity purely on efficiency concerns: the gains in statistical certainty may be essentially unbounded without further restrictions. We hope that future work in the domain of efficiency theory for data-adaptive parameters will consider classes of restrictions that would exclude the case considered here.
Acknowledgement
The author thanks Don Green, Cyrus Samii, Jas Sekhon, Mark van der Laan, and two anonymous reviewers for helpful comments. The author expresses particular gratitude to Jas Sekhon for suggesting a parsimonious proof strategy for Proposition 3 and to an anonymous reviewer for inspiring Corollary 1. All remaining errors are the author’s responsibility.
References
1. Crump RK, Hotz VJ, Imbens GW, Mitnik OA. Dealing with limited overlap in estimation of average treatment effects. Biometrika 2009.10.1093/biomet/asn055Search in Google Scholar
2. Humphreys M. Bounds on least squares estimates of causal effects in the presence of heterogeneous assignment probabilities Columbia University, 2009 Manuscript.Search in Google Scholar
3. Angrist JD, Pischke JS. Mostly harmless econometrics: An empiricist’s companion. Princeton, NJ: Princeton University Press, 2009.10.1515/9781400829828Search in Google Scholar
4. Aronow PM, Samii C. Does regression produce representative estimates of causal effects? Am J Pol Sci 2016;60(1):250–267.10.1111/ajps.12185Search in Google Scholar
5. Austin PC. Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies. Pharm Stat 2011;10(2):150–161.10.1002/pst.433Search in Google Scholar PubMed PubMed Central
6. Rosenbaum PR, Rubin DB. Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am Stat 1985;39(1):33–38.10.1017/CBO9780511810725.019Search in Google Scholar
7. Abadie A, Imbens G. Simple and bias-corrected matching estimators for average treatment effects. NBER technical working paper no. 283 2002.10.3386/t0283Search in Google Scholar
8. Aronow PM, Green DP, Lee DK. Sharp bounds on the variance in randomized experiments. Ann Stat 2014;42(3):850–871.10.1214/13-AOS1200Search in Google Scholar
9. Balzer LB, Petersen ML, van der Laan MJ. Targeted estimation and inference for the sample average treatment effect. Berkeley, CA: Bepress, 2015.Search in Google Scholar
10. van der Laan MJ, Hubbard AE, Pajouh SK. Statistical inference for data adaptive target parameters. Princeton, NJ: Bepress, 2013.Search in Google Scholar
©2016 by De Gruyter
This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Articles in the same Issue
- Research Articles
- Generalized Structural Mean Models for Evaluating Depression as a Post-treatment Effect Modifier of a Jobs Training Intervention
- Data-Adaptive Causal Effects and Superefficiency
- The Mechanics of Omitted Variable Bias: Bias Amplification and Cancellation of Offsetting Biases
- A Causal Inference Approach to Network Meta-Analysis
- Causal, Casual and Curious
- Lord’s Paradox Revisited – (Oh Lord! Kumbaya!)
Articles in the same Issue
- Research Articles
- Generalized Structural Mean Models for Evaluating Depression as a Post-treatment Effect Modifier of a Jobs Training Intervention
- Data-Adaptive Causal Effects and Superefficiency
- The Mechanics of Omitted Variable Bias: Bias Amplification and Cancellation of Offsetting Biases
- A Causal Inference Approach to Network Meta-Analysis
- Causal, Casual and Curious
- Lord’s Paradox Revisited – (Oh Lord! Kumbaya!)