Abstract
A century ago, Neyman showed how to evaluate the efficacy of treatment using a randomized experiment under a minimal set of assumptions. This classical repeated sampling framework serves as a basis of routine experimental analyses conducted by today’s scientists across disciplines. In this article, we demonstrate that Neyman’s methodology can also be used to experimentally evaluate the efficacy of individualized treatment rules (ITRs), which are derived by modern causal machine learning (ML) algorithms. In particular, we show how to account for additional uncertainty resulting from a training process based on cross-fitting. The primary advantage of Neyman’s approach is that it can be applied to any ITR regardless of the properties of ML algorithms that are used to derive the ITR. We also show, somewhat surprisingly, that for certain metrics, it is more efficient to conduct this ex-post experimental evaluation of an ITR than to conduct an ex-ante experimental evaluation that randomly assigns some units to the ITR. Our analysis demonstrates that Neyman’s repeated sampling framework is as relevant for causal inference today as it has been since its inception.
1 Introduction
Neyman’s seminal 1923 paper introduced two foundational ideas in causal inference [1]. First, Neyman developed a formal notation for potential outcomes and defined the average treatment effect (ATE) as a causal quantity of interest. Second, he showed how randomization of treatment assignment alone can be used to establish the unbiasedness and estimation uncertainty of the standard difference-in-means estimator. Since then, combined with the additional assumption of random sampling of units, Neyman’s repeated sampling framework has served as a basis of routine experimental analyses conducted by scientists across many disciplines.
Over the past two decades, however, the causal inference literature has gone beyond the ATE. Specifically, the realization that the same treatment can have varying impacts on different individuals led to the development of statistical methods and machine learning (ML) algorithms for estimating heterogeneous treatment effects (e.g., [2–5]). Furthermore, a number of researchers have developed various methods for deriving data-driven individualized treatment rules (ITRs) (e.g., [6–13]). With an increasing availability of granular data and modern computing power, these ITRs are becoming popular in business, medicine, politics, and even public policy.
In this article, we demonstrate that Neyman’s repeated sampling framework is still relevant for today’s causal ML methods. We show how the framework can be used to experimentally evaluate the efficacy of any ITRs (including those obtained with ML algorithms via cross-fitting) under a minimal set of assumptions. While some of our formal results are originally derived in our previously published work [14] or follow directly from them, we focus on the intuition behind those theoretical results to facilitate the future extensions to other settings.
We also show, using Neyman’s framework, that it is not always statistically more efficient to evaluate an ITR by conducting a new randomized experiment where the treatment is the administration of the ITR itself (i.e., ex-ante evaluation) than simply using the data from an existing randomized controlled trial (i.e., ex-post evaluation). Altogether, this article shows how Neyman’s classical methodological framework can be applied to solve today’s causal inference problems.
2 Neyman’s repeated sampling framework
We begin by briefly introducing Neyman’s inferential approach to estimating the ATE. Suppose that we have a sample of
As pointed out by Rubin in his discussion of Neyman’s 1923 paper [15], the aforementioned setup implicitly assume no interference between units – the outcome of one unit is not influenced by the treatment of another unit. We explicitly state this assumption below.
Assumption 1
(No interference between units) The potential outcomes for unit
Neyman considered the classical randomized experiment where the treatment assignment is completely randomized with
Assumption 2
(Complete randomization of treatment assignment) The treatment assignment probability is given by
for each
Under these two assumptions alone, Neyman showed the following sample average treatment effect (SATE) can be estimated without bias:
Using the difference-in-means estimator
Neyman also showed that the variance of this estimator is not identifiable but a conservative variance can be estimated from the data without bias:
where
Neyman obtained the aforementioned results by averaging over all possible treatment assignments under complete randomization. Subsequent work has extended Neyman’s framework to a superpopulation framework by assuming that the sample of
Assumption 3
(Random sampling of units) Each of
This extended framework, which we call Neyman’s repeated sampling framework, is useful because it allows us to estimate the population ATE from the sample,
Subsequent work has shown that the difference-in-means estimator is unbiased for the PATE and the exact variance can be estimated without bias [16]:
where
In the remainder of this article, we will show that this Neyman’s repeated sampling framework enables an assumption-free experimental evaluation of data-driven ITRs.
3 Experimental evaluation of ITRs
In this section, we explain how Neyman’s repeated sampling framework can be applied to experimentally evaluate the empirical performance of ITRs, which assigns each individual unit to either the treatment or control condition based on their observed characteristics.
3.1 Setup
Suppose that we use an ML algorithm to create a ITR,
Most commonly, researchers first estimate the conditional ATE [see, e.g., 7,17–21]:
They then derive an ITR as the treatment rule that assigns the treatment to everyone who is predicted to have a positive CATE, i.e.,
Our goal is to evaluate the empirical performance of an ITR without assuming that an ITR is indeed optimal. In the aforementioned example, we do not assume that
To measure the performance of an ITR, we consider two quantities. The first is the population average value (PAV), which is defined as
This is the standard metric of ITR’s overall performance. The second quantity is the population average prescriptive effect (PAPE) [14,22], which measures the benefit of ITR and is defined as follows:
where
The PAPE compares the performance of ITR against the non-ITR that treats the same proportion of randomly selected individuals. This contrasts with other quantities considered in the literature such as the targeting operator characteristic (TOC) [23], which compares the performance of ITR against the non-individualized rule that treats everyone. Unlike the TOC, the PAPE focuses on the benefit of determining which individuals should be treated while holding the proportion of those who receive the treatment constant.
To gain additional intuition about the PAPE, consider the following alternative but equivalent expression of the same quantity:
This alternative expression shows that the PAPE measures how well the ITR agrees with the true individual treatment effect (ITE). To compare across datasets, we can further normalize the PAPE as the correlation between the ITR and the true ITE, i.e.,
Although this provides a scale-invariant quantity to understand the performance of ITR, it is not identifiable from the data because we cannot identify the variance of ITE, i.e.,
The aforementioned equality further implies the following inequality by applying Cauchy-Schwarz twice:
Therefore, PAPE is bounded provided that the second moments of the potential outcomes exist. Thus, given a fixed variance of the potential outcomes,
3.2 Estimation and inference
To estimate the PAV and PAPE under Neyman’s repeated sampling framework, we consider the following “difference-in-means”-type estimators:
where
The following theorems, reproduced from our previously published work [14], show that, under Neyman’s repeated sampling framework, these two estimators are unbiased and the finite-sample variances can be derived.
Theorem 1
(Unbiasedness and variance of the PAV estimator [14]) Under Assumptions 1–3, the expectation and variance of the PAV estimator defined equation (6) are given by
where
Theorem 2
(Unbiasedness and variance of the PAPE estimator [14]) Under Assumptions 1–3, the expectation and variance of the PAPE estimator defined equation (7) are given by,
where
The properties of the PAV estimator shown above follow immediately from Neyman’s classic results by replacing the potential outcome
We can also compare the results of the PAPE estimator with Neyman’s classic results. We observe that the relevant potential outcome is given by
The correlation can be broadly decomposed into two components: (1) a negative component caused by the negative correlation within the mean-adjusted ITR
On the other hand, for
Thus, this additional term is only likely to be positive under a scenario where
3.3 Performance comparison among multiple ITRs
While the PAPE compares an ITR with a random treatment assignment rule that treats the same proportion of units, researchers are often interested in comparing the performance of multiple ITRs. In such cases, we recommend estimating the difference in PAV between two ITRs that are subject to the same budget constraint. Imai and Li [14] provided the details of estimation and inference regarding this quantity.
The use of PAPE for comparison of two ITRs is sometimes inappropriate. To see this, note that the difference in PAPE can be written as the covariance between the agreement of two ITRs and the true ITE:
This expression shows that for ITRs with similar treatment proportions, the sign of this difference indicates the relative capability of the ITRs in identifying the optimal individuals to treat. However, if the two ITRs

Illustration of PAPE for two different ITRs
3.4 Lack of invariance
Unlike Neyman’s ATE estimator, the PAV and PAPE estimators are not invariant to a constant shift of the outcome variable. One might expect that adding a constant
Since this term equals zero in expectation, the two estimators remain unbiased. However, this shift affects their variances because the ITR
We now derive the constant shift
Proposition 1
(Minimum variance estimators) The variances of the constant shift estimators are given by
where
Proof is in Appendix B. The proposition implies that if we wish to minimize variance across a range of
4 Ex-ante vs ex-post experimental evaluations
So far, we have considered an ex-post evaluation, in which we first conduct a completely randomized experiment and then evaluate ITRs using the data from the experiment. Alternatively, researchers may consider an ex-ante experimental evaluation, in which we randomly assign units to an ITR, i.e., the ITR itself is the “treatment” of this experiment. Ex-ante experimental designs are commonly used in practice [see 24,25, for example].
We apply the Neyman’s repeated sampling framework to compare the statistical efficiency of ex-ante and ex-post experimental evaluations. We show below that, perhaps surprisingly, in some cases, ex-post evaluation is more efficient than ex-ante evaluation. Our result suggests that given a potential ethical concern of ex-ante experimental evaluation, researchers may prefer ex-post evaluation. Another reason to prefer ex-post evaluation is that this design allows one to evaluate any number of ITRs, while the ex-ante evaluation is tied to a particular ITR. In this section, our analysis focuses on the PAPE. Since the PAV does not compare between two different treatment regimes, it does not make sense to design a randomized trial around it.
4.1 Setup
For the ex-ante evaluation of the PAPE, we assume a simple random of
Assumption 4
(Complete randomization in the ex-ante evaluation of PAPE) The probability of being assigned to the ITR rather than the random treatment rule is given by
for each
for each
Using this experimental data, we wish to estimate the PAPE defined in equation (4). For simplicity, we have the number of treated units under the random treatment rule to equal that under the ITR condition, i.e.,
We consider the following estimator of the PAPE for the ex-ante experimental evaluation that accounts for a potential difference in the proportion of treated units between the ITR and the random treatment rule by appropriately weighting the latter:
The ex-ante evaluation differs from the ex-post evaluation in two ways. First, the ex-ante estimator requires two separate random assignments (
4.2 Comparison of the two experimental designs
Before comparing two modes of evaluation, we derive the bias and variance of the ex-ante evaluation estimators under the Neyman’s repeated sampling framework. In the current case, the uncertainty comes from three types of randomness: (1) the random assignment to the individualized or random treatment rule, (2) the randomized treatment assignment under the random assignment rule, and (3) the simple random sampling of units from the target population. The next theorem shows that this estimator is unbiased and the variance is identifiable. Proof is given in Appendix A.
Theorem 3
(Unbiasedness and variance of the ex-ante PAPE estimator) Under Assumptions 1, 3, and 4, the expectation and variance of the ex-ante PAPE estimator defined in equation (8) are given by
where
Given these results, we examine the relative statistical efficiency of the ex-post and ex-ante experimental evaluations. To facilitate the comparison, we assume
Under this simplified setting, the difference in the variance of the PAPE estimator between the ex-ante and ex-post evaluations is given by
The details of the derivation are given in Appendix C. Suppose now that the ITR correctly assigns individuals on average, i.e.,
for
The result implies that under a set of simplifying assumptions, the ex-post evaluation is more efficient than the ex-ante evaluation. We note, however, that this conclusion may not hold if the ex-ante and ex-post setups have sample allocation different from the setting considered here.
5 Incorporating the uncertainty of ML training
In the aforementioned sections, we assume that the ITR to be evaluated is given. For example, an ITR may be derived using an external dataset. But, in many cases, researchers may wish to use the same experimental dataset to both derive an ITR and evaluate it. One possibility is to randomly split a dataset into the training and evaluation datasets, and then use the former to learn an ITR and the latter for its evaluation. Unfortunately, this sample splitting approach does not use the data most efficiently.
An alternative and more efficient approach is cross-fitting. The idea is to randomly split the data into
While the dominant “double machine learning” (DML) approach uses the same cross-fitting procedure [26], we show here that Neyman’s repeated sampling framework can also incorporate this cross-fitting approach. Unlike the DML, Neyman’s framework enables us to derive the finite-sample properties of ITR evaluation solely based on the random splitting of the data as well as randomization of treatment assignment and random sampling of units.
5.1 Setup
Consider a generic ML algorithm, which we define as a deterministic function mapping the space of training data of finite size, denoted by
Typically, the scoring rule of interest is the estimated CATE such that the largest value indicates the highest treatment prioritization. Alternatively, the scoring rule may be based on the estimated baseline risk, i.e.,
where the notation makes it explicit that the ITR depends on the specific training data
Next, consider the following standard cross-fitting procedure. First, we randomly split the experimental data of size
Then, for each fold
where
5.2 Evaluation metrics under cross-fitting
To extend Neyman’s repeated sampling framework to cross-fitting with
For the PAV under cross-fitting, we consider an average ITR over across training data of size
which represents the proportion of times the estimated ITR would assign the treatment to a unit with a specific value of covariates. The notation makes explicit the dependence on the size of training data
Under Neyman’s repeated sampling framework, one can view each estimated ITR as another random sampling from a population of ITRs based on the ML algorithm
For the PAPE, we consider the cross-fitting version of the proportion treated by ITR
Then, the PAPE under cross-fitting can be defined as
As shown earlier, the PAPE is equal to the covariance between the average proportion treated and the individual treatment effect:
5.3 Finite sample properties
We now apply Neyman’s repeated sampling framework to the cross-fitting PAV estimator given in equation (12). It is easy to show that
where
where
We can further analyze the first term of equation (14) by following the analytical strategy used in Theorem 1. The only difference is that the estimated ITR is correlated across observations due to training process:
where
Theorem 4
(Unbiasedness and exact variance of the cross-fitting PAV estimator [14]) Under Assumptions 1–3 the expectation and variance of the cross-fitting PAV estimator defined in equation (12) are given by,
for
In particular, we note that when compared with the fixed ITR setting, there are two additional terms. One of these terms is proportional to
Therefore, maximally, the efficiency gain resulting from the cross-fitting procedure reduces the variance to 1/K
6 A numerical study
In this section, we empirically validate our theoretical results through a numerical study. In particular, we focus on demonstrating the results related to the lack of invariance (Proposition 1) and the efficiency comparison between the ex-ante and ex-post estimators (Theorem 3). Strong finite-sample performance of the proposed estimators have been extensively demonstrated in our previously published study [14].
In all our simulations, we use the 28th data-generating process (DGP) from the 2016 Atlantic Causal Inference Conference (ACIC) Competition, of which the details are given in [28]. For the population distribution of pre-treatment covariates, we use the empirical distribution of covariates from this sample of
First, we investigate the effect of shifting potential outcomes by a constant on the variance of estimators. Figure 2(a) plots the empirical standard deviation of the PAV estimator (the vertical axis) with
As predicted by our theoretical analysis, we find that balancing the potential outcomes leads to a lower standard error in the estimator due to the unbalanced nature of the relevant potential outcomes

Numerical experiments: (a) Empirical standard error of PAV estimator as a function of constant shift in potential outcomes.
Second, we compare the statistical efficiency of the ex-ante and ex-post PAPE estimators under the assumption
7 Conclusion
In this article, we provided a short overview of how Neyman’s repeated sampling framework can be used to experimentally evaluate the performance of arbitrary ITRs. We consider the two settings, one in which an ITR is given and the other in which an ITR is estimated from the same data. We also demonstrated the new challenges that result from the application of Neyman’s framework, including the lack of invariance of evaluation estimators and the need to incorporate the uncertainty due to training of ML algorithms. We further demonstrated how Neyman’s repeated-sampling framework can highlight the difference between the ex-ante evaluation and ex-post evaluation of ITRs by showing that the ex-post evaluation is statistically more efficient. Our ongoing work also applies this framework to the estimation of heterogeneous treatment effects discovered by ML algorithms [29]. Altogether, we have shown that a century after his original proposal, Neyman’s analytical framework remains relevant and is widely applicable to the evaluation of today’s causal ML methods.
Acknowledgement
The authors would like to thank Peng Ding and the two anonymous reviewers for their helpful and invaluable feedback during the review process.
-
Funding information: None declared.
-
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
-
Conflict of interest: Prof. Kosuke Imai is a member of the Editorial Advisory Board in the Journal of Causal Inference but was not involved in the review process of this article.
-
Data availability statement: The numerical experiments included in the current study can be reproduced with the R scripts available at https://github.com/MichaelLLi/NeymanMLCode.
Appendix A Proof of Theorem 3
We first consider the following intermediate estimator:
This estimator differs from the ex-ante estimator of the PAPE
Lemma 1
(Expectation and variance of the intermediate estimator) Under Assumptions 1, 3, and 4, the expectation and variance of the estimator given in equation (A1) for estimating the PAPE defined in equation (4) are given by:
Proof
We first derive the bias expression. First, we take the expectation with respect to
Next, we take the expectation with respect to
Finally, we take the expectation over the sampling of
For the variance expression, we proceed as follows:
For the first term, we further use the law of total variance by conditioning on the sample, and center
for
where
where
B Proof of Proposition 1
By definition, we have
Define
□
C Difference of the PAPE variances
To compute the difference of the two PAPE variances, we first define the following:
Then, a simple algebraic manipulation yields
where
Under the assumption that
Finally, note the following:
and
Hence, we have
□
D Comparison under the simplifying assumptions
Define
Now, consider a constant shift of the outcome variable, i.e.,
Thus, we observe that the variance difference decreases by
Since the ex-ante estimator is completely unaffected by this change, the constant shift increases the variance of the ex-post evaluation estimator by the same amount. Under the simplifying assumptions, we have,
Therefore, we can bound the difference in variance from below as follows:
□
E Outcome model for the numerical study
References
[1] Neyman J. On the application of probability theory to agricultural experiments. Essay on principles. Ann Agricultural Sci. 1923:1–51. Search in Google Scholar
[2] Imai K, Ratkovic M. Estimating treatment effect heterogeneity in randomized program evaluation. Ann Appl Stat. 2013;7:443–70. 10.1214/12-AOAS593Search in Google Scholar
[3] Athey S, Imbens G. Recursive partitioning for heterogeneous causal effects. Proc Nat Acad Sci. 2016;113(27):7353–60. 10.1073/pnas.1510489113Search in Google Scholar PubMed PubMed Central
[4] Wager S, Athey S. Estimation and inference of heterogeneous treatment effects using random forests. J Amer Stat Assoc. 2018;113(523):1228–42. 10.1080/01621459.2017.1319839Search in Google Scholar
[5] Hahn PR, Murray JS, Carvalho CM. Bayesian regression tree models for causal inference: regularization, confounding, and heterogeneous effects. Bayesian Anal. 2020;15(3):965–1056. 10.1214/19-BA1195Search in Google Scholar
[6] Dudík M, Langford J, Li L. Doubly robust policy evaluation and learning. in Proceedings of the 28th International Conference on International Conference on Machine Learning. ICML’11, USA: Omnipress; 2011. p. 1097–104. Search in Google Scholar
[7] Zhang B, Tsiatis AA, Davidian M, Laber E. Estimating optimal treatment regimes from a classification perspective. Stat. 2012;1(1):103–14. 10.1002/sta.411Search in Google Scholar PubMed PubMed Central
[8] Chakraborty B, Laber E, Zhao Y-Q. Inference about the expected performance of a data-driven dynamic treatment regime. Clin Trials. 2014;11(4):408–17. 10.1177/1740774514537727Search in Google Scholar PubMed PubMed Central
[9] Jiang N, Li L. Doubly robust off-policy value evaluation for reinforcement learning. in: Proceedings of The 33rd International Conference on Machine Learning. Balcan MF, Weinberger KQ, (Eds.), vol. 48 of Proceedings of Research. New York, New York, USA: PMLR; 20–22 Jun 2016. p. 652–61. Search in Google Scholar
[10] Kallus N. Balanced policy evaluation and learning. in: Advances in Neural Information Processing Systems 31. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R. (Eds.) Curran Associates, Inc.; 2018. p. 8895–906. Search in Google Scholar
[11] Qi Z, Liu D, Fu H, Liu Y. Multi-armed angle-based direct learning for estimating optimal individualized treatment rules with various outcomes. J Amer Stat Assoc. 2020;115(530):678–91. 10.1080/01621459.2018.1529597Search in Google Scholar PubMed PubMed Central
[12] Mo W, Liu Y. Efficient learning of optimal individualized treatment rules for heteroscedastic or misspecified treatment-free effect models. J R Stat Soc Ser B Stat Methodol. 2022;84(2):440–72. 10.1111/rssb.12474Search in Google Scholar
[13] Ben-Michael E, Greiner J, Imai K, Jiang Z. Safe policy learning through extrapolation: Application to pre-trial risk assessment. Technical Report. 2021. arXiv:2109.11679. Search in Google Scholar
[14] Imai K, Liii ML. Experimental evaluation of individualized treatment rules. J Amer Stat Assoc. 2023;118(541):242–56. 10.1080/01621459.2021.1923511Search in Google Scholar
[15] Rubin DB. Comments on “On the application of probability theory to agricultural experiments. Essay on principles. Section 9 by J. Splawa-Neyman translated from the Polish and edited by D. M. Dabrowska and T.P. Speed”. Stat Sci. 1990;5:472–80. 10.1214/ss/1177012031Search in Google Scholar
[16] Ding P, Li X, Miratrix LW. Bridging finite and super population causal inference. J Causal Inference. 2017;5(2):20160027. 10.1515/jci-2016-0027Search in Google Scholar
[17] Qian M, Murphy SA. Performance gurantees for individualized treatment rules. Ann Stat. 2011;39(2):1180–210. 10.1214/10-AOS864Search in Google Scholar PubMed PubMed Central
[18] Luedtke AR, van der Laan MJ. Optimal individualized treatments in resource-limited settings. Int J Biostat. 2016;12(1):283–303. 10.1515/ijb-2015-0007Search in Google Scholar PubMed PubMed Central
[19] Luedtke AR, van der Laan MJ. Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy. Ann Statist. 2016;44(2):713–42. 10.1214/15-AOS1384Search in Google Scholar PubMed PubMed Central
[20] Zhou X, Mayer-Hamblett N, Khan U, Kosorok MR. Residual weighted learning for estimating individualized treatment rules. J Amer Stat Assoc. 2017;112(517):169–87. 10.1080/01621459.2015.1093947Search in Google Scholar PubMed PubMed Central
[21] Kitagawa T, Tetenov A. Who should be treated?: Empirical welfare maximization methods for treatment choice. Econometrica 2018;86:591–616. 10.3982/ECTA13288Search in Google Scholar
[22] Radcliffe NJ. Using control groups to target on predicted lift: Building and assessing uplift models. Direct Market Analytic J. 2007;1(3):14–21. Search in Google Scholar
[23] Yadlowsky S, Fleming S, Shah N, Brunskill E, Wager S. Evaluating treatment prioritization rules via rank-weighted average treatment effects. 2021. arXiv: http://arXiv.org/abs/arXiv:211107966. Search in Google Scholar
[24] Kumar A, Aikens RC, Hom J, Shieh L, Chiang J, Morales D, et al. Orderrex clinical user testing: a randomized trial of recommender system decision support on simulated cases. J Amer Med Inform Assoc. 2020;27(12):1850–9. 10.1093/jamia/ocaa190Search in Google Scholar PubMed PubMed Central
[25] Forman EM, Goldstein SP, Crochiere RJ, Butryn ML, Juarascio AS, Zhang F, et al. Randomized controlled trial of ontrack, a just-in-time adaptive intervention designed to enhance weight loss. Translat Behav Med. 2019;9(6):989–1001. 10.1093/tbm/ibz137Search in Google Scholar PubMed
[26] Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey W, et al. Double/debiased machine learning for treatment and structural parameters. Oxford, UK: Oxford University Press; 2018. 10.3386/w23564Search in Google Scholar
[27] Nadeau C, Bengio Y. Inference for the generalization error. Machine Learning. 2003;52(3):239–81. 10.1023/A:1024068626366Search in Google Scholar
[28] Dorie V, Hill J, Shalit U, Scott M, Cervone D, Automated versus do-it-yourself methods for causal inference: lessons learned from a data analysis competition. Stat Sci. Vol. 34. February 2019; p. 43–68. 10.1214/18-STS667Search in Google Scholar
[29] Imai K, Li ML. Statistical inference for heterogeneous treatment effects discovered by generic machine learning in randomized experiments. Journal of Business & Economic Statistics. Forthcoming.Search in Google Scholar
[30] Neyman J. On the application of probability theory to agricultural experiments: Essay on principles, section 9 (translated in 1990). Stat Sci. 1923;5:465–80. 10.1214/ss/1177012032Search in Google Scholar
© 2024 the author(s), published by De Gruyter
This work is licensed under the Creative Commons Attribution 4.0 International License.
Articles in the same Issue
- Research Articles
- Evaluating Boolean relationships in Configurational Comparative Methods
- Doubly weighted M-estimation for nonrandom assignment and missing outcomes
- Regression(s) discontinuity: Using bootstrap aggregation to yield estimates of RD treatment effects
- Energy balancing of covariate distributions
- A phenomenological account for causality in terms of elementary actions
- Nonparametric estimation of conditional incremental effects
- Conditional generative adversarial networks for individualized causal mediation analysis
- Mediation analyses for the effect of antibodies in vaccination
- Sharp bounds for causal effects based on Ding and VanderWeele's sensitivity parameters
- Detecting treatment interference under K-nearest-neighbors interference
- Bias formulas for violations of proximal identification assumptions in a linear structural equation model
- Current philosophical perspectives on drug approval in the real world
- Foundations of causal discovery on groups of variables
- Improved sensitivity bounds for mediation under unmeasured mediator–outcome confounding
- Potential outcomes and decision-theoretic foundations for statistical causality: Response to Richardson and Robins
- Quantifying the quality of configurational causal models
- Design-based RCT estimators and central limit theorems for baseline subgroup and related analyses
- An optimal transport approach to estimating causal effects via nonlinear difference-in-differences
- Estimation of network treatment effects with non-ignorable missing confounders
- Double machine learning and design in batch adaptive experiments
- The functional average treatment effect
- An approach to nonparametric inference on the causal dose–response function
- Review Article
- Comparison of open-source software for producing directed acyclic graphs
- Special Issue on Neyman (1923) and its influences on causal inference
- Optimal allocation of sample size for randomization-based inference from 2K factorial designs
- Direct, indirect, and interaction effects based on principal stratification with a binary mediator
- Interactive identification of individuals with positive treatment effect while controlling false discoveries
- Neyman meets causal machine learning: Experimental evaluation of individualized treatment rules
- From urn models to box models: Making Neyman's (1923) insights accessible
- Prospective and retrospective causal inferences based on the potential outcome framework
- Causal inference with textual data: A quasi-experimental design assessing the association between author metadata and acceptance among ICLR submissions from 2017 to 2022
- Some theoretical foundations for the design and analysis of randomized experiments
Articles in the same Issue
- Research Articles
- Evaluating Boolean relationships in Configurational Comparative Methods
- Doubly weighted M-estimation for nonrandom assignment and missing outcomes
- Regression(s) discontinuity: Using bootstrap aggregation to yield estimates of RD treatment effects
- Energy balancing of covariate distributions
- A phenomenological account for causality in terms of elementary actions
- Nonparametric estimation of conditional incremental effects
- Conditional generative adversarial networks for individualized causal mediation analysis
- Mediation analyses for the effect of antibodies in vaccination
- Sharp bounds for causal effects based on Ding and VanderWeele's sensitivity parameters
- Detecting treatment interference under K-nearest-neighbors interference
- Bias formulas for violations of proximal identification assumptions in a linear structural equation model
- Current philosophical perspectives on drug approval in the real world
- Foundations of causal discovery on groups of variables
- Improved sensitivity bounds for mediation under unmeasured mediator–outcome confounding
- Potential outcomes and decision-theoretic foundations for statistical causality: Response to Richardson and Robins
- Quantifying the quality of configurational causal models
- Design-based RCT estimators and central limit theorems for baseline subgroup and related analyses
- An optimal transport approach to estimating causal effects via nonlinear difference-in-differences
- Estimation of network treatment effects with non-ignorable missing confounders
- Double machine learning and design in batch adaptive experiments
- The functional average treatment effect
- An approach to nonparametric inference on the causal dose–response function
- Review Article
- Comparison of open-source software for producing directed acyclic graphs
- Special Issue on Neyman (1923) and its influences on causal inference
- Optimal allocation of sample size for randomization-based inference from 2K factorial designs
- Direct, indirect, and interaction effects based on principal stratification with a binary mediator
- Interactive identification of individuals with positive treatment effect while controlling false discoveries
- Neyman meets causal machine learning: Experimental evaluation of individualized treatment rules
- From urn models to box models: Making Neyman's (1923) insights accessible
- Prospective and retrospective causal inferences based on the potential outcome framework
- Causal inference with textual data: A quasi-experimental design assessing the association between author metadata and acceptance among ICLR submissions from 2017 to 2022
- Some theoretical foundations for the design and analysis of randomized experiments