Abstract
There is a growing literature on design-based (DB) methods to estimate average treatment effects (ATEs) for randomized controlled trials (RCTs) for full sample analyses. This article extends these methods to estimate ATEs for discrete subgroups defined by pre-treatment variables, with an application to an RCT testing subgroup effects for a school voucher experiment in New York City. We consider ratio estimators for subgroup effects using regression methods, allowing for model covariates to improve precision, and prove a new finite population central limit theorem. We discuss extensions to blocked and clustered RCT designs, and to other common estimators with random treatment-control sample sizes or summed weights: post-stratification estimators, weighted estimators that adjust for data nonresponse, and estimators for Bernoulli trials. We also develop simple variance estimators that share features with robust estimators. Simulations show that the DB subgroup estimators yield confidence interval coverage near nominal levels, even for small subgroups.
1 Introduction
There is a growing literature on design-based (DB) methods to estimate overall average treatment effects (ATEs) for randomized controlled trials (RCTs). These nonparametric methods use the building blocks of experimental designs to generate consistent, asymptotically normal ATE estimators with minimal assumptions. The underpinnings of these methods were introduced by Neyman [1] and later developed in seminal works by Rubin [2,3] and Holland [4] using a potential outcomes framework.
To date, the DB literature has focused on ATE estimation for full sample analyses. In this article, we build on these methods to develop ATE estimators for discrete subgroups defined by pre-treatment (baseline) characteristics of study participants. Subgroup analyses for RCTs are common across fields as they can be used to assess treatment effect heterogeneity and inform decisions about how to best target and improve treatments [5,6]. Guidelines for the planning, analysis, and reporting of RCT subgroup analyses have been proposed in the literature to ensure statistical rigor, such as approaches to reduce the chances of finding spurious positive effects due to multiple testing [5,7,8].
As a motivating example, consider the evaluation of the New York City (NYC) School Choice Scholarships Program, an RCT where low-income public school students in grades K–4 could participate in a series of lotteries to receive a private school voucher for up to 3 years [9,10]. A subgroup analysis was pre-specified for the study to examine differences in voucher effects for African-American and Latino students. The hypothesis was that African Americans might benefit more from the vouchers as they tended to live in poorer communities and attend lower-performing public schools.
Several key aspects of this subgroup analysis motivate the theory underlying this article. First, the study sample was not randomly sampled from a broader population. Rather, the sample included only a very small percentage of NYC families who applied for a scholarship. Thus, the study results cannot be generalized to a broader voucher program that would involve all children in NYC or elsewhere. This setting suggests a finite population framework for estimating ATEs where the sample and their potential outcomes are considered fixed, and study results are assumed to pertain to the study sample only. This is a common RCT setting across disciplines that often include volunteer samples of individuals and sites.
Second, the estimation strategy should allow for the inclusion of model baseline covariates to improve precision as power is often a concern for subgroup analyses due to small sample sizes. Third, the voucher study conducted randomization within strata, suggesting the need for a theory for blocked RCTs. Fourth, the study randomized families rather than students, suggesting a further need to consider a theory for clustered RCTs that are becoming increasingly prevalent across fields [11,12]. Finally, the study constructed weights to adjust for missing outcome data, a common strategy for RCT analyses that should be covered in the theory.
This article addresses these issues by developing DB ATE ratio estimators for subgroup-related analyses using regression models that allow for baseline covariates. We focus on ratio estimators due to the randomness of subgroup sizes in the treatment and control groups. We prove a new finite population central limit theorem (CLT) by building on the methods reported by Pashley [13] and Schochet et al. [14]. We also discuss extensions to blocked and clustered RCTs, and to other common estimators with random sample sizes or summed weights: post-stratification estimators, weighted estimators that adjust for data nonresponse, and estimators for Bernoulli trials (BTs). We provide consistent variance estimators that are compared to commonly used robust standard errors (SEs). Our simulations show that the DB subgroup ATE estimators yield confidence interval coverage near nominal levels, even for small subgroups. Finally, we demonstrate the methods using data from our motivating NYC voucher experiment.
The rest of this article proceeds as follows. Section 2 discusses the related literature. Section 3 provides the theoretical framework, ATE estimators and CLT results for the non-clustered RCT, and extensions. Section 4 discusses blocked and clustered RCTs. Section 5 presents simulation results, and Section 6 presents empirical results using the NYC voucher study. Section 7 concludes.
2 Related work
Our work builds on the growing literature on DB methods to estimate ATEs for full sample analyses [14–23]. These methods also pertain to subgroup analyses conditional on subgroup sizes observed in the treatment and control groups [21], but not to unconditional analyses that average over subgroup allocations.
Our work draws most directly on two studies. First, we draw on methods in Schochet et al. [14] who provide finite population CLTs for ratio estimators for blocked, clustered RCTs with general weights (using previous results in Scott and Wu [24], Li and Ding [23], and Pashley [13]). Our innovation is to adapt these methods by treating subgroup indicators as “weights” in the analysis. Second, we draw on results from the study by Miratrix et al. [25] who considered DB post-stratification estimators for overall effects, which share properties with baseline subgroup estimators. Miratrix et al. [25], however, do not consider asymptotic distributions, blocked or clustered RCT designs, the inclusion of other model covariates, or weights considered here.
Finally, there is a large statistical literature on DB methods for analyzing survey data with complex sample designs, including for estimating subpopulation means or totals [26–28]. However, these works do not consider RCT settings for estimating treatment-control differences in subpopulation means.
In what follows, we focus on the non-clustered RCT design without blocking and extensions to related estimators. We then discuss blocked and clustered designs.
3 DB subgroup analysis for non-clustered RCTs
We assume an RCT of
Let
For the subgroup analysis, we assume each sample member is allocated to a discrete category within a subgroup class with
We assume two conditions. The first is the stable unit treatment value assumption (SUTVA) [29]:
(C1): SUTVA: Let
SUTVA allows us to express
Under SUTVA, the ATE parameter for subgroup
which is the mean treatment effect for members of subgroup
Our second condition is complete randomization [21], where extensions to BTs are discussed in Section 3.3:
(C2): Complete randomization: For fixed
This condition implies that potential outcomes are independent of treatment status,
3.1 ATE estimators
Under the potential outcomes framework and SUTVA, the data generating process for the observed outcome measure,
This relation states that we can observe
Rearranging (2) generates the following nominal full sample regression model:
where
We center the treatment indicator in (3) to facilitate the theory without changing the estimator.
In contrast to usual formulations of the regression model, the residual,
The model in (3) also applies to each subgroup due to randomization. Thus, if we combine each subgroup model using the
where
Consider the ordinary least squares (OLS) differences-in-mean estimator for
where
The finite population CLT in Theorem 4 in the study by Li and Ding [23] applies to
Our CLT is provided in Section 3.2 for a more general covariate-adjusted estimator from a working model that includes in (4) a
where
We focus on the pooled covariate model in (6) because it is commonly used in practice. In Section 3.3, we discuss extensions to models that interact
Using OLS to estimate the working model in (6) yields the following covariate-adjusted estimator for
where
Our DB theory is conditional on randomizations that yield
3.2 Main CLT result
To consider the asymptotic properties of
Our CLT builds on Schochet et al. [14] who provided CLTs for RCT ratio estimators with general weights for clustered, blocked designs. We adapt these methods to our setting by treating the subgroup indicators,
Before presenting our CLT, we need to define several terms. First, for
where
We now present our CLT theorem, proved in Supplementary Materials S1.
Theorem 1
Assume (C1), (C2), and the following conditions for
(C3) Letting
(C4)
(C5) The subgroup shares,
(C6) As
(C7) Letting
(C8)
Then, as
where
Remark 1. The
where
Remark 2. The first two terms in (9) pertain to separate variances for the two research groups because we allow for heterogeneous treatment effects. The third term pertains to the treatment-control covariance,
Remark 3. (C3) and (C7) are Lindeberg-type conditions from Li and Ding [23] that control the tails of the potential outcome and covariate distributions. (C6) yields a weak law of large numbers for the observed subgroup shares so that
Remark 4. Theorem 1 is proved in two stages by expressing the ATE estimator in (7) as,
where
Remark 5. Under (C1)–(C6), Theorem 1 also applies to
where
The following corollary to Theorem 1, proved in Supplement S1, provides the joint asymptotic distribution of the subgroup estimators,
Corollary 1
Under the conditions of Theorem 1, as
This corollary is important for real-world applications because it supports the use of standard F-tests (or chi-square tests) to test the null hypothesis of equal subgroup effects.
3.3 Extensions to related estimators
This section outlines extensions of our CLT result to post-stratification estimators, models that interact
Post-stratification estimators. Miratrix et al. [25] considered variance estimation for a DB post-stratification ATE estimator that obtains overall effects for the model without covariates by averaging
Interacted models. Theorem 1 can be extended to a model that replaces
BTs. Our results also extend to BTs where each sample member is independently randomized to the treatment group with probability
Second, we can adapt the CLT in Theorem 1 to BTs by using expected rather than actual sample sizes in the theorem conditions (i.e., by replacing
Data nonresponse weights. Theorem 1 also applies, with additional assumptions, to a “subgroup” analysis that adjusts for missing outcome data using respondents only with nonresponse weights,
We invoke two missing data assumptions for each subgroup: (i) data are missing at random for each research group conditional on covariates [34]:
Accordingly, we can apply Theorem 1, assuming known weights, where the respondent sample and weighted least squares are used to obtain the ATE estimator,
3.4 Variance estimation
To obtain consistent variance estimators for (9) (and model variants), we can either use expected subgroup sizes,

(a) Probabilities for the differences,
Further, Figure 1b shows that for small
To further examine the pattern of the SE ratios in Figure 1b, suppose first that
Using expected sizes, a consistent (upper bound) plug-in variance estimator for (9) based on estimated subgroup regression residuals is as follows:
where
and
Here we set
As shown in Supplement S3, (12) is asymptotically equivalent to the robust Huber–White (HW) variance estimator [39,40], as has been shown for full sample estimators [16,20,41]. In finite samples, however, the DB variances will typically be larger for the model without covariates due to larger df corrections. We compare the two estimators in our simulations, along with other SE variants.
4 Blocked and clustered designs
The above CLT results extend directly to blocked RCTs where randomization is performed separately within strata (e.g., sites, demographic groups, or time cohorts), and to clustered RCTs where groups (e.g., schools, hospitals, or communities) are randomized rather than individuals.
4.1 Blocked RCTs
In blocked designs, the sample is first divided into subpopulations, and a mini-experiment is conducted in each one. Note that we do not consider blocks formed by subgroups slated for ATE estimation as the theory for the full sample analysis applies in this case (as
For the blocked design, we use similar notation as above with the addition of the subscript
With this notation, we can now define the ATE estimand for blocks containing members of subgroup
where
Consider OLS estimation of the following extension of (6) to blocked RCTs:
where
where
Because a mini-experiment is conducted in each block, we can apply Theorem 1 to
Theorem 2
Assume (C1)–(C4) and (C6)–(C8) for each included block, and the following conditions for
(C4a) The block shares,
(C5a) The subgroup shares,
Then, as
where
The proof (not shown) parallels the one for Theorem 1, applied to each block, by redefining the residual as,
A variance estimator for
Next we provide a corollary to Theorem 2 on the pooled subgroup estimator across blocks,
Corollary 2
Under the conditions of Theorem 2, as
where
This result follows because the
Finally, a future research topic is to develop a CLT for a restricted model that controls for block main effects but excludes block-by-treatment interactions. An example of such a model is to replace the first set of interactions in (14) with
4.2 Clustered RCTs
In clustered RCTs, groups rather than individuals are the unit of randomization. Consider a clustered, non-blocked RCT with
We index clusters by
Consider an individual-level subgroup (
where
Applying OLS to (6) with clustered data yields the following subgroup ATE estimator:
where
We see that (17) is a ratio estimator because
Finally, Supplement S3.3 outlines cluster RCT results for a subgroup analysis defined by a cluster-level characteristic (
5 Simulation analysis
We conducted simulations to examine the finite sample statistical properties of our DB subgroup ATE estimators. The focus is on the non-clustered RCT because prior full sample simulation results for the clustered RCT also pertain to individual-level subgroup analyses [14,45], as discussed above. For the simulations, we applied the variance estimator in (12) using expected and actual subgroup sizes, for models with and without covariates. We set
5.1 Simulation setup
The following model was used to generate potential outcomes for
where
We generated five draws of potential outcomes using (18) to help guard against unusual draws and report average results. For each draw, we conducted 10,000 replications, randomly assigning units to either the treatment or control group using
In Supplement S4, we discuss variants of (12) used in our simulations. These include applying the df correction for hypothesis testing in Bell and McCaffrey [46]; subtracting a lower bound on the
5.2 Simulation results
Table 1 and Supplement Tables S1–S4 present the simulation results. Of the 300,000 draws of
Simulation results for the subgroup ATE estimators
Model specification | Bias of ATE estimatora | Confidence interval coverage | True SEa,b | Mean estimated SE |
---|---|---|---|---|
Model without covariates | ||||
Sample size:
|
||||
Design-based (DB), actual subgroup sizes,
|
−0.002 | 0.954 | 0.646 | 0.640 |
DB, expected sizes,
|
−0.002 | 0.952 | 0.646 | 0.633 |
DB, actual sizes, adjust for
|
−0.002 | 0.948 | 0.646 | 0.621 |
HW | −0.002 | 0.953 | 0.646 | 0.639 |
Sample size:
|
||||
DB, actual sizes,
|
0.000 | 0.958 | 0.376 | 0.385 |
DB, expected sizes,
|
0.000 | 0.957 | 0.376 | 0.383 |
DB, actual sizes, adjust for
|
0.000 | 0.956 | 0.376 | 0.381 |
HW | 0.000 | 0.958 | 0.376 | 0.385 |
Sample size:
|
||||
DB, actual sizes,
|
0.000 | 0.953 | 0.626 | 0.628 |
DB, expected sizes,
|
0.000 | 0.951 | 0.626 | 0.618 |
DB, actual sizes, adjust for
|
0.000 | 0.948 | 0.626 | 0.613 |
HW | 0.000 | 0.948 | 0.626 | 0.613 |
Model with two covariates | ||||
Sample size:
|
||||
DB, actual sizes,
|
0.002 | 0.950 | 0.501 | 0.482 |
DB, expected sizes,
|
0.002 | 0.948 | 0.501 | 0.476 |
DB, actual sizes, adjust for
|
0.002 | 0.944 | 0.501 | 0.467 |
HW | 0.002 | 0.953 | 0.501 | 0.494 |
Sample size:
|
||||
DB, actual sizes,
|
0.000 | 0.961 | 0.298 | 0.305 |
DB, expected sizes,
|
0.000 | 0.960 | 0.298 | 0.303 |
DB, actual sizes, adjust for
|
0.000 | 0.959 | 0.298 | 0.302 |
HW | 0.000 | 0.962 | 0.298 | 0.308 |
Sample size:
|
||||
DB, actual sizes,
|
0.000 | 0.956 | 0.486 | 0.482 |
DB, expected sizes,
|
0.000 | 0.954 | 0.486 | 0.475 |
DB, actual sizes, adjust for
|
0.000 | 0.951 | 0.486 | 0.470 |
HW | 0.000 | 0.952 | 0.486 | 0.470 |
Note: See text for simulation details. The calculations assume two subgroups with a focus on results for Subgroup 1, a treatment assignment rate of
ATE = Average treatment effect; DB = Design-based; HW = Hubert–White.
aBiases and true SEs are the same for all specifications within each sample size category because they use the same data and OLS model for ATE estimation.
bTrue SEs are measured as the standard deviation of the estimated treatment effects across simulations.
Estimated SEs are close to “true” values, as measured by the standard deviation of the ATE estimates across replications. Consistent with the theory on SE ratios in Section 3.4, the SEs are slightly larger using actual subgroup sizes than expected ones, leading to narrower confidence interval coverage using the expected sizes. Also consistent with the theory, the SEs are slightly smaller for the HW estimator for the model without covariates, and for specifications that adjust for
6 Empirical application using the motivating NYC voucher experiment
To demonstrate our DB subgroup ATE estimators, we used baseline and outcome data from the NYC School Choice Scholarships Foundation Program (SCSF) [9]. SCSF was funded by philanthropists to provide scholarships to public school students in grades K–4 from low-income families to attend any participating NYC private school. In spring 1997, more than 20,000 students applied to receive a voucher. SCSF then used random lotteries to offer 3-year vouchers of up to $1,400 annually to 1,000 eligible families in the treatment group. Of the remaining families not offered the voucher, 960 were randomly selected to the control group.
SCSF assisted the treatment group in finding private-school placements. More than 78% of treatment families used a voucher, for 2.6 years on average, where 98% of users attended parochial schools. Here we focus on estimating ATEs (i.e., intention-to-treat effects on the voucher offer) for two race/ethnicity subgroups as defined in the original study [9]: African Americans and Latinos each of who comprise about 47% of the sample. The study authors hypothesized that African Americans might benefit more from the vouchers as they tended to live in more disadvantaged communities with lower-performing public schools.
Following the original study [9], the primary outcomes for our analysis are composite national percentile rankings in math and reading from the study-administered Iowa Test of Basic Skills (ITBS). We focus on first follow-up year test scores, where the response rate was 78% for treatments and 71% for controls. Our goal is not to replicate study results but to illustrate our subgroup ATE estimators.
The voucher study was a blocked RCT. Applicants from schools with average test scores below the city median were assigned a higher probability of winning a scholarship, and blocks were also formed by lottery date and family size (with 30 blocks in total). The design is also partly clustered because families were randomized, where all eligible children within a family could receive a scholarship; 30% of families had at least two children in the evaluation.
We used (14) for ATE estimation and (12) for variance estimation for each block, where blocks were weighted by their subgroup sizes to obtain the overall subgroup effects. To adjust for clustering, we averaged data to the family level. Following [9], we used weights to adjust for missing follow-up test scores. We ran models without covariates and those that included baseline ITBS scores to increase precision, though they were not collected for the entire kindergarten cohort. Following the original study, other demographic covariates were not included in the models due to the large number of blocks.
Table 2 presents the subgroup findings that mirror those from the original study. We find that the offer of a voucher had no effect on test scores overall or for Latinos across specifications. The effects on African Americans are also not statistically significant at the 5% level for the model without baseline test scores. However, the effects on African Americans become positive and statistically significant for the model with baseline scores, that excludes the kindergarteners but nonetheless yields SEs that are reduced by about 12%. These effects are 4.7 percentile ranking points, which translates into a 0.26 standard deviation increase, with a significant F-test for the subgroup interaction effect (p-value = 0.028). The effects for African Americans remain significant using the sample with baseline test scores without controlling for them in the model.
Estimated ATEs on composite test scores for the NYC voucher experiment
Model specification | Overall sample | African American | Latino |
---|---|---|---|
Model excludes baseline test scores | |||
DB, actual subgroup sizes | 0.25 (1.06) | 2.54 (1.45) | −0.86 (1.58) |
DB, expected subgroup sizes | 0.25 (1.06) | 2.54 (1.45) | −0.86 (1.58) |
HW | 0.25 (1.03) | 2.54 (1.42) | −0.86 (1.49) |
DB: actual sizes using sample with baseline test scores | 0.88 (1.30) | 4.47* (1.73) | −1.11 (1.76) |
Model includes baseline test scores | |||
DB, actual subgroup sizes | 1.70 (1.01) | 4.70* (1.27) | 0.50 (1.44) |
DB, expected subgroup sizes | 1.70 (1.01) | 4.70* (1.27) | 0.50 (1.44) |
HW | 1.70 (0.98) | 4.70* (1.24) | 0.50 (1.39) |
Student sample size (without/with baseline test scores) | 2,012/1,434 | 902/643 | 964/682 |
Note: SEs are in parentheses. See text for ATE and SE formulas. All estimates are weighted to adjust for follow-up test score nonresponse.
ATE = Average treatment effect.
*Statistically significant at the 5% level, two-tailed test.
We find across specifications that the DB SEs are nearly identical using actual and expected sample sizes. Further, consistent with theory, the DB SEs are slightly larger than the HW SEs, but both yield the same study conclusions: the vouchers did not improve test scores overall, but there is evidence they had a positive effect on African American students in grades 1–4. A detailed reanalysis of the original study data, however, cautions that the results for African Americans are sensitive to alternative race/ethnicity definitions and should be interpreted carefully [10].
7 Conclusion
This article considered DB RCT methods for ATE estimation for discrete subgroups defined by pre-treatment sample characteristics. Our subgroup estimators derive from the Neyman–Rubin–Holland model that underlies experiments and are based on simple least squares regression methods. We considered ratio estimators due to the randomness of observed subgroup sample sizes in the treatment and control groups that were not conditioned on for the asymptotic analysis. The DB approach is appealing in that it applies to continuous, binary, and discrete outcomes, and is nonparametric in that makes no assumptions about the distribution of potential outcomes or the model functional form.
We developed a new finite population, unconditional CLT for our subgroup ATE estimators under the non-clustered RCT, allowing for baseline covariates to improve precision. The main difference between our CLT and prior full sample ones is that the asymptotic variance for the subgroup estimator is based on expected subgroup sizes rather than actual ones. Another difference is that the subgroup variance includes a finite sample adjustment (
A contribution of this work is that it provides a unified DB framework for subgroup analyses across a range of RCT designs. We discussed extensions of the asymptotic theory to blocked and clustered designs. We also discussed extensions to other commonly used estimators with random treatment-control sample sizes or summed weights: post-stratification estimators that average subgroup estimators to obtain overall effects, weighted estimators to adjust for data nonresponse, and estimators from BTs.
Our simulations for the non-clustered RCT show that the subgroup ATE estimators yield low bias and confidence interval coverage near nominal levels, although with slight over-coverage. This is somewhat surprising as the simulation literature on DB and robust variance estimators for clustered RCTs – that also applies to the subgroup context – shows the opposite issue of under-coverage [14,45].
Our simulations find very similar results using either actual or expected subgroup sample sizes for variance estimation. As demonstrated in several ways, this occurs because the difference between the observed subgroup proportions,
The free RCT-YES software (www.rct-yes.com), funded by the U.S. Department of Education, estimates ATEs for both full sample and baseline subgroup analyses using the DB methods discussed in this article using either R or Stata. The software applies actual sample sizes for the variance formulas for subgroup analyses and allows for general weights. The software also allows for multi-armed trials with multiple treatment condition.
Acknowledgements
The author would like to thank the two reviewers for very helpful suggestions and comments.
-
Funding information: Author states no funding involved.
-
Author contribution: The author confirms the sole responsibility for the conception of the study, presented results and manuscript preparation.
-
Conflict of interest: Author states no conflict of interest.
-
Data availability statement: The NYC Voucher data for the empirical analysis were obtained under a restricted data use license agreement with Mathematica. Per license requirements, these data cannot be shared with journal readers. However, to the best of my knowledge, these data can be obtained, and I would be happy to provide the SAS and R programs used for the analysis.
References
[1] Neyman J. On the application of probability theory to agricultural experiments: essay on principles. Sect 9, Translated Stat Sci. 1990;5:465–72.10.1214/ss/1177012031Search in Google Scholar
[2] Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol. 1974;66:688–701.10.1037/h0037350Search in Google Scholar
[3] Rubin DB. Assignment to treatment group on the basis of a covariate. J Educ Stat. 1977;2:1–26.10.3102/10769986002001001Search in Google Scholar
[4] Holland PW. Statistics and causal inference. J Am Stat Assoc. 1986;81:945–60.10.1080/01621459.1986.10478354Search in Google Scholar
[5] Rothwell PM. Subgroup analyses in randomized controlled trials: importance, indications, and interpretation. Lancet. 2005;365:176–86.10.1016/S0140-6736(05)17709-5Search in Google Scholar PubMed
[6] Schochet PZ, Puma M, Deke J. Understanding variation in treatment effects in education impact evaluations: an overview of quantitative methods (NCEE 2014-4017). Washington, DC: National Center for Education Evaluation and Regional Assistance; 2014.Search in Google Scholar
[7] Wang R, Lagakos SW, Ware JH, Hunter DJ, Drazen JM. Statistics in medicine-reporting of subgroup analyses in clinical trials. N Engl J Med. 2007;357(21):2189–94.10.1056/NEJMsr077003Search in Google Scholar PubMed
[8] Schulz KF, Altman DG, Moher D. CONSORT 2010 statement: updated guidelines for reporting parallel group randomised trials. J Pharmacol Pharmacother. 2019;1:100–7.10.4103/0976-500X.72352Search in Google Scholar PubMed PubMed Central
[9] Mayer D, Peterson P, Myers D, Tuttle C, Howell W. School choice in New York City: an evaluation of the school choice scholarships program. Mathematica Policy Research, Washington, DC; 2002.Search in Google Scholar
[10] Krueger AB, Zhu P. Another look at the New York City school voucher experiment. Am Behav Scientist 47(5):658–98.10.1177/0002764203260152Search in Google Scholar
[11] Bland JM. Cluster randomised trials in the medical literature: two bibliometric surveys. BMC Med Res Methodol. 2004;4:21.10.1186/1471-2288-4-21Search in Google Scholar PubMed PubMed Central
[12] Schochet PZ. Statistical power for random assignment evaluations of education programs. J Educ Behav Stat. 2008;33:62–87.10.3102/1076998607302714Search in Google Scholar
[13] Pashley NE. Note on the delta method for finite population inference with applications to causal inference. Working Paper: Harvard University Statistics Department, Cambridge MA; 2019.Search in Google Scholar
[14] Schochet PZ, Pashley NE, Miratrix LW, Kautz T. Design-based ratio estimators and central limit theorems for clustered, blocked RCTs. J Am Stat Assoc. 2022;117(540):2135–46.10.1080/01621459.2021.1906685Search in Google Scholar
[15] Yang L, Tsiatis A. Efficiency study of estimators for a treatment effect in a pretest-posttest trial. Am Statistician. 2001;55:314–21.10.1198/000313001753272466Search in Google Scholar
[16] Freedman D. On regression adjustments to experimental data. Adv Appl Math. 2008;40:180–93.10.1016/j.aam.2006.12.003Search in Google Scholar
[17] Schochet PZ. Is regression adjustment supported by the Neyman model for causal inference? J Stat Plan Inference. 2010;140:246–59.10.1016/j.jspi.2009.07.008Search in Google Scholar
[18] Schochet PZ. Statistical theory for the RCT-YES software: design-based causal inference for RCTs: second edition (NCEE 2016–4011). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education; 2016.Search in Google Scholar
[19] Aronow PM, Middleton JA. A class of unbiased estimators of the average treatment effect in randomized experiments. J Causal Inference. 2013;1:135–54.10.1515/jci-2012-0009Search in Google Scholar
[20] Lin W. Agnostic notes on regression adjustments to experimental data: reexamining Freedman’s critique. Ann Appl Stat. 2013;7:295–318.10.1214/12-AOAS583Search in Google Scholar
[21] Imbens G, Rubin D. Causal inference for statistics, social, and biomedical sciences: an introduction. Cambridge, UK: Cambridge University Press; 2015.10.1017/CBO9781139025751Search in Google Scholar
[22] Middleton JA, Aronow PM. Unbiased estimation of the average treatment effect in cluster-randomized experiments. Statistics Politics Policy. 2015;6:39–75.10.1515/spp-2013-0002Search in Google Scholar
[23] Li X, Ding P. General forms of finite population central limit theorems with applications to causal inference. J Am Stat Assoc. 2017;112:1759–69.10.1080/01621459.2017.1295865Search in Google Scholar
[24] Scott A, Wu CF. On the asymptotic distribution of ratio and regression estimators. J Am Stat Assoc. 1981;1981(112):1759–69.10.1080/01621459.1981.10477612Search in Google Scholar
[25] Miratrix LW, Sekhon JS, Yu B. Adjusting treatment effect estimates by post-stratification in randomized experiments. J R Stat Soc Ser B. 2013;75(2):369–96.10.1111/j.1467-9868.2012.01048.xSearch in Google Scholar
[26] Cochran W. Sampling techniques. New York: John Wiley and Sons; 1977.Search in Google Scholar
[27] Lohr SL. Sampling: design and analysis. 2nd edn. Pacific Grove, CA: Duxbury Press; 2009.Search in Google Scholar
[28] Thompson S. Sampling. Hoboken, NJ: John Wiley & Sons; 2012.Search in Google Scholar
[29] Rubin DB. Which ifs have causal answers? Discussion of Holland’s “statistics and causal inference”. J Am Stat Assoc. 1986;81:961–2.10.1080/01621459.1986.10478355Search in Google Scholar
[30] Fraser DAS. Ancillaries and conditional inference. Stat Sci. 2004;19(2):333–69.10.1214/088342304000000323Search in Google Scholar
[31] Aronow PM, Green DP, Lee DKK. Sharp bounds on the variance in randomized experiments. Ann Stat. 2014;42:850–71.10.1214/13-AOS1200Search in Google Scholar
[32] Wright T. On some properties of variable size simple random sampling and a limit theorem. Commun Stat Theory Methods. 1988;17(9):2997–3016.10.1080/03610928808829785Search in Google Scholar
[33] Rosenbaum P, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41–55.10.1093/biomet/70.1.41Search in Google Scholar
[34] Rubin DB. Multiple imputation for nonresponse in surveys. NY: J. Wiley and Sons; 1987.10.1002/9780470316696Search in Google Scholar
[35] Horvitz DG, Thompson DJ. A generalization of sampling without replacement from a finite universe. J Am Stat Assoc. 1952;47:663–85.10.1080/01621459.1952.10483446Search in Google Scholar
[36] Lunceford JK, Davidian M. Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study. Stat Med. 2004;23(19):2937–60.10.1002/sim.1903Search in Google Scholar PubMed
[37] Serfling RJ. Probability inequalities for the sum in sampling without replacement. Ann Stat. 1974;2:39–48.10.1214/aos/1176342611Search in Google Scholar
[38] Greene E, Wellner JA. Exponential bounds for the hypergeometric distribution. Bernoulli. 2017;23(3):1911–50.10.3150/15-BEJ800Search in Google Scholar PubMed PubMed Central
[39] Huber PJ. The behavior of maximum likelihood estimates under nonstandard conditions. Proced Fifth Berkeley Symp Math Stat Probability. 1967;1:221–33.Search in Google Scholar
[40] White H. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica. 1980;1980(48):817–38.10.2307/1912934Search in Google Scholar
[41] Su F, Ding P. Model-assisted analyses of cluster-randomized experiments. J R Stat Soc Ser B. 2021;83(5):994–1015.10.1111/rssb.12468Search in Google Scholar
[42] Pashley NE, Miratrix LW. Insights on variance estimation for blocked and matched pairs designs. J Educ Behav Stat. 2021;46(3):271–96.10.3102/1076998620946272Search in Google Scholar
[43] Liu H, Yang Y. Regression-adjusted average treatment effect estimates in stratified randomized experiments. Biometrika. 2020;107(4):935–48.10.1093/biomet/asaa038Search in Google Scholar
[44] Liang K, Zeger S. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22.10.1093/biomet/73.1.13Search in Google Scholar
[45] Cameron AC, Miller DL. A practitioner’s guide to cluster-robust inference. J Hum Resour. 2015;50:317–72.10.3368/jhr.50.2.317Search in Google Scholar
[46] Bell R, McCaffrey D. Bias reduction in standard errors for linear regression with multi-stage samples. Surv Methodol. 2002;28:169–81.Search in Google Scholar
© 2024 the author(s), published by De Gruyter
This work is licensed under the Creative Commons Attribution 4.0 International License.
Articles in the same Issue
- Research Articles
- Evaluating Boolean relationships in Configurational Comparative Methods
- Doubly weighted M-estimation for nonrandom assignment and missing outcomes
- Regression(s) discontinuity: Using bootstrap aggregation to yield estimates of RD treatment effects
- Energy balancing of covariate distributions
- A phenomenological account for causality in terms of elementary actions
- Nonparametric estimation of conditional incremental effects
- Conditional generative adversarial networks for individualized causal mediation analysis
- Mediation analyses for the effect of antibodies in vaccination
- Sharp bounds for causal effects based on Ding and VanderWeele's sensitivity parameters
- Detecting treatment interference under K-nearest-neighbors interference
- Bias formulas for violations of proximal identification assumptions in a linear structural equation model
- Current philosophical perspectives on drug approval in the real world
- Foundations of causal discovery on groups of variables
- Improved sensitivity bounds for mediation under unmeasured mediator–outcome confounding
- Potential outcomes and decision-theoretic foundations for statistical causality: Response to Richardson and Robins
- Quantifying the quality of configurational causal models
- Design-based RCT estimators and central limit theorems for baseline subgroup and related analyses
- An optimal transport approach to estimating causal effects via nonlinear difference-in-differences
- Estimation of network treatment effects with non-ignorable missing confounders
- Double machine learning and design in batch adaptive experiments
- The functional average treatment effect
- An approach to nonparametric inference on the causal dose–response function
- Review Article
- Comparison of open-source software for producing directed acyclic graphs
- Special Issue on Neyman (1923) and its influences on causal inference
- Optimal allocation of sample size for randomization-based inference from 2K factorial designs
- Direct, indirect, and interaction effects based on principal stratification with a binary mediator
- Interactive identification of individuals with positive treatment effect while controlling false discoveries
- Neyman meets causal machine learning: Experimental evaluation of individualized treatment rules
- From urn models to box models: Making Neyman's (1923) insights accessible
- Prospective and retrospective causal inferences based on the potential outcome framework
- Causal inference with textual data: A quasi-experimental design assessing the association between author metadata and acceptance among ICLR submissions from 2017 to 2022
- Some theoretical foundations for the design and analysis of randomized experiments
Articles in the same Issue
- Research Articles
- Evaluating Boolean relationships in Configurational Comparative Methods
- Doubly weighted M-estimation for nonrandom assignment and missing outcomes
- Regression(s) discontinuity: Using bootstrap aggregation to yield estimates of RD treatment effects
- Energy balancing of covariate distributions
- A phenomenological account for causality in terms of elementary actions
- Nonparametric estimation of conditional incremental effects
- Conditional generative adversarial networks for individualized causal mediation analysis
- Mediation analyses for the effect of antibodies in vaccination
- Sharp bounds for causal effects based on Ding and VanderWeele's sensitivity parameters
- Detecting treatment interference under K-nearest-neighbors interference
- Bias formulas for violations of proximal identification assumptions in a linear structural equation model
- Current philosophical perspectives on drug approval in the real world
- Foundations of causal discovery on groups of variables
- Improved sensitivity bounds for mediation under unmeasured mediator–outcome confounding
- Potential outcomes and decision-theoretic foundations for statistical causality: Response to Richardson and Robins
- Quantifying the quality of configurational causal models
- Design-based RCT estimators and central limit theorems for baseline subgroup and related analyses
- An optimal transport approach to estimating causal effects via nonlinear difference-in-differences
- Estimation of network treatment effects with non-ignorable missing confounders
- Double machine learning and design in batch adaptive experiments
- The functional average treatment effect
- An approach to nonparametric inference on the causal dose–response function
- Review Article
- Comparison of open-source software for producing directed acyclic graphs
- Special Issue on Neyman (1923) and its influences on causal inference
- Optimal allocation of sample size for randomization-based inference from 2K factorial designs
- Direct, indirect, and interaction effects based on principal stratification with a binary mediator
- Interactive identification of individuals with positive treatment effect while controlling false discoveries
- Neyman meets causal machine learning: Experimental evaluation of individualized treatment rules
- From urn models to box models: Making Neyman's (1923) insights accessible
- Prospective and retrospective causal inferences based on the potential outcome framework
- Causal inference with textual data: A quasi-experimental design assessing the association between author metadata and acceptance among ICLR submissions from 2017 to 2022
- Some theoretical foundations for the design and analysis of randomized experiments