Exploiting neighborhood interference with low-order interactions under unit randomized design

Mayleen Cortez-Rodriguez; Matthew Eichhorn; Christina Lee Yu

doi:10.1515/jci-2022-0051

Article Open Access

Exploiting neighborhood interference with low-order interactions under unit randomized design

Mayleen Cortez-Rodriguez , Matthew Eichhorn and Christina Lee Yu

Published/Copyright: August 3, 2023

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Journal of Causal Inference Volume 11 Issue 1

Abstract

Network interference, where the outcome of an individual is affected by the treatment assignment of those in their social network, is pervasive in real-world settings. However, it poses a challenge to estimating causal effects. We consider the task of estimating the total treatment effect (TTE), or the difference between the average outcomes of the population when everyone is treated versus when no one is, under network interference. Under a Bernoulli randomized design, we provide an unbiased estimator for the TTE when network interference effects are constrained to low-order interactions among neighbors of an individual. We make no assumptions on the graph other than bounded degree, allowing for well-connected networks that may not be easily clustered. We derive a bound on the variance of our estimator and show in simulated experiments that it performs well compared with standard estimators for the TTE. We also derive a minimax lower bound on the mean squared error of our estimator, which suggests that the difficulty of estimation can be characterized by the degree of interactions in the potential outcomes model. We also prove that our estimator is asymptotically normal under boundedness conditions on the network degree and potential outcomes model. Central to our contribution is a new framework for balancing model flexibility and statistical complexity as captured by this low-order interactions structure.

Keywords: causal inference; total treatment effect; neighborhood interference; design of experiments

MSC 2010: 62K99; 91D30; 60F05

1 Introduction

Accurately estimating causal effects is relevant in numerous applications, from pharmaceutical companies researching the efficacy of a new medication, to policy makers understanding the impact of social welfare programs, to social media companies evaluating the impact of different recommendation algorithms on user engagement across their platforms. Often, the entity interested in understanding a causal effect will design an experiment where they randomly assign subsets of the population to treatment (e.g., new medication) and to control (e.g., a placebo) and draw conclusions based on the observed outcomes of the participants (e.g., health outcomes). Other times, the entity may need to determine causal effects from observational data accrued in a previous study where they did not have full control over the treatment assignment mechanism.

Our work focuses on estimating the total treatment effect (TTE), or the difference between the average outcomes of the population when everyone is treated versus when no one is treated, given data collected from a randomized experiment. This estimand is sometimes referred to as the global average treatment effect, as in ref. [1]. The TTE is a quantity of interest in scenarios where the decision maker must choose between adopting the proposed treatment or sticking with the status quo. For example, a social media company may develop a new recommendation algorithm for suggesting content to their users, and they want to decide whether to roll out this new algorithm across their platform. As another example, suppose that a pharmaceutical company develops a vaccine for some infectious disease. Then, public health experts and officials must decide whether the vaccine is safe and efficacious enough to warrant its recommendation to the general population. As the side effects of the new treatment are unknown, the goal is to determine the efficacy of the treatment relative to the status quo baseline by running a budgeted randomized trial, where the number of treated individuals in the trial is limited for safety reasons.

The techniques and guarantees for estimating causal effects in classical causal inference heavily rely upon the stable unit treatment value assumption (SUTVA), which posits that the outcome of each individual is independent of the treatment assignment of all other individuals [2]. Unfortunately, SUTVA is violated in all the aforementioned applications because people influence and are impacted by their peers. In the presence of this network interference, an individual’s outcome is affected by the treatment assignment of others in their social network, and SUTVA no longer holds. Distinguishing between the direct effect of treatment on an individual and the network effect of others’ treatment on the individual can be challenging. This has resulted in a growing literature on causal inference in the presence of network (interference) effects, sometimes referred to as spillover or peer influence effects. In this work, we consider the task of estimating the TTE from unit randomized trials under neighborhood interference, when an individual is affected by the treatment of its direct neighbors but is otherwise unaffected by the treatment of the individuals outside their neighborhood. Furthermore, we focus on unit randomized designs (RDs), wherein individuals are independently assigned to either treatment or control in a randomized experiment. This is in contrast to cluster RDs, which have been proposed as an approach to address network interference for randomized experimental design but which may not be feasible in practice due to an incompatibility with existing experimental platforms.

Estimating a causal effect from randomized experiments involves two decisions. First, we must decide what kind of experiment to run, i.e., how we choose the individuals to participate in the study, and how we determine which individuals are assigned to receive the new treatment. In this article, we focus on randomization inference, in which we assume the entire population participates in the study, and the randomness arises from the assignment of treatment or control. Second, after the experiment is conducted, we must decide how to analyze the data and construct an estimate from the measured observations. The literature addressing network interference can largely be categorized as either constructing clever RDs to exploit clustering structure in the network, or designing new estimators that exploit structure in the potential outcomes model. Without making assumptions on either the network or the potential outcomes model, it is impossible to estimate the TTE under arbitrary interference [3–5].

Table 1 summarizes how different assumptions on the potential outcomes model or network structure lead to different types of solutions, with a focus on unbiased estimators and neighborhood interference models where the network is known. The columns correspond to assumptions on the network structure (in order of increasing generality from left to right), while the rows correspond to assumptions on the structure in the potential outcomes model (in order of increasing generality from top to bottom). The body of the graph lists proposed solutions for estimating the TTE under the corresponding assumptions on the network/model structure. For example, under a fully general network structure and a linear potential outcomes model, refs [6–11] propose using an ordinary least squares (OLS) estimator with a Bernoulli RD. Since we focus on solutions proposing unbiased estimators, we also list bounds on the variance of the estimators in the table, as a point of comparison. As Table 1 indicates, the literature has either focused on analyzing the Horvitz–Thompson estimator under new RDs that exploit network structure by increasing the correlation between neighbors’ treatment assignments or alternatively using regression-style estimators with Bernoulli RD, exploiting strong functional assumptions on the potential outcomes model. Without assuming any structure beyond neighborhood interference, the baseline solution of using the Horvitz–Thompson estimator under the Bernoulli RD is an unbiased estimator whose variance scales exponentially in the degree of the network.

Table 1

Literature Summary. Each row corresponds to an assumption on the structure of the potential outcomes model, and each column corresponds to an assumption on the structure of network. We list a proposed solution under the corresponding model/network assumptions in the following order: an unbiased ̲ estimator for the TTE, the RD, a bound on the variance (if available), and citations to related work. In the variance bounds, Y max is a bound on the effects on any individual, d is the maximum neighborhood size, C is the number of subcommunities or clusters, p is the treatment probability that is assumed to be small, κ is the restricted growth parameter, and β is the polynomial degree of the potential outcomes model. Our result proposes an estimator for the TTE under a β -order interactions (equivalently, β -degree polynomial) structure, a fully general graph, and Bernoulli design. All the solutions in the table rely on full knowledge of the network under neighborhood interference, and we focus on unbiased estimators.

Model structure	Network structure
	C disconnected subcommunities	κ -restricted growth	Fully general
Linear	Directions for future work		OLS, Bernoulli RD; [6–11]
Generalized linear			Regression/machine learning, Bernoulli RD; [11]
β -order interactions			SNIPE, Bernoulli RD; O Y max 2 d 2 β + 2 n p β
Arbitrary neighborhood interference	Horvitz–Thompson, Cluster RD; O Y max 2 C p ; [12–15]	Horvitz–Thompson, Randomized Cluster RD; O Y max 2 κ 4 d 2 n p ; [1,7,16,17]	Horvitz–Thompson, Bernoulli RD; O Y max 2 d 2 n p d ; [3]

In our work, we propose a hierarchy of model classes that extrapolates between simple linear models and complex general models, such that a practitioner can choose the strength of the model assumptions they are willing to impose. Naturally, assuming a more limited model simplifies the causal inference task. We characterize the complexity of a potential outcomes model with the order of interactions β , which also corresponds to the polynomial degree of the potential outcomes function when viewed as a polynomial of the treatment vector. A β -order interactions model is one in which each neighborhood set of size at most β can have a unique additive network effect on the potential outcome. Our model allows for heterogeneity in the influence of different sets of treated neighbors, strictly generalizing beyond the typical parametric model classes used in the literature, which oftentimes assumes anonymous interference. We make no assumptions on the graph beyond bounded degree, so the graph may be well connected and not easily clustered. We summarize our contributions and results below:

Assuming a β -order interactions model, under a nonuniform Bernoulli RD with treatment probabilities { p i } i = 1 n for p i ∈ [ p , 1 − p ] , we present the structured neighborhood interference polynomial estimator (SNIPE), a simple unbiased estimator for the TTE.
We derive a bound on the variance of our estimator, which scales polynomially in the degree d of the network and exponentially in the degree β of the potential outcomes model. We also show that our estimator is asymptotically normal.
For a d -regular graph and uniform treatment probabilities p i = p with p β < 0.16 and β ≪ d , we prove that the minimax optimal mean squared error for estimating the TTE is lower bounded by Ω 1 n p β , implying that the exponential dependence on β is necessary.
We provide experimental results to illustrate that using regression models that do not allow for heterogeneity among the network effects can lead to considerable bias when the anonymous interference assumption is violated. The experiments validate that our estimator is unbiased for β -order interaction models, and obtains a lower mean squared error than existing alternatives.

To interpret the upper bound on the variance of our proposed estimator for the TTE, we note that our variance scales as O ( poly ( d ) ∕ n p β ) , compared to the variance of the Horvitz–Thompson estimator under Bernoulli RD which scales as O ( 1 ∕ n p d ) , where β is always bounded above by d . For smaller values of β , the β -order interactions model imposes stronger structural assumptions on the potential outcomes model than required by the Horvitz–Thompson estimator. In turn, our estimator has significantly lower variance, scaling only polynomially in the network degree d yet exponential in β , as opposed to exponential in d . In addition, the minimax lower bound shows that the exponential dependence on β is also minimax optimal amongst β -order interaction models, implying that the order of interactions, or polynomial degree, of the potential outcomes model is a meaningful property that expresses the complexity of estimating the TTE.

2 Related literature

While there has been a flurry of recent activity in addressing the challenges that arise from network interference, every proposed solution concept fundamentally hinges upon making key assumptions on the form of network interference. Without any assumptions, the vector of observed outcomes under a particular treatment vector z ∈ { 0 , 1 } n may have no relationship to the potential outcomes under any other treatment vector. We should naturally expect that if we are willing to assume stronger assumptions, then we may be able to obtain stronger results conditioned on those assumptions being satisfied. As such, the literature ranges between results that impose fewer assumptions on the model and graph, resulting in unbiased estimators with high variance, or results that impose strong assumptions on the model and graph, resulting in simple, unbiased estimators with low variance.

Early works propose framing assumptions via exposure functions, or constant treatment response [3,4]. This assumes that there is some known exposure mapping, f ( z , θ i ) ∈ Δ , which maps the treatment vector z , along with unit-specific traits θ i , to a discrete space Δ of exposures, or effective treatments. The potential outcomes function for unit i is then assumed to only be a function of its exposure, or effective treatment, such that if f ( z , θ i ) = f ( z ′ , θ i ) , then Y i ( z ) = Y i ( z ′ ) . If ∣ Δ ∣ is as large as 2 n , then this assumption places no effective restriction on the potential outcomes function; thus, this assumption is only meaningful when ∣ Δ ∣ is relatively small. One commonly used exposure mapping expresses the neighborhood interference assumption [17–20], in which each unit i is associated to a neighborhood N i ⊆ [ n ] , and unit i ’s potential outcome is only a function of the treatments of units within N i . We could use an exposure mapping to formulate this assumption, where ∣ Δ ∣ = 2 d for d denoting the maximum neighborhood size, and f ( z , N i ) = f ( z ′ , N i ) if the treatments assigned to individuals in N i are the same between z and z ′ . Another commonly used exposure mapping expresses the anonymous interference assumption [14,21–23], in which the potential outcomes are only a function of the treatments through the number of treated neighbors, i.e., f ( z , N i ) = f ( z ′ , N i ) , if the number of treated individuals in N i via z and z ′ is the same. While the exposure mapping framework provides a highly generalizable and abstract framework for assumptions, it is fundamentally discrete in nature and the complexity of estimation is characterized by the number of possible exposures ∣ Δ ∣ , which could still be large. As a result, ref. [4] suggests a collection of additional assumptions that can be imposed on top of anonymous neighborhood interference, including distributional or functional form assumptions, or additivity assumptions as suggested in ref. [18].

The majority of works in the literature (along with our work) assume neighborhood interference with respect to a known graph. Notable exceptions include ref. [24], which considers a highly structured spatial interference setting with network effects decaying with distance, and [25], which provides methods for testing hypotheses about interference effects including higher order spillovers. Without imposing any additional assumptions on the potential outcomes besides neighborhood interference, a natural solution is to use the Horvitz–Thompson estimator to estimate the average outcomes under full neighborhood treatment and full neighborhood control [3]. While the estimator is unbiased, the variance of the estimator scales inversely with the probability that a unit’s full neighborhood is in either treatment or control. Under a Bernoulli( p ) RD, where each individual is treated independently with probability p , the variance scales as O ( 1 ∕ n p d ) , as indicated in the bottom right cell of Table 1. The exponential dependence on d renders the estimator impractical for realistic networks.

One approach in response to the aforementioned challenge is to consider cleverly constructing RDs that increase overlap, i.e., the probability that a unit’s entire neighborhood is assigned to treatment or control. The earliest literature in this line of work additionally assumes partial interference, also referred to as the multiple networks setting, in which the population can be partitioned into disjoint groups, and network interference only occurs within groups and not across groups [12,14,15,20,21,26,27]. This assumption makes sense in contexts where interference is only expected between naturally clustered groups of individuals, such as households, cities, or countries. Given knowledge of the groups, we can then randomly assign groups to different treatment saturation levels, e.g., jointly assigning groups to either treatment or control, increasing the correlation of treatments within neighborhoods. Then, a difference in means estimator or a Horvitz–Thompson estimator can be used to estimate the TTE. The asymptotic consistency of these estimators relies on the number of groups going to infinity, with a variance scaling inversely with the fraction of treated clusters, i.e., O ( 1 ∕ C p ) as indicated in the bottom left cell of Table 1. In practice, even networks that are clearly clustered into separate groups may not have a sufficiently large number of groups to result in accurate estimates.

For general connected graphs, one can still implement a cluster-based RD on constructed clusters, where the clusters are constructed to minimize the number of edges between clusters [1,7,16,17]. References [1,17] provide guarantees for graphs exhibiting a restricted growth property, which limits the rate at which local neighborhoods of the graph can grow in size, and ref. [1] proves that using randomized cluster RD along with the Horvitz–Thompson estimator achieves a variance of O ( 1 ∕ n p ) for restricted growth networks, which is a significant gain from the exponential dependence on d under Bernoulli design. A limitation of these solutions is that the cluster RD can be difficult to implement due to incompatibility with existing experimentation platforms or to ethical concerns. When the existing experimentation platform is already set up for a unit RD, the experimenter may have the desire to avoid overhauling the platform to work with cluster RD due to time or resource constraints [11]. In addition, refs [28–31] detail some ethical issues that arise in cluster RDs including, but not limited to, problems with informed consent (e.g., it may be infeasible to gain informed consent from every unit in a cluster) and concerns about distributional justice (e.g., with regards to how clusters are selected and assigned to treatment). Furthermore, both refs [28,30] comment that many of the existing, formal research ethics guidelines were designed with unit RD in mind and thus, offer little guidance for considerations that arise in cluster RDs. The work we present here focuses on scenarios with unit RDs, when cluster RDs may not be feasible or may be undesirable due to any of the aforementioned concerns.

The alternate approach that has gained traction empirically is to impose additional functional assumptions on the potential outcomes in addition to anonymous neighborhood interference. The most common assumption is that the potential outcomes are linear with respect to a particular statistic of the treatment vector, where the linear function is shared across units [6–11]. Under this assumption, estimating the entire potential outcomes function reduces to linear regression, which one could solve using OLS, as indicated in the upper right most cell of Table 1. After recovering the linear model, one could estimate any desired causal estimand. The results rely on correctly choosing the statistic that governs the linearity, or more generally reduces the task to heuristic feature engineering for generalized linear models [11]. One could plausibly extend the function class beyond linear and instead apply machine learning techniques to estimate the function that generalizes beyond linear regression. While we do not state a variance bound in Table 1, one would expect that when p is sufficiently large, the variance would scale with O ( 1 ∕ n ) , with some dependence on the complexity of the model class; when p is very small, the variance may scale with O ( 1 ∕ p n ) , as the regression still requires sufficient variance of covariates represented in the data, i.e., a sufficient spread of number of neighbors treated. Reference [22] considers nonparametric models yet focuses on estimating a locally linearized estimand. A drawback of these approaches is that they assume anonymous interference, which imposes a symmetry in the potential outcomes such that the causal network effect of treating any of an individual’s neighbors is equal regardless of which neighbor is chosen. In addition, they assume that the function that we are learning is shared across units, or at least units of the same known covariate type, which can be limiting.

While we have primarily focused on summarizing the literature for unbiased estimators, there has also been some work considering how structure in the potential outcomes model can be exploited to reduce bias of standard estimators. References [32–34] use application domain-specific structure in two-sided marketplaces and network routing to compare the bias of the difference in means estimators under different experimental designs. Reference [35] uses structure in ridesharing platforms to propose a new estimator with reduced bias. In addition, there has been some limited empirical work studying bias under model misspecification [16].

Complementary to the literature on randomized experiments, there has been a growing literature considering causal inference over observational studies in the presence of network interference. However, the limitations similarly mirror the aforementioned concerns. A majority of the literature assumes partial interference [15,36–39], i.e., the multiple networks setting, which then enables causal inference of a variety of different estimands. In particular, it is commonly assumed that the different groups in the network are sampled from some underlying distribution, and the statistical guarantees are given with respect to the number of groups going to infinity. Alternately, other works assume that the potential outcomes only depend on a simple and known statistic of the neighborhood treatment, most commonly the number or fraction of treated [11,40,41]. Either the neighborhood statistic only takes finite values, or assumptions are imposed on the functional form of the potential outcomes, which imply anonymous interference and reduce inference to a regression task or maximum likelihood calculation. Reference [42] considers a general exposure mapping model alongside an inverse propensity weighted estimator, but the estimator has high variance when the exposure mapping is complex.

In contrast to the majority of the mentioned literature, we neither rely on cluster RDs nor anonymous interference assumptions. We instead propose a potential outcomes model with low-order interactions structure, where the degree of interactions β characterizes the difficulty of inference, also studied in ref. [43]. For β = 1 , this model is equivalent to heterogeneous additive network effects in ref. [44], which can be derived from the joint assumptions of additivity of main effects and interference in ref. [18]. When β is larger than the network degree, then this assumption is equivalent to an arbitrary neighborhood interference, providing a nested hierarchy of models that encompass the simple linear model class, the fully general model class, as well as model classes of varying complexity in between. References [43,44], which consider similar models as we do, focus on the setting when the underlying network is fully unknown, and yet there is richer available information either in the form of historical data [44] or multiple measurements over the course of a multi-stage experiment [43], neither of which we assume in this work. Our estimator also has close connections to the pseudoinverse estimator in ref. [45], the Riesz estimator in ref. [46], and the estimator in ref. [22], which is discussed in Section 4.

3 Model

3.1 Causal network

Let [ n ] ≔ { 1 , … , n } denote the underlying population of n individuals. We model the network effects in the population as a directed graph over the individuals with edge set E ⊆ [ n ] × [ n ] . An edge ( j , i ) ∈ E signifies that the treatment assignment of individual j affects the outcome of individual i . As an individual’s own treatment is likely to affect their outcome, we expect self-loops in this graph. In much of the article, we are concerned with neighborhood interference effects, and we use N i ≔ { j ∈ [ n ] : ( j , i ) ∈ E } to denote the in-neighborhood of an individual i . Note that this definition allows i ∈ N i . Many of our variance bounds reference the network degree. We let d in denote the maximum in-degree of any individual, d out denote the maximum out-degree, and d ≔ max { d in , d out } .

3.2 Potential outcomes model

To each individual i , we associate a treatment assignment z i ∈ { 0 , 1 } , where we interpret z i = 1 as an assignment to the treatment group and z i = 0 as an assignment to the control group. We collect all treatment assignments into the vector z . We use Y i to denote the outcome of individual i . As our setting assumes network interference, the classical SUTVA assumption is violated. That is, Y i is not a function only of z i . Rather, Y i : { 0 , 1 } n → R may be a function of z , the treatment assignments of the entire population. Since each treatment variable z i is binary, we can indicate an exact treatment assignment as a product of z i (for treated individuals) and ( 1 − z i ) (for untreated individuals) factors. As such, we can express a general potential outcome function Y i as a polynomial in z ,

Y i ( z ) = ∑ T ⊆ [ n ] a i , T ∏ j ∈ T z j ∏ k ∈ [ n ] \ T ( 1 − z k ) ,

where a i , T is individual i ’s outcome when their set of treated neighbors is exactly T . Via a change of basis, we can equivalently express Y i ( z ) as a polynomial in the “treated subsets”:

(3.1) Y i ( z ) = ∑ S ′ ⊆ [ n ] c i , S ′ ∏ j ′ ∈ S ′ z j ′ ,

where c i , S ′ represents the additive effect on individual i ’s outcome that they receive when the entirety of subset S ′ (perhaps among other individuals) is treated. Note that c i , ∅ represents the baseline effect, the component of i ’s outcome that is independent of the treatment assignments.

So far, the potential outcomes model described in (3.1) is completely general. However, it is parameterized by 2 n coefficients { c i , S ′ } , which makes it untenable in most settings. To combat this, we impose some structural assumptions on these coefficients. First, we observe that the populations of interest can be quite large (e.g., the population of an entire country), and their influence networks may have high diameter. Throughout most of the article, we assume that individuals’ outcomes are influenced only by their immediate in-neighborhood.

Assumption 1

(Neighborhood interference) Y i ( z ) only depends on the treatment of individuals in N i . Equivalently, Y i ( z ) = Y i ( z ′ ) for any z and z ′ such that z j = z ′ for all j ∈ N i . In our notation, c i , S ′ = 0 for any S ′ ⊈ N i .

Next, we note that the degree of each Y i ( z ) can (under the neighborhood interference assumption) be as large as d in . In such a model, one’s outcome may be differently influenced by a treated coalition of any size in their neighborhood. Contrast this with a simpler linear potential outcomes model, wherein an individual’s outcome receives only an independent additive effect from each of their treated neighbors. This illustrates that the degree of the polynomial may serve as a proxy for its complexity. In this work, we consider the scenario where the polynomial degree may be significantly smaller than d in .

Assumption 2

(Low polynomial degree) Each potential outcome function Y i ( z ) has degree at most β . In our notation, c i , S ′ = 0 whenever ∣ S ′ ∣ > β . Along with Assumption 1, it follows that the potential outcomes function Y i ( z ) from (3.1) can be expressed in the following form,

(3.2) Y i ( z ) = ∑ S ′ ∈ S i β c i , S ′ ∏ j ∈ S ′ z j ,

where we define S i β ≔ { S ⊆ N i : ∣ S ∣ ≤ β } .

We remark that while we use the formal mathematical term of “low polynomial degree,” since this describes a function over a vector of binary variables, a low polynomial degree constraint is equivalent to a constraint on the order of interactions amongst the treatments of neighbors. In the simplest setting when β = 1 , this is equivalent to a model in which the networks effects are additive across treated neighbors, strictly generalizing beyond all linear models that have been widely used in applied settings.

We use the notation in (3.2) to express the potential outcomes model for the remainder of the article. If β ≥ d in , note that β could be completely removed from the definition of Y i in equation (3.2), reducing to the arbitrary neighborhood interference setting. However, we turn our attention to settings where β might be much smaller than the degree of the graph ( β ≪ d in ), and we can assume that only low-order interactions within neighborhoods have an effect on an individual. As noted earlier, taking β = 1 corresponds to the heterogeneous linear outcomes model in ref. [44]. We include further examples to help in understanding this low polynomial degree assumption in Section 3.2.1.

Many of our variance bounds utilize an upper bound on the treatment effects for each individual. We define Y max such that

(3.3) Y max ≔ max i ∈ [ n ] ∑ S ′ ⊆ S i β ∣ c i , S ′ ∣ .

It follows that ∣ Y i ( z ) ∣ ≤ Y max for any treatment vector z .

Remark 1

We emphasize that the model in (3.2) captures fully general neighborhood interference when β = d . Even when β < d , the number of parameters in the model grows with n . This results in the total number of observations always being smaller than the number of parameters in the model, so using simple regression-style estimators to identify the model is impossible.

Remark 2

We can consider the order of interactions or polynomial degree β as a way to measure the complexity of the model class. Our subsequent results suggest that β is meaningful as it also captures a notion of statistical complexity or difficulty of estimation. This is evidenced by the variance upper bound and minimax lower bound results in Section 5, where we see that the smaller β is relative to the degree of the graph, the smaller the variance incurred and the larger β is with respect to the graph degree, the higher the variance incurred. In some sense, the “lower” the degree of the model, i.e., the more structure imposed, the “easier” it is to estimate the TTE. On the flip side, the “higher” the degree of the model, i.e., the closer it is to being fully general, the “harder” it is to estimate.

Remark 3

Assumption 2 implies that the model exhibits a particular type of sparsity in the coefficients with respect to the monomial basis, in which the coefficients corresponding to sets larger than size β are zero. In our setting when the treatments are budgeted, i.e., p is small, these coefficients precisely correspond to effects that are observable in a Bernoulli RD, i.e., the probability for observing or measuring the coefficient corresponding to set S is p ∣ S ∣ . As such, there is an direct connection between this hierarchy of models as parameterized by β and the ability to measure and estimate the TTE, which is formalized by our subsequent analysis.

3.2.1 Examples of low-degree interaction models

We provide a few examples to illustrate when the polynomial degree of the potential outcomes model may naturally be smaller than the total neighborhood size (i.e., β < d ).

Example 1

Consider a potential outcomes model that exhibits the joint assumptions of additivity of main effects and interference effects as defined in ref. [18]. This imposes that the potential outcomes satisfy

Y i ( z ) = Y i ( 0 ) + ( Y i ( z i e i ) − Y i ( 0 ) ) + ∑ k ∈ [ n ] ( Y i ( z k e k ) − Y i ( 0 ) ) ,

where 0 ∈ R n is a vector of all zeros and e j ∈ { 0 , 1 } n is a standard basis vector. As discussed in ref. [44], this assumption implies a low-degree interaction model with β = 1 , which they refer to as heterogeneous additive network effects.

Example 2

Consider a hypothesized setting where each individual’s neighborhood can be divided into smaller sub-communities. For example, one’s close contacts may include their immediate family, their close friends, their work colleagues, etc. Within each of these sub-communities, there may be nontrivial (higher order) interactions between the treatments of its members. However, it is reasonable to assume that the cumulative effects of each sub-community have an additive effect on the individual’s outcome. That is, there are no nontrivial interactions resulting from the treatment of individuals across different sub-communities. In this case, a natural choice for β is the size of the largest sub-community, which could be significantly smaller than the size of the largest neighborhood.

Example 3

Suppose a social networking platform is testing a new “hangout room” feature where groups of up to five people can interact in a novel environment on the platform. One can posit that a natural choice for β in this setting is 5, as the change in any individual’s usage on the platform can be attributed to how they utilize this new feature, which in turn is a function of it being introduced to various subsets of up to five users in that individual’s neighborhood.

Example 4

Consider a setting where an individual’s outcome is a low-degree polynomial in some auxiliary quantity, which is itself linear in the treatment assignments of their neighborhood. For example, we might have

Y i ( z ) = c i , ∅ + ∑ j ∈ N i c i j z j + ∑ j ∈ N i c i j z j 2 + ⋯ + ∑ j ∈ N i c i j z j β .

A similar setting is explored in our simulated experiments in Section 6.

Example 5

Consider an example in which network effects only arise from pairwise edge interactions and triangular interactions, i.e., individual i ’s outcome consists of a sum of its baseline outcome, its own direct effect c i , i , pairwise edge effects c i , j for j ∈ N i , and triangle effects c i , { j , j ′ } for j , j ′ ∈ N i such that there is also an edge between j and j ′ , indicating that the three individuals are mutual connections. For such a model, β would be 2 due to the triangular interactions.

Remark 4

We emphasize that any potential outcomes model that takes a binary treatment vector z as its input can be written as a polynomial in z , taking the form in equation (3.2) for general β . However, the low-degree assumption, i.e., that β ≪ d , will not generally admit high-degree models. For example, both threshold models and saturation models generally require the degree of Y i ( z ) to be ∣ N i ∣ . In a threshold potential outcomes model, an individual experiences network effects once a particular threshold of their neighbors are treated [7]. An example of this type of model is given by

Y i ( z ) = I ∑ j ∈ N i z j ≥ θ ⋅ ∑ S ⊆ N i c i , S ∏ j ∈ S z j ,

where 0 ≤ θ ≤ ∣ N i ∣ . Saturation models allow for network or peer effects to increase up to a particular saturation level, such as

Y i ( z ) = min θ , ∑ S ⊆ N i c i , S ∏ j ∈ S z j ,

where θ is some maximum saturation threshold. In this model, an individual receives additive effects from each subset of treated neighbors until a certain threshold effect is reached, after which the networks effects have “saturated” and the treatment of additional neighbors contributes no additional effect.

3.3 Causal estimand and RD

Throughout most of the article, we concern ourselves with estimating the TTE. This quantifies the difference between the average of individual’s outcomes when the entire population is treated versus the average of individual’s outcomes when the entire population is untreated:

(3.4) TTE ≔ 1 n ∑ i = 1 n ( Y i ( 1 ) − Y i ( 0 ) ) ,

where 1 represents the all ones vector and 0 represents the zero vector. Plugging in our parameterization from equation (3.2), we may re-express the TTE as follows:

(3.5) TTE = 1 n ∑ i = 1 n ∑ S ′ ∈ S i β S ′ ≠ ∅ c i , S ′ .

Since exposing individuals to treatment can have a deleterious and irreversible effect on their outcomes, we wish to estimate the TTE after treating a small random subset of the population. We focus on a nonuniform Bernoulli design, wherein each individual i is independently assigned treatment with probability p i ∈ [ p , 1 − p ] for p > 0 . That is, each z i ∼ Bernoulli ( p i ) . Such a RD is both straightforward to implement and to understand. Furthermore, many existing experimentation platforms are already designed for Bernoulli randomization, making it easy to collect new data or to re-analyze existing data and adjust for network interference, rather than requiring a complete overhaul of the existing experimentation platform to allow for more complex randomization schemes.

4 Estimator

In this section, we introduce the estimator that will serve as our central focus through the rest of the article: the SNIPE. While we restrict our attention to nonuniform Bernoulli design, the techniques to derive this estimator can be generalized to a wide variety of causal estimands and experimental designs. In fact, our estimator is connected to the Horvitz–Thompson estimator and turns out to be a special case of both the pseudoinverse estimator first introduced by Swaminathan et al. [45] and more recently the Riesz estimator of Harshaw et al. [46]. We discuss these connections in Section 4.3.

To provide intuition, we first derive the estimator in the β = 1 case with the linear heterogenous potential outcomes model [44] via a connection to OLS. Then, we show how it can be extended to the more general polynomial setting. Our main result (Section 5) establishes both the unbiasedness of this family of estimators and a bound on its variance under Bernoulli design.

Recalling that S i β ≔ { S ⊆ N i : ∣ S ∣ ≤ β } , the vector c i collects the parameters { c i , S } S ∈ S i β in some canonical ordering. We will assume throughout that ∅ ∈ S i β is always first in this ordering and otherwise index the entries of these vectors by their corresponding set S . As an example, when β = 1 and N i = { j 1 , … , j d i } , we may have c i = [ c i , ∅ c i , { j 1 } … c i , { j d i } ] ⊺ , where d i is the in-degree of unit i . In a similar manner, the treated subsets vector z ˜ i collects the indicators { ∏ j ∈ S z j } S ∈ S i β in the same ordering. In our β = 1 example, z ˜ i = [ 1 z j 1 … z j d i ] ⊺ . By using this notation, we may express

(4.1) Y i ( z ) = ⟨ c i , z ˜ i ⟩ , TTE = 1 n ∑ i = 1 n ⟨ ( 1 ∣ S i ∣ − e 1 ) , c i ⟩ ,

where the inner product argument in the second equation is the ∣ S i ∣ -length vector with first entry 0 and remaining entries 1.

4.1 Building intuition in the linear setting ( β = 1 )

To motivate our estimator, we consider the linear heterogeneous potential outcomes model ( β = 1 ) under nonuniform Bernoulli RD. By using the TTE expression from (4.1), we can recast the problem of estimating the TTE into the problem of estimating the parameter vector c i : by linearity of expectation, an unbiased estimator of c i will give rise to an unbiased estimator of TTE .

As a thought experiment toward estimating c i , imagine that we can perform M independent replications of our randomized experiment. In each replication m ∈ [ M ] , we observe the treated subsets vector ( z ˜ i ) m , obtained from our Bernoulli RD, and the outcome ( Y i ) m . We visualize,

( Y i ) 1 ( Y i ) 2 ⋮ ( Y i ) M ︸ Y i ∈ R M = 1 z j 1 1 … z j d i 1 1 z j 1 2 … z j d i 2 ⋮ ⋮ ⋱ ⋮ 1 z j 1 M … z j d i M ︸ X i ∈ R M × ( d i + 1 ) c i , ∅ c i j 1 ⋮ c i j d i ⏟ c i ∈ R ( d i + 1 ) .

To minimize the sum of squared deviations from the true coefficients, we use OLS, computing

c ˆ i = ( X i ⊺ X i ) − 1 X i ⊺ Y i .

We consider the limit of this estimate as M → ∞ . The consistency of the least squares estimator ensures that c ˆ i ⟶ P c i . By the law of large numbers, we have

X i ⊺ X i = ∑ m = 1 M 1 z j 1 m z j 2 m … z j d i m z j 1 m ( z j 1 m ) 2 z j 1 m z j 2 m … z j 1 m z j d i m z j 2 m z j 1 m z j 2 m ( z j 2 m ) 2 ⋱ z j 2 m z j d i m ⋮ ⋱ ⋱ ⋱ ⋮ z j d i m z j 1 m z j d i m z j 2 m z j d i m … ( z j d i m ) 2 ⟶ a . s . M ⋅ E [ z ˜ i z ˜ i ⊺ ] .

Similarly, the vector X i ⊺ Y i ⟶ a . s . M ⋅ E [ z ˜ i ⋅ Y i ( z ) ] . Assuming the invertibility of the matrix E [ z ˜ i z ˜ i ⊺ ] (we show this explicitly for Bernoulli design below), we obtain in the limit,

c i = ( M ⋅ E [ z ˜ i z ˜ i ⊺ ] ) − 1 M ⋅ E [ z ˜ i ⋅ Y i ( z ) ] = E [ z ˜ i z ˜ i ⊺ ] − 1 E [ z ˜ i ⋅ Y i ( z ) ] .

Now, we return from our thought experiment to the actual experimental setting wherein a single instantiation ( z ˜ i , Y i ) is realized. We consider the estimator

(4.2) c i ^ ≔ E [ z ˜ i z ˜ i ⊺ ] − 1 z ˜ i Y i .

Note that the matrix E [ z ˜ i z ˜ i ⊺ ] − 1 depends only on our experimental design and thus can be utilized by our estimator. The unbiasedness of this estimator follows from our aforementioned computation; by applying linearity,

E [ c i ^ ] = E [ z ˜ i z ˜ i ⊺ ] − 1 E [ z ˜ i Y i ] = c i .

By applying linearity once more, we may obtain the unbiased estimator for the TTE,

(4.3) TTE ^ = 1 n ∑ i = 1 n ⟨ ( 1 ∣ S i β ∣ − e 1 ) , c i ^ ⟩ = 1 n ∑ i = 1 n Y i ⟨ E [ z ˜ i z ˜ i ⊺ ] − 1 ( 1 ∣ S i β ∣ − e 1 ) , z ˜ i ⟩ .

For the specific setting of nonuniform Bernoulli design with β = 1 , one can use the fact that E [ z j k ] = E [ z j k 2 ] = p j k and E [ z j k z j k ′ ] = p j k p j k ′ for k ≠ k ′ to compute,

E [ z ˜ i z ˜ i ⊺ ] = 1 p j 1 p j 2 … p j d i p j 1 p j 1 p j 1 p j 2 … p j 1 p j d i p j 2 p j 1 p j 2 p j 2 ⋱ p j 2 p j d i ⋮ ⋱ ⋱ ⋱ ⋮ p j d i p j 1 p j d i p j 2 p j d i … p j d i ,

which we can invert to obtain

E [ z ˜ i z ˜ i ⊺ ] − 1 = 1 + ∑ k p j k 1 − p j k − 1 1 − p j 1 ⋯ ⋯ − 1 1 − p j d i − 1 1 − p j 1 1 p j 1 ( 1 − p j 1 ) 0 ⋯ 0 ⋮ 0 1 p j 2 ( 1 − p j 2 ) 0 ⋮ ⋮ ⋮ 0 ⋱ 0 − 1 1 − p j d i 0 ⋯ 0 1 p j d i ( 1 − p j d i ) .

We calculate

E [ z ˜ i z ˜ i ⊺ ] − 1 ( 1 ∣ S i β ∣ − e 1 ) = − ∑ j ∈ N i 1 1 − p j 1 p j 1 ( 1 − p j 1 ) … 1 p j d i ( 1 − p j d i ) ⊺ .

By plugging into equation (4.3), we obtain the explicit form for our estimator:

TTE ^ SNIPE ( 1 ) ≔ 1 n ∑ i = 1 n Y i ∑ j ∈ N i z j − p j p j ( 1 − p j ) ,

where SNIPE ( 1 ) refers to the fact that this is the SNIPE estimator with β = 1 .

4.2 SNIPE in the general polynomial setting

For larger β , we can use the same least squares approach to obtain an unbiased estimate of the coefficient vector c i , and then plug this into equation (4.1) to obtain TTE ^ . The outcomes Y i ( z ) remain linear functions in z ˜ i – albeit for a significantly longer z ˜ i containing ∣ S i β ∣ = ∑ k = 0 min ( β , ∣ N i ∣ ) ∣ N i ∣ k entries – so the same convergence results apply. Suppose we index the entries of E [ z ˜ i z ˜ i ⊺ ] by the sets S and T corresponding to the matrix row and column. By the independence and marginal treatment probabilities of nonuniform Bernoulli design, we see that

( E [ z ˜ i z ˜ i ⊺ ] ) S , T = ∏ j ∈ S ∪ T p j .

Working with this matrix is significantly more tedious, so we relegate the details to Appendix A. There, we show that E [ z ˜ i z ˜ i ⊺ ] is invertible by giving an explicit formula for the entries of its inverse. Then, we plug these entries into equation (4.3) to derive the following explicit formula for the estimator:

(4.4) TTE ^ SNIPE ( β ) ≔ 1 n ∑ i = 1 n Y i ∑ S ∈ S i β g ( S ) ∏ j ∈ S z j − p j p j ( 1 − p j ) ,

where we define the coefficient function g : 2 [ n ] → R such that

g ( S ) = ∏ s ∈ S ( 1 − p s ) − ∏ s ∈ S ( − p s )

for each S ⊆ [ n ] and g ( ∅ ) = 0 . When it is clear from context that TTE ^ refers to the SNIPE estimator, we suppress the subscript for cleaner notation.

Remark 5

We pause here to again emphasize that our technique gives us an unbiased estimate of the coefficient vector c i for each individual i . From here, we leverage the linearity of expectation to obtain an unbiased estimator for the TTE , which is a linear function of these c i coefficients. This same strategy can be applied to develop estimators for any causal effect that is linear in the c i , S coefficients. We highlight some other potential estimands and the explicit form of their estimators in Appendix E. The techniques that we discuss in Section 5 can be used to establish further properties of these estimators.

This estimator can be evaluated in O ( n d in β ) time and only utilizes structural information about the graph (not any influence coefficients c i , S ). Structurally, the estimator takes the form of a weighted average of the outcomes Y i of each individual i , where the weights themselves are functions of the treatment assignments of all members j of the in-neighborhood N i . To make use of the low-order interference assumption, the estimator separately scales the effect of treatment of each sufficiently small subset of N i using the scaling function g ( S ) . The definition of this g ( S ) ensures the unbiasedness of the estimator.

In the special case of a uniform treatment probability p i = p across all nodes, we can simplify this estimator to show that it is only a function of the number of treated individuals in i ’s neighborhood and not the identities. Let T = { j : z j = 1 } denote the set of treated units. We can rewrite the estimator as follows:

(4.5) TTE ^ = 1 n ∑ i = 1 n Y i ∑ S ∈ S i β ∏ j ∈ S z j − p p − ∏ j ∈ S z j − p p − 1 = 1 n ∑ i = 1 n Y i ∑ k = 0 min ( β , ∣ N i ∩ T ∣ ) ∑ A ⊆ N i ∩ T ∣ A ∣ = k ∑ ℬ ⊆ N i \ T ∣ ℬ ∣ ≤ β − k 1 − p p k ( − 1 ) ∣ ℬ ∣ − ( − 1 ) k p 1 − p ∣ ℬ ∣ = 1 n ∑ i = 1 n Y i ∑ k = 0 min ( β , ∣ N i ∩ T ∣ ) ∣ N i ∩ T ∣ k ∑ ℓ = 0 min ( β − k , ∣ N i \ T ∣ ) ∣ N i \ T ∣ ℓ ( − 1 ) k + ℓ p − 1 p k − − p 1 − p ℓ .

Thus, while our estimation guarantees allow for heterogeneity in the potential outcomes model, when treatment probabilities are uniform, computing the estimator does not depend on the identity of the treated individuals, but only the number of treated neighbors. As a result, the estimator can be evaluated in O ( n β 2 ) time, which is a significant improvement compared to the O ( n d in β ) computational complexity when the treatment probabilities are nonuniform.

4.3 Connection to other estimator classes

Our estimator takes the form of a linear weighted estimator

TTE ^ = 1 n ∑ i = 1 n Y i ⋅ w i ( z ) ,

under specifically constructed weight functions w i : { 0 , 1 } n → R . From this perspective, we can draw connections between our estimator and others appearing in the literature.

4.3.1 Horvitz–Thompson estimator

First, we show that in the special case where ∣ N i ∣ ≤ β , our estimator is identical to the classical Horvitz–Thompson estimator. In this case, the restriction ∣ S ∣ ≤ β is satisfied for every S ⊆ N i , so that S i β includes all subsets of N i . By using this, we may simplify

w i ( z ) = ∑ S ⊆ N i g ( S ) ∏ j ∈ S z j − p j p j ( 1 − p j ) = ∑ S ⊆ N i ∏ j ∈ S z j − p j p j − ∑ S ⊆ N i ∏ j ∈ S − ( z j − p j ) ( 1 − p j ) (by definition of g ( S ) ) = ∏ j ∈ N i 1 + z j − p j p j − ∏ j ∈ N i 1 − z j − p j 1 − p j (distributivity) = ∏ j ∈ N i z j p j − ∏ j ∈ N i 1 − z j 1 − p j = I ( z treats all of N i ) Pr ( z treats all of N i ) − I ( z treats none of N i ) Pr ( z treats none of N i ) .

Thus,

TTE ^ SNIPE = 1 n ∑ i = 1 n Y i I ( z treats all of N i ) Pr ( z treats all of N i ) − I ( z treats none of N i ) Pr ( z treats none of N i ) = TTE ^ H T ,

so we exactly recover the Horvitz–Thompson estimator. As a result, when β is sufficiently large relative to the degree of the nodes in the graph, our estimator is very similar to the Horvitz–Thompson estimator, only differing for the nodes that have graph degrees larger than β . In this sense, under Bernoulli randomization, our estimator can be thought of as a generalization of Horvitz–Thompson to additionally account for low polynomial degree structure, which is most relevant for simplifying the potential outcomes associated with high-degree vertices.

4.3.2 Pseudoinverse estimator

The key technical steps of deriving our estimator can be described as constructing unbiased estimators for each unit i ’s contribution to the TTE using a connection to OLS for linear models, and subsequently averaging the unbiased estimates due to the fact that the causal estimand is a linear function of these individual contributions. This overall technique has also appeared in the previous literature in semiparametric estimation, with one clear example described by Swaminathan et al. [45] in a seemingly different context of off-policy evaluation for online recommendation systems. In their model, a context x ∈ X arrives, at which point the principal selects a tuple (or slate) of actions s = ( s 1 , … , s ℓ ) and observes a random reward r based on the interaction between the context and the slate. The authors make a linearity assumption that posits that V ( x , s ) = E r [ r ∣ x , s ] = 1 s ⊺ ϕ x , where 1 s ⊺ indicates the choice of a particular action in each entry of the tuple, and θ x is a context-specific reward weight vector associated to context x . In our setting, the context x is an individual i , the reward weights θ x are their effect coefficients c i , and the slate indicator vector 1 s ⊺ is our treated subsets vector z ˜ i .

A primary goal of ref. [45] is to estimate ϕ x , which can be used to inform a good slate selection policy. The mean squared error of an estimate w for r is E μ [ ( 1 s ⊺ w − r ) 2 ] , where μ encodes the random selection of the context, slate, and reward. In this framing, the least squares estimator

θ ¯ x = ( E μ [ 1 s 1 s ⊺ ∣ x ] ) † E μ [ r 1 s ⊺ ∣ x ]

is the minimizer of the mean squared error (MSE) with the minimum norm. As the reward distribution is unknown, it is replaced with an empirical estimate of r given past data. We can perform the substitutions as described in the previous paragraph, noting that Bernoulli design ensures that E [ z ˜ i z ˜ i ⊺ ] is invertible (Appendix A), to recover our estimator c i ^ from (4.2).

4.3.3 Riesz estimator

In their recent work, Harshaw et al. [46] consider a very general causal inference framework wherein a treatment z is drawn from an underlying experimental design distribution over an intervention set Z . We observe an outcome Y i ( z ) for each unit i ∈ [ n ] , where we assume that Y i belongs to a model space ℳ i , which is a subspace of Y containing all measurable and square-integrable (with respect to the distribution over Z ) functions Z → R . Each individual is additionally endowed with a linear effect functional θ i : Y → R , and the goal is to estimate the average individual treatment effect τ = 1 n ∑ i = 1 n θ i ( Y i ) .

In our setting, Z = { 0 , 1 } n with the nonuniform Bernoulli design distribution (i.e., Pr ( z i = 1 ) = p i , with { z i } mutually independent). Y consists of all linear functions over the basis { ∏ j ∈ S z j : S ⊆ [ n ] , ∣ S ∣ ≤ β } , and each ℳ i restricts to the linear functions over { ∏ j ∈ S z j : S ∈ S i β } . Finally, each unit i has effect functional θ i ( Y i ) = Y i ( 1 ) − Y i ( 0 ) , so that τ = TTE .

Under two assumptions – correct model specification (needed in our work as well) and positivity (always satisfied in our setting since Bernoulli design ensures each treatment in Z occurs with positive probability) – the Riesz representation theorem guarantees the existence of an unbiased estimator for τ , referred to as a Riesz estimator, of the form

τ ^ = 1 n ∑ i = 1 n Y i ( z ) ψ i ( z ) ,

where ψ i ∈ ℳ i is a Riesz representor for θ i with the property that

θ i ( Y ) = E z [ ψ i ( z ) Y ( z ) ] ( ⋆ )

for each Y ∈ ℳ i . As each model space ℳ i has dimension ∣ S i β ∣ , we can identify ψ i by solving the linear system that verifies that ( ⋆ ) holds for each function in some basis for ℳ i . A canonical choice is the standard basis, giving rise to equations

θ i ( ∏ j ′ ∈ S ′ z j ′ ) = I ( S ′ ≠ ∅ ) = E z [ ψ i ( z ) ∏ j ′ ∈ S ′ z j ′ ] ,

for each S ′ ∈ S i β . The choice ψ i ( z ) = ∑ S ∈ S i β g ( S ) ∏ j ∈ S z j − p j p j ( 1 − p j ) is a solution to this system (Appendix B) and gives rise to our estimator.

5 Properties of estimator under Bernoulli design

The following theorem summarizes the key properties of our estimator.

Theorem 1

Under a potential outcomes model satisfying the neighborhood interference assumption with polynomial degree at most β , the estimator defined in (4.4) is unbiased with variance upper bounded by

d in d out Y max 2 n ⋅ e d in β ⋅ max 4 β 2 , 1 p ( 1 − p ) β ,

where each p i ∈ [ p , 1 − p ] and p > 0 .

Notably, a sequence of networks with n → ∞ and d = o ( log n ) has variance asymptotically approaching 0. We defer the proof of this theorem to Appendix B. Rather than appealing to the convergence properties of the pseudoinverse estimator to establish unbiasedness, we present an alternate combinatorial proof. To bound the variance, we carefully consider different possible overlapping subsets of individuals to separately bound many covariance terms that make up the overall variance expression.

To understand the variance bounds for our estimator, we can compare against the variance of Horvitz–Thompson under a Bernoulli design. In the simple setting of a d -regular graph and uniform Bernoulli ( p ) randomization, ref. [17] showed that the Horvitz–Thompson estimator has a variance that is lower bounded by Ω ( 1 ∕ n p d ) . In contrast, the variance of our estimator only scales polynomially in the degree d , but exponentially in the polynomial degree β , which is achieved by simply changing the estimator, without requiring any additional clustering structure of the graph and without utilizing complex RDs. This is a significant gain when the polynomial degree β is significantly lower than the graph degree d . The simplest setting of β = 1 already expresses all potential outcomes models that satisfy additivity of main effects and additivity of interference, as defined in ref. [18]; this subsumes all linear models that are commonly used in the practical literature, yet which require additional homogeneity assumptions.

We also remark that when β = d , both the Horvitz–Thompson estimator and our estimator are unbiased, for the same class of functions. When β < d , the Horvitz–Thompson estimator is unbiased for a strictly larger class of functions than our estimator, precisely those characterized by β = d . In this way, if the practitioner can use domain knowledge to argue that the true potential outcomes model belongs to the class of functions parametrized by β from equation (3.2) for β < d , then our estimator provides an advantage over the Hortvitz-Thompson estimator. This is because both estimators are unbiased but the variance of our estimator does not have exponential dependence on the graph degree. However, depending on the flexibility that the practitioner desires or needs to express in the potential outcomes, there is no clear winner. For example, if one desires a fully nonparametric potential outcomes to capture the most general neighborhood interference settings, they might set β = d . Then, both our estimator and the Horvitz–Thompson estimator are unbiased, and both have variance scaling exponentially in d . On the other hand, suppose anonymous interference was satisfied for a particular application and the potential outcomes could be modeled:

Y i ( z ) = c 0 + c 1 z i + c 2 ∑ j ∈ N i z j .

Then, using an OLS estimator would give an unbiased estimate with lower variance than our estimator. Thus, our model and estimator can be viewed as simply “adding to the toolbox” that practitioners may use depending on how expressive they need their potential outcomes models to be.

In the special setting of uniform Bernoulli design and β = 1 , our estimator as stated in (4.5) is the same as an estimator presented in ref. [22]. They consider a fully nonparametric setting under anonymous interference, in which the goal is to estimate the derivative of the total outcomes under changes of the population-wide treatment probability. As the derivative can be estimated by locally linearizing the outcomes function, they derive the special case of the estimator in (4.5) under β = 1 by taking the derivative of the expected population outcomes under a Bernoulli randomization, constructed by inverse propensity weights. This suggests that under a fully nonparametric setting, our estimator may be used for estimating an appropriately defined local estimand. There may also be opportunities to perform variance reduction on our estimator given knowledge of the graph structure, as proposed in ref. [22]; however, their solution requires anonymous interference, and it is not clear how to extend their solution concept beyond β = 1 , anonymous interference, and uniform treatment probabilities.

5.1 Minimax lower bound on mean squared error

To understand the optimality of our estimator, we construct a lower bound on the minimax optimal mean squared error rate. In particular, we show that for a setting with a d -regular graph and sufficiently-small uniform treatment probabilities p , the best achievable mean squared error is lower bounded by Ω 1 n p β .

Theorem 2

(Minimax lower bound) For any n , d , β , p with p β < 0.16 , and any estimator TTE ^ , there exists a causal network on n nodes with maximum degree d and effect coefficients { c i , S } i ∈ [ n ] , S ∈ S i for which the minimax squared error under uniform treatment probabilities p i = p is bounded below by

E [ ( TTE ^ − TTE ) 2 ] = Ω 1 n p β .

For a β -order interactions model, estimating the TTE requires being able to measure the network effect of the size β subsets of a unit’s neighborhood. The probability that a set of size β is jointly assigned to treatment is p β . As a result, the scaling of 1 p β is somewhat intuitive.

The proof of Theorem 2 (given in Appendix C) uses a generalized variation of LeCam’s method for fuzzy hypothesis testing [47], [48, Sec. 2.7.4]. We reduce the problem of TTE estimation to the creation of a hypothesis test to distinguish between two priors. We consider a setting with a d -regular graph, uniform treatment probabilities p i = p , and two Gaussian priors Γ 0 , Γ 1 over the effect coefficients. The priors have the same variances but with shifted means such that ∣ E Γ 0 [ TTE ] − E Γ 1 [ TTE ] ∣ = 2 δ for some carefully tuned parameter δ . The failure of hypothesis tests that rely on TTE ^ is attributable to one of two factors: (1) a significant shift in TTE brought about by the variability of the coefficients, or (2) inaccuracy of the estimator. We bound the probability of the former with a Gaussian tail bound. We bound the latter in terms of the Kullback-Leibler (KL)-divergence between the distributions over ( Y , z ) induced by Γ 0 and Γ 1 . The rest of the argument involves a calculation of this KL-divergence and a selection of δ to ensure a nonvanishing error probability.

While our lower bound clearly indicates that the exponential dependence on β as exhibited by p − β is necessary, a notable difference between our lower and upper bounds is the dependence on the graph degree d . Recall that the upper bound on the variance of our SNIPE estimator in Section 1 scales roughly as d β + 2 ; however, our lower bound result has no dependence on d . We attribute this gap to a weakness in the analyses and believe that it can be tightened with more careful calculation. In the upper bound, the d β arises as a bound on the sum of binomial coefficients of the form d k for k ≤ β . While this bound is precise asymptotically for a regime, where β is a small constant and d is large, it is loose in the most general case where β = d , where we know 2 d is sufficient. On the other hand, our lower bound argument considers a setting with a d -regular graph, but it only uses the degree of the graph to normalize the parameters on the prior distributions, such that we see no dependence on d in the final lower bound.

5.2 Central limit theorem

In this section, we show TTE ^ SNIPE is asymptotically normal using Stein’s method, a common approach to proving central limit theorems [3,49–52]. In particular, we use Theorem 3.6 from Ross [53], which we restate below for convenience.

[53, Theorem 3.6]. Let X 1 , … , X n be random variables such that E [ X i 4 ] < ∞ , E [ X i ] = 0 , ν 2 = Var ∑ i X i , and define W = ∑ i X i ∕ ν . Let collection ( X 1 , … , X n ) have dependency neighborhoods D i for i ∈ [ n ] , and also define D ≔ max 1 ≤ i ≤ n ∣ D i ∣ . Then, for Z a standard normal random variable,

(5.1) d W ( W , Z ) ≤ D 2 ν 3 ∑ i = 1 n E ∣ X i ∣ 3 + 28 D 3 ∕ 2 π ν 2 ∑ i = 1 n E [ X i 4 ] ,

where d W ( W , Z ) is the Wasserstein distance between W and Z .

Our estimator can be written in the form of TTE ^ = 1 n ∑ i ∈ [ n ] Y i w i ( z ) , where Y i and w i ( z ) depend only on the treatment assignments of individuals j ∈ N i . We apply Theorem 3.6 from ref. [53], to the random variables X i ≔ 1 n ( Y i w i ( z ) − E [ Y i w i ( z ) ] ) , such that by construction W = ∑ i X i ∕ ν = ( TTE ^ − TTE ) ∕ ν and TTE ^ = ν W + TTE . An application of Theorem 3.6 from ref. [53] implies that TTE ^ is asymptotically normal as long as the bound in (5.1) converges to 0 as n approaches infinity. As the size of the dependency neighborhoods D , the moments of X i , and the scaling of the variance ν all depend on the network parameters, for simplicity, we additionally assume in the conditions of Theorem 3 that d , Y max , and β are bounded above by a constant, and we also assume the treatment probability p is also lower bounded as a function of n .

Assumption 3

For some constant c > 0 , we have n ⋅ Var [ TTE ^ ] → c as n → ∞ .

This is a typical assumption made in the literature [3,51,54] that rules out degenerate cases, such as all the potential outcomes being 0, that would result in the estimator having an unnaturally low variance scaling smaller than 1 ∕ n .

Theorem 3

(Central limit theorem) Under Assumptions 1–3, and assuming that d , Y max , and β are all O ( 1 ) with respect to n , p = ω ( n − 1 ∕ 4 β ) , and p i ≤ 1 2 for all i, the normalized error ( TTE ^ − TTE ) ∕ ν converges in distribution to a standard normal random variable as n → ∞ , for ν 2 = Var [ TTE ^ ] .

Theorem 3 states that our estimator is asymptotically normal, assuming the boundedness of the magnitude of the potential outcomes, the polynomial degree, and the network degree. For the complete proof, refer to Appendix D. The proof follows from a straightforward application of Theorem 3.6 from ref. [53] with appropriate bounds for the moments E ∣ X i ∣ 3 and E [ X i 4 ] , the dependency neighborhood size D , and the variance ν . The dependency neighborhood for variable X i is the set { j ∈ [ n ] s.t. N i ∩ N j ≠ ∅ } , i.e., the set of individuals j that share in-neighbors with individual i . This follows from the observation that both X i and X j are a function of the treatment variables z k for shared in-neighbors k ∈ N i ∩ N j . The size of the largest dependency neighborhood is thus bounded by D ≤ d in d out . The moments E ∣ X i ∣ 3 , E [ X i 4 ] will be bounded as a function of Y max as defined in (3.3) along with the network degree d , treatment probability p , and the polynomial degree β . The boundedness assumptions are used to argue that these moments do not grow too quickly in n , so that the Wasserstein distance in (5.1) converges to 0. The details of these calculations are deferred to Appendix D. We remark that the strict boundedness assumption can be relaxed so that d , Y max , and β are polylogarithmic with respect to n , so long as we tighten the lower bound on the growth of p by a corresponding logarithmic factor.

5.3 Variance estimator

The central limit theorem result in Theorem 3 implies that we can construct asymptotically valid confidence intervals by [ TTE ^ − ν Φ − 1 ( 1 − α ) , TTE ^ + ν Φ − 1 ( 1 − α ) ] if we knew ν , where Φ indicates the cdf of a standard normal. Since the practitioner typically does not know ν , we construct a conservative estimator for ν by applying the approach from Aronow and Samii [3,55]. We begin by rewriting the SNIPE estimator as a sum over possible exposures x :

TTE ^ = 1 n ∑ i ∈ [ n ] Y i ( z ) w i ( z ) = 1 n ∑ i ∈ [ n ] ∑ x ∈ { 0 , 1 } ∣ N i ∣ I ( z N i = x ) Y i ( x ) w i ( x ) .

Note that both Y i ( x ) and w i ( x ) are deterministic, and the only randomness is expressed in the indicator function I ( z N i = x ) . Overloading notation, we let Y i and w i be expressed only as a function of z N i , where we assume the order β is clearly specified. Then, the variance of our estimator is given by

Var [ TTE ^ ] = 1 n 2 ∑ i , j ∈ [ n ] ∑ x ∈ { 0 , 1 } ∣ N i ∣ ∑ x ′ ∈ { 0 , 1 } ∣ N j ∣ Y i ( x ) w i ( x ) Y j ( x ′ ) w j ( x ′ ) Cov ( I ( z N i = x ) , I ( z N j = x ′ ) ) .

If P ( { z N i = x } ∩ { z N j = x ′ } ) > 0 , an unbiased estimate for the corresponding term in the variance expression is given by

I ( z N i = x ) I ( z N j = x ′ ) P ( { z N i = x } ∩ { z N j = x ′ } ) Y i ( x ) w i ( x ) Y j ( x ′ ) w j ( x ′ ) Cov ( I ( z N i = x ) , I ( z N j = x ′ ) ) .

Otherwise, if P ( { z N i = x } ∩ { z N j = x ′ } ) = 0 , by the same technique as given in ref. [55], using the observation that Cov ( I ( z N i = x ) , I ( z N j = x ′ ) ) = P ( z N i = x ) P ( z N j = x ′ ) , the corresponding term in the variance expression can be approximated by the inflated estimate

1 2 ( I ( z N i = x ) P ( z N i = x ) Y i 2 ( x ) w i 2 ( x ) + I ( z N j = x ′ ) P ( z N j = x ′ ) Y j 2 ( x ′ ) w j 2 ( x ′ ) ) .

The expectation of this estimate will be higher than the true term in the variance via Cauchy–Schwarz inequality. Therefore, a conservative variance estimator Var ^ ( TTE ^ SNIPE ) is given by

1 n 2 ∑ i ∑ j ∈ ℳ i ∑ x ∈ { 0 , 1 } ∣ N i ∪ N j ∣ I ( z N i ∪ N j = x ) P ( z N i ∪ N j = x ) Y i ( x ) w i ( x ) Y j ( x ) w j ( x ) Cov ( I ( z N i = x N i ) , I ( z N j = x N j ) ) + 1 n 2 ∑ i ∑ x ∈ { 0 , 1 } ∣ N i ∣ I ( z N i = x ) P ( z N i = x ) Y i 2 ( x ) w i 2 ( x ) ∑ j ∈ ℳ i ( 2 ∣ N j ∣ − 2 ∣ N j \ N i ∣ ) ,

where

Cov ( I ( z N i = x N i ) , I ( z N j = x N j ) ) = ∏ k ∈ N i ∪ N j p k x k ( 1 − p k ) 1 − x k ( 1 − ∏ k ′ ∈ N i ∩ N j p k ′ x k ′ ( 1 − p k ′ ) 1 − x k ′ ) .

In the Section 6, we compare the empirical variance of the SNIPE estimator TTE ^ with this conservative variance estimate Var ^ ( TTE ^ ) in simulations. We also compare against the worst-case variance bound from Theorem 1. Most notable is that both the conservative variance estimate and the worst-case variance bound are orders of magnitude larger than the empirical variance, suggesting that there is significant work needed to develop tighter variance estimates.

6 Experimental results

Using computational experiments on simulated data, we compare the performance of our estimator with existing estimators. By using an Erdös–Rényi model, we generate random directed graphs of n nodes for a population of n individuals. Figure 1 shows results from networks made using the Erdös–Rényi model with n nodes and probability p edge = 10 ∕ n of an edge existing between any two nodes. Hence, the expected in-degree and out-degree of each node is 10. For degree β , we construct the same potential outcomes model as in ref. [43]:

(6.1) Y i ( z ) = c i , ∅ + ∑ j ∈ N i c ˜ i j z j + ∑ ℓ = 2 β ∑ j ∈ N i c ˜ i j z j ∑ j ∈ N i c ˜ i j ℓ ,

where c i , ∅ ∼ U [ 0 , 1 ] , c ˜ i i ∼ U [ 0 , 1 ] , and for i ≠ j , c ˜ i j = v j ∣ N i ∣ ∕ ∑ k : ( k , j ) ∈ E ∣ N k ∣ for v j ∼ U [ 0 , r ] , where r denotes a hyperparameter that governs the magnitude of the network effects relative to the direct effects. We represent the magnitude of individual j ’s influence by the parameter v j . This influence is shared among individual j ’s out-neighbors proportional to their in-degrees.

$Figure 1 Plots visualizing the performance of various TTE \hspace{0.1em}\text{TTE}\hspace{0.1em} estimators under Bernoulli design on Erdös–Rényi networks for both linear and quadratic potential outcomes models. The height of each line on a plot depicts the experimental relative bias of the estimator and the shaded width depicts the experimental standard deviation. The SNIPE estimator is parametrized by β \beta , the degree of the potential outcomes model. (a) Varying population size, (b) varying direct:indirect effects, and (c) varying treatment budget.$

Figure 1

Plots visualizing the performance of various TTE estimators under Bernoulli design on Erdös–Rényi networks for both linear and quadratic potential outcomes models. The height of each line on a plot depicts the experimental relative bias of the estimator and the shaded width depicts the experimental standard deviation. The SNIPE estimator is parametrized by β , the degree of the potential outcomes model. (a) Varying population size, (b) varying direct:indirect effects, and (c) varying treatment budget.

6.1 Other estimators

We compare the performance of SNIPE with the performance of least-squares regression and difference-in-means estimators, also as in ref. [43]. Although the Horvitz–Thompson estimator is unbiased in this setting, under unit Bernoulli RD its variance is very high in practice. Thus, we omit this estimator from our experimental results and comparison. Another related estimator is the Hájek estimator, which is only approximately unbiased but with lower variance than the Horvitz–Thompson estimator. However, under our RD, its variance is still very high in practice. In addition, as we implemented a Bernoulli RD on a graph with expected network degree of 10, both the Hájek and Horvitz–Thompson estimators consistently took a value of 0, giving us no meaningful results to consider. For these reasons, we chose to omit these two estimators from our experimental results and comparison. The simplest difference-in-means estimator is the difference between the average outcome of individuals assigned to treatment and the average outcome of individuals assigned to control, given by

(6.2) TTE ^ DM = ∑ i ∈ [ n ] z i Y i ∑ i ∈ [ n ] z i − ∑ i ∈ [ n ] ( 1 − z i ) Y i ∑ i ∈ [ n ] ( 1 − z i ) .

This estimator does not take into account any information about each individual’s neighborhood and is biased under network interference. We also consider a modified version of this estimator that uses information about the number of treated neighbors of each individual. Let U i denote the number of individuals in N i \ { i } assigned to treatment, and let U ˜ i denote the number of neighbors individuals in N i \ { i } assigned to control. Then, the estimator is given by

(6.3) TTE ^ DM ( λ ) = ∑ i ∈ [ n ] z i I ( U i ≥ λ ) Y i ∑ i ∈ [ n ] z i I ( U i ≥ λ ) − ∑ i ∈ [ n ] ( 1 − z i ) I ( U ˜ i ≥ λ ) Y i ∑ i ∈ [ n ] ( 1 − z i ) I ( U ˜ i ≥ λ ) ,

for some user-defined tolerance λ ∈ [ 0 , 1 ] . We set λ = 0.75 for our experiments. Note that TTE ^ DM ( λ ) counts an individual i ’s outcome only when at least λ of their neighborhood is assigned to the same treatment as them.

We also compare with least-squares regression models of degree β , which assume that the potential outcomes model is given by

(6.4) Y i ( z ) = g ( z i , z ¯ i ) = ρ + ∑ k = 1 β γ k X i k + z i ρ ˜ + ∑ k = 1 β − 1 γ ˜ k X i k ,

for some covariate X i . We consider two variations. In the first, we set X i equal to the number of treated neighbors. In the second, we let X i equal the proportion of treated neighbors. In both cases, we do not include i in its neighborhood. The two sets of coefficients ( ρ , γ 1 , … γ β ) and ( ρ ˜ , γ ˜ 1 , … γ ˜ β ) allow for the model to be different when i is treated vs not treated, and since we only allow up to degree β interactions, the second summation stops at β − 1 . Overall, there are 2 β + 1 coefficients in the model. By using least-squares regression, we determine the set of coefficients minimizing the least-squares predictive error on the data set { z i , X i , Y i ( z ) } i ∈ [ n ] . These coefficients define an estimate for the function g ˆ in Equation 6.4. When X i = U i , the number of treated neighbors, the estimate is given by

(6.5) TTE ^ LS-Num = 1 n ∑ i = 1 n ( g ˆ ( 1 , ∣ N i ∣ − 1 ) − g ˆ ( 0 , 0 ) ) .

When we set X i = U i ∕ ( ∣ N i ∣ − 1 ) , the proportion of treated neighbors, we have

(6.6) TTE ^ LS-Prop = 1 n ∑ i = 1 n ( g ˆ ( 1 , 1 ) − g ˆ ( 0 , 0 ) ) .

6.2 Results and discussion

For each population size n , we sample G networks from the Erdös–Rényi model described previously. For every configuration of parameters in the experiment, we sample N treatment assignment vectors z 1 , … , z N from a uniform Bernoulli distribution with treatment probability p to compute the TTE using each estimator. Each plot we include also shows the relative bias of the TTE estimates, averaged over the results from these G N samples and normalized by the magnitude, for each estimator. The width of the shading around each line in the plots shows the standard deviation across the G N estimates. For our experiments,^[1] we chose G = 10 and N = 500 .

Figures 1 and 2 visualize the effects of various network or estimator parameters on the performance of each of the four TTE estimators described in Section 6.1 and TTE ^ SNIPE ( β ) , all under Bernoulli RD. In particular, we consider the effects of the population size ( n ), the treatment budget ( p ), the ratio between the network and direct effects ( r ), and the degree of the potential outcomes model ( β ). We list specific values for the parameters above each plot. Figure 1 shows the bias and empirical standard deviation of each estimator, where the values are all normalized by the magnitude of the true TTE. Figure 2 plots the empirical MSE of each estimator, also normalized by the magnitude of the true TTE. The normalization can alternately be viewed as standardizing all models so that the ground truth TTE is 1.

$Figure 2 Plots visualizing the MSE of various TTE \hspace{0.1em}\text{TTE}\hspace{0.1em} estimators under Bernoulli design on Erdős-Rényi networks for both linear and quadratic potential outcomes models. The height of each line on a plot depicts the mean squared error when the model is normalized so that the true TTE \hspace{0.1em}\text{TTE}\hspace{0.1em} is effectively equal to 1. Alternatively, we can think of this as the variance of the normalized estimates. Our estimator under a β \beta -order potential outcomes model is denoted SNIPE ( β ) \left(\beta ) in the figure. (a) Varying population size, (b) varying direct:indirect effects, and (c) varying treatment budget.$

Figure 2

Plots visualizing the MSE of various TTE estimators under Bernoulli design on Erdős-Rényi networks for both linear and quadratic potential outcomes models. The height of each line on a plot depicts the mean squared error when the model is normalized so that the true TTE is effectively equal to 1. Alternatively, we can think of this as the variance of the normalized estimates. Our estimator under a β -order potential outcomes model is denoted SNIPE ( β ) in the figure. (a) Varying population size, (b) varying direct:indirect effects, and (c) varying treatment budget.

The top row of plots in Figure 1 features results for a linear ( β = 1 ) potential outcomes model while the bottom row shows results for a quadratic ( β = 2 ) potential outcomes model. As expected, the SNIPE estimator, shown in blue, has no relative bias and its variance decreases as n increases. With the exception of the modified difference-in-means estimator TTE ^ DM(0.75) in green, the variances of the other estimators are lower than ours. However, the biases of the other estimators are larger than the standard deviation of our unbiased estimator overall. Moreover, as r increases, the networks effects are more significant than the direct effects and we see the biases of the other estimators grow larger. Note that the variance of our estimator remains relatively constant as r varies. When r is close to 0, there are essentially no network effects, SUTVA holds, and as expected, all the estimators are unbiased. Figure 2 shows that for many parameter combinations, the MSE of our estimator is lower than the other estimators; this is particularly the case for sufficiently large population sizes (large n ) and sufficiently significant relative network effects (large r ). In the top row of plots, corresponding to β = 1 , the difference in means estimators perform poorly relative to the other estimators so that they are beyond the upper limit of the displayed plot. While the performance of the least squares estimators and our estimator is comparable for β = 1 , the MSE of our estimator is solely due to variance, which will decrease with large n , yet the MSE of the least squares estimators is largely due to its bias, which will not decrease with large n , highlighting that they are not consistent estimators for our heterogeneous model.

6.3 Variance estimator experiments

We compare the empirical variance of SNIPE with the variance bound from Theorem 1 and the variance estimator constructed from Aronow–Samii’s method described in Section 5.3. As mentioned earlier, for each population size n , we sample G networks from the Erdös–Rényi model described previously. For every configuration of parameters in the experiment, we sample N treatment assignment vectors z 1 , … , z N from a uniform Bernoulli distribution with treatment probability p to compute the TTE using each estimator. For the variance experiments, we chose G = 10 and N = 100 . Table 2 displays the effects of various network or estimator parameters on the experimental variance of SNIPE ( β ) as well as the variance estimate and the theoretical variance bound, all under Bernoulli RD, and all averaged over G N samples of treatment vectors. As in previous experiments, we consider the effects of the population size ( n ), the treatment budget ( p ), the ratio between the network and direct effects ( r ), and the degree of the potential outcomes model ( β ). We list fixed values for the parameters in parentheses. The main observation we wish to draw attention to is the differences in the orders of magnitude among the empirical variance, the variance estimate, and the variance bound. It is clear from these results that obtaining a tighter variance estimator would be a valuable direction for the future work.

Table 2

Table presenting the empirical variance of the SNIPE ( β ) estimator for the TTE under Bernoulli ( p ) design on Erdös–Rényi networks with β = 1 , 2

	Experimental variance	Variance estimate	Variance bound
Results for SNIPE(1)
n ( p = 0.2 , r = 2 )
1,000	17.63	16327.43	488300.11
2,500	8.19	3724.33	245279.16
5,000	3.34	2270.81	140688.10
7,500	2.33	1408.50	106981.33
10,000	1.95	1437.32	105354.62
p (n = 5,000, r = 2)
0.1	2.93	5439.83	272939.86
0.2	3.71	2375.13	174599.20
0.3	4.33	1259.70	106516.48
0.4	6.10	1660.36	122405.92
0.5	8.06	2197.06	93585.34
r (n = 5,000, p = 0.2)
0.5	1.06	1480.23	19410.59
1.0	1.69	1793.46	44527.96
1.5	2.50	3802.44	112805.72
2.0	3.91	2080.84	142140.47
Results for SNIPE(2)
n ( p = 0.2 , r = 2 )
1,000	255.59	11046.03	470718.02
2,500	92.21	3873.10	231156.58
5,000	46.36	2962.08	147815.75
7,500	29.80	1495.31	95530.25
10,000	21.28	1463.03	81113.89
p (n = 5,000, r = 2 )
0.1	114.95	8635.44	147108.70
0.2	43.94	2142.06	133774.85
0.3	24.36	1373.10	129746.07
0.4	13.24	1115.97	128101.61
0.5	9.19	2435.29	133271.42
r (n = 5,000, p = 0.2)
0.5	12.99	1627.89	126414.68
1.0	20.32	1731.70	129161.00
1.5	30.94	2054.55	134921.48
2.0	44.50	2494.29	144292.22

The variance bound is computed using the bound in Theorem 1 and the variance estimate is computed using the Aronow–Samii estimator described in section 5.3. The parameters n , p , and r refer to the population size, the treatment probability, and the ratio of direct to indirect effects, respectively.

7 Conclusions and future work

We propose an estimator for the TTE under neighborhood interference and Bernoulli design when the graph is known. Our approach considers a potential outcomes model that is polynomial in the treatment vector z with degree parameterized by β , which we assume to be much smaller than the maximum neighborhood size. This assumption is equivalent to constraining the order of interactions amongst treated neighbors to sets of size at most β . We derive theoretical bounds on the variance of our estimator under Bernoulli RD and show that we improve upon the variance of the Horvitz–Thompson estimator when β is significantly lower than the maximum degree of the graph. We provide minimax lower bounds on the mean squared error of our estimator when the graph is d -regular and the treatment probabilities are the same for each individual. Furthermore, under additional boundedness conditions, we prove a central limit theorem for our estimator, allowing for conservative, asymptotically valid confidence intervals using our proposed variance estimator. Through computational experiments, we illustrate that our estimator has lower MSE than the MSE of standard difference in means and least-squares estimators for the TTE. Our work uniquely complements the literature in that we consider how to incorporate and exploit structure in the potential outcomes model in a way that allows for a richer model class than the typical parametric model classes and does not reduce the effective treatment to a low-dimensional statistic.

Our work presents many interesting and likely fruitful directions for future work. We summarize a few of these in the following section.

7.1 Optimized experimental designs

In this work, we studied the relationship between the complexity of the potential outcomes model (parameterized by β ) and the difficulty of estimation. In this analysis, we made no structural assumptions on the network and restricted focus to independent Bernoulli experimental design. It is easy to conceive that a more careful selection of the experimental design, motivated by structural information of the causal network, could lead to improved performance of the estimator. For example, in graphs that are well clustered, correlating the treatments within each cluster could allow some individuals to have more of their neighborhood treated, giving a better estimate of the magnitude of their treatment effect. The design philosophy of our estimator, as motivated in Section 4.1 through the lens of experiment replication, is not particular to Bernoulli design. As such, an enticing direction of future study would be to explore this estimator for other experimental designs and better understand the interplay between performance gains due to the network structure and the model structure.

7.2 Implications to observational studies

While our stated theoretical results only hold for nonuniform Bernoulli designs, there are natural implications to the analysis of observational studies under appropriate unconfoundedness assumptions. In particular, if treatments across individuals are independent from each other conditioned on observed covariates, and if the conditional treatment probabilities could be estimated, then one could plausibly consider a plugin approach to modify our estimator for such observational data. Formalizing how to extend our results to observational settings would be a fruitful and interesting direction for the future work.

7.3 Dealing with model misspecification

Another interesting direction for future work centers around how to use our proposed class of estimators when the model parameter β is unknown. In settings such as online social networks, it is reasonable to posit a low-degree interactions assumption on the network interference, which corresponds to adopting a potential outcomes model parameterized by a ground-truth value β GT . To estimate the TTE, a researcher will select a value β Exp to use in defining their estimator. Without knowledge of the ground truth model, it is possible that β GT ≠ β Exp . As this phenomenon of model misspecification is pervasive through causal inference and more general machine learning domains, it would be useful to quantify the relationship between the degree of β -misspecification and any additional accrual of bias or variance. Another related question is whether a statistical test can be developed to aid in the correct choice of β Exp or to validate the low-degree polynomial structure of the model.

Acknowledgements

We gratefully acknowledge the financial support from the National Science Foundation grants CCF-1948256 and CNS-1955997 and the National Science Foundation Graduate Research Fellowship grant DGE-1650441. We also acknowledge support from the Air Force Research Laboratory grant AFOSR FA9550-23-1-0301. We also thank Professor Nathan Kallus for his insightful feedback.

Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Conflict of interest: The authors state no conflict of interest.
Data availability statement: The Python scripts for the experiments and the data used in our results are available at: https://github.com/mayscortez/low-order-unitRD.

Appendix A The explicit TTE estimator for general β

Here, we derive an explicit formula for the TTE estimator under non-uniform Bernoulli design for general β . Recall (Equation (4.3)) that the estimator has the following form:

TTE ^ = 1 n ∑ i = 1 n Y i ( z ) ⟨ E [ z ˜ i z ˜ i ⊺ ] − 1 ( 1 ∣ S i β ∣ − e 1 ) , z ˜ i ⟩ ,

where each S i β collects all subsets of N i with cardinality at most β and each z ˜ i is the vector of length ∣ S i ∣ with entries ( z ˜ i ) S = ∏ j ∈ S z j , indicating whether the entire subset S has been assigned to treatment. Here, we index vectors and matrix entries by their corresponding sets (rather than numerical indices) for notational convenience. To begin, we focus our attention on the first argument vectors of these inner products. Entries of the matrix E [ z ˜ i z ˜ i ⊺ ] have the form

( E [ z ˜ i z ˜ i ⊺ ] ) S , T = E [ ∏ j ∈ S z j ∏ j ′ ∈ T z j ′ ] = E [ ∏ j ∈ S ∪ T z j ] = ∏ j ∈ S ∪ T p j .

Here, the second equality uses the fact that each z j ∈ { 0 , 1 } , and the third equality uses the independence of the treatment assignments. The following lemma establishes the invertibility of this matrix by giving an explicit expression for its inverse.

Lemma 1

The matrix E [ z ˜ i z ˜ i ⊺ ] is invertible, with entries of its inverse A i given by the formula

( A i ) S , T = ∏ j ∈ S − 1 p j ∏ k ∈ T − 1 p k ∑ U ∈ S i β ( S ∪ T ) ⊆ U ∏ ℓ ∈ U p ℓ 1 − p ℓ .

Proof

We will argue that E [ z ˜ i z ˜ i ⊺ ] A i = I ∣ S i β ∣ entrywise. Note that both E [ z ˜ i z ˜ i ⊺ ] and A i are symmetric matrices, so their product will be as well. First, we consider the diagonal entries of this matrix. Given any S ∈ S i β ,

( E [ z ˜ i z ˜ i ⊺ ] A i ) S , S = ∑ T ∈ S i β ∏ j ∈ S ∪ T p j ( A i ) T , S = ∑ T ∈ S i β ( − 1 ) ∣ S ∪ T ∣ ∏ k ∈ ( S ∩ T ) − 1 p k ∑ U ∈ S i β ( S ∪ T ) ⊆ U ∏ ℓ ∈ U p ℓ 1 − p ℓ = ∑ U ∈ S i β S ⊆ U ∏ ℓ ∈ U p ℓ 1 − p ℓ ∑ T ⊆ U ( − 1 ) ∣ S ∪ T ∣ ∏ k ∈ ( S ∩ T ) − 1 p k ( reverse sum ) = ( − 1 ) ∣ S ∣ ∑ U ∈ S i β S ⊆ U ∏ ℓ ∈ U p ℓ 1 − p ℓ ∑ V ⊆ S ∏ k ∈ V − 1 p k ∑ W ⊆ ( U \ S ) ( − 1 ) ∣ W ∣ ︸ I ( U ⊆ S ) ( V = T ∩ S , W = T \ S ) = ( − 1 ) ∣ S ∣ ∏ ℓ ∈ S p ℓ 1 − p ℓ ∑ V ⊆ S ∏ k ∈ V − 1 p k ( only non-zero term is U = S ) = ( − 1 ) ∣ S ∣ ∏ ℓ ∈ S p ℓ 1 − p ℓ ∏ k ∈ S 1 − 1 p k ( distributivity ) = 1 .

Next, we consider the off-diagonal entries. By the symmetry of E [ z ˜ i z ˜ i ⊺ ] A i , it suffices to consider entries ( S ′ , S ) for which S ′ \ S ≠ ∅ . We have,

( M i A i ) S ′ , S = ∑ T ∈ S i β ∏ j ′ ∈ S ′ ∪ T p j ′ ( A i ) T , S = ∑ T ∈ S i β ∏ j ′ ∈ S ′ ∪ T p j ′ ∏ j ∈ S − 1 p j ∏ k ∈ T − 1 p k ∑ U ∈ S i β ( S ∪ T ) ⊆ U ∏ ℓ ∈ U p ℓ 1 − p ℓ = ∏ j ∈ S − 1 p j ∑ U ∈ S i β S ⊆ U ∏ ℓ ∈ U p ℓ 1 − p ℓ ∑ T ⊆ U ∏ j ′ ∈ S ′ ∪ T p j ′ ∏ k ∈ T − 1 p k ( reverse sum ) = ∏ j ∈ S − 1 p j ∑ U ∈ S i β S ⊆ U ∏ ℓ ∈ U p ℓ 1 − p ℓ ∏ j ′ ∈ S ′ p j ′ ∑ V ⊆ S ′ ∏ k ∈ V − 1 p k ∑ W ⊆ U \ S ′ ( − 1 ) ∣ W ∣ ︸ I ( U ⊆ S ′ ) ( V = T ∩ S ′ , W = T \ S ′ ) = 0 .

Here, the last line follows because any non-zero term in the outer sum must correspond to some U such that S ⊆ U ⊆ S ′ ⇒ S ⊆ S ′ . Our earlier assumption that S ′ \ S ≠ ∅ ensures there is no such U .□

By using this lemma, we consider the entries in the first argument vector of each inner product. We have,

( A i ( 1 ∣ S i β ∣ − e 1 ) ) S = ∑ T ∈ S i β ( A i ) S , T − ( A i ) S , ∅ = ∏ j ∈ S − 1 p j ∑ T ∈ S i β ∏ k ∈ T − 1 p k ∑ U ∈ S i β ( S ∪ T ) ⊆ U ∏ ℓ ∈ U p ℓ 1 − p ℓ − ∑ U ∈ S i β S ⊆ U ∏ ℓ ∈ U p ℓ 1 − p ℓ = ∏ j ∈ S − 1 p j ∑ U ∈ S i β S ⊆ U ∏ ℓ ∈ U p ℓ 1 − p ℓ ∑ T ⊆ U ∏ k ∈ T − 1 p k − 1 ( reverse sum and factor ) = ∏ j ∈ S − 1 p j ∑ U ∈ S i β S ⊆ U ∏ ℓ ∈ U p ℓ 1 − p ℓ ∏ k ∈ U p k − 1 p k − 1 ( distributivity ) = ∏ j ∈ S − 1 p j ∑ U ∈ S i β S ⊆ U g ( U ) ∏ ℓ ∈ U − 1 1 − p ℓ ,

where we define g ( S ) ≔ ∏ s ∈ S ( 1 − p s ) − ∏ s ∈ S ( − p s ) . By substituting back into the inner product, we calculate

⟨ A i ( 1 ∣ S i β ∣ − e 1 ) , z ˜ i ⟩ = ∑ S ∈ S i β ∏ j ∈ S − z j p j ∑ U ∈ S i β S ⊆ U g ( U ) ∏ ℓ ∈ U − 1 1 − p ℓ = ∑ U ∈ S i β g ( U ) ∏ ℓ ∈ U − 1 1 − p ℓ ∑ S ⊆ U ∏ j ∈ S − z j p j = ∑ U ∈ S i β g ( U ) ∏ ℓ ∈ U − 1 1 − p ℓ ∏ j ∈ U 1 − z j p j = ∑ U ∈ S i β g ( U ) ∏ j ∈ U z j − p j p j ( 1 − p j ) .

Finally, we replace U with S to conform to earlier notation and obtain the explicit form for our estimator

T T E ^ = 1 n ∑ i = 1 n Y i ∑ S ∈ S i β g ( S ) ∏ j ∈ S z j − p j p j ( 1 − p j ) .

B Proof of Theorem 1

B.1 Unbiasedness

The key insight that we use in our unbiasedness calculations comes from the following lemma.

Lemma 2

If { z j } j ∈ [ n ] are mutually independent with z j ∼ Bernoulli ( p j ) , then for any S , S ′ ⊆ [ n ] ,

E ∏ j ∈ S z j − p j p j ( 1 − p j ) ∏ j ′ ∈ S ′ z j ′ = I ( S ⊆ S ′ ) ⋅ ∏ j ′ ∈ S ′ \ S p j ′ .

Proof

By the mutual independence of the { z j } , we can rewrite this expectations as a product, separating the variables into three groups.

E ∏ j ∈ S z j − p j p j ( 1 − p j ) ∏ j ′ ∈ S ′ z j ′ = ∏ j ∈ S \ S ′ E z j − p j p j ( 1 − p j ) ∏ j ′ ∈ S ′ \ S E [ z j ′ ] ∏ j ″ ∈ S ∩ S ′ E z j ″ ( z j ″ − p j ″ ) p j ″ ( 1 − p j ″ ) .

Note that the expectations in the first product each simplify to 0, so this expectation is non-zero only when S ⊆ S ′ . The expectations in the second product simplify to p j ′ , and those in the third product each simplify to 1. These observations imply the lemma.□

The critical feature of this lemma is that this indicator function simplifies sums over arbitrary sets to sums over subsets S ⊆ S ′ . This additional structure permits simplifications using the distributive property

∑ S ⊆ S ′ ∏ j ∈ S a j = ∏ j ∈ S ′ ( 1 + a j ) .

We leverage this fact in the following calculation. Given any S ′ ∈ S i β , we may simplify

E ∑ S ∈ S i β g ( S ) ∏ j ∈ S z j − p j p j ( 1 − p j ) ∏ j ′ ∈ S ′ z j ′ = ∑ S ∈ S i β g ( S ) ⋅ E ∏ j ∈ S z j − p j p j ( 1 − p j ) ∏ j ′ ∈ S ′ z j ′ ( linearity ) = ∑ S ⊆ S ′ g ( S ) ∏ j ′ ∈ S ′ \ S p j ′ ( Lemma 2 ) = ∏ j ′ ∈ S ′ p j ′ ∑ S ⊆ S ′ g ( S ) ∏ j ∈ S 1 p j = ∏ j ′ ∈ S ′ p j ′ ∑ S ⊆ S ′ ∏ j ∈ S 1 − p j p j − ∑ S ⊆ S ′ ∏ j ∈ S ( − 1 ) ( definition of g ( S ) ) = ∏ j ′ ∈ S ′ p j ′ ∏ j ∈ S ′ 1 + 1 − p j p j − I ( S ′ = ∅ ) ( distributivity ) = 1 − I ( S ′ = ∅ ) = I ( S ′ ≠ ∅ ) .

By applying the linearity of expectation and the previous result, we calculate

E [ TTE ^ ] = 1 n ∑ i = 1 n E Y i ∑ S ∈ S i β g ( S ) ∏ j ∈ S z j − p j p j ( 1 − p j ) = 1 n ∑ i = 1 n ∑ S ′ ∈ S i β c i , S ′ E ∑ S ∈ S i β g ( S ) ∏ j ∈ S z j − p j p j ( 1 − p j ) ∏ j ′ ∈ S ′ z j ′ = 1 n ∑ i = 1 n ∑ S ′ ∈ S i β S ′ ≠ ∅ c i , S ′ = TTE .

B.2 Variance bound

To bound the variance of this estimator, we make use of the following lemma to bound the magnitude of each g ( S ) coefficient.

Lemma 3

For any S ⊆ [ n ] , ∣ g ( S ) ∣ ≤ 1 .

Proof

First, note that ∣ g ( ∅ ) ∣ = 0 ≤ 1 . Now, for any non-empty set S , let i ∈ S . Then,

∣ g ( S ) ∣ = ∣ ∏ s ∈ S ( 1 − p s ) − ∏ s ∈ S ( − p s ) ∣ = ∣ ( 1 − p i ) ∏ s ∈ S \ { i } ( 1 − p s ) + p i ∏ s ∈ S \ { i } ( − p s ) ∣ ≤ ( 1 − p i ) ∏ s ∈ S \ { i } ( 1 − p s ) + p i ∏ s ∈ S \ { i } p s ( triangle inequality ) ≤ 1 − p i + p i = 1 .

This next lemma is used to bound the covariance terms that appear in our final calculation.

Lemma 4

Suppose that { z j } j ∈ [ n ] are mutually independent, with z j ∼ B e r n o u l l i ( p j ) . Then, for any S , S ′ , T , T ′ ⊆ [ n ] ,

0 ≤ C o v ∏ j ∈ S z j − p j p j ( 1 − p j ) ∏ j ′ ∈ S ′ z j ′ , ∏ k ∈ T z k − p k p k ( 1 − p k ) ∏ k ′ ∈ T ′ z k ′ ≤ I ( S △ T ⊆ S ′ ∪ T ′ ) ⋅ 1 p ( 1 − p ) ∣ S ∩ T ∣ ,

where S △ T = ( S ∪ T ) \ ( S ∩ T ) indicates the symmetric difference of S and T .

Proof

We reason separately about the two terms in the covariance expansion. By Lemma 2,

(A1) E ∏ j ∈ S z j − p j p j ( 1 − p j ) ∏ j ′ ∈ S ′ z j ′ E ∏ k ∈ T z k − p k p k ( 1 − p k ) ∏ k ′ ∈ T ′ z k ′ = I S ⊆ S ′ , T ⊆ T ′ ∏ j ′ ∈ S ′ \ S p j ′ ∏ k ′ ∈ T ′ \ T p k ′ .

Next, we reason about the expectation of the product term. Since the z j are Bernoulli random variables, we can combine the products over S ′ and T ′ , giving

(A2) E ∏ j ∈ S z j − p j p j ( 1 − p j ) ∏ k ∈ T z k − p k p k ( 1 − p k ) ∏ j ′ ∈ S ′ ∪ T ′ z j ′ .

We partition the elements of S ∪ S ′ ∪ T ∪ T ′ based on which of the products they are present in:

j ∈ S ∩ T ∩ ( S ′ ∪ T ′ ) : j contributes a factor of E z j 3 − 2 z j 2 p j + z j p j 2 p j 2 ( 1 − p j ) 2 = 1 p j .
j ∈ S ∩ T \ ( S ′ ∪ T ′ ) : j contributes a factor of E z j 2 − 2 z j p j + p j 2 p j 2 ( 1 − p j ) 2 = 1 p j ( 1 − p j ) .
j ∈ S ∩ ( S ′ ∪ T ′ ) \ T : j contributes a factor of E z j 2 − z j p j p j − p j 2 = 1 .
j ∈ T ∩ ( S ′ ∪ T ′ ) \ S : j contributes a factor of E z j 2 − z j p j p j − p j 2 = 1 .
j ∈ S \ T \ ( S ′ ∪ T ′ ) : j contributes a factor of E z j − p j p j − p j 2 = 0 .
j ∈ T \ S \ ( S ′ ∪ T ′ ) : j contributes a factor of E z j − p j p j − p j 2 = 0 .
j ∈ ( S ′ ∪ T ′ ) \ S \ T : j contributes a factor of E [ z j ] = p j .

Cases 5 and 6 ensure that (A2) is non-zero only when S ⊆ ( T ∪ S ′ ∪ T ′ ) and T ⊆ ( S ∪ S ′ ∪ T ′ ) , or equivalently when S △ T ⊆ S ′ ∪ T ′ . This condition is necessary for (A1) to be non-zero. In addition, note that each j from case 7 contributing a factor of p j to (A2) also contributes at least one factor of p j to (A1). The remaining j from other cases contribute a factor of at least 1. Notably, both (A2) and (A1) are non-negative, with (A2) dominating (A1), so that the covariance is always bounded below by zero, and upper bounded by (A2). In (A2), we can upper bound the contribution of each j ∈ S ∩ T by ( p ( 1 − p ) ) − 1 ; note that our definition of p ensures that p j ( 1 − p j ) ≥ p ( 1 − p ) for each j . We upper bound the contribution of each other j by 1, which establishes the stated bound on the covariance.□

We are ready to bound the variance. If N i ∩ N i ′ = ∅ , then Y i ( z ) w i ( z ) and Y i ′ ( z ) w i ′ ( z ) are functions of disjoint sets of independent variables. Thus, Cov [ Y i ( z ) w i ( z ) , Y i ′ ( z ) w i ′ ( z ) ] = 0 . We let ℳ i denote the set of individuals i ′ such that N i ∩ N i ′ ≠ ∅ , i.e., all individuals i ′ that share an in-neighbor with individual i . Note that ∣ ℳ i ∣ ≤ d in d out . By applying the bilinearity of covariance and the triangle inequality, we have

Var [ TTE ^ ] ≤ 1 n 2 ∑ i = 1 n ∑ i ′ ∈ ℳ i ∑ S ′ ∈ S i β ∣ c i , S ′ ∣ ∑ T ′ ∈ S i ′ β ∣ c i ′ , T ′ ∣ ∑ S ∈ S i β ∣ g ( S ) ∣ ∑ T ∈ S i ′ β ∣ g ( T ) ∣ × Cov ∏ j ∈ S z j − p j p j ( 1 − p j ) ∏ j ′ ∈ S ′ z j ′ , ∏ k ∈ T z k − p k p k ( 1 − p k ) ∏ k ′ ∈ T ′ z k ′ .

By plugging in our bounds from Lemmas 3 and 4, we can simplify this bound:

Var [ TTE ^ ] ≤ 1 n 2 ∑ i = 1 n ∑ i ′ ∈ ℳ i ∑ S ′ ∈ S i β ∣ c i , S ′ ∣ ∑ T ′ ∈ S i ′ β ∣ c i ′ , T ′ ∣ ∑ S ∈ S i β ∑ T ∈ S i ′ β I ( S △ T ⊆ S ′ ∪ T ′ ) ⋅ 1 p ( 1 − p ) ∣ S ∩ T ∣ .

Via the change of variables U = S ∩ T , S ″ = S \ U , T ″ = T \ U , we may rewrite this

1 n 2 ∑ i = 1 n ∑ i ′ ∈ ℳ i ∑ S ′ ∈ S i β ∣ c i , S ′ ∣ ∑ T ′ ∈ S i ′ β ∣ c i ′ , T ′ ∣ ∑ U ∈ S i β 1 p ( 1 − p ) ∣ U ∣ # ( S ″ , T ″ ) ∈ ( S ′ ∪ T ′ ) 2 : S ″ ∩ T ″ = ∅ ∣ S ″ ∣ , ∣ T ″ ∣ ≤ β − ∣ U ∣ ≤ 1 n 2 ∑ i = 1 n ∑ i ′ ∈ ℳ i ∑ S ′ ∈ S i β ∣ c i , S ′ ∣ ∑ T ′ ∈ S i ′ β ∣ c i ′ , T ′ ∣ ∑ U ∈ S i β 1 p ( 1 − p ) ∣ U ∣ ( 2 β ) 2 β − 2 ∣ U ∣ ≤ 1 n 2 ( 2 β ) 2 β ∑ i = 1 n ∑ i ′ ∈ ℳ i ∑ S ′ ∈ S i β ∣ c i , S ′ ∣ ∑ T ′ ∈ S i ′ β ∣ c i ′ , T ′ ∣ ∑ U ∈ S i β 1 4 β 2 p ( 1 − p ) ∣ U ∣ ≤ d in d out Y max 2 n ( 2 β ) 2 β ∑ k = 0 β d in k 1 4 β 2 p ( 1 − p ) k ≤ d in d out Y max 2 n e d in β ⋅ max 4 β 2 , 1 p ( 1 − p ) β .

Here, the final inequality makes use of the bound ∑ k = 0 β d k ≤ e d β β . Note that when β = 1 , this bound simplifies to

e d in 2 d out Y max 2 n p ( 1 − p ) .

C Proof of Theorem 2

We use a variation of LeCam’s method, which allows us to recast the hardness of TTE estimation through the lens of hypothesis testing. We consider the following setting.

The causal network is a d -regular directed graph (so d in = d out = d for all nodes) on n nodes.
The coefficients { c i , S } are drawn from one of two possible Gaussian distributions, Γ 0 and Γ 1 . The coefficients are mutually independent under both distributions with marginal probabilities
Γ 0 : c i , S ∼ 0 ∣ S ∣ < β , N δ d β − 1 , d β − 1 ∣ S ∣ = β , Γ 1 : c i , S ∼ 0 ∣ S ∣ < β , N − δ d β − 1 , d β − 1 ∣ S ∣ = β ,
where δ is a parameter that we will fix later.
All units have a uniform treatment probability p .

By using the mutual independence assumption, we see that under Γ 0 , each Y i ( 1 ) ∼ N ( δ , 1 ) and TTE ∼ N δ , 1 n . Under Γ 1 , each Y i ( 1 ) ∼ N ( − δ , 1 ) and TTE ∼ N − δ , 1 n .

We wish to compute a lower bound on the mean squared error of any estimator for TTE in this setting. To begin this calculation, we have

(A1) inf TTE ^ sup c E z [ ( TTE ^ − TTE ) 2 ∣ c ] ≥ inf TTE ^ sup c δ 2 100 Pr ∣ TTE ^ − TTE ∣ ≥ δ 10 ∣ c ≥ δ 2 100 inf TTE ^ max Γ ∈ { Γ 0 , Γ 1 } E c ∼ Γ Pr ∣ TTE ^ − TTE ∣ ≥ δ 10 ∣ c .

Here, the first inequality lower bounds the conditional expectation by δ 2 100 whenever ∣ TTE ^ − TTE ∣ ≥ δ 10 and by 0 whenever ∣ TTE ^ − TTE ∣ < δ 10 . The second inequality replaces the supremum over all possible c with a maximum over two possible distributions over c .

Now, consider designing a hypothesis test Ψ to distinguish these models, i.e., a test for I ( E c [ TTE ] > 0 ) . Each estimator TTE ^ gives rise to a decision rule Ψ ^ = I ( TTE ^ > 0 ) . If Ψ ^ is incorrect, then one of the following two scenarios must have occurred:

∣ TTE − E c [ TTE ] ∣ ≥ 9 δ 10 by a Gaussian tail bound this has probability < exp − 81 n δ 2 200 .
∣ TTE ^ − TTE ∣ ≥ δ 10 .

When neither of these conditions is true, then ∣ TTE ^ − E c [ TTE ] ∣ < δ , so TTE ^ and E c [ TTE ] have the same sign. By applying a union bound, we have

Pr ( Ψ ^ ≠ I ( E c [ TTE ] > 0 ) ) < exp − 81 n δ 2 200 + Pr ∣ TTE ^ − TTE ∣ ≥ δ 10 .

By rearranging this inequality and plugging into (A1), we may continue the simplification

≥ δ 2 100 inf TTE ^ max Γ ∈ { Γ 0 , Γ 0 } E c ∼ Γ [ Pr ( Ψ ^ ≠ I ( E [ TTE ] > 0 ) ∣ c ) ] − δ 2 100 exp − 81 n δ 2 200 ≥ δ 2 100 inf Ψ max Γ ∈ { Γ 0 , Γ 0 } E c ∼ Γ [ Pr ( Ψ ≠ I ( E [ TTE ] > 0 ) ∣ c ) ] − δ 2 100 exp − 81 n δ 2 200 ≥ δ 2 200 inf Ψ ( E c ∼ Γ 0 [ Pr ( Ψ ≠ 0 ) ] + E c ∼ Γ 1 [ Pr ( Ψ ≠ 1 ) ] ) − δ 2 100 exp − 81 n δ 2 200 = δ 2 200 ( 1 − ‖ P 0 − P 1 ‖ TV ) − δ 2 100 exp − 81 n δ 2 200 ≥ δ 2 200 ( 1 − 1 − exp ( − D KL ( P 0 ‖ P 1 ) ) ) − δ 2 100 exp − 81 n δ 2 200 .

Here, the second line follows because we have expanded the support of the infimum to all hypothesis tests, not just those that make use of an estimator TTE ^ . The third line lower bounds the maximum over the distributions by an average. The P i in the fourth line represent the joint distribution over ( z , c ) when c is drawn from Γ i . The last line is an application of the Bretagnolle-Huber inequality.

Next, we derive an upper bound for this KL-divergence. By applying the definition, we have

D KL ( P 0 ‖ P 1 ) = E P 0 log P 0 ( Y , z ) P 1 ( Y , z ) = E P 0 log P 0 ( Y ∣ z ) P 1 ( Y ∣ z ) .

Here, the second equality uses the fact that the treatment assignments are independent from the random model coefficients. Now, conditioned on the treatment assignment z , the outcomes Y are distributed according to a Gaussian with independent coordinates. If we let T ( z ) = { i ∈ [ n ] : z i = 1 } denote the set of treated individuals under z , then the marginal distribution of Y i ( z ) conditioned on z is

Y i ( z ) ∼ N ± ∣ N i ∩ T ( z ) ∣ β d β − 1 δ , ∣ N i ∩ T ∣ β d β − 1 ,

where the positive expectation comes from P 0 and the negative expectation comes from P 1 . By plugging in the density of the Gaussian into our KL-divergence formula, we find that

D KL ( P 0 ‖ P 1 ) = E P 0 − 1 2 d β ∑ i = 1 n ∣ N i ∩ T ( z ) ∣ β − 1 Y i ( z ) − ∣ N i ∩ T ( z ) ∣ β d β − 1 δ 2 − Y i ( z ) + ∣ N i ∩ T ( z ) ∣ β d β − 1 δ 2 = 2 δ ∑ i = 1 n E P 0 [ Y i ( z ) ] = 2 δ 2 ∑ i = 1 n d β − 1 E z ∣ N i ∩ T ( z ) ∣ β = 2 δ 2 ∑ i = 1 n d β − 1 ∑ S ⊆ N i S = β E z [ I ( S ⊆ T ( z ) ) ] = 2 n δ 2 p β

By plugging into our earlier results, we find that

inf TTE ^ sup c E z [ ( TTE ^ − TTE ) 2 ∣ c ] ≥ δ 2 200 1 − 1 − exp ( − 2 n δ 2 p β ) − 2 exp − 81 n δ 2 200 .

By taking δ 2 = 8 3 n p β , we obtain the upper bound

1 75 n p β 1 − 1 − exp ( − 16 3 ) − 2 exp − 27 25 p β .

This is Ω 1 n p β as long as p β < 0.16 .

D Proof of Theorem 3

Proof of Theorem 3

We apply Theorem 3.6 from ref. [53] to the following defined random variables,

X i ≔ 1 n ( Y i w i ( z ) − E [ Y i w i ( z ) ] ) , ν 2 ≔ Var ∑ i ∈ [ n ] X i , and W ≔ 1 ν ∑ i ∈ [ n ] X i ,

where w i ( z ) = ∑ S ⊆ N i ∣ S ∣ ≤ β g ( S ) ∏ j ∈ S z j p j − 1 − z j 1 − p j . By construction, it follows that

W = 1 n ν ∑ i ∈ [ n ] ( Y i w i ( z ) − E [ Y i w i ( z ) ] ) = 1 ν ( TTE ^ − TTE ) ,

such that TTE ^ = W ν + TTE . The proof follows from verifying the conditions used in Theorem 3.6 of [53], computing appropriate bounds for the moments of X i , and using the fact that ν 2 = Ω ( 1 ∕ n ) by Assumption 3 and the variance bound in Theorem 1.

First, we note that E [ X i ] = 0 by construction. To upper bound E [ X i 4 ] , note that that for all i , we have

(A1) ∣ w i ( z ) ∣ = ∑ S ⊆ N i ∣ S ∣ ≤ β g ( S ) ∏ j ∈ S z j p j − 1 − z j 1 − p j ≤ ∑ S ⊆ N i ∣ S ∣ ≤ β 1 p ∣ S ∣ ≤ d p β .

The second inequality in (A1) follows from Lemma 3, which upper bounds ∣ g ( S ) ∣ ≤ 1 . Furthermore, we use the assumption that for all i , p i ∈ [ p , 1 − p ] , and hence, ∣ ( z j ∕ p j − ( 1 − z j ) ∕ ( 1 − p j ) ) ∣ ≤ 1 ∕ p . The final inequality in (A1) follows from the fact that ∣ S ∣ ≤ β , and the number of subsets of N i for any i is bounded above by d β .

By recognizing that E [ Y i w i ( z ) ] = ∑ S ′ ⊆ N i 1 ≤ ∣ S ′ ∣ ≤ β c i , S ′ ≤ Y max , we obtain a bound ∣ X i ∣ ≤ 1 n Y max d p β + Y max using the triangle inequality and the bound on ∣ w i ( z ) ∣ from (A1). From this, we can bound

(A2) E [ X i 4 ] ≤ 1 n 4 E Y max d p β + Y max 4 = Y max 4 n 4 d p 4 β + 4 d p 3 β + 6 d p 2 β + 4 d p β + 1 .

Since d ∕ p > 1 and β ≥ 1 , we can then bound

(A3) E [ X i 4 ] = O 1 n 4 d p 4 β Y max 4 .

In the same way, we can bound

E [ ∣ X i ∣ 3 ] ≤ 1 n 3 E Y max d p β + Y max 3 = Y max 3 n 3 d p 3 β + 3 d p 2 β + 3 d p β + 1 .

Since d ∕ p > 1 and β ≥ 1 , we obtain the bound

(A4) E [ ∣ X i ∣ 3 ] = O 1 n 3 d p 3 β Y max 3 .

Let ℳ i denote the set of individuals i ′ such that N i ∩ N i ′ ≠ ∅ , i.e., all individuals i ′ that share an in-neighbors with individual i . The set ℳ i characterizes the dependency neighborhood of i , as X i and X j are dependent if and only if there is a shared neighbor k such that X i and X j both depend on z k . It follows that the maximum size of any dependency neighborhood D = max i ∈ [ n ] ∣ ℳ i ∣ ≤ d in d out ≤ d 2 . Then, by plugging in the bounds for D , E [ ∣ X i ∣ 3 ] and E [ X i 4 ] into Theorem 3.6 of ref. [53] results in the following bound

(A5) d W ( W , Z ) ≤ O d 4 ν 3 Y max 3 d 3 β n 2 p 3 β + d 3 ν 2 Y max 4 d 4 β n 3 p 4 β ,

where recall that Z is a standard normal random variable. Since Assumption 3 implies that ν 2 ≥ O ( 1 ∕ n ) , it follows that

d W ( W , Z ) = O Y max 3 d 3 β + 4 n 1 ∕ 2 p 3 β + Y max 2 d 2 β + 3 n 1 ∕ 2 p 2 β .

By boundedness of Y max , d , β , and as p = ω ( n − 1 ∕ 4 β ) , the Wasserstein distance between W and Z ∼ N ( 0 , 1 ) goes to 0 as n → ∞ . As TTE ^ = W ν + TTE , it follows that the distribution of TTE ^ converges to a normal with mean TTE and variance ν 2 .□

E Other estimands

E.1 Average treatment effect

The average treatment effect (ATE) measures the average effect that one’s own treatment has on their outcome (assuming no one else is treated).

ATE ≔ 1 n ∑ i = 1 n ( Y i ( e i ) − Y i ( 0 ) ) = 1 n ∑ i = 1 n c i , { i } .

The estimator for ATE takes the form

ATE ^ = 1 n ∑ i = 1 n Y i ( z ) ∑ S ∈ S i β ( A i ) S , { i } ∏ j ∈ S z j ,

where A i is the inverse of E [ z ˜ i z ˜ i ⊺ ] as defined in Section 1. By plugging in the explicit form of these matrix entries, we have:

ATE ^ = 1 n ∑ i = 1 n Y i ( z ) ∑ S ∈ S i β − 1 p i ∏ j ∈ S − z j p j ∑ U ∈ S i β ( S ∪ { i } ) ⊆ U ∏ ℓ ∈ U p ℓ 1 − p ℓ = 1 n ∑ i = 1 n Y i ( z ) ⋅ − 1 p i ∑ U ∈ S i β i ∈ U ∏ ℓ ∈ U p ℓ 1 − p ℓ ∑ S ⊆ U ∏ j ∈ S − z j p j ( reverse inner sums ) = 1 n ∑ i = 1 n Y i ( z ) ⋅ − 1 p i ∑ U ∈ S i β i ∈ U ∏ ℓ ∈ U p ℓ 1 − p ℓ ∏ j ∈ U p j − z j p j = 1 n ∑ i = 1 n Y i ( z ) ⋅ − 1 p i ∑ U ∈ S i β i ∈ U ∏ j ∈ U p j − z j 1 − p j .

E.2 Conditional average treatment effect

Given any subdemographic of the population D ⊆ [ n ] , the conditional average treatment effects (CATE) of D is the average effect to an individual of the demographic that is treated in isolation

CATE ( D ) ≔ 1 ∣ D ∣ ∑ i ∈ D ( Y i ( e i ) − Y i ( 0 ) ) = 1 ∣ D ∣ ∑ i ∈ D c i , { i } .

By the same calculation as mentioned earlier, we estimate the conditional average treatment effect:

CATE ( D ) ^ = 1 ∣ D ∣ ∑ i ∈ D c i , { i } ^ = 1 ∣ D ∣ ∑ i ∈ D Y i ( z ) ⋅ − 1 p i ∑ U ∈ S i β i ∈ U ∏ j ∈ U p j − z j 1 − p j .

E.3 Size-dependent treatment effects

Although not standard in the literature, as its significance is largely brought about by the low-degree polynomial structure of our potential outcomes model, another family of causal estimands can be used to understand the magnitude of the treatment effects that individuals experience as a result of different sized subsets of their neighborhood being treated. We define the α -treatment effect,

TE ( α ) ≔ 1 n ∑ i = 1 n ∑ S ′ ⊆ N i ∣ S ′ ∣ = α c i , S ′ ,

for some parameter α ≤ β . That is, the α -treatment effect measures only those effects that subsets S ′ of size α have on the outcome of each individual and averages the cumulative effect over the individuals. For example, even though the polynomial degree of a model could be large, the causal effects associated to higher order interactions could be small such that the potential outcomes could be well approximated with a linear model ( β = 1 ). In such an event, one would expect that TE(1) would be close to TTE, and TE( α ) would be significantly smaller in magnitude for α > 1 . As TE ( α ) is again a linear combination of the model parameters, we can obtain an unbiased estimate using the same framework as mentioned earlier. In this case, the estimator takes the following form:

TE ( α ) ^ = 1 n ∑ i = 1 n Y i ( z ) ∑ S ∈ S i β ∑ T ⊆ N i ∣ T ∣ = α ( A i ) S , T ∏ j ∈ S z j = 1 n ∑ i = 1 n Y i ( z ) ∑ S ∈ S i β ∑ T ⊆ N i ∣ T ∣ = α ∏ j ∈ S − z j p j ∏ k ∈ T − 1 p k ∑ U ∈ S i β ( S ∪ T ) ⊆ U ∏ ℓ ∈ U p ℓ 1 − p ℓ = 1 n ∑ i = 1 n Y i ( z ) ∑ T ⊆ N i ∣ T ∣ = α ∏ k ∈ T − 1 p k ∑ U ∈ S i β T ⊆ U ∏ ℓ ∈ U p ℓ 1 − p ℓ ∑ S ⊆ U ∏ j ∈ S − z j p j ( reorder sums ) = 1 n ∑ i = 1 n Y i ( z ) ∑ T ⊆ N i ∣ T ∣ = α ∏ k ∈ T − 1 p k ∑ U ∈ S i β T ⊆ U ∏ j ∈ U p j − z j 1 − p j ( distributivity ) .

References

[1] Ugander J, Yin H. Randomized graph cluster randomization. 2020. https://arxiv.org/abs/2009.02297. Search in Google Scholar

[2] Rubin DB. Randomization analysis of experimental data: the fisher randomization test comment. J Amer Stat Assoc. 1980;75(371):591–3. http://www.jstor.org/stable/2287653. 10.2307/2287653Search in Google Scholar

[3] Aronow PM, Samii C. Estimating average causal effects under general interference, with application to a social network experiment. Ann Appl Stat. 2017;11(4):1912–47. 10.1214/16-AOAS1005Search in Google Scholar

[4] Manski CF. Identification of treatment response with social interactions. Econom J. 2013;16(1):S1–23. 10.1111/j.1368-423X.2012.00368.xSearch in Google Scholar

[5] Basse GW, Airoldi EM. Limitations of design-based causal inference and A/B testing under arbitrary and network interference. Sociol Methodol. 2018;48(1):136–51. 10.1177/0081175018782569Search in Google Scholar

[6] Toulis P, Kao E. Estimation of causal peer influence effects. In: International Conference on Machine Learning; 2013. p. 1489–97. Search in Google Scholar

[7] Gui H, Xu Y, Bhasin A, Han J. Network a/b testing: From sampling to estimation. In: Proceedings of the 24th International Conference on International Conferences Steering Committee; 2015. p. 399–409. 10.1145/2736277.2741081Search in Google Scholar

[8] Basse GW, Airoldi EM. Model-assisted design of experiments in the presence of network-correlated outcomes. Biometrika. 2018;105(4):849–58. 10.1093/biomet/asy036Search in Google Scholar

[9] Cai J, De Janvry A, Sadoulet E. Social networks and the decision to insure. Amer Econ J Appl Econ. 2015;7(2):81–108. 10.1257/app.20130442Search in Google Scholar

[10] Parker BM, Gilmour SG, Schormans J. Optimal design of experiments on connected units with application to social networks. J R Stat Soc Ser C (Appl Stat.) 2017;66(3):455–80. https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssc.12170. 10.1111/rssc.12170Search in Google Scholar

[11] Chin A. Regression adjustments for estimating the global treatment effect in experiments with interference. J Causal Inference. 2019;7(2):20180026. 10.1515/jci-2018-0026Search in Google Scholar

[12] Sobel ME. What do randomized studies of housing mobility demonstrate? J Amer Stat Assoc. 2006;101(476):1398–407. 10.1198/016214506000000636. Search in Google Scholar

[13] Rosenbaum PR. Interference between units in randomized experiments. J Amer Stat Assoc. 2007;102(477):191–200. 10.1198/016214506000001112. Search in Google Scholar

[14] Hudgens MG, Halloran ME. Toward causal inference with interference. J Amer Stat Assoc. 2008;103:832–42. https://EconPapers.repec.org/RePEc:bes:jnlasa:v:103:y:2008:m:june:p:832-842. 10.1198/016214508000000292Search in Google Scholar PubMed PubMed Central

[15] Tchetgen EJT, VanderWeele TJ. On causal inference in the presence of interference. Stat Meth Med Res. 2012;21(1):55–75. PMID: 21068053. 10.1177/0962280210386779. Search in Google Scholar PubMed PubMed Central

[16] Eckles D, Karrer B, Ugander J. Design and analysis of experiments in networks: reducing bias from interference. J Causal Inference. 2017;5(1):20150021. 10.1515/jci-2015-0021Search in Google Scholar

[17] Ugander J, Karrer B, Backstrom L, Kleinberg J. Graph cluster randomization: Network exposure to multiple universes. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM; 2013. p. 329–37. 10.1145/2487575.2487695Search in Google Scholar

[18] Sussman DL, Airoldi EM. Elements of estimation theory for causal effects in the presence of network interference. 2017. http://arXiv.org/abs/arXiv:170203578. Search in Google Scholar

[19] Bargagli-Stoffi FJ, Tortù C, Forastiere L. Heterogeneous treatment and spillover effects under clustered network interference. 2020. http://arXiv.org/abs/arXiv:200800707. 10.2139/ssrn.3666101Search in Google Scholar

[20] Bhattacharya R, Malinsky D, Shpitser I. Causal inference under interference and network uncertainty. In: Adams RP, Gogate V, editors. Proceedings of The 35th Uncertainty in Artificial Intelligence Conference. vol. 115 of Proceedings of Machine Learning Research. PMLR; 2020. p. 1028–38. https://proceedings.mlr.press/v115/bhattacharya20a.html. Search in Google Scholar

[21] Liu L, Hudgens MG. Large sample randomization inference of causal effects in the presence of interference. J Amer Stat Assoc. 2014;109(505):288–301. PMID: 24659836. 10.1080/01621459.2013.844698. Search in Google Scholar PubMed PubMed Central

[22] Li S, Wager S. Random graph asymptotics for treatment effect estimation under network interference. 2020. https://arxiv.org/abs/2007.13302. Search in Google Scholar

[23] Viviano D. Experimental design under network interference. 2020. https://arxiv.org/abs/2003.08421. Search in Google Scholar

[24] Leung MP. Rate-optimal cluster-randomized designs for spatial interference. 2021. https://arxiv.org/abs/2111.04219. Search in Google Scholar

[25] Athey S, Eckles D, Imbens GW. Exact p-values for network interference. J Amer Stat Assoc. 2018;113(521):230–40. 10.1080/01621459.2016.1241178Search in Google Scholar

[26] VanderWeele TJ, TchetgenTchetgen EJ, Halloran ME. Interference and sensitivity analysis. Statist Sci. 2014 Nov;29(4):687–706. 10.1214/14-STS479. Search in Google Scholar PubMed PubMed Central

[27] Auerbach E, Tabord-Meehan M. The local approach to causal inference under network interference. 2021. http://arXiv.org/abs/arXiv:210503810. Search in Google Scholar

[28] Taljaard M, Weijer C, Grimshaw JM, BelleBrown J, Binik A, Boruch R, et al. Ethical and policy issues in cluster randomized trials: rationale and design of a mixed methods research study. Trials. 2009;10(1):1–10. 10.1186/1745-6215-10-61Search in Google Scholar PubMed PubMed Central

[29] Edwards SJ, Braunholtz DA, Lilford RJ, Stevens AJ. Ethical issues in the design and conduct of cluster randomised controlled trials. Bmj. 1999;318(7195):1407–9. 10.1136/bmj.318.7195.1407Search in Google Scholar PubMed PubMed Central

[30] Hutton JL. Are distinctive ethical principles required for cluster randomized controlled trials? Stat Med. 2001;20(3):473–88. 10.1002/1097-0258(20010215)20:3<473::AID-SIM805>3.0.CO;2-DSearch in Google Scholar

[31] Donner A, Klar N. Pitfalls of and controversies in cluster randomization trials. Amer J Public Health. 2004;94(3):416–22. PMID: 14998805. 10.2105/AJPH.94.3.416. Search in Google Scholar

[32] Johari R, Li H, Liskovich I, Weintraub GY. Experimental design in two-sided platforms: an analysis of bias. Manag Sci. 2022;68:7069–89. 10.1287/mnsc.2021.4247Search in Google Scholar

[33] Li H, Zhao G, Johari R, Weintraub GY. Interference, bias, and variance in two-sided marketplace experimentation: guidance for platforms. In: Proceedings of the ACM Web Conference 2022; 2022. p. 182–92. 10.1145/3485447.3512063Search in Google Scholar

[34] Spang B, Hannan V, Kunamalla S, Huang TY, McKeown N, Johari R. Unbiased experiments in congested networks. In: Proceedings of the 21st ACM Internet Measurement Conference; 2021. p. 80–95. 10.1145/3487552.3487851Search in Google Scholar

[35] Bright I, Delarue A, Lobel I. Reducing marketplace interference bias via shadow prices. 2022. http://arXiv.org/abs/arXiv:220502274. 10.1145/3580507.3597738Search in Google Scholar

[36] Perez-Heydrich C, Hudgens MG, Halloran ME, Clemens JD, Ali M, Emch ME. Assessing effects of cholera vaccination in the presence of interference. Biometrics. 2014;70(3):731–41. 10.1111/biom.12184Search in Google Scholar

[37] Liu L, Hudgens MG, Becker-Dreps S. On inverse probability-weighted estimators in the presence of interference. Biometrika. 2016;103(4):829–42. 10.1093/biomet/asw047Search in Google Scholar

[38] DiTraglia FJ, Garcia-Jimeno C, O’Keeffe-O’Donovan R, Sanchez-Becerra A. Identifying causal effects in experiments with spillovers and non-compliance. 2020. https://arxiv.org/abs/2011.07051. Search in Google Scholar

[39] Vazquez-Bare G. Identification and estimation of spillover effects in randomized experiments. J Econom. 2022;105237. 10.1016/j.jeconom.2021.10.014Search in Google Scholar

[40] Verbitsky-Savitz N, Raudenbush SW. Causal inference under interference in spatial settings: a case study evaluating community policing program in Chicago. Epidemiol Meth. 2012;1(1):107–30. 10.1515/2161-962X.1020Search in Google Scholar

[41] Ogburn EL, Sofrygin O, Diaz I, Van der Laan MJ. Causal inference for social network data. 2017. http://arXiv.org/abs/arXiv:170508527. Search in Google Scholar

[42] Forastiere L, Airoldi EM, Mealli F. Identification and estimation of treatment and interference effects in observational studies on networks. J Amer Stat Assoc. 2021;116(534):901–18. 10.1080/01621459.2020.1768100Search in Google Scholar

[43] Cortez M, Eichhorn M, Yu CL. Staggered rollout designs enable causal inference under interference without network knowledge. In: Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, editors. Advances in Neural Information Processing Systems. Curran Associates, Inc.; Vol. 35. 2022.Search in Google Scholar

[44] Yu CL, Airoldi E, Borgs C, Chayes J. Estimating the total treatment effect in randomized experiments with unknown network structure. Proceedings of the National Academy of Sciences. 2022;119(44):e2208975119. 10.1073/pnas.2208975119Search in Google Scholar PubMed PubMed Central

[45] Swaminathan A, Krishnamurthy A, Agarwal A, Dudik M, Langford J, Jose D, et al. Off-policy evaluation for slate recommendation. In: Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, editors. Advances in Neural Information Processing Systems. Curran Associates, Inc.; Vol. 30. 2017.Search in Google Scholar

[46] Harshaw C, Sävje F, Wang Y. A design-based Riesz representation framework for randomized experiments. 2022. http://arXiv.org/abs/arXiv:221008698. Search in Google Scholar

[47] LeCam L. Convergence of estimates under dimensionality restrictions. Ann Stat. 1973;1:38–53. 10.1214/aos/1193342380Search in Google Scholar

[48] Tsybakov AB. Introduction to nonparametric estimation. New York: Springer; 2009. 10.1007/b13794Search in Google Scholar

[49] Chin A. Central limit theorems via Stein’s method for randomized experiments under interference. 2018. http://arXiv.org/abs/arXiv:180403105. Search in Google Scholar

[50] Leung MP. Treatment and spillover effects under network interference. Rev Econ Stat. 2020 May;102(2):368–80. 10.1162/rest_a_00818Search in Google Scholar

[51] Harshaw C, Sävje F, Eisenstat D, Mirrokni V, Pouget-Abadie J. Design and analysis of bipartite experiments under a linear exposure-response model. Electr J Stat. 2023;17(1):464–518. 10.1214/23-EJS2111Search in Google Scholar

[52] Ogburn EL, Sofrygin O, Diaz I, Van der Laan MJ. Causal inference for social network data. J Amer Stat Assoc. 2022:1–15. 10.1080/01621459.2022.2131557Search in Google Scholar

[53] Ross N. Fundamentals of Steinas method. Probabil Surveys. 2011;8:210–93. 10.1214/11-PS182Search in Google Scholar

[54] Leung MP. Rate-optimal cluster-randomized designs for spatial interference. Ann Stat. 2022;50(5):3064–87. 10.1214/22-AOS2224Search in Google Scholar

[55] Aronow PM, Samii C. Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities. Survey Methodol. 2013;39(1):231–41. Search in Google Scholar

Received: 2022-08-10

Revised: 2023-06-13

Accepted: 2023-06-22

Published Online: 2023-08-03

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/jci-2022-0051

Keywords for this article

causal inference; total treatment effect; neighborhood interference; design of experiments

Creative Commons

BY 4.0