Individualized treatment rules under stochastic treatment cost constraints

Hongxiang Qiu; Marco Carone; Alex Luedtke

doi:10.1515/jci-2022-0005

Artikel Open Access

Individualized treatment rules under stochastic treatment cost constraints

Hongxiang Qiu , Marco Carone und Alex Luedtke

Veröffentlicht/Copyright: 31. Dezember 2022

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen

Aus der Zeitschrift Journal of Causal Inference Band 10 Heft 1

Abstract

Estimation and evaluation of individualized treatment rules have been studied extensively, but real-world treatment resource constraints have received limited attention in existing methods. We investigate a setting in which treatment is intervened upon based on covariates to optimize the mean counterfactual outcome under treatment cost constraints when the treatment cost is random. In a particularly interesting special case, an instrumental variable corresponding to encouragement to treatment is intervened upon with constraints on the proportion receiving treatment. For such settings, we first develop a method to estimate optimal individualized treatment rules. We further construct an asymptotically efficient plug-in estimator of the corresponding average treatment effect relative to a given reference rule.

Keywords: nonparametric inference; average treatment effect; dynamic treatment regime

MSC 2010: 62D20; 62G20

1 Introduction

The effect of a treatment often varies across subgroups of the population [1,2]. When such differences are clinically meaningful, it may be beneficial to assign treatments strategically depending on subgroup membership. Such treatment assignment mechanisms are called individualized treatment rules (ITRs). A treatment rule is commonly evaluated on the basis of the mean counterfactual outcome value it generates – what is often referred to as the treatment rule’s value – and an ITR with an optimal value is called an optimal ITR. There is an extensive literature on estimation of optimal ITRs and their corresponding values using data from randomized trials or observational studies [3,4,5, 6,7].

Most existing approaches for estimating ITRs do not incorporate real-world resource constraints. Without such constraints, an optimal ITR would assign the treatment to members of a subgroup provided there is any benefit for such individuals, even when this benefit is minute. In contrast, under treatment resource limits, it may be more advantageous to reserve treatment for subgroups with the greatest benefit from treatment. This issue has received attention in the recent work. Luedtke and van der Laan developed methods for estimation and evaluation of optimal ITRs with a constraint on the proportion receiving treatment [8]. Qiu et al. instead considered related problems in settings in which instrumental variables (IVs) are available [9]. In one of the settings they considered, the same resource constraint is imposed as in the study by Luedtke and van der Laan [8], but a binary IV is used to identify optimal ITRs even in settings in which there may be unmeasured confounders. In another setting considered in the study by Qiu et al. [9], the authors considered interventions on a causal IV or encouragement status and developed methods to estimate individualized encouragement (rather than treatment) rules with a constraint on the proportion receiving both encouragement and treatment [10]. They also developed nonparametrically efficient estimators of the average causal effect of optimal rules relative to a prespecified reference rule. Sun et al. [11] considered a setting in which the cost of treatment is random and dependent on baseline covariates. They developed methods to estimate optimal ITRs under a constraint on the expected additional treatment cost compared to control, though inference on the impact of implementing the optimal ITR in the population was not studied [11]. Sun [12] considered a related problem involving the development of optimal ITRs under resource constraints and established the asymptotic properties of the estimated optimal ITR. Their method appears viable when the class of ITRs is restricted by the user a priori.

In this article, we study estimation and inference for an optimal rule under two different cost constraints. The first is the same as appearing in the study by Sun et al. [11]. In contrast to earlier work on this setting, we do not constrain the class of ITRs considered and provide a means to obtain inference about the optimal ITR. The second constraint we consider places a cap on the total cost under the rule rather than on the incremental cost relative to control. To our knowledge, the latter problem has not previously been considered in the literature. Both of these estimation problems mirror the intervention-on-encouragement setting considered in the study by Qiu et al. [9] but involve different constraints and a more general cost function.

Similarly as in the study by Qiu et al. [9], the estimators that we develop are asymptotically efficient within a nonparametric model and enable the construction of asymptotically valid confidence intervals (CIs) for the impact of implementing the optimal rule. We develop our estimators using similar tools – such as semiparametric efficiency theory [13,14] and targeted minimum loss-based estimation (TMLE) [15,16] – as were used to tackle the related problem studied in the study by Qiu et al. [9]. Consequently, our proposed estimators are similar to that presented in the study by Qiu et al. [9]. Therefore, we will streamline the presentation by highlighting the key similarities and focusing on the differences between these related problems and estimation schemes.

The rest of this article is organized as follows. In Section 2, we describe the problem setup, introduce notation, and present the causal estimands along with basic causal conditions. In Section 3, we present additional causal conditions and the corresponding nonparametric identification results. In Section 4, we present our proposed estimators and their theoretical properties. In Section 5, we present a simulation illustrating the performance of our proposed estimators. We make concluding remarks in Section 6. Proofs, technical conditions, and additional simulation results can be found in the Supplementary material.

2 Setup and objectives

To facilitate comparisons with the study by Qiu et al. [9], we adopt similar notation. Suppose that we observe independent and identically distributed data units O 1 , O 2 , … , O n ∼ P 0 , where P 0 is an unknown sampling distribution. A prototypical data unit O consists of the quadruplet ( W , T , C , Y ) , where W ∈ W ⊆ R p is the vector of baseline covariates, T ∈ { 0 , 1 } is the treatment status, C ∈ [ 0 , ∞ ) is the random treatment cost, and Y ∈ R is the outcome of interest. As a convention, we assume that larger values of Y are preferable. We use V = V ( W ) ∈ V to denote a fixed transformation of W upon which we allow treatment decisions to depend. For example, V may be a subset of covariates in W or a summary of W (e.g., body mass index as a summary of height and weight). In practice, V may be chosen based on prior knowledge on potential modifiers of the treatment effect as well as the cost of measuring various covariates. We distinguish between V ( W ) and W because of their different roles. On the one hand, we will assume that the full covariate W contains all confounders and thus is used to identify causal effects, while V ( W ) might not be sufficient for this purpose. On the other hand, some covariates in W may be expensive or difficult to measure in future applications, and thus, implementing an optimal ITR based on a subset V ( W ) of covariate W may be desirable. In the rest of this article, we will use the shorthand notation V , V i , and v to refer to V ( W ) , V ( W i ) , and V ( w ) , respectively. We define an individualized (stochastic) treatment rule (ITR) to be a function ρ : V → [ 0 , 1 ] that prescribes treatment with probability ρ ( v ) according to an exogenous source of randomness for an individual with covariate value v . Any stochastic ITR that only takes values in { 0 , 1 } is referred to as a deterministic ITR.

In this work, we adopt the potential outcomes framework [17,18]. For each individual, we use C ( t ) and Y ( t ) to denote the potential treatment cost and potential outcome, respectively, corresponding to scenarios in which the individual has treatment status t . We use E to denote an expectation over the counterfactual observations and the exogenous random mechanism defining a rule, and E 0 to denote an expectation over observables alone under sampling from P 0 . We make the usual stable unit treatment value assumption.

Condition A1

(Stable unit treatment value assumption) The counterfactual data unit of one individual is unaffected by the treatment assigned to other individuals, and there is only a single version of the treatment, so that T = t implies that C = C ( t ) and Y = Y ( t ) .

Remark 1

The ITRs we consider are not truly individualized, because they are based on the value of covariate V rather than each individual’s unique potential treatment effects Y ( 1 ) − Y ( 0 ) and C ( 1 ) − C ( 0 ) . Nevertheless, depending on the resolution of V , these ITRs can be considerably more individualized than assigning everyone to either treatment or control. In this article, we adopt the conventional nomenclature and refer to the treatment rules we study as ITRs [see, e.g., 7,19, 20,21, 22,23, 24,25, 26,27, 28,29].

We define C ( ρ ) and Y ( ρ ) to be the counterfactual treatment cost and outcome, respectively, for an ITR ρ under an exogenous random mechanism. We note that if ρ ( v ) ∈ ( 0 , 1 ) for an individual with covariate v , an exogenous random mechanism is used to randomly assign treatment with probability ρ ( v ) , and thus, C ( ρ ) and Y ( ρ ) are random for this given individual. If ρ were implemented in the population, then the population mean outcome would be E [ Y ( ρ ) ] , where we use E to denote expectation under the true data-generating mechanism involving potential outcomes ( C ( t ) , Y ( t ) ) and exogenous randomness in ρ . We consider a generic treatment resource constraint requiring that a convex combination of the population average treatment cost and the population average additional treatment cost compared to control be no greater than a specified constant κ ∈ ( 0 , ∞ ] . Consequently, an optimal ITR ρ 0 under this constraint is a solution in ρ : V → [ 0 , 1 ] to

(1) maximize E [ Y ( ρ ) ] subject to α E [ C ( ρ ) ] + ( 1 − α ) E [ C ( ρ ) − C ( 0 ) ] ≤ κ ,

where α ∈ [ 0 , 1 ] is also a constant specified by the investigator. Natural choices of α are α = 0 , corresponding to a constraint on the population average additional treatment cost compared to control, and α = 1 , corresponding to a constraint on the population average treatment cost. The first choice may be preferred when the control treatment corresponds to the current standard of care and a limited budget is available to fund the novel treatment to some patients. The second choice may be more relevant when both treatment and control incur treatment costs.

Remark 2

Our setup is similar to that in the study by Qiu et al. [9] if we view T and C defined here as the IV/encouragement Z and treatment status A defined in those prior works, respectively. However, the constraint in our setup is different from the constraint E [ ρ ( V ) C ( ρ ) ] ≤ κ considered previously. In IV settings, the constraint in (1) with α = 1 is useful when assigning treatment always incurs a cost, regardless of whether encouragement is applied, such as in distributing a limited supply of an expensive drug within a health system based on the results of a randomized clinical trial. It is instead useful with α = 0 when no encouragement is present under the standard of care but intervention on the encouragement is of interest when additional treatment resources are available. The constraint considered in the study by Qiu et al. [9,10] was instead useful in cases in which treatment only incurs a cost when paired with encouragement, such as when housing vouchers are used to encourage individuals to live in a certain area. In the general setting in which T is viewed as treatment status and C as a random treatment cost, the constraint in (1) with α = 0 is identical to that considered in the study by Sun et al. [11] – we refer the readers to these works for a more in-depth discussion of the relation between the current problem setup and IV settings.

To evaluate an optimal ITR ρ 0 , we follow Qiu et al. [9] in considering three types of reference ITRs and develop methods for statistical inference on the difference in the mean counterfactual outcome between ρ 0 and a reference ITR ρ 0 ℛ : V → [ 0 , 1 ] . The first type of reference ITR considered, denoted by ρ FR ( FR = fixed rule), is any fixed ITR that may be specified by the investigator before the study. When α = 0 , it is usually most reasonable to consider the rule that always assigns control, namely, v ↦ 0 , because the constraint in (1) may arise due to limited funding for implementing treatment whereas the standard of care rule is to always assign control. The second type, denoted by ρ 0 RD ( RD = random), prescribes treatment completely at random to individuals regardless of their baseline covariates. The probability of prescribing treatment is chosen such that the treatment resource is saturated (i.e., all available resources are used) or all individuals receive treatment, if such a probability exists. Symbolically, this ITR is given by ρ 0 RD : v ↦ min { 1 , ( κ − α E [ C ( 0 ) ] ) / E [ C ( 1 ) − C ( 0 ) ] } under the condition that E [ C ( 0 ) ] ≤ κ and E [ C ( 1 ) − C ( 0 ) ] > 0 . Although ρ 0 RD has the same interpretation as the corresponding encouragement rule in the study by Qiu et al. [9], its mathematical expression is different due to the different resource constraints. This rule may be of interest if it is known a priori that treatment is harmless. The third type, denoted by ρ 0 TP ( TP = true propensity), prescribes treatment according to the true propensity of the treatment implied by the study sampling mechanism P 0 , so that ρ 0 TP equals w ↦ P 0 ( T = 1 ∣ W = w ) . This ITR may be of interest in two settings. In one setting, ρ 0 TP satisfies the treatment resource constraint. The investigator may wish to determine the extent to which the implementation of an optimal ITR would improve upon the standard of care. In the other setting, the treatment resource constraint is newly introduced and the standard-of-care ITR may lead to overuse of treatment resources. The investigator may then be interested in whether the implementation of an optimal constrained ITR would result, despite the new resource constraint, in a noninferior mean outcome.

3 Identification of causal estimands

In this section, we present nonparametric identification results. Though these results are similar to those for individualized encouragement rules in the study by Qiu et al. [9], there are two key differences. First, the form of some of the conditions in the study by Qiu et al. [9] need to be modified to account for the novel resource constraint considered here. Second, two additional conditions are needed to overcome challenges that arise due to this new constraint.

We first introduce notation that will be useful when presenting our identification results and our proposed estimators. For any observed-data distribution P , we define pointwise the conditional mean functions μ P C ( t , w ) ≔ E P ( C ∣ T = t , W = w ) and μ P Y ( t , w ) ≔ E P ( Y ∣ T = t , W = w ) , where we use E P to denote an expectation over observables alone under sampling from P , and their corresponding contrasts due to different treatment status, Δ P C ( w ) ≔ μ P C ( 1 , w ) − μ P C ( 0 , w ) and Δ P Y ( w ) ≔ μ P Y ( 1 , w ) − μ P Y ( 0 , w ) . We also define the average of these contrasts conditional on V as δ P C ( v ) ≔ E P [ Δ P C ( W ) ∣ V = v ] and δ P Y ( v ) ≔ E P [ Δ P Y ( W ) ∣ V = v ] , and the propensity to receive treatment μ P T ( w ) ≔ P ( T = 1 ∣ W = w ) . In addition, we define ν P ( t , v ) ≔ E P [ E P ( C ∣ T = t , W ) ∣ V = v ] , ϕ P ≔ E P [ μ P C ( 0 , W ) ] . These quantities play an important role in tackling the problem at hand. Throughout the article, for ease of notation, if f P is a quantity or operation indexed by distribution P , we may denote f P 0 by f 0 . As an example, we may use Δ 0 Y to denote Δ P 0 Y .

We introduce additional causal conditions we will require, positivity and unconfoundedness. In one form or another, these conditions commonly appear in the causal inference literature [16], including in the IV literature [30,31, 32,33].

Condition A2

(Strong positivity). There exists a constant ε T > 0 such that ε T < μ 0 T ( w ) < 1 − ε T holds for P 0 – almost every w .

Condition A3

(Unconfoundedness of treatment). For each t ∈ { 0 , 1 } , T and ( C ( t ) , Y ( t ) ) are conditionally independent given W = w for P 0 – almost every w .

Equipped with these conditions, we are able to state a theorem on the nonparametric identification of the mean counterfactual outcomes and average treatment effect (ATE) – these results can be viewed as a corollary of the well-known G-formula [34].

Theorem 1

(Identification of ATE and expected treatment resource expenditure). Provided Conditions A1–A3 are satisfied, it holds that E [ Y ( t ) ∣ W = w ] = μ 0 Y ( t , w ) , E [ Y ( 1 ) − Y ( 0 ) ∣ W = w ] = Δ 0 Y ( w ) , and E [ Y ( 1 ) − Y ( 0 ) ∣ V = v ] = δ 0 Y ( v ) for P 0 – almost every w and v , and so, E [ Y ( ρ ) − Y ( ρ 0 ℛ ) ] = E 0 [ { ρ ( V ) − ρ 0 ℛ ( W ) } Δ 0 Y ( W ) ] . In addition, it holds that E [ C ( t ) ∣ W = w ] = μ 0 C ( t , w ) for P 0 – almost every w , and so, E [ C ( ρ ) ] = E 0 [ ρ ( V ) μ 0 C ( 1 , W ) + ( 1 − ρ ( V ) ) μ 0 C ( 0 , W ) ] .

In view of Theorem 1, the objective function in (1) can be identified as follows:

E [ Y ( ρ ) ] = E 0 [ ρ ( V ) μ 0 Y ( 1 , W ) + ( 1 − ρ ( V ) ) μ 0 Y ( 0 , W ) ] = E 0 [ ρ ( V ) Δ 0 Y ( W ) ] + E 0 [ μ 0 Y ( 0 , W ) ] ,

and, similarly, the expected cost is identified as E [ C ( ρ ) ] = E 0 [ ρ ( V ) Δ 0 C ( W ) ] + E 0 [ μ 0 C ( 0 , W ) ] . It follows that the optimization problem (1) is equivalent to

(2) maximize E 0 [ ρ ( V ) δ 0 Y ( V ) ] subject to E 0 [ ρ ( V ) δ 0 C ( V ) ] + α ϕ 0 ≤ κ .

This differs from equation (3) defining optimal individualized encouragement rules in the study by Qiu et al. [9]. We now present two additional conditions so that (2) is a fractional knapsack problem [35], thereby allowing us to use existing results from the optimization literature. These conditions are similar to those in the study by Sun et al. [11].

Condition A4

(Strictly costlier treatment). There exists a constant ε C > 0 such that Δ 0 C ( w ) > ε C holds for P 0 – almost every w .

Condition A5

(Financial feasibility of assigning treatment). The inequality α ϕ 0 < κ holds.

Condition A4 is reasonable if the treatment is more expensive than control. When applied to an IV setting as outlined in Remark 2, this condition corresponds to the assumption that the IV is indeed an encouragement to take treatment. This condition is slightly stronger than its counterpart in the study by Sun et al. [11], which only requires that Δ 0 C ≥ 0 . This stronger condition is needed to ensure the asymptotic linearity of our proposed estimator in Section 4. Under Condition A4, it is evident that Condition A5 is reasonable because if α ϕ 0 > κ , then no ITR satisfies the treatment resource constraint in view of the fact that E 0 [ ρ ( V ) δ 0 C ( V ) ] ≥ 0 , whereas if α ϕ 0 = κ , then only the trivial ITR v ↦ 0 satisfies the constraint, and there is no need to estimate an optimal ITR.

Under these two additional conditions, (2) is a fractional knapsack problem [35] in which every subgroup defined by a different value of V corresponds to a different “item.” A solution in the special case in which V ( W ) = W and α = 0 was given in Theorem 1 in the study by Sun et al. [11]. We now state a more general result with the following differences: (i) the treatment decision may be based on a summary V rather than the entire covariate vector W , and (ii) α may take any value in [ 0 , 1 ] rather than only zero. We also explicitly state the randomization probability at the boundary for completeness and clarity. Despite these differences, the result we obtain is similar to Theorem 1 in the study by Sun et al. [11]. Define pointwise ξ 0 ( v ) ≔ δ 0 Y ( v ) / δ 0 C ( v ) , and write η 0 ≔ inf { η : E 0 [ I ( ξ 0 ( V ) > η ) δ 0 C ( V ) ] ≤ κ − α ϕ 0 } and τ 0 ≔ max { η 0 , 0 } .

Theorem 2

(Optimal ITR). Under Conditions A1–A5, a solution to (2) is explicitly given by

ρ 0 ( v ) ≔ κ − α ϕ 0 − E 0 [ I ( ξ 0 ( V ) > τ 0 ) δ 0 C ( V ) ] E 0 [ I ( ξ 0 ( V ) = τ 0 ) δ 0 C ( V ) ] : if τ 0 > 0 , ξ 0 ( v ) = τ 0 and E 0 [ I ( ξ 0 ( V ) = τ 0 ) δ 0 C ( V ) ] > 0 I ( ξ 0 ( v ) > τ 0 ) : otherwise.

Here, the first case is the boundary case with the randomization probability that saturates the treatment resource.

We also note that the reference ITRs introduced in Section 2 are also identified under the aforementioned conditions. In particular, it can be shown that ρ 0 RD ( v ) ≔ min { 1 , ( κ − α ϕ 0 ) / E 0 [ Δ 0 C ( W ) ] } and ρ 0 TP = μ 0 T .

4 Estimating and evaluating optimal ITRs

In this section, we present an estimator of an optimal ITR ρ 0 and an inferential procedure for its ATE relative to a reference ITR ρ 0 ℛ , where ℛ is any of FR , RD , or TP . The proposed procedure is an adaptation of the method first proposed by Qiu et al. [9,10].

We begin by introducing some notations that are useful for defining the estimands. We define the parameter Ψ ρ ( P ) ≔ E P [ ρ ( V ) Δ P Y ( W ) ] or Ψ ρ ( P ) ≔ E P [ ρ ( W ) Δ P Y ( W ) ] for each ITR ρ and distribution P ∈ ℳ , depending on whether the domain of ρ is V or W . Here, we consider the model ℳ to be locally nonparametric at P 0 [13]. For P ∈ ℳ , the ATE of an optimal ITR ρ P relative to a reference ITR ρ P ℛ equals Ψ ℛ ( P ) ≔ Ψ ρ P ( P ) − Ψ ρ P ℛ ( P ) . We are interested in making inference about ψ 0 ≔ Ψ ℛ ( P 0 ) , where we have suppressed dependence on ℛ from our shorthand notation.

4.1 Pathwise differentiability of the ATE

We first present a result regarding the pathwise differentiability of the ATE. Pathwise differentiability of the parameter of interest serves as the foundation for constructing asymptotically efficient estimators of this parameter, based on which an inferential procedure may be developed. Additional technical conditions are required and are provided in Section S1 in the Supplementary Material. For a distribution P ∈ ℳ , a function μ C : { 0 , 1 } × W → R , an ITR ρ , and a decision threshold τ ∈ R , we define pointwise the following functions:

(3) D ( P , ρ , τ , μ C ) ( o ) ≔ ρ ( v ) y − μ P Y ( t , w ) t + μ P T ( w ) − 1 + Δ P Y ( w ) − Ψ e ( P ) − τ ρ ( v ) c − μ C ( t , w ) t + μ P T ( w ) − 1 + Δ C ( w ) + α ( 1 − t ) ( c − μ C ( 0 , w ) ) 1 − μ P T ( w ) + μ C ( 0 , w ) − κ ; G ( P ) ( o ) ≔ D ( P , ρ P , τ P , μ P C ) ( o ) ; D 1 ( P , μ C ) ( o ) ≔ ( 1 − t ) ( c − μ C ( 0 , w ) ) 1 − μ P T ( w ) + μ C ( 0 , w ) − E P [ μ C ( 0 , W ) ] ; D 2 ( P , μ C ) ( o ) ≔ c − μ C ( t , w ) t + μ P T ( w ) − 1 + Δ C ( w ) − E P [ Δ C ( W ) ] ; G RD ( P ) ( o ) ≔ D ( P , ρ P RD , 0 , μ P C ) ( o ) − α Ψ ρ P RD ( P ) D 1 ( P , μ P C ) κ − ϕ P − Ψ ρ P RD ( P ) D 2 ( P , μ P C ) E P [ Δ P C ( W ) ] ; G TP ( P ) ( o ) ≔ μ P T ( w ) t + μ P T ( w ) − 1 [ y − μ P Y ( t , w ) ] + t Δ P Y ( w ) − Ψ ρ P TP ( P ) ; G FR ( P ) ( o ) ≔ D ( P , ρ FR , 0 , μ P C ) ( o ) .

One key condition we rely on is the following nonexceptional law assumption.

Condition B1

(Nonexceptional law). P 0 ( ξ 0 ( V ) = τ 0 ) = 0 .

Under this condition, the true optimal ITR ρ 0 is identical to an indicator function. If all covariates are discrete, then we can plug in the empirical estimates into the identification formulae in Theorems 1 and 2 and show that the resulting estimators of the ATE are asymptotically normal by the delta method even when Condition B1 does not hold. We do not further pursue this simple case in this article, and thus need to rely on the nonexceptional law assumption, namely Condition B1, to account for continuous covariates. We list additional technical conditions in Supplement S1.

We can now provide a formal result describing the pathwise differentiability of the ATE parameter.

Theorem 3

(Pathwise differentiability of the ATE). Let ℛ ∈ { FR , RD , TP } . Provided Conditions A1–A5 and B1–B5 are satisfied, the parameters P ↦ Ψ ρ P ( P ) and P ↦ Ψ ρ P ℛ ( P ) are pathwise differentiable at P 0 relative to ℳ with canonical gradients G ( P 0 ) and G ℛ ( P 0 ) , respectively.

We note that the pathwise differentiability of P ↦ Ψ ρ P ℛ ( P ) was established in Theorem 3 in the study by Qiu et al. [9] for ℛ ∈ { FR , TP } . The other results can be proven using similar techniques. We presented the proof of these results in Supplement S4.2. In view of Theorem 3, it follows that the ATE parameter Ψ ℛ is pathwise differentiable at P 0 with nonparametric canonical gradient:

(4) D ℛ ( P 0 ) ≔ G ( P 0 ) − G ℛ ( P 0 )

for ℛ ∈ { FR , RD , TP } .

Remark 3

We have noted similar additional terms related to the resource being used in the canonical gradient of the mean counterfactual outcome or ATE of optimal ITRs under resource constraints, for example, in the studies by Luedtke and van der Laan [8] and Qiu et al. [9]. In our problem, this additional term is

− τ 0 ρ 0 ( v ) c − μ 0 C ( t , w ) t + μ 0 T ( w ) − 1 + Δ 0 C ( w ) + α ( 1 − t ) ( c − μ 0 C ( 0 , w ) ) 1 − μ 0 T ( w ) + μ 0 C ( 0 , w ) − κ .

Such terms appear to come from solving a fractional knapsack problem with truncation at zero and take the form of a product of (i) the threshold in the solution, and (ii) a term that equals the influence function of the resource being used under the solution when the resource is saturated. We conjecture that such structures generally exist for fractional knapsack problems.

4.2 Proposed estimator and asymptotic linearity

We next present our proposed nonparametric procedure for estimating an optimal ITR ρ 0 and the corresponding ATE ψ 0 . We will generally use subscript n to denote an estimator with sample size n and add a hat to a nuisance function estimator that is targeted toward estimating ϕ 0 .

Use the empirical distribution P ˆ W , n of W as an estimate of the true marginal distribution of W . Compute estimates μ n Y , μ n C , μ n T , δ n Y and δ n C of μ 0 Y , μ 0 C , μ 0 T , δ 0 Y and δ 0 C , using flexible regression methods. Recall that μ 0 Y ( t , w ) = E 0 [ Y ∣ T = t , W = w ] , μ 0 C ( t , w ) = E 0 [ C ∣ T = t , W = w ] , δ 0 Y ( v ) = E 0 [ μ 0 Y ( 1 , W ) − μ 0 Y ( 0 , W ) ∣ V = v ] , and δ 0 C ( v ) = E 0 [ μ 0 C ( 1 , W ) − μ 0 C ( 0 , W ) ∣ V = v ] . Define pointwise Δ n C ( w ) ≔ μ n C ( 1 , w ) − μ n C ( 0 , w ) .
Estimate an optimal ITR:
1. Estimate ϕ 0 = E 0 [ μ 0 C ( 0 , W ) ] with a one-step correction estimator:
  ϕ n ≔ 1 n ∑ i = 1 n μ n C ( 0 , W i ) + ( 1 − T i ) ( C i − μ n C ( 0 , W i ) ) 1 − μ n T ( W i ) .
2. Let ξ n ≔ δ n Y / δ n C , Γ n : τ ↦ 1 n ∑ i : ξ n ( V i ) > τ Δ n C ( W i ) and γ n : τ ↦ 1 n ∑ i : ξ n ( V i ) = τ Δ n C ( W i ) . For any k ∈ [ 0 , ∞ ] , define η n ( k ) ≔ inf { τ : Γ n ( τ ) ≤ k − α ϕ n } , τ n ( k ) ≔ max { η n ( k ) , 0 } and
  d n , k : v ↦ k − α ϕ n − Γ n ( η n ( k ) ) γ n ( η n ( k ) ) : if ξ n ( v ) = η n ( k ) and γ n ( η n ( k ) ) > 0 , I { ξ n ( v ) > η n ( k ) } : otherwise.
  The rule d n , k is the sample analog of an ITR that prescribes treatment to those with the highest values of ξ 0 ( V ) , regardless of whether the treatment is harmful, until treatment resources run out.
3. Compute k n , which is used to define an estimate of ρ 0 for which the plug-in estimator is asymptotically linear under conditions, as follows:
  1. if τ n ( κ ) > 0 and there is a solution in k ∈ [ 0 , ∞ ) to
    (5) 1 n ∑ i = 1 n d n , k ( V i ) Δ n C ( W i ) + C i − μ n C ( T i , W i ) t i + μ n T ( W i ) − 1 + α ϕ n = κ ,
    then take k n to be this solution;
  2. otherwise, set k n = κ .
4. Estimate ρ 0 using the sample analog of ρ 0 with treatment resource constraint k n , namely,
  ρ n : v ↦ k n − α ϕ n − Γ n ( τ n ( k n ) ) γ n ( τ n ( k n ) ) : if ξ n ( v ) = τ n ( k n ) and γ n ( τ n ( k n ) ) > 0 I { ξ n ( v ) > τ n ( k n ) } : otherwise.
Obtain an estimate ρ n ℛ of the reference ITR ρ 0 ℛ as follows:
1. For ℛ = FR , take ρ n ℛ to be ρ FR .
2. For ℛ = RD ,
  1. obtain a targeted estimate μ ˆ n C ( 1 , ⋅ ) of μ 0 C ( 1 , ⋅ ) : run an ordinary least-square linear regression with outcome C , covariate 1 / ( T + μ n T ( W ) − 1 ) , offset μ n C ( T , W ) , and no intercept. Take μ ˆ n C to be the fitted mean model;
  2. take ρ n ℛ to be the constant function w ↦ min 1 , ( κ − ϕ n ) / 1 n ∑ i = 1 n Δ ˆ n C ( W i ) , where we define pointwise Δ ˆ n C ( w ) ≔ μ ˆ n C ( 1 , w ) − μ ˆ n C ( 0 , w ) .
3. For ℛ = TP , take ρ n ℛ to be μ n T .
Estimate ATE of ρ 0 relative to the reference ITR ρ 0 ℛ with a TMLE ψ n :
1. obtain a targeted estimate μ ˆ n Y of μ 0 Y : run an ordinary least-square linear regression with outcome Y , covariate [ ρ n ( V ) − ρ n ℛ ( W ) ] / [ T + μ n T ( W ) − 1 ] , offset μ n Y ( T , W ) , and no intercept. Take μ ˆ n Y to be the fitted mean function.
2. with P ˆ n being any distribution with components μ ˆ n Y and P ˆ W , n , take
  ψ n ≔ Ψ ρ n ( P ˆ n ) − Ψ ρ n ℛ ( P ˆ n ) = 1 n ∑ i = 1 n [ ρ n ( V i ) − ρ n , i ℛ ] [ μ ˆ n Y ( 1 , W i ) − μ ˆ n Y ( 1 , W i ) ] ,
  where ρ n , i ℛ is defined as ρ n ℛ ( W i ) or ρ n ℛ ( V i ) depending on the covariate used by the reference ITR.

The aforementioned procedure is similar to that proposed in the study by Qiu et al. [9]. One key difference is the use of the refined estimator k n of κ obtained via the estimating equation (5), which is a key to ensuring the asymptotic linearity of ψ n . Another difference is that the denominator of ξ n is now δ n C , which is consistent with our different definition of the unit value for solving the fractional knapsack problem (2). Similar to TMLE for other problems, when C or Y has known bounds (e.g., the closed interval [ 0 , 1 ] ), to obtain a corresponding targeted estimate that respect the known bounds, we may use logistic regression rather than ordinary least squares [36].

The aforementioned procedure has both similarities and substantial differences compared to the estimation procedure proposed by Sun et al. [11]. The main difference is that our procedure is targeted towards efficient estimation of and inference about the ATE of ψ 0 of the optimal ITR under a nonparametric model, while Sun et al. [11] focused on estimating the optimal ITR ρ 0 and does not evaluate this optimal ITR. This leads to a key difference between the two procedures when estimating the optimal ITR: we need to solve an estimating equation (5), which is crucial to ensuring that the estimator ψ n is asymptotically linear, while Sun et al. [11] do not. The requirement of solving (5) is related to the nature of the fractional knapsack problem discussed in Remark 3, and we conjecture that such a calibration on the resource used is necessary for general problems of the same nature. Our procedure is also related to the method in Sun [12]. Sun [12] relied on the availability of asymptotically normal estimators of both the average benefit and average resource used (Assumption 2.4), a nontrivial requirement when the propensity score μ 0 T is unknown in observational studies. Our procedure essentially produces such estimators: in Step 4, an asymptotically normal estimator of the ATE is constructed, whereas an asymptotically normal estimator of the expected resource is produced in Step 2 and used to calibrate the resource expenditure of the estimated optimal ITR ρ n in Step 2(c).

Remark 4

In Step 1 of the aforementioned procedure, we estimate the functions δ 0 Y and δ 0 C using a naïve approach based on outcome regression. It is viable to use more advanced techniques such as the doubly robust methods in van der Laan and Rubin [15], van der Laan and Luedtke [37], Luedtke and van der Laan [38], and Kennedy [39] or R-learning as in Nie and Wager [40]. These methods were developed for conditional ATE estimation and might lead to better estimators of δ 0 Y and δ 0 C . It is also possible to develop multiply robust methods to estimate ξ 0 using influence function techniques. Such methods to estimate ξ 0 are beyond the scope of our article, whose main focus is on the inference for the ATE. Our theoretical analysis of the estimator only applies to naïve estimators based on outcome regression, but we expect only minor modifications to be required to study these more advanced estimators once their asymptotic behavior is characterized.

Remark 5

In Step 2(a), it is also viable to use other efficient estimators of ϕ 0 , for example, a TMLE. We note that estimating ϕ 0 is only one component of estimating the optimal ITR ρ 0 . Methods such as TMLE can be preferable to ensure that the estimator respects known bounds on the estimand. However, in our case, such an improvement in estimating ϕ 0 does not necessarily lead to an improvement in the estimation of ρ 0 .

We now present results on the asymptotic linearity and efficiency of our proposed estimator. We state and discuss the technical conditions required by the theorem below in Supplement S1.

Theorem 4

(Asymptotic linearity of ATE estimator) Let ℛ ∈ { FR , RD , TP } . Under Conditions B1–B12, with the canonical gradient D ℛ ( P 0 ) defined in (3) and (4), it holds that

ψ n − ψ 0 = 1 n ∑ i = 1 n D ℛ ( P 0 ) ( O i ) + o p ( n − 1 / 2 ) .

Therefore, n ( ψ n − ψ 0 ) ⟶ d N ( 0 , σ 0 2 ) , where σ 0 2 ≔ E 0 [ D ℛ ( P 0 ) ( O ) 2 ] . Since ψ n is asymptotically linear with influence function equal to the canonical gradient, ψ n is also asymptotically efficient.

To conduct inference about ψ 0 , we can directly plug the estimators of nuisance functions into D ℛ ( P 0 ) to obtain a consistent estimator of D ℛ ( P 0 ) , and then take the sample variance to obtain a consistent estimator of the asymptotic variance σ 0 2 . The proof of Theorem 4 can be found in Supplements S4.3 and S4.4.

Remark 6

It may be desirable to use cross-fitting [41,42] to estimate an optimal ITR for better finite-sample performance. The asymptotic linearity is maintained by a similar argument that is used to prove Theorem 4. We describe this algorithm in Section S3 in the Supplementary Material.

Remark 7

We note that, unlike the study by Qiu et al. [9] where the bound κ lies in ( 0 , 1 ] due to the binary nature of treatment status, the methods we propose here do not require knowledge of an upper bound on treatment costs. When such a bound is indeed known (e.g., one), our methods may still be applied as long as all special cases corresponding to κ = ∞ or κ < ∞ in Section 4 are replaced by κ being equal to or less than the known bound, respectively.

5 Simulation

5.1 Simulation setting

In this simulation study, we investigate the performance of our proposed estimator of the ATE of an optimal ITR relative to specified reference ITRs. We focus here on the setting α = 1 . This scenario is more difficult than the case α = 0 because it requires the estimation of ϕ 0 .

We generate data from a model in which the treatment T is an IV and both the treatment cost C and outcome Y are binary. This data-generating mechanism satisfies all causal conditions and has an unobserved confounder between treatment cost and outcome. We first generate a trivariate covariate W = ( W 1 , W 2 , W 3 ) , where W 1 ∼ Unif ( − 1 , 1 ) , W 2 ∼ Bernoulli ( 0.8 ) , and W 3 ∼ N ( 0 , 1 ) are mutually independent. We also simulate an unobserved treatment-outcome confounder U ∼ Bernoulli ( 0.5 ) independently of W , and then simulate T , C , and Y as follows:

T ∣ W , U ∼ Bernoulli ( expit ( 2.5 W 1 + 0.5 W 2 W 3 ) ) , C ∣ T , W , U ∼ Bernoulli ( expit ( 2 T − 1 − W 1 + 0.2 W 2 + 0.7 W 3 + 2 W 1 W 2 + 0.5 U ) ) , Y ∣ T , C , W , U ∼ Bernoulli ( expit ( − 0.3 C + C W 2 − W 1 + 0.2 W 2 − 0.9 W 3 + 0.3 C U ) ) .

We introduce U in the data-generating mechanism to emphasize that we do not require assumptions on the joint distribution of treatment cost and outcome conditional on covariates. We consider all three reference ITRs ℛ ∈ { FR , RD , TP } , where we set ρ FR : v ↦ 0 . We set κ = 0.68 , which is an active constraint with τ 0 > 0 and ρ 0 RD < 1 .

The ITRs we consider are based on all covariates – that is, we take V ( W ) = W . We estimate the nuisance functions using the Super Learner [43] with library including a logistic regression, generalized additive model with logit link [44], gradient boosting machine [45,46, 47], support vector machine [48,49], and neural network [50,51]. Because none of the nuisance functions follows a logistic regression model, the resulting ensemble learner is not expected to achieve the parametric convergence rate. Since both C and Y are binary, we use logistic regression rather than ordinary least squares to obtain their corresponding targeted estimates in Section 4.2. We consider sample size n ∈ { 500 , 1,000 , 4,000 , 16,000 } and run 1000 Monte Carlo repetitions for each sample size. We implement the algorithm that incorporates cross-fitting discussed in Remark 6 and described in Section S3 in the Supplementary Material.

To evaluate the performance of our proposed estimator, we investigate the bias and root-mean-squared error (RMSE) of the estimator. We also investigate the coverage probability and the width of nominal 95% Wald CIs constructed using influence function-based standard error estimates. We further investigate the probability that our confidence lower limit falls below the true ATE, that is, the coverage probability of the 97.5% Wald confidence lower bound.

5.2 Simulation results

Table 1 presents the performance of our proposed estimator in this simulation. For sample sizes 500, 1,000 and 4,000, the CI coverage of our proposed method is lower than the nominal coverage 95%. When sample size is larger (16,000), the CI coverage of our proposed method increases to 90–93%. The coverage of the confidence lower bounds is much closer to nominal (97.5%) for all sample sizes considered, though, and is always approximately nominal when the sample size is large. For all reference ITRs, the bias and RMSE of our proposed estimator appear to converge to zero faster than and at the same rate as the square root of sample size, respectively. All biases are negative, which is expected in view of Remark 6. All standard errors underestimate the variation of the estimator with the extent decreasing as sample size increases.

Table 1

Performance of estimators of ATEs in the simulation with nuisance functions estimated via machine learning

Performance measure	Sample size	FR	RD	TP
95% Wald CI coverage	500	74 %	71 %	70 %
	1,000	78 %	74 %	73 %
	4,000	90 %	84 %	88 %
	16,000	93 %	90 %	93 %
97.5% confidence lower	500	94 %	96 %	96 %
bound coverage	1,000	97 %	98 %	96 %
	4,000	98 %	98 %	98 %
	16,000	97 %	98 %	97 %
Bias	500	− 0.018	− 0.018	− 0.020
	1,000	− 0.014	− 0.013	− 0.013
	4,000	− 0.003	− 0.004	− 0.003
	16,000	− 0.000	− 0.001	− 0.000
RMSE	500	0.056	0.039	0.046
	1,000	0.039	0.025	0.031
	4,000	0.017	0.009	0.012
	16,000	0.009	0.004	0.005
Ratio of mean standard error	500	0.620	0.620	0.571
to standard deviation	1,000	0.683	0.673	0.637
	4,000	0.868	0.765	0.809
	16,000	0.913	0.870	0.906

Figure 1 presents the width of the Wald CIs scaled by the square root of sample size n . Our theory indicates that the CI width should shrink at a root- n rate, and our simulation results are consistent with this. There are some outlying cases of extremely wide or narrow CIs. This is expected for small sample sizes because the estimator of σ 0 2 in Theorem 4 resembles a sample mean and might not be close to σ 0 2 with high probability when the sample size is small. In practice, this issue might be slightly mitigated by fine-tuning the involved machine learning algorithms.

$Figure 1 Boxplot of n × CI \sqrt{n}\times {\rm{CI}} width for ATE relative to each reference IER.$

Figure 1

Boxplot of n × CI width for ATE relative to each reference IER.

As indicated in Theorem 4, theoretical guarantees for the validity of the Wald CIs rely on the nuisance function estimators converging to the truth sufficiently quickly. It appears that the undercoverage of our Wald CI in small samples may owe, in part, to poor estimation of these nuisance functions in small sample sizes. To illustrate how our procedure may perform with improved small-sample nuisance function estimators, we conducted another two simulations: one is identical to those reported earlier in all ways except that the nuisance function estimators μ n Y , μ n C , and μ n T are taken to be equal to the truth; the other is a simpler scenario under a lower dimension and a parametric model. The results are presented in Section S5 in the Supplementary material and suggest that our proposed estimator may achieve significantly better performance with improved machine learning estimators of the nuisance functions. This motivates seeking ways to optimize the finite-sample performance of the nuisance function estimators employed in future applications of the proposed method, possibly based on prior subject-matter expertise. The underestimation of standard errors in this simulation also motivates future work exploring whether there are standard error estimators with better finite-sample performance, for example, estimators based on the bootstrap.

6 Conclusion

There is extensive literature on estimating optimal ITRs and evaluating their performance. Among these works, only a few incorporated treatment resource constraints. In this article, we build upon the study by Sun et al. [11] and study the problem of estimating optimal ITRs under treatment cost constraints when the treatment cost is random. By using similar techniques as used in the study by Qiu et al. [9], we have proposed novel methods to estimate an optimal ITR and infer about the corresponding ATE relative to a prespecified reference ITR, under a locally nonparametric model. Our methods may also be applied to IV settings in the study by Qiu et al. [9] when the IV is intervened on.

Acknowledgments

This work was partially supported by the National Institutes of Health under award numbers DP2-LM013340 and R01HL137808. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Funding information: This work was partially supported by the National Institutes of Health under award numbers DP2-LM013340 and R01HL137808.
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Conflict of interest: Prof. Marco Carone is a member of the Editorial Board in the Journal of Causal Inference but was not involved in the review process of this article.
Data availability statement: Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.

References

[1] Rothwell PM. Subgroup analysis in randomised controlled trials: importance, indications, and interpretation. Lancet. 2005;365(9454):176–86. 10.1016/S0140-6736(05)17709-5Suche in Google Scholar PubMed

[2] Varadhan R, Segal JB, Boyd CM, Wu AW, Weiss CO. A framework for the analysis of heterogeneity of treatment effect in patient-centered outcomes research. J Clin Epidemiol. 2013;66(8):818–25. 10.1016/j.jclinepi.2013.02.009Suche in Google Scholar PubMed PubMed Central

[3] Chakraborty B, Moodie EEM. Statistical methods for dynamic treatment regimes. Statistics for biology and health. New York, NY: Springer; 2013. 10.1007/978-1-4614-7428-9Suche in Google Scholar

[4] Luedtke AR, van der Laan MJ. Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy. Annals Statistics. 2016;44(2):713–42. 10.1214/15-AOS1384Suche in Google Scholar PubMed PubMed Central

[5] Murphy SA. Optimal dynamic treatment regimes. J R Stat Soc B (Stat Methodol). 2003;65(2):331–55. 10.1111/1467-9868.00389Suche in Google Scholar

[6] Robins JM. Optimal structural nested models for optimal sequential decisions. New York, NY: Springer; 2004. p. 189–326. 10.1007/978-1-4419-9076-1_11Suche in Google Scholar

[7] Zhao Y, Zeng D, Rush AJ, Kosorok MR. Estimating individualized treatment rules using outcome weighted learning. J Am Stat Assoc. 2012;107(499):1106–18. 10.1080/01621459.2012.695674Suche in Google Scholar PubMed PubMed Central

[8] Luedtke AR, van der Laan MJ. Optimal individualized treatments in resource-limited settings. Int J Biostat. 2016;12(1):283–303. 10.1515/ijb-2015-0007Suche in Google Scholar PubMed PubMed Central

[9] Qiu H, Carone M, Sadikova E, Petukhova M, Kessler RC, Luedtke A. Optimal individualized decision rules using instrumental variable methods. J Am Stat Assoc. 2021;116(533):174–91. 10.1080/01621459.2020.1745814Suche in Google Scholar PubMed PubMed Central

[10] Qiu H, Carone M, Sadikova E, Petukhova M, Kessler RC, Luedtke A. Correction to: optimal individualized decision rules using instrumental variable methods. J Am Stat Assoc. 2021;(just-accepted):1–2. 10.1080/01621459.2020.1865166Suche in Google Scholar PubMed PubMed Central

[11] Sun H, Du S, Wager S. Treatment allocation under uncertain costs. 2021. arXiv: http://arXiv.org/abs/arXiv:210311066v1. Suche in Google Scholar

[12] Sun L. Empirical welfare maximization with constraints. 2021. arXiv: http://arXiv.org/abs/arXiv:210315298v1. Suche in Google Scholar

[13] Pfanzagl J. Estimation in semiparametric models. In: Estimation in semiparametric models. New York, NY, USA: Springer; 1990. p. 17–22. 10.1007/978-1-4612-3396-1_5Suche in Google Scholar

[14] van der Vaart AW. Asymptotic statistics. Cambridge, England: Cambridge University Press; 1998. 10.1017/CBO9780511802256Suche in Google Scholar

[15] van der Laan M, Rubin D. Targeted maximum likelihood learning. Int J Biostat. 2006;2(1):Article 11. doi: 10.2202/1557-4679.1043. 10.2202/1557-4679.1043Suche in Google Scholar

[16] van der Laan MJ, Rose S. Targeted learning in data science. New York, NY, USA: Springer; 2018. 10.1007/978-3-319-65304-4Suche in Google Scholar

[17] Neyman J. Sur les applications de la théorie des probabilités aux expériences agricoles: Essay des principles. (Excerpts reprinted and translated to English, 1990). Stat Sci. 1923;5:463–72. Suche in Google Scholar

[18] Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol. 1974;66(5):688–701. 10.1037/h0037350Suche in Google Scholar

[19] Butler EL, Laber EB, Davis SM, Kosorok MR. Incorporating patient preferences into estimation of optimal individualized treatment rules. Biometrics 2018;74(1):18–26. 10.1111/biom.12743Suche in Google Scholar PubMed PubMed Central

[20] Chen J, Fu H, He X, Kosorok MR, Liu Y. Estimating individualized treatment rules for ordinal treatments. Biometrics 2018;74(3):924–33. 10.1111/biom.12865Suche in Google Scholar PubMed PubMed Central

[21] Imai K, Li ML. Experimental evaluation of individualized treatment rules. J Am Stat Assoc. 2021;1–15. 10.1080/01621459.2021.1923511.Suche in Google Scholar

[22] Laber E, Zhao Y. Tree-based methods for individualized treatment regimes. Biometrika. 2015;102(3):501–14. 10.1093/biomet/asv028Suche in Google Scholar PubMed PubMed Central

[23] Lei H, Nahum-Shani I, Lynch K, Oslin D, Murphy SA. A “SMART” design for building individualized treatment sequences. Annual Rev Clin Psychol. 2012;8:21–48. 10.1146/annurev-clinpsy-032511-143152Suche in Google Scholar PubMed PubMed Central

[24] Petersen ML, Deeks SG, van der Laan MJ. Individualized treatment rules: Generating candidate clinical trials. Stat Med 2007;26(25):4578–601. 10.1002/sim.2888Suche in Google Scholar PubMed PubMed Central

[25] Qian M, Murphy SA. Performance guarantees for individualized treatment rules. Annal Stat. 2011;39(2):1180. 10.1214/10-AOS864Suche in Google Scholar PubMed PubMed Central

[26] Song R, Kosorok M, Zeng D, Zhao Y, Laber E, Yuan M. On sparse representation for optimal individualized treatment selection with penalized outcome weighted learning. Stat. 2015;4(1):59–68. 10.1002/sta4.78Suche in Google Scholar PubMed PubMed Central

[27] van der Laan MJ, Petersen ML. Causal effect models for realistic individualized treatment and intention to treat rules. Int J Biostat. 2007;3(1):Article 3. 10.2202/1557-4679.1022.Suche in Google Scholar PubMed PubMed Central

[28] Zhao YQ, Zeng D, Laber EB, Song R, Yuan M, Kosorok MR. Doubly robust learning for estimating individualized treatment with censored data. Biometrika. 2015;102(1):151–68. 10.1093/biomet/asu050Suche in Google Scholar PubMed PubMed Central

[29] Zhou X, Mayer-Hamblett N, Khan U, Kosorok MR. Residual weighted learning for estimating individualized treatment rules. J Am Stat Assoc. 2017;112(517):169–87. 10.1080/01621459.2015.1093947Suche in Google Scholar PubMed PubMed Central

[30] Abadie A. Semiparametric instrumental variable estimation of treatment response models. J Econom. 2003;113(2):231–63. 10.1016/S0304-4076(02)00201-4Suche in Google Scholar

[31] Imbens GW, Angrist JD. Identification and estimation of local average treatment effects. Econometrica. 1994;62(2):467–75. 10.2307/2951620Suche in Google Scholar

[32] Tchetgen Tchetgen EJ, Vansteelandt S. Alternative identification and inference for the effect of treatment on the treated with an instrumental variable. Harvard University Biostatistics Working Paper Series. 2013. Suche in Google Scholar

[33] Wang L, Tchetgen Tchetgen E. Bounded, efficient and multiply robust estimation of average treatment effects using instrumental variables. J R Stat Soc B (Stat Methodol). 2018;80(3):531–50. 10.1111/rssb.12262Suche in Google Scholar PubMed PubMed Central

[34] Robins J. A new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect. Math Modell. 1986;7(9–12):1393–512. 10.1016/0270-0255(86)90088-6Suche in Google Scholar

[35] Dantzig GB. Discrete-variable extremum problems. Operat Res. 1957;5(2):266–88. 10.1287/opre.5.2.266Suche in Google Scholar

[36] Gruber S, Van Der Laan MJ. A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome. Int J Biostat. 2010;6(1):Article 26. 10.2202/1557-4679.1260. Suche in Google Scholar PubMed PubMed Central

[37] van der Laan MJ, Luedtke AR. Targeted learning of the mean outcome under an optimal dynamic treatment rule. J Causal Inference. 2014;3(1):61–95.10.1515/jci-2013-0022Suche in Google Scholar PubMed PubMed Central

[38] Luedtke AR, van der Laan MJ. Super-learning of an optimal dynamic treatment rule. Int J Biostat. 2016;12(1):305–32. 10.1515/ijb-2015-0052Suche in Google Scholar PubMed PubMed Central

[39] Kennedy EH. Towards optimal doubly robust estimation of heterogeneous causal effects. 2020. arXiv: http://arXiv.org/abs/arXiv:200414497v3. Available from: http://arxiv.org/abs/2004.14497. Suche in Google Scholar

[40] Nie X, Wager S. Quasi-oracle estimation of heterogeneous treatment effects. Biometrika. 2021;108(2):299–319. 10.1093/biomet/asaa076Suche in Google Scholar

[41] Newey WK, Robins JR. Cross-fitting and fast remainder rates for semiparametric estimation. 2018. arXiv: http://arXiv.org/abs/arXiv:180109138v1. 10.1920/wp.cem.2017.4117Suche in Google Scholar

[42] Zheng W, van der Laan MJ. Cross-validated targeted minimum-loss-based estimation. New York, NY: Springer; 2011. p. 459–74. 10.1007/978-1-4419-9782-1_27Suche in Google Scholar

[43] van der Laan MJ, Polley EC, Hubbard AE. Super learner. Stat Appl Genetics Mol Biol. 2007;6(1):Article 25. 10.2202/1544-6115.1309. Suche in Google Scholar PubMed

[44] Hastie T, Tibshirani R. Generalized additive models. London: Chapman and Hall; 1990. Suche in Google Scholar

[45] Friedman JH. Greedy function approximation: a gradient boosting machine. Annal Stat. 2001;29(5):1189–232.10.1214/aos/1013203451Suche in Google Scholar

[46] Friedman JH. Stochastic gradient boosting. Comput Stat Data Anal. 2002;38(4):367–78. 10.1016/S0167-9473(01)00065-2Suche in Google Scholar

[47] Mason L, Baxter J, Bartlett PL, Frean M. Boosting algorithms as gradient descent; 2000. p. 512–8.Suche in Google Scholar

[48] Bennett KP, Campbell C. Support vector machines: hype or hallelujah? SIGKDD Explor Newsl. 2000;2(2):1–13. 10.1145/380995.380999Suche in Google Scholar

[49] Cortes C, Vapnik V. Support-vector networks. Machine Learning. 1995;20(3):273–97. 10.1007/BF00994018Suche in Google Scholar

[50] Bishop CM. Neural networks for pattern recognition. Oxford, England: Oxford University Press; 1995. 10.1093/oso/9780198538493.001.0001Suche in Google Scholar

[51] Ripley BD. Pattern recognition and neural networks. Cambridge, England: Cambridge University Press; 2014. Suche in Google Scholar

Received: 2022-01-17

Revised: 2022-11-22

Accepted: 2022-11-25

Published Online: 2022-12-31

This work is licensed under the Creative Commons Attribution 4.0 International License.

Supplementary material

Artikel in diesem Heft

https://doi.org/10.1515/jci-2022-0005

Schlagwörter für diesen Artikel

nonparametric inference; average treatment effect; dynamic treatment regime

Creative Commons

BY 4.0

Individualized treatment rules under stochastic treatment cost constraints

Artikel

Abstract

1 Introduction

2 Setup and objectives

Condition A1

Remark 1

Remark 2

3 Identification of causal estimands

Condition A2

Condition A3

Theorem 1

Condition A4

Condition A5

Theorem 2

4 Estimating and evaluating optimal ITRs

4.1 Pathwise differentiability of the ATE

Condition B1

Theorem 3

Remark 3

4.2 Proposed estimator and asymptotic linearity

Remark 4

Remark 5

Theorem 4

Remark 6

Remark 7

5 Simulation

5.1 Simulation setting

5.2 Simulation results

6 Conclusion

Acknowledgments

References

Zusatzmaterial

Artikel in diesem Heft

Artikel in diesem Heft

Artikel in diesem Heft