Abstract
The attributable fraction is a common measure in epidemiological research, which quantifies the public health impact of a particular exposure on a particular outcome. Often, the exposure effect may be mediated through a third variable, which lies on the causal pathway between the exposure and the outcome. To assess the role of such mediators we propose a decomposition of the attributable fraction into a direct component and a mediated component. We show how these components can be estimated in cross-sectional, cohort and case-control studies, using either maximum likelihood or doubly robust estimation methods. We illustrate the proposed methods by an application to a study of physical activity, overweight and CVD. In an Appendix we provide R-code, which implements the proposed methods.
1 Introduction
The attributable fraction (AF) is a common measure in epidemiological research, which quantifies the public health impact of a particular exposure on a particular outcome. It is defined as the proportion of outcome events that would be eliminated if the exposure was hypothetically eliminated from the population (Levin 1953). The AF has been in focus of intensive research, and various estimation techniques have been developed for all common study designs (e.g. Miettinen 1974; Sturmans, Mulder, and Valkenburg 1977; Deubner et al. 1980; Bruzzi et al. 1985; Greenland and Drescher 1993; Sjölander and Vansteelandt 2011) (e.g. Bruzzi et al. 1985; Deubner et al. 1980; Greenland and Drescher 1993; Miettinen 1974; Sjölander and Vansteelandt 2011; Sturmans, Mulder, and Valkenburg 1977).
However, less attention has been given to scenarios where there is more than one exposure of interest. For such scenarios, Eide and Gefeller (1995) proposed to estimate ‘sequential’ AFs, by considering a sequence of hypothetical scenarios where the exposures are eliminated one at a time. Obviously, the values of these sequential AFs may depend on the order in which we eliminate the exposures. Eide and Gefeller (1995) proposed to consider all possible orderings and, for each exposure, take the average over its order-specific sequential AFs, thus obtaining an ‘average’ AF for each exposure.
The average AF has the virtue of being simple, since it is one scalar measure independent of exposure ordering. However, it may not be a particularly relevant measure when the ordering is etiologically/scientifically important. This would typically be the case when one of the exposures acts as a mediator for another exposure. For instance, it has been shown (Mendis, Puska, and Norrving 2011) that both physical inactivity and overweight are strong risk factors for cardiovascular disease (CVD). Being physical inactive also increases the risk of becoming obese, so part of the effect of physical inactivity on CVD may be mediated through overweight. However, the average AF does not provide information about the existence or magnitude of such a mediated effect.
Like the AF, mediation analysis has been in focus of intensive research. Several definitions of direct and indirect (i.e. mediated) effects have been developed (Robins and Greenland 1992; Pearl 2001; Rubin 2004), as well as estimation techniques for scenarios where the effects are identifiable (see Vansteelandt (2009) and the references therein), and non-parametric bounds for scenarios where the effects are not identifiable (e.g. Cai et al. 2008; Sjölander 2009). In this paper we develop methods for mediation analysis with attributable fractions. We focus on scenarios with two exposures, where the effect of the first exposure (e.g. physical inactivity) on the outcome (e.g. CVD) is potentially mediated through the second exposure (e.g. overweight). We propose a decomposition of the AF into a mediated component and a direct component, which measure the proportion of outcome events that can be attributed to a mediated effect and direct effect, respectively, of the first exposure. These components sum up to the total AF for the first exposure.
The paper is organized as follows. In Section 2 we present basic notation and definitions, and our proposed decomposition of the AF. In Section 3 we show how the decomposed AF can be estimated with either maximum likelihood (ML) or doubly robust (DR) estimation methods. Whereas ML methods rely on one regression model, and are generally inconsistent if this model is incorrect, DR methods use two regression models and give consistent estimates if either model is correct, not necessarily both. We consider estimation in both cross-sectional studies, cohort studies and case-controls studies. In Section 4 we present the results from a simulation study, and in Section 5 we illustrate the proposed methods by an application to a study of physical activity, overweight and CVD. In an Appendix we provide R-code, which implements the proposed methods.
2 Notation and definitions
Let
where
Let
and the ‘natural indirect AF’, defined as
Here,
By setting
We note that the natural direct and indirect AFs in eqs. (2) and (3) are close in spirit to the natural direct and indirect effects proposed by Robins and Greenland (1992) and Pearl (2001). An important difference though, is that the natural direct and indirect effects utilize nested counterfactuals where
3 Model-based estimation
3.1 Counterfactual independence assumptions
The natural direct and indirect AFs are causal (counterfactual) parameters, and consistent estimation of these from observational data requires that we link the counterfactual variables to the observed variables, and that we make appropriate control for confounders. We thus make the standard ‘consistency assumption’ (Pearl 2009) that an intervention to set variable(s)
We further assume that data have been collected on a set of variables
and that
Assumption (5) requires that
Under assumptions (4)-(6) we can rewrite the natural direct and indirect AFs, so that they are free from counterfactual probabilities. Towards this end we define
where the first equality follows from the law of iterated expectations, the second equality follows from assumption (6) and the third equality follows from assumption (4). Similarly, define
where the first equality follows from the law of iterated expectations, the second equality follows from assumption (5) and the third equality follows from assumption (4).
Define
and
The expressions in (9) and (10) are both free from counterfactual variables, which enables us to estimate these from observational data, as described in the following two sections.
We end this section by emphasizing the importance of appropriate confounding control, when applying the estimation techniques developed in the following sections. Arguably, there is always unmeasured confounding in real observational studies. However, with a careful study design, the degree of unmeasured confounding may be reasonably low, in which case the crucial assumptions (5) and (6) may hold as an approximation.
3.2 Cross-sectional/cohort studies
In cross-sectional studies and cohort studies, data consist of an iid sample of size
Trivially, we may use the sample prevalence
A disadvantage of this ML method is that
This feature is somewhat awkward, since it implies that it may difficult to postulate models for
One solution to this problem would be to replace the model for
To reduce sensitivity to model misspecification, while avoiding having to model the mediator, we instead propose to use DR estimators for both
Specifically, to estimate
where
where
We note that
In Appendix A we use the theory for M-estimators (Stefanski and Boos 2002) to show that
Table 1 summarizes what models are required to be correct, to guarantee consistency of the ML and DR estimators of NDAF, NIAF and AF. In this table we have explicitly spelled out NDAF, NIAF and AF as functions of the parameters being estimated by the models.
Model dependence of the ML and DR estimators for cross-sectional/cohort studies.
The ML estimator of … | …requires correct model(s) for… |
The DR estimator of … | …requires correct model(s) for… |
3.3 Case-control studies
In case-control studies, data consist of
Let
and
A simple way to estimate
However, to protect against model misspecification we may also use DR estimators. Specifically, to estimate
In Appendix A we use the theory for M-estimators (Stefanski and Boos 2002) to show that
Table 2 summarizes what models are required to be correct, to guarantee consistency of the ML and DR estimators of NDAF, NIAF and AF. In this table we have explicitly spelled out NDAF, NIAF and AF as functions of the parameters being estimated by the models.
Model dependence of the ML and DR estimators for case-control studies
The ML estimator of … | …requires correct model(s) for… |
The DR estimator of … | …requires correct model(s) for… |
We end this section with a technical remark. The validity of the OR-based estimators developed above depends on the approximate relations
4 Simulations
4.1 Cross-sectional/cohort studies
To investigate the performance of the proposed estimators for cross-sectional/cohort studies (Section 3.2) we generated 10,000 samples, of
For this model we have that
When analyzing the simulated data, four scenarios were considered. In scenario I, no constraints were imposed on the working model parameters. In scenario II, the interaction terms
For each of the four scenarios scenario we used the working models to calculate the ML estimators and DR estimators of
Table 3 displays the results. As expected, the ML estimator of
We caution the reader that the observed robustness against functional form misspecification is not guaranteed by theory, and may depend heavily on the data generating model. In a setting similar to our, VanderWeele and Vansteelandt (2010) showed that functional form misspecification is minor when the outcome is rare and the mediator is continuous with constant variance, which is precisely the model we have simulated from above. Thus, one may suspect that bias due to functional form misspecification would be more pronounced if the outcome was more common and/or the mediator had a skewed distribution. We investigated this in an additional simulation (see Appendix B), but observed no more bias than in the simulation above. This gives some further support to the conclusions that, while misspecification of the linear predictor can give rise to serious bias, the impact of function form misspecification may be less severe.
Much of the literature on mediation has focused on the scenario where there are exposure-mediator interactions (see, for instance, VanderWeele (2015) and the references therein). We have added a simulation in Appendix B, in which exposure-mediator interaction is present. The conclusions from this simulation are again similar to those from the simulation above.
Simulation results for cross-sectional/cohort studies. The table reports mean estimates (est), mean standard errors (se) and standard deviations (sd) of the estimates, over 10,000 simulated samples. The models for
ML | DR | ||||||
---|---|---|---|---|---|---|---|
est | s.e. | s.d. | est | s.e. | s.d. | ||
I | |||||||
0.20 | 0.12 | 0.13 | 0.20 | 0.13 | 0.14 | ||
0.06 | 0.04 | 0.04 | 0.05 | 0.06 | 0.07 | ||
0.25 | 0.11 | 0.11 | 0.25 | 0.11 | 0.11 | ||
II | |||||||
0.14 | 0.14 | 0.14 | 0.19 | 0.14 | 0.15 | ||
0.08 | 0.05 | 0.05 | 0.05 | 0.07 | 0.08 | ||
0.22 | 0.12 | 0.12 | 0.25 | 0.11 | 0.12 | ||
III | |||||||
0.20 | 0.12 | 0.13 | 0.20 | 0.14 | 0.15 | ||
0.06 | 0.04 | 0.04 | 0.05 | 0.08 | 0.09 | ||
0.25 | 0.11 | 0.11 | 0.25 | 0.11 | 0.11 | ||
IV | |||||||
0.14 | 0.14 | 0.14 | 0.16 | 0.15 | 0.16 | ||
0.08 | 0.05 | 0.05 | 0.07 | 0.09 | 0.10 | ||
0.22 | 0.12 | 0.12 | 0.23 | 0.12 | 0.12 |
4.2 Case-control studies
To investigate the performance of the proposed estimators for case-control studies (Section 3.3) we generated a population of 10,000,000 subjects from the model in eq. (22). From this population we drew 10,000 samples of
In these working models,
Table 4 displays the results. We observe the same pattern as in the previous section. The ML estimators are unbiased in scenarios I and III, where the linear predictors for
Simulation results for case-control studies. The table reports mean estimates (est), mean standard errors (se) and standard deviations (sd) of the estimates, over 10,000 simulated samples. True values are
ML | DR | ||||||
---|---|---|---|---|---|---|---|
est | s.e. | s.d. | est | s.e. | s.d. | ||
I | |||||||
0.21 | 0.04 | 0.04 | 0.19 | 0.04 | 0.04 | ||
0.05 | 0.01 | 0.01 | 0.05 | 0.02 | 0.02 | ||
0.26 | 0.03 | 0.03 | 0.25 | 0.03 | 0.03 | ||
II | |||||||
0.12 | 0.05 | 0.05 | 0.21 | 0.03 | 0.03 | ||
0.09 | 0.02 | 0.02 | 0.04 | 0.01 | 0.01 | ||
0.20 | 0.03 | 0.03 | 0.26 | 0.03 | 0.03 | ||
III | |||||||
0.21 | 0.04 | 0.04 | 0.19 | 0.04 | 0.04 | ||
0.05 | 0.01 | 0.01 | 0.05 | 0.02 | 0.02 | ||
0.26 | 0.03 | 0.03 | 0.24 | 0.03 | 0.03 | ||
IV | |||||||
0.12 | 0.05 | 0.05 | 0.09 | 0.05 | 0.05 | ||
0.09 | 0.02 | 0.02 | 0.10 | 0.03 | 0.03 | ||
0.20 | 0.03 | 0.03 | 0.19 | 0.03 | 0.03 |
5 Applied example
In this section we illustrate the proposed methods by an application to a study of physical activity, overweight and CVD. Sjölander (2011) and Sjölander and Vansteelandt (2011) aimed to estimate the fraction of CVD cases, during a 10-year period, that can be attributed to overweight. They used data from the National March Cohort, which was established in 1997 during a fund-raising event organized by the Swedish Cancer Society, and included 300,000 individuals. Questionnaire data on known or suspected risk factors for CVD were obtained on a subset of 43,880 individuals. The cohort was followed until 2006, and each CVD event recorded. The binary exposure was defined as the indicator of BMI being below 18.5 kg/m
We used the same data as Sjölander (2011) and Sjölander and Vansteelandt (2011), but considered physical activity as the exposure and BMI as the mediator. Specifically, we wished to estimate the fraction of CVD cases that can be attributed to a direct effect of physical inactivity on CVD, and the fraction of CVD cases that can be attributed to an indirect effect, mediated through overweight. As attributable fractions require the exposure to be binary, we defined a binary exposure
In all analysis we controlled for the potential confounders
For each of the thresholds
In Appendix C we provide outputs for all fitted regression models. Table 5 displays the estimates of
Analysis results for the National March Cohort.
ML | DR | ||||
---|---|---|---|---|---|
est | 95% CI | est | 95% CI | ||
4.07 | (0.54,7.61) | 4.20 | (0.67,7.74) | ||
0.48 | (0.25,0.71) | 0.35 | (-0.06,0.77) | ||
4.56 | (0.99, 8.12) | 4.55 | (0.99,8.12) | ||
10.71 | (0.80,20.62) | 11.14 | (1.22,21.07) | ||
2.63 | (1.68,3.58) | 2.19 | (0.92,3.45) | ||
13.33 | (3.45,23.22) | 13.33 | (3.46,23.21) | ||
22.05 | (5.73,38.36) | 22.42 | (6.08,38.76) | ||
5.52 | (3.66,7.38) | 4.97 | (2.52,7.43) | ||
27.56 | (11.42,43.71) | 27.39 | (11.27,43.52) |
6 Discussion
In this paper we have proposed a decomposition of the AF into a direct component and a mediated component. For given exposure, outcome and mediator, these components measure the proportion of outcome events that can be attributed to a mediated effect and direct effect, respectively, of the exposure. We have shown how the direct and mediated AFs can be estimated with ML and DR methods, in both cross-sectional, cohort and case-control studies. In simulations we observed that the DR estimators were almost as efficient as the ML estimators. Furthermore, the ML estimators were often biased when the underlying model was incorrectly specified. These results speak in favor of the DR estimators.
We have considered unmatched case-control studies, but the proposed methods may also be extended to matched case-control studies. If ordinary logistic regression is used, and the matching variables are controlled for by explicitly adding them to the model, then the proposed estimators in Section 3.3 apply without modification. To account for the correlation of subjects within matched strata, a minor modification is required for the calculation of standard errors; we explain this in Appendix A. If conditional logistic regression is used, then the ML estimation method from Section 3.3 can be used without modification. However, DR estimation becomes more difficult, since the DR estimation methods developed by Tchetgen Tchetgen, Robins, and Rotnitzky (2010) does not apply to odds ratios estimated with conditional logistic regression models.
A common obstacle for practitioners who wish to use novel methodology is the lack of software implementation. To facilitate the use of our proposed methods we have written two R-functions, which implement the estimators developed in Sections 3.2 and 3.3, respectively. These functions use analytic expressions for all standard errors, as derived in Appendix A, and thus avoid time consuming bootstrap resampling techniques. We provide the code for these functions in Appendix D.
Funding statement: This work was supported by The Swedish Research Council (Grant No. 2016-01267).
Appendix
A Asymptotic distribution of estimators
We derive the asymptotic distribution of the proposed DR estimators; the distribution of the ML estimators can be derived in a similar way.
We first consider the estimators for cross-sectional studies and cohort studies. Assume parametric models
where
and
It follows from more general results in Robins (2000) that
Define
Using the delta method we have that
A consistent estimate of the variance of
We next consider case-control studies. Assume parametric models
where
and
It follows from more general results in Tchetgen Tchetgen, Robins, and Rotnitzky (2010) that
Define
Using the delta method we have that
A consistent estimate of the variance of
Sometimes, data consist of clusters in which subjects are correlated. This occurs, for instance, in family studies and in matched studies. A simple way to account for this correlation is to let
B Additional simulation results
B.1 Common outcome and skewed mediator
In this simulation we generated 10,000 samples, of
For this model we have that
We analyzed the samples in the same way as in Section 4.1. Table 6 displays the results. We observe the same pattern as in Section 4.1; misspecification of the linear predictor gives substantial bias, whereas misspecification of the functional form does not seem to have any major impact.
Simulation results for cross-sectional/cohort studies. The table reports mean estimates (est), mean standard errors (se) and standard deviations (sd) of the estimates, over 10,000 simulated samples. The models for
ML | DR | ||||||
---|---|---|---|---|---|---|---|
est | s.e. | s.d. | est | s.e. | s.d. | ||
I | |||||||
0.12 | 0.03 | 0.03 | 0.12 | 0.03 | 0.03 | ||
−0.02 | 0.01 | 0.01 | −0.03 | 0.02 | 0.02 | ||
0.09 | 0.03 | 0.03 | 0.08 | 0.04 | 0.04 | ||
II | |||||||
0.06 | 0.03 | 0.03 | 0.12 | 0.04 | 0.03 | ||
−0.10 | 0.01 | 0.01 | −0.03 | 0.03 | 0.02 | ||
−0.04 | 0.04 | 0.04 | 0.08 | 0.05 | 0.04 | ||
III | |||||||
0.12 | 0.03 | 0.03 | 0.12 | 0.03 | 0.03 | ||
−0.02 | 0.01 | 0.01 | −0.02 | 0.01 | 0.01 | ||
0.09 | 0.03 | 0.03 | 0.09 | 0.03 | 0.03 | ||
IV | |||||||
0.06 | 0.03 | 0.03 | 0.06 | 0.04 | 0.04 | ||
−0.10 | 0.01 | 0.01 | −0.09 | 0.02 | 0.02 | ||
−0.04 | 0.04 | 0.04 | −0.03 | 0.04 | 0.04 |
B.2 Exposure-mediator interactions
In this simulation we generated 10,000 samples, of
For this model we have that
We analyzed the samples in the same way as in Section 4.1, with the exception that an interaction term was added for
Simulation results for cross-sectional/cohort studies. The table reports mean estimates (est), mean standard errors (se) and standard deviations (sd) of the estimates, over 10,000 simulated samples. The models for
ML | DR | ||||||
---|---|---|---|---|---|---|---|
est | s.e. | s.d. | est | s.e. | s.d. | ||
I | |||||||
0.67 | 0.07 | 0.07 | 0.67 | 0.08 | 0.08 | ||
0.02 | 0.01 | 0.02 | 0.02 | 0.03 | 0.04 | ||
0.69 | 0.07 | 0.06 | 0.69 | 0.07 | 0.06 | ||
II | |||||||
0.64 | 0.08 | 0.08 | 0.67 | 0.08 | 0.08 | ||
0.04 | 0.02 | 0.02 | 0.02 | 0.03 | 0.04 | ||
0.68 | 0.07 | 0.07 | 0.69 | 0.07 | 0.06 | ||
III | |||||||
0.67 | 0.07 | 0.07 | 0.67 | 0.08 | 0.08 | ||
0.02 | 0.01 | 0.02 | 0.02 | 0.03 | 0.05 | ||
0.69 | 0.07 | 0.06 | 0.69 | 0.07 | 0.06 | ||
IV | |||||||
0.64 | 0.08 | 0.08 | 0.65 | 0.08 | 0.09 | ||
0.04 | 0.02 | 0.02 | 0.03 | 0.04 | 0.05 | ||
0.68 | 0.07 | 0.07 | 0.68 | 0.07 | 0.07 |
C Regression outputs for the applied example
C.1 Threshold k=1
The logistic regression model for
The logistic regression model for
Estimate | Std. Error | z value | Pr(>|z|) | |
(Intercept) | −9.959659 | 0.226483 | −43.975 | < 2e-16 *** |
pa.binary | 0.622893 | 0.230872 | 2.698 | 0.00698 ** |
bmi | 0.046875 | 0.006499 | 7.213 | 5.47e-13 *** |
age | 0.093932 | 0.002138 | 43.935 | < 2e-16 *** |
sexmale | 0.855787 | 0.045208 | 18.930 | < 2e-16 *** |
The logistic regression model for
Estimate | Std. Error | z value | Pr(>|z|) | |
(Intercept) | −8.726691 | 0.141025 | −61.880 | < 2e-16 *** |
pa.binary | 0.719856 | 0.230968 | 3.117 | 0.00183 ** |
age | 0.093035 | 0.002103 | 44.234 | < 2e-16 *** |
sexmale | < 2e-16 *** | 0.867404 | 0.045194 | 19.193 |
The logistic regression model for
Estimate | Std. Error | z value | Pr(>|z|) | |
(Intercept) | −7.295189 | 0.358801 | −20.332 | < 2e−16 *** |
Bmi | 0.111044 | 0.012622 | 8.798 | < 2e−16 *** |
age | −0.017876 | 0.003662 | −4.882 | 1.05e−06 *** |
sexmale | 0.780565 | 0.122179 | 6.389 | 1.67e−10 *** |
The logistic regression model for
Estimate | Std. Error | z value | Pr(>|z|) | |
(Intercept) | −4.76036 | 0.18398 | −25.874 | < 2e−16 *** |
age | −0.01263 | 0.00349 | −3.618 | 0.000296 *** |
sexmale | 0.81437 | 0.12205 | 6.672 | 2.52e−11 *** |
C.2 Threshold k=2
The logistic regression model for
The logistic regression model for
Estimate | Std. Error | z value | Pr(>|z|) | |
(Intercept) | −9.969788 | 0.226375 | −44.041 | < 2e−16 *** |
pa.binary | 0.198366 | 0.088017 | 2.254 | 0.0242 * |
bmi | 0.046479 | 0.006510 | 7.140 | 9.33e−13 *** |
age | 0.094103 | 0.002142 | 43.934 | < 2e−16 *** |
sexmale | 0.857242 | 0.045197 | 18.967 | < 2e−16 *** |
The logistic regression model for
Estimate | Std. Error | z value | Pr(>|z|) | |
(Intercept) | −8.753871 | 0.141819 | −61.726 | < 2e−16 *** |
pa.binary | 0.252032 | 0.087708 | 2.874 | 0.00406 ** |
age | 0.093275 | 0.002108 | 44.243 | < 2e−16 *** |
sexmale | 0.868590 | 0.045184 | 19.224 | < 2e−16 *** |
The logistic regression model for
Estimate | Std. Error | z value | Pr(>|z|) | |
(Intercept) | −3.990596 | 0.125507 | −31.796 | < 2e−16 *** |
Bmi | 0.079951 | 0.004814 | 16.609 | < 2e−16 *** |
age | −0.012737 | 0.001179 | −10.800 | < 2e−16 *** |
sexmale | 0.170377 | 0.039355 | 4.329 | 1.5e−05 *** |
The logistic regression model for
Estimate | Std. Error | z value | Pr(>|z|) | |
(Intercept) | −2.203133 | 0.058720 | −37.520 | < 2e−16 *** |
age | −0.008853 | 0.001129 | −7.838 | 4.56e−15 *** |
sexmale | 0.203585 | 0.039221 | 5.191 | 2.10e−07 *** |
C.3 Threshold k=3
The logistic regression model for
Estimate | Std. Error | z value | Pr(>|z|) | |
(Intercept) | −9.966975 | 0.226333 | −44.037 | < 2e−16 *** |
pa.binary | 0.165470 | 0.059751 | 2.769 | 0.00562 ** |
bmi | 0.045779 | 0.006527 | 7.014 | 2.32e−12 *** |
age | 0.094108 | 0.002139 | 43.990 | < 2e−16 *** |
sexmale | 0.858640 | 0.045199 | 18.997 | < 2e−16 *** |
The logistic regression model for
Estimate | Std. Error | z value | Pr(>|z|) | |
(Intercept) | −8.774193 | 0.142165 | −61.72 | < 2e−16 *** |
pa.binary | 0.210291 | 0.059397 | 3.54 | 0.000399 *** |
age | 0.093307 | 0.002106 | 44.30 | < 2e−16 *** |
sexmale | 0.869998 | 0.045184 | 19.25 | < 2e−16 *** |
The logistic regression model for
Estimate | Std. Error | z value | Pr(>|z|) | |
(Intercept) | −2.9506692 | 0.0902247 | −32.704 | < 2e−16 *** |
bmi | 0.0721846 | 0.0035431 | 20.373 | < 2e−16 *** |
age | −0.0093996 | 0.0008165 | −11.512 | < 2e−16 *** |
sexmale | 0.0994121 | 0.0273967 | 3.629 | 0.000285 *** |
The logistic regression model for
Estimate | Std. Error | z value | Pr(>|z|) | |
(Intercept) | −1.3447446 | 0.0412541 | −32.597 | < 2e−16 *** |
age | −0.0059798 | 0.0007839 | −7.629 | 2.37e−14 *** |
sexmale | 0.1319002 | 0.0272459 | 4.841 | 1.29e−06 *** |
D R-code
The code below requires a not yet released version of the package drgee, which contains functions that were written for this paper specifically. A binary version of this package for windows is available as supplementary material online for this paper.
The code contains two functions; mediationAF and mediationAF.casectrl. The function mediationAF implements the methods proposed in Section 3.2 and the function mediationAF.casectrl implements the methods proposed in Section 3.3.
The function mediationAF has arguments formulaYAM, formulaYA, formulaAM, formulaA, data, A and Y. The arguments formulaYAM, formulaYA, formulaAM and formulaA specify logistic regression models for
The function mediationAF.casectrl has arguments formulaYAM, formulaAYM, formulaYA, formulaAY, iaformulaM, iaformula, data, A and Y. The arguments formulaYAM, formulaAYM, formulaYA and formulaAY specify logistic regression models for
References
Bruzzi, P., Green, S., Byar, D., Brinton, L., and Schairer, C. (1985). Estimating the population attributable risk for multiple risk factors using case-control data. American Journal of Epidemiology, 122:904–914.10.1093/oxfordjournals.aje.a114174Search in Google Scholar
Cai, M., Kuroki, Z., Pearl, J., Tian, J. (2008). Bounds on direct effects in the presence of confounded intermediate variables. Biometrics, 64:695–701.10.1111/j.1541-0420.2007.00949.xSearch in Google Scholar
Deubner, D., Wilkinson, W., Helms, M., Herman, T., Curtis, G. (1980). Logistic model estimation of death attributable to risk factors for cardiovascular disease in evans county, georgia. American Journal of Epidemiology, 112:135–143.10.1093/oxfordjournals.aje.a112963Search in Google Scholar
Eide, G., Gefeller, O. (1995). Sequential and average attributable fractions as aids in the selection of preventive strategies. Journal of Clinical Epidemiology, 48:645–655.10.1016/0895-4356(94)00161-ISearch in Google Scholar
Greenland, S., Drescher, K. (1993). Maximum likelihood estimation of the attributable fraction from logistic models. Biometrics, 49:865–872.10.2307/2532206Search in Google Scholar
Greenland, S., Robins, J., Pearl, J. (1999). Confounding and collapsibility in causal inference. Statistical Science, 14:29–46.10.1214/ss/1009211805Search in Google Scholar
Levin, M. (1953). The occurrence of lung cancer in man. Acta-Unio Internationalis Contra Cancrum, 9:531–541.Search in Google Scholar
Mendis, S., Puska, P., Norrving, B., et al. (2011). Global atlas on cardiovascular disease prevention and control. Geneva: World Health Organization.Search in Google Scholar
Miettinen, O. (1974). Proportion of disease caused or prevented by a given exposure, trait or intervention. American Journal of Epidemiology, 99:325–332.10.1093/oxfordjournals.aje.a121617Search in Google Scholar PubMed
Pearl, J. (2001). Direct and indirect effects. In: Proceedings of the seventeenth conference on uncertainty in artificial intelligence, 411–420. Morgan Kaufmann Publishers Inc.Search in Google Scholar
Pearl, J. (2009). Causality: Models, Reasoning, and Inference. 2nd Edition. New York: Cambridge University Press.10.1017/CBO9780511803161Search in Google Scholar
Prentice, R., Pyke, R. (1979). Logistic disease incidence models and case-control studies. Biometrika, 66:403–411.10.1093/biomet/66.3.403Search in Google Scholar
Robins, J. (2000). Robust estimation in sequentially ignorable missing data and causal inference models. In: Proceedings of the American Statistical Association, volume 1999, 6–10.Search in Google Scholar
Robins, J., Greenland, S. (1992). Identifiability and exchangeability for direct and indirect effects. Epidemiology, 48:143–155.10.1097/00001648-199203000-00013Search in Google Scholar PubMed
Rubin, D. (2004). Direct and indirect causal effects via potential outcomes. Scandinavian Journal of Statistics, 31:161–170.10.1111/j.1467-9469.2004.02-123.xSearch in Google Scholar
Shpitser, I., Tchetgen Tchetgen, E. (2016). Causal inference with a graphical hierarchy of interventions. The Annals of Statistics, 44:2433–2466.10.1214/15-AOS1411Search in Google Scholar PubMed PubMed Central
Shpitser, I., VanderWeele, T. (2011). A complete graphical criterion for the adjustment formula in mediation analysis. The International Journal of Biostatistics, 7:1–24.10.2202/1557-4679.1297Search in Google Scholar PubMed PubMed Central
Sjölander, A. (2009). Bounds on natural direct effects in the presence of confounded intermediate variables. Statistics in Medicine, 28:558–571.10.1002/sim.3493Search in Google Scholar PubMed
Sjölander, A. (2011). Estimation of attributable fractions using inverse probability weighting. Statistical methods in medical research, 20:415–428.10.1177/0962280209349880Search in Google Scholar PubMed
Sjölander, A., Vansteelandt, S. (2011). Doubly robust estimation of attributable fractions. Biostatistics, 12:112–121.10.1093/biostatistics/kxq049Search in Google Scholar PubMed
Stefanski, L., Boos, D. (2002). The calculus of m-estimation. The American Statistician, 56:29–38.10.1198/000313002753631330Search in Google Scholar
Sturmans, F., Mulder, P., Valkenburg, H. (1977). Estimation of the possible effect of interventive measures in the area of ischemic heart diseases by the attributable risk percentage. American Journal of Epidemiology, 105:281–289.10.1093/oxfordjournals.aje.a112384Search in Google Scholar PubMed
Tchetgen Tchetgen, E., Robins, J., Rotnitzky, A. (2010). On doubly robust estimation in a semiparametric odds ratio model. Biometrika, 97:171–180.10.1093/biomet/asp062Search in Google Scholar PubMed PubMed Central
VanderWeele, T. (2015). Explanation in causal inference: methods for mediation and interaction. Oxford University Press.Search in Google Scholar
VanderWeele, T., Vansteelandt, S. (2010). Odds ratios for mediation analysis for a dichotomous outcome. American Journal of Epidemiology 172:1339–1348.10.1093/aje/kwq332Search in Google Scholar PubMed PubMed Central
Vansteelandt, S. (2009). Estimating direct effects in cohort and case–control studies. Epidemiology, 20:851–860.10.1097/EDE.0b013e3181b6f4c9Search in Google Scholar PubMed
© 2018 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Articles
- An Instrumental Variables Design for the Effect of Emergency General Surgery
- Propensity Score Estimation Using Classification and Regression Trees in the Presence of Missing Covariate Data
- Mediation Analysis with Attributable Fractions
- Estimating Case-Fatality Reduction from Randomized Screening Trials
- The Choice of Effect Measure for Binary Outcomes: Introducing Counterfactual Outcome State Transition Parameters
- The Pseudo-Observation Analysis of Time-To-Event Data. Example from the Danish Diet, Cancer and Health Cohort Illustrating Assumptions, Model Validation and Interpretation of Results
- New Challenges in HIV Research: Combining Phylogenetic Cluster Size and Epidemiological Data
- Robust and Flexible Estimation of Stochastic Mediation Effects: A Proposed Method and Example in a Randomized Trial Setting
Articles in the same Issue
- Articles
- An Instrumental Variables Design for the Effect of Emergency General Surgery
- Propensity Score Estimation Using Classification and Regression Trees in the Presence of Missing Covariate Data
- Mediation Analysis with Attributable Fractions
- Estimating Case-Fatality Reduction from Randomized Screening Trials
- The Choice of Effect Measure for Binary Outcomes: Introducing Counterfactual Outcome State Transition Parameters
- The Pseudo-Observation Analysis of Time-To-Event Data. Example from the Danish Diet, Cancer and Health Cohort Illustrating Assumptions, Model Validation and Interpretation of Results
- New Challenges in HIV Research: Combining Phylogenetic Cluster Size and Epidemiological Data
- Robust and Flexible Estimation of Stochastic Mediation Effects: A Proposed Method and Example in a Randomized Trial Setting