Abstract
We propose an extension of the EM algorithm that exploits the common assumption of unique parameterization, corrects for biases due to missing data and measurement error, converges for the specified model when standard implementation of the EM algorithm has a low probability of convergence, and reduces a potentially complex algorithm into a sequence of smaller, simpler, self-contained EM algorithms. We use the theory surrounding the EM algorithm to derive the theoretical results of our proposal, showing that an optimal solution over the parameter space is obtained. A simulation study is used to explore the finite sample properties of the proposed extension when there is missing data and measurement error. We observe that partitioning the EM algorithm into simpler steps may provide better bias reduction in the estimation of model parameters. The ability to breakdown a complicated problem in to a series of simpler, more accessible problems will permit a broader implementation of the EM algorithm, permit the use of software packages that now implement and/or automate the EM algorithm, and make the EM algorithm more accessible to a wider and more general audience.
1 Introduction
The Expectation-Maximization (EM) algorithm, introduced by Dempster, Laird, and Rubin [1], is a well-known and flexible method for obtaining a maximum likelihood estimate (MLE) when data are incomplete. It is a two-step iterative algorithm. It estimates the expectation of the complete data log-likelihood given the observed data and the current parameter estimates in the first step. In the second step, it maximizes the log-likelihood from the expectation step giving new values of the parameter estimates. The algorithm iterates through the Expectation step (E-step) and the Maximization step (M-step) until the sequence of parameter estimates converges to a stationary point [1, 2].
For any situation in which maximum likelihood estimation is desirable and feasible for fully observed data, estimation in the analogous incomplete data situation can be accomplished with the EM algorithm. It is this ability to associate an incomplete data problem with a complete data problem that makes it an attractive approach for missing data analysis. Unfortunately the EM algorithm has fundamental drawbacks. For example, convergence may be slow and the covariance matrix is not an instant algorithmic by-product [2]. Fortunately, there are proposals for speeding up EM convergence, such as the Parameter-Expanded EM algorithm [3], and for obtaining the covariance matrix [4, 5].
The EM algorithm was originally formulated as an approach to obtain MLEs when there are missing data, but it has also been applied to analyses where there are mismeasured variables. Dawid and Skeme [6] used the EM algorithm to correct misclassification when the true value was never observed, but multiple measures for each subject were obtained. In this situation, the multiple measurements permit the estimation of the misclassification rates allowing for the estimation of the underlying true population proportion. Drews et al. [7] consider the situation where the exposure is misclassified not by one classification scheme, but by two schemes (patient interviews and medical records). They use the EM algorithm to obtain corrected odds ratio estimates in this situation. Ganguli et al. [8] proposed a variation of the Monte Carlo EM (MCEM) algorithm specific to structural nonparametric regression problems with measurement error in covariates for additive models; it is referred to as the Ganguli, Staudenmayer, Wand EM.
Despite the transportation of the EM approach from missing data problems to the problem of measurement error, and the wealth of literature about the EM algorithm for missing data problems, there has been limited application of the EM algorithm to measurement error problems. Gustafson [9] comments that the lack of a closed form solution of the E-step for many problems, the requirement of an approximate solution, and the additional work required for covariance estimation has contributed to the scarceness of EM approaches as a proposed solution when mismeasured variables are present. This is echoed by Buonaccorsi [10] who comments on the unwieldy nature of the EM algorithm for the calibration adjustment of measurement error in the response of a linear model.
In many situations, there are alternate approaches that can be used. For missing data, weighting [11] or multiple imputation methods [12–14] often are viable options. For measurement error, alternative methods include simulation-extrapolation [15–17], instrumental variables [18], calibration [19], and imputation [20, 21]. Our interest in the likelihood approach is that it is a method of correction for both missing data and measurement error problems. Thus, there is considerable appeal to this approach, which can handle both aspects of “imperfect data” in the same model, within a unified methodological framework.
We consider the situation where the EM algorithm is a viable and potentially attractive approach for obtaining corrected parameter estimates for imperfect data. Of particular interest (Section 2.2) is the situation where there are missing responses, mismeasured continuous explanatory variables, no closed form solution for the E-step, and the algorithm has an unacceptably low probability of convergence. We propose a new variation of the EM algorithm, inspired by the expectation-conditional maximization (ECM) of Meng and Rubin [22] and the use of tangent spaces by Tsiatis [23], that maintains the ability to correct for both missing data and measurement error while overcoming non-convergence. This makes it a practical option for applied statistical research. With the emergence of software packages that now implement and/or automate the EM algorithm such as EMCluster [24], our proposal may permit the direct use of these new packages for increasingly complex problems as well as making the algorithm accessible to a wider audience.
In Section 2, we present the EM algorithm, motivating problem and imperfect data. In Section 3, we present the proposed variation of the EM algorithm and the general theory for maximization using a sequence of self-contained, smaller and simpler EM algorithms. We use a simulation study to consider the finite sample performance of the proposed variation of the EM algorithm in Section 4. Section 5 discusses possible future applications of the proposed method.
2 The EM algorithm, imperfect data and the motivating problem
2.1 The EM algorithm
When there are imperfect data (i.e. missing and mismeasured data), we can obtain MLEs by maximizing the log-likelihood of the observed portion of the data. The log-likelihood associated with the observed data,
where
The EM algorithm provides an indirect approach for obtaining the MLE for
The algorithm iterates through E- and M-steps; the E-step for the
where
Under the regularity conditions of Dempster et al. [1] and Wu [27], the likelihood,
requiring only that Equation (1) be continuous in
The E- and M-steps are repeated until the difference between sequential steps in the likelihood changes by an arbitrarily small amount,
Pragmatically, the stopping criterion
is used, probably motivated by Wu [27] who showed
as
The generalized EM algorithm (GEM) relaxes the M-step condition. It requires that Equation (2) holds, permitting the selection of
Three relevant variations of the EM algorithm are the Monte Carlo EM (MCEM) [28], expectation-conditional maximization (ECM) of Meng and Rubin [22], and the multi-cycle ECM algorithms [22]. The MCEM algorithm replaces a complicated and analytically intractable expectation in the E-step by approximating the integral with a finite sum,
where m denotes the Monte Carlo sample size. In the
2.2 Imperfect data and the motivating problem
For subjects
In what follows, we assume that
The measurement error is unbiased,
We let
Assuming unique parameterization of the conditional distributions [9, 26] while restricting our exposition to classical unbiased nondifferential measurement error, MCAR and MAR missing data mechanisms, the joint distribution may be expressed as
where
The joint distribution (Equation 5) has been decomposed into the product of four conditional distributions and the marginal distribution of the perfect covariates
The outcome model
We have a primary interest in the outcome model,
Motivated by this example, we used a simulation study to examine the finite-sample bias reduction for both
3 The orthogonally partitioned EM algorithm
Let
such that
Unique parameterization has been used, without a formal definition as a means to specify complex joint distributions through a product of conditional ones [32–36]. Proposition 1 formalizes unique parameterization, ensuring that the parameter space is comprised of orthogonal subspaces.
A likelihood may be uniquely parameterized if and only if
for
See Appendix.
The proposed extension of the EM algorithm begins with an initial estimation of
where
Equation (5) is used to illustrate Equation (7) and motivate the simulation study in Section 4. For the
where
For the first stage,
where
The first stage Q-function at the
for
The argument for the second stage Q-function at the
parallels that of the first stage.
Each stage can be thought of as a constrained maximization, where the constraint is to an orthogonal subspace of
If the E-step can be written in the form of Equation (7) and
Beginning with the space-filling property, the gradient of G,
with the complement
Meng and Rubin [22] showed that having the convex hull of all feasible directions determined by the constraint spaces,
If each of the k,
If we have an OPEM (Proposition 2) and there exists a
See Appendix.
Dempster, Laird, and Rubin [1] and Wu [27] provide two methods to show convergence to stationary points. The simplest is to view the algorithm as a single process. By the Inverse Mapping Theorem and by Meng and Rubin [22], Theorem 2 proof part (i), it is a closed mapping. This directly implies that the limit points are stationary points (Wu [27], Theorem 1).
An alternate approach views the algorithm as the sum of K generalized EM algorithms. Each converges to a limiting point which is a stationary point [27]. In each stage, Monte Carlo integration is used with the relevant EM variant to obtain the maximizer of
If all K maximizations of Equation (7) are unique, then the limit points of the K OPEM stages
By extending Wu’s Theorem 6 [27] and replacing the uniqueness condition with the continuity of
4 Numerical example
4.1 Simulation specification
We consider a finite-sample simulation study using the general model given in Equation (5). As noted in Section 2.2, we are interested in the unbiased estimation of
The response is modeled as a logistic regression,
where
where
Simulation parameterization for the response and missing data mechanism models.
| Model | ||||||
| 1 | 0.03 | 0.38 | –1.88 | 0.40 | 2.63 | 2.31 |
| 2 | 0.03 | –0.18 | –2.82 | 0.40 | 2.45 | 0.86 |
| 3 | 0.03 | 1.04 | 0.89 | 0.40 | 1.86 | –0.68 |
Imperfect data are introduced in the following manner. Let X be measured with error using a classical unbiased nondifferential measurement error model,
In applied settings,
We let
For each parameterization, we implemented
Due to the low probability of convergence using the MCEM, ECM and multi-cycle ECM algorithm with Monte Carlo integration (Section 2.2), we compare the OPEM with the naive analysis that assumes complete data. This is a reasonable contrast since the OPEM is not a direct extension of the ECM or multi-cycle ECM, but a new option for implementing an EM algorithm when convergence is problematic. The proposed OPEM algorithm has the advantage of convergence when alternate implementations of the EM algorithm have a low probability of convergence. All simulations were implemented in R [50].
4.2 Simulation results
For the estimated coefficients
Although our two-stage OPEM algorithm first estimated
4.2.1 Response mechanism model results, θ Y
The estimates of the outcome model were obtained using an MCEM algorithm, conditional on the parameter estimates from the first stage,
Bias assessment for simulated example:
| Model | |||||||
| 1 | 0.1 | 0.01 (0.01) | −0.06 (0.02) | −0.01 (0.02) | −0.01 (0.01) | 0.01 (0.2) | 0.02 (0.02) |
| 0.3 | 0.02 (0.01) | −0.39 (0.02) | −0.09 (0.02) | −0.01 (0.02) | 0.06 (0.03) | 0.08 (0.03) | |
| 0.5 | 0.06 (0.02) | −0.90 (0.02) | −0.28 (0.02) | 0.01 (0.02) | 0.07 (0.04) | 0.05 (0.03) | |
| 0.7 | 0.10 (0.01) | −1.30 (0.01) | −0.41 (0.02) | −0.01 (0.02) | 0.20 (0.06) | 0.12 (0.05) | |
| 2 | 0.1 | 0.02 (0.01) | −0.03 (0.02) | 0.01 (0.02) | 0.02 (0.01) | 0.02 (0.02) | 0.01 (0.02) |
| 0.3 | 0.04 (0.01) | −0.36 (0.02) | −0.03 (0.01) | 0.02 (0.01) | 0.02 (0.03) | −0.02 (0.02) | |
| 0.5 | 0.07 (0.01) | −0.80 (0.02) | −0.07 (0.01) | −0.01 (0.02) | 0.07 (0.04) | 0.01 (0.02) | |
| 0.7 | 0.09 (0.01) | −1.19 (0.01) | −0.07 (0.01) | 0.01 (0.02) | 0.10 (0.05) | 0.02 (0.03) | |
| 3 | 0.1 | −0.01 (0.01) | 0.01 (0.01) | 0.01 (0.01) | −0.01 (0.01) | 0.03 (0.02) | 0.01 (0.01) |
| 0.3 | 0.02 (0.01) | −0.20 (0.01) | 0.04 (0.01) | 0.01 (0.01) | 0.07 (0.02) | −0.03 (0.01) | |
| 0.5 | 0.05 (0.01) | −0.55 (0.01) | 0.13 (0.01) | 0.01 (0.01) | 0.01 (0.02) | −0.02 (0.01) | |
| 0.7 | 0.06 (0.01) | −0.82 (0.01) | 0.21 (0.01) | −0.01 (0.01) | 0.06 (0.03) | −0.02 (0.02) | |
Bias reduction for
4.2.2 Missing data mechanism results, θ C
We observe the expected attenuation and augmentation in
We observe modest bias reduction for model 1 across all levels of
We observe a nominal inflation of the associated standard errors with the OPEM. In terms of mean squared error and bias-variance trade-off, the OPEM performs well when the proportion of imprecision in the measurement of X is large,
The first stage of the OPEM permits us to compare the effect of estimation using a two-stage OPEM for bias reduction versus a one-stage approach (standard implementation of the EM algorithm), providing that the one stage EM algorithm converges. We observe minimal reduction of bias in
Bias assessment for simulated example:
| Model | |||||||
| 1 | 0.1 | −0.11 (0.04) | 0.05 (0.02) | −0.08 (0.03) | −0.11 (0.04) | 0.05 (0.02) | −0.08 (0.03) |
| 0.3 | −0.10 (0.04) | −0.01 (0.02) | −0.06 (0.03) | −0.11 (0.04) | 0.03 (0.02) | −0.07 (0.03) | |
| 0.5 | −0.11 (0.04) | 0.05 (0.02) | −0.08 (0.03) | −0.11 (0.04) | 0.05 (0.02) | −0.08 (0.03) | |
| 0.7 | −0.06 (0.04) | −0.10 (0.02) | 0.01 (0.03) | −0.10 (0.04) | 0.05 (0.03) | −0.05 (0.03) | |
| 2 | 0.1 | −0.03 (0.03) | −0.02 (0.02) | −0.05 (0.03) | −0.03 (0.03) | −0.02 (0.02) | −0.05 (0.03) |
| 0.3 | −0.08 (0.04) | 0.01 (0.02) | −0.06 (0.04) | −0.08 (0.04) | −0.02 (0.02) | −0.06 (0.04) | |
| 0.5 | −0.13 (0.04) | 0.04 (0.02) | −0.13 (0.04) | −0.13 (0.04) | 0.01 (0.02) | −0.13 (0.04) | |
| 0.7 | −0.02 (0.04) | 0.06 (0.01) | −0.04 (0.03) | −0.04 (0.04) | −0.01 (0.02) | −0.04 (0.04) | |
| 3 | 0.1 | −0.05 (0.04) | 0.01 (0.03) | 0.01 (0.02) | −0.05 (0.04) | 0.02 (0.03) | 0.01 (0.02) |
| 0.3 | −0.01 (0.04) | −0.06 (0.02) | 0.02 (0.03) | −0.06 (0.04) | 0.04 (0.03) | 0.01 (0.03) | |
| 0.5 | 0.03 (0.03) | −0.23 (0.03) | 0.06 (0.02) | −0.09 (0.04) | 0.01 (0.04) | 0.04 (0.02) | |
| 0.7 | 0.08 (0.04) | −0.37 (0.02) | 0.06 (0.03) | −0.11 (0.04) | 0.03 (0.03) | 0.03 (0.03) | |
5 Discussion
The K-stage orthogonally partitioned EM (OPEM) algorithm provides an implementation that exploits the common assumption of unique parameterization, corrects for biases due to imperfect data, converges for the specified model when standard implementation of the EM algorithm has a low probability of convergence, and reduces a potentially complex algorithm into a sequence of smaller, simpler, self-contained EM algorithms. We showed, using the theoretical framework of the EM algorithm, that the OPEM can be applied to any situation where the joint probability distribution can be written as a product of conditional probability distributions. As well, it yields an optimal solution over the parameter space, and known extensions of the EM algorithm for accelerating convergence may be used for each stage of the OPEM [40]. The computation of standard errors may be performed stage-wise with an appropriate adaptation of Louis’ method for MCEM approaches [4, 38, 28] or through the adaptation of Oakes’ formula for standard errors from an MCEM by Casella [41], Appendix A.3). The finite sample simulation study revealed that partitioning the EM algorithm into simpler steps may provide better bias reduction in the estimation of parameters of the outcome model. This suggests that positioning the primary model at or near the end of the OPEM sequence is a strategy for better bias reduction in the estimators of the parameters of interest.
A potential application of this method is for the correction of bias due to mismeasured variables that are used to construct a Horvitz-Thompson [42] type weighting mechanism or propensity score for model correction [43, 44]. Additionally, this methodology could be adapted for related approaches, such as the inverse probability of treatment weighted marginal structural model estimator [45, 46, 47, 48]. Given that a likelihood approach could provide a unified method to correct for imperfect data, the EM algorithm is a natural tool to explore.
Beyond the potential application of the K-stage OPEM to a wide variety of Horvitz-Thompson type estimators, Section 3 strongly suggests broad applicability to complex, multi-model settings. The OPEM can be directly used for any situation with missing or mismeasured data where the joint distribution is expressed as a product of conditional, uniquely parameterized distributions. The ability to breakdown a complicated problem in to a series of simpler, more accessible problems will permit a broader implementation of the EM algorithm, permit the use of software packages that now implement and/or automate the EM algorithm for complex situations of imperfect data, and make the EM algorithm more accessible to a wider and more general audience.
A Appendix
A.1 Proof of Proposition 1
then there exist p linear operators
A.2 Proof of Proposition 3
The algorithm begins with an initial value
results in the kth Q-function being a Generalized EM algorithm. There are K MCEM algorithms for which maximizers are obtained. By induction, the entire optimization procedure is itself a Generalized EM algorithm.□
References
1. Dempster A, Laird N, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 1977;39:1–38.Search in Google Scholar
2. McLachlan GJ, Krishnan T. The EM algorithm and extensions. Hoboken, NJ: Wiley, 2008.10.1002/9780470191613Search in Google Scholar
3. Liu C, Rubin DB, Yu YN. Parameter expansion to accelerate EM: the PX-EM algorithm. Biometrika 1998;85:755–70.10.1093/biomet/85.4.755Search in Google Scholar
4. Louis TA. Finding the observed information matrix when using the EM algorithm. J R Stat Soc Ser B 1982;44:226–33.Search in Google Scholar
5. Oakes D. Direct calculation of the information matrix via the EM algorithm. J R Stat Soc Ser B 1999;61:479–82.10.1111/1467-9868.00188Search in Google Scholar
6. Dawid AP, Skeme AM. Maximum likelihood estimation of observer error-rates using the EM algorithm. Appl Stat 1979;28:20–8.10.2307/2346806Search in Google Scholar
7. Drews CD, Flanders WD, Kosinski AS. Use of two data sources to estimate odds-ratios in case-control studies. Epidemiology 1993;4:327–35.10.1097/00001648-199307000-00008Search in Google Scholar PubMed
8. Ganguli B, Staudenmayer J, Wand MP. Additive models with predictors subject to measurement error. Aust N Z J Stat 2005;47:193–202.10.1111/j.1467-842X.2005.00383.xSearch in Google Scholar
9. Gustafson P. Measurement error and misclassification in statistics and epidemiology: impacts and Bayesian adjustments. New York: Chapman & Hall/CRC, 2004.Search in Google Scholar
10. Buonaccorsi JP. Measurement error: models, methods, and applications. Boca Raton, FL: Chapman& Hall, 2010.10.1201/9781420066586Search in Google Scholar
11. Seaman SR, White IR. Review of inverse probability weighting for dealing with missing data. Stat Methods Med Res 2013;22:278–95.10.1177/0962280210395740Search in Google Scholar PubMed
12. Carpenter JR, Kenward MG, Vansteelandt S. A comparison of multiple imputation and doubly robust estimation for analyses with missing data. J R Stat Soc Ser A 2006;169:571–84.10.1111/j.1467-985X.2006.00407.xSearch in Google Scholar
13. Kenward MG, Carpenter J. Multiple imputation: current perspectives. Stat Methods Med Res 2007;16:199–218.10.1177/0962280206075304Search in Google Scholar PubMed
14. White IR, Carlin JB. Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values. Stat Med 2010;29:2920–31.10.1002/sim.3944Search in Google Scholar PubMed
15. Guolo A. The SIMEX approach to measurement error correction in meta-analysis with baseline risk as covariate. Stat Med 2014;33:2062–76.10.1002/sim.6076Search in Google Scholar PubMed
16. He W, Xiong J, Yi GY. SIMEX R Package for accelerated failure time. J Stat Software 2012.10.18637/jss.v046.c01Search in Google Scholar
17. Shang Y. Measurement error adjustment using the SIMEX method: an application to student growth percentiles. J Educ Meas 2012;49:446–65.10.1111/j.1745-3984.2012.00186.xSearch in Google Scholar
18. Buzas JS, Stefanski LA. Instrumental variable estimation in generalized linear measurement error models. J Am Stat Assoc 1996;91:999–1006.10.1080/01621459.1996.10476970Search in Google Scholar
19. Strand M, Sillau S, Grunwald GK, Rabinovitch N. Regression calibration for models with two predictor variables measured with error and their interaction, using instrumental variables and longitudinal data. Stat Med 2014;33:470–87.10.1002/sim.5904Search in Google Scholar PubMed PubMed Central
20. Cole SR, Chu H, Greenland S. Multiple-imputation for measurement-error correction. Int J Epidemiol 2006;35:1074–81.10.1093/ije/dyl097Search in Google Scholar PubMed
21. Edwards JK, Cole SR, Troester MA, Richardson DB. Accounting for misclassified outcomes in binary regression models using multiple imputation with internal validation data. Am J Epidemiol 2013;177:904–12.10.1093/aje/kws340Search in Google Scholar PubMed PubMed Central
22. Meng X-L, Rubin DB. Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 1993;80:267–78.10.1093/biomet/80.2.267Search in Google Scholar
23. Tsiatis AA. Semiparametric theory and missing data. New York: Springer, 2006.Search in Google Scholar
24. Chen W, Maitra R, Melnykov V (2013): EM algorithm for model-based clustering of finite mixture gaussian distribution, Version 0.2-4.Search in Google Scholar
25. Carpenter J, Kenward M. Multiple imputation and its application. Chichester, West Sussex: John Wiley & Sons, 2013.10.1002/9781119942283Search in Google Scholar
26. Little RJA, Rubin DB. Statistical analysis with missing data, 2nd ed. Hoboken, NJ: John Wiley & Sons, 2002.10.1002/9781119013563Search in Google Scholar
27. Wu CFJ. On the convergence properties of the EM algorithm. Ann Stat 1983;11:95–103.10.1214/aos/1176346060Search in Google Scholar
28. Wei GCG, Tanner MA. A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J Am Stat Assoc 1990;85:699–704.10.1080/01621459.1990.10474930Search in Google Scholar
29. Carroll R, Ruppert D, Stefanski L, Crainiceanu C. Measurement error in non-linear models: a modern perspective, 2nd ed. Boca Raton, FL: Chapman & Hall/CRC, 2006.10.1201/9781420010138Search in Google Scholar
30. Rubin D. Inference and missing data. Biometrika 1976;63:581–92.10.1093/biomet/63.3.581Search in Google Scholar
31. Regier MD, Moodie EEM, Platt RW. The effect of error-in-confounders on the estimation of the causal parameter when using marginal structural models and inverse probability-of-treatment weights: a simulation study. Int J Biostat 2014;10:1–15.10.1515/ijb-2012-0039Search in Google Scholar PubMed
32. Chen J, Zhg D, Davidian M. A Monte Carlo EM algorithm for generalized linear mixed models with flexible random effects distribution. Biostatistics 2002;3:347–60.10.1093/biostatistics/3.3.347Search in Google Scholar PubMed
33. Ibrahim JG, Chen MH, Lipsitz SR. Monte Carlo EM for missing covariates in parametric regression models. Biometrics 1999;55:591–6.10.1111/j.0006-341X.1999.00591.xSearch in Google Scholar PubMed
34. Lipsitz SR, Ibrahim JG. A conditional model for incomplete covariates in parametric regression models. Biometrika 1996;83:916–22.10.1093/biomet/83.4.916Search in Google Scholar
35. Liu W, Wu L. Simultaneous inference for semiparametric nonlinear mixed-effects models with covariate measurement errors and missing responses. Biometrics 2007;63:342–50.10.1111/j.1541-0420.2006.00687.xSearch in Google Scholar PubMed
36. Stubbendick A, Ibrahim JG. Maximum likelihood methods for nonignorable missing responses and covariates in random effects models. Biometrics 2003;59:1140–50.10.1111/j.0006-341X.2003.00131.xSearch in Google Scholar
37. Gilks W, Wild P. Adaptive rejection sampling for Gibbs sampling. Appl Stat 1992;41:337–48.10.2307/2347565Search in Google Scholar
38. Robert CP, Casella G. Monte Carlo statistical methods, 2nd edition. New York: Springer, 2004.10.1007/978-1-4757-4145-2Search in Google Scholar
39. Ibrahim JG, Chen MH, Lipsitz SR. Missing responses in generalised linear mixed models when the missing data mechanism is nonignorable. Biometrika 2001;88:551–64.10.1093/biomet/88.2.551Search in Google Scholar
40. Daigle B, Roh M, Petzold L, Niemi J. Accelerated maximum likelihood parameter estimation for stochastic biochemical systems. BMC Bioinformatics 2012;13:1–18.10.1186/1471-2105-13-68Search in Google Scholar PubMed PubMed Central
41. Casella G. Empirical Bayes Gibbs sampling. Biostatistics 2001;2:485–500.10.1093/biostatistics/2.4.485Search in Google Scholar PubMed
42. Horvitz D, Thompson D. A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 1952;47:663–85.10.1080/01621459.1952.10483446Search in Google Scholar
43. Lunceford JK, Davidian M. Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat Med 2004;23:2937–60.10.1002/sim.7231Search in Google Scholar PubMed
44. Wooldridge JM. Inverse probability weighted estimation for general missing data problems. J Econometrics 2007;141:1281–301.10.1920/wp.cem.2004.0504Search in Google Scholar
45. Cole SR, Hudgens MG, Tien PC, Anastos K, Kingsley L, Chmiel JS, et al.. Marginal structural models for case-cohort study designs to estimate the association of antiretroviral therapy initiation with incident AIDS or death. Am J Epidemiol 2012;175:381–90.10.1093/aje/kwr346Search in Google Scholar PubMed PubMed Central
46. Hernán MA, Hernández-Daz S, Robins JM. A structural approach to selection bias. Epidemiology 2004;15:615–25.10.1097/01.ede.0000135174.63482.43Search in Google Scholar PubMed
47. Moodie EEM, Stephens D. Marginal structural models: unbiased estimation for longitudinal studies. Int J Public Health 2011;56:117–19.10.1007/s00038-010-0198-4Search in Google Scholar PubMed
48. Robins JM, Hernán MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology 2000;11:550–60.10.1097/00001648-200009000-00011Search in Google Scholar PubMed
49. Hoffman K, Kunze R. Linear algebra, 2nd ed. Upple Sadle River, NJ: Prentice Hall, 1971.Search in Google Scholar
50. R Development Core Team (2014): R: a language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org, ISBN 3-900051-07-0.Search in Google Scholar
©2016 by Regier et al., published by De Gruyter
Articles in the same Issue
- Frontmatter
- Editorial
- Special Issue on Data-Adaptive Statistical Inference
- Research Articles
- Statistical Inference for Data Adaptive Target Parameters
- Evaluations of the Optimal Discovery Procedure for Multiple Testing
- Addressing Confounding in Predictive Models with an Application to Neuroimaging
- Model-Based Recursive Partitioning for Subgroup Analyses
- The Orthogonally Partitioned EM Algorithm: Extending the EM Algorithm for Algorithmic Stability and Bias Correction Due to Imperfect Data
- A Sequential Rejection Testing Method for High-Dimensional Regression with Correlated Variables
- Variable Selection for Confounder Control, Flexible Modeling and Collaborative Targeted Minimum Loss-Based Estimation in Causal Inference
- Testing the Relative Performance of Data Adaptive Prediction Algorithms: A Generalized Test of Conditional Risk Differences
- A Case Study of the Impact of Data-Adaptive Versus Model-Based Estimation of the Propensity Scores on Causal Inferences from Three Inverse Probability Weighting Estimators
- Influence Re-weighted G-Estimation
- Optimal Spatial Prediction Using Ensemble Machine Learning
- AUC-Maximizing Ensembles through Metalearning
- Selection Bias When Using Instrumental Variable Methods to Compare Two Treatments But More Than Two Treatments Are Available
- Doubly Robust and Efficient Estimation of Marginal Structural Models for the Hazard Function
- Data-Adaptive Bias-Reduced Doubly Robust Estimation
- Optimal Individualized Treatments in Resource-Limited Settings
- Super-Learning of an Optimal Dynamic Treatment Rule
- Second-Order Inference for the Mean of a Variable Missing at Random
- One-Step Targeted Minimum Loss-based Estimation Based on Universal Least Favorable One-Dimensional Submodels
Articles in the same Issue
- Frontmatter
- Editorial
- Special Issue on Data-Adaptive Statistical Inference
- Research Articles
- Statistical Inference for Data Adaptive Target Parameters
- Evaluations of the Optimal Discovery Procedure for Multiple Testing
- Addressing Confounding in Predictive Models with an Application to Neuroimaging
- Model-Based Recursive Partitioning for Subgroup Analyses
- The Orthogonally Partitioned EM Algorithm: Extending the EM Algorithm for Algorithmic Stability and Bias Correction Due to Imperfect Data
- A Sequential Rejection Testing Method for High-Dimensional Regression with Correlated Variables
- Variable Selection for Confounder Control, Flexible Modeling and Collaborative Targeted Minimum Loss-Based Estimation in Causal Inference
- Testing the Relative Performance of Data Adaptive Prediction Algorithms: A Generalized Test of Conditional Risk Differences
- A Case Study of the Impact of Data-Adaptive Versus Model-Based Estimation of the Propensity Scores on Causal Inferences from Three Inverse Probability Weighting Estimators
- Influence Re-weighted G-Estimation
- Optimal Spatial Prediction Using Ensemble Machine Learning
- AUC-Maximizing Ensembles through Metalearning
- Selection Bias When Using Instrumental Variable Methods to Compare Two Treatments But More Than Two Treatments Are Available
- Doubly Robust and Efficient Estimation of Marginal Structural Models for the Hazard Function
- Data-Adaptive Bias-Reduced Doubly Robust Estimation
- Optimal Individualized Treatments in Resource-Limited Settings
- Super-Learning of an Optimal Dynamic Treatment Rule
- Second-Order Inference for the Mean of a Variable Missing at Random
- One-Step Targeted Minimum Loss-based Estimation Based on Universal Least Favorable One-Dimensional Submodels