Startseite Empirical Likelihood in Nonignorable Covariate-Missing Data Problems
Artikel Öffentlich zugänglich

Empirical Likelihood in Nonignorable Covariate-Missing Data Problems

  • Yanmei Xie EMAIL logo und Biao Zhang
Veröffentlicht/Copyright: 20. April 2017
Veröffentlichen auch Sie bei De Gruyter Brill

Abstract:

Missing covariate data occurs often in regression analysis, which frequently arises in the health and social sciences as well as in survey sampling. We study methods for the analysis of a nonignorable covariate-missing data problem in an assumed conditional mean function when some covariates are completely observed but other covariates are missing for some subjects. We adopt the semiparametric perspective of Bartlett et al. (Improving upon the efficiency of complete case analysis when covariates are MNAR. Biostatistics 2014;15:719–30) on regression analyses with nonignorable missing covariates, in which they have introduced the use of two working models, the working probability model of missingness and the working conditional score model. In this paper, we study an empirical likelihood approach to nonignorable covariate-missing data problems with the objective of effectively utilizing the two working models in the analysis of covariate-missing data. We propose a unified approach to constructing a system of unbiased estimating equations, where there are more equations than unknown parameters of interest. One useful feature of these unbiased estimating equations is that they naturally incorporate the incomplete data into the data analysis, making it possible to seek efficient estimation of the parameter of interest even when the working regression function is not specified to be the optimal regression function. We apply the general methodology of empirical likelihood to optimally combine these unbiased estimating equations. We propose three maximum empirical likelihood estimators of the underlying regression parameters and compare their efficiencies with other existing competitors. We present a simulation study to compare the finite-sample performance of various methods with respect to bias, efficiency, and robustness to model misspecification. The proposed empirical likelihood method is also illustrated by an analysis of a data set from the US National Health and Nutrition Examination Survey (NHANES).

1 Introduction

Missing data is an important practical problem commonly found in medical sciences, social sciences, and many other related disciplines. For example, in sample surveys, some sampled individuals do not complete the questionnaire because of non-contact, refusal to respond, or some other reason. In panel surveys and clinical trials, attrition arises when members of the panel or subjects of the longitudinal study group drop out prior to the end of the study and do not return. In the past decade, there has been a great deal of interest in statistical methods for studying missing data problems, including missing response, missing covariate, or both. There are three major missing-data mechanisms, as discussed in Little and Rubin [1]. The first and simplest case is called missing completely at random (MCAR), i.e., missingness does not depend on any observed or missing quantities. The second case is missing at random (MAR), i.e., missingness depends only on the observed quantities. The third case is called missing not at random (MNAR) or nonignorable missing, i.e., missingness depends on the unobserved quantities; this is the most difficult case to handle. Most of the past works have mainly focused on ignorable missing data problems involving MCAR and MAR missing data mechanisms. For a review on the analyses of ignorable missing data problems, we refer the readers to Little and Rubin [1]. In this paper, we focus on the study of a nonignorable covariate-missing data problem motivated by the alcohol and blood pressure data from the US National Health and Nutrition Examination Survey (NHANES) [2] as described below.

The NHANES is a program of studies designed to assess the health and nutritional status of adults and children in the United States. This survey produces vital and health statistics for the Nation and provides an important basis for public health interventions. For the Demographics Data, the Examination Data, and the Questionnaire Data from the 2003–2004 NHANES, age and gender were fully observed, but the other variables such as education, systolic blood pressure (SBP), diastolic blood pressure (DBP), body mass index (BMI), income, and alcohol consumption had missing values. As argued by Little and Zhang [3], it is plausible and reasonable to assume that the missingness in education, BMI and the two blood pressure measures was missing at random (MAR) due to missed visits, but missingness of household income was thought more likely to be missing not at random (MNAR), since the propensity to respond to the income question in sample surveys is likely to depend on and possibly determined by the underlying value of the income variable, with those with low or high income generally considered less likely to respond than others. In a recent and related work, Bartlett et al. [4] have considered data on alcohol consumption and blood pressure from the 2003–2004 NHANES and have argued that missingness in alcohol consumption is likely to depend largely on alcohol consumption itself and is thus missing not at random (MNAR). To study the effect of socio-economic status (income and education) on blood pressure measures, Little and Zhang [3] have performed the regression analysis of blood pressure on income and education, adjusting for age, gender and body mass index after conditioning on the subsample of cases with household income fully observed. In addition, Bartlett et al. [4] have considered the regression analysis on the dependence of systolic blood pressure on the average number of alcoholic drinks consumed per day, with adjustment for age and body mass index. Both regression analyses can be regarded as nonignorable covariate-missing data problems.

Under the ignorable covariate-missing data mechanism, there are a variety of statistical methods available in the literature on analysis of missing data, including (a) complete case analysis (CCA), (b) inverse probability weighting (IPW) in the spirit of Horvitz and Thompson [5], which weights each of the complete data by the inverse of its inclusion probability, and (c) augmented inverse probability weighted complete-case (AIPWCC) proposed by Robins et al. [6], which incorporates the incomplete data to increase efficiency while reducing the bias. By contrast, under the nonignorable covariate-missing data mechanism, since the probability of nonignorable missingness in a covariate variable is dependent on the value of that variable, it is generally difficult to specify a model for the nonignorable covariate-missing data mechanism. To overcome this difficulty to handle nonignorable covariate-missing data problems, Bartlett et al. [4] have proposed an augmented complete case analysis by modeling the probability of missingness on the fully observed variables under the assumption that missingness in a covariate depends on the value of that covariate, but is conditionally independent of the outcome variable. Under this missing not at random (MNAR) mechanism, the commonly used CCA approach is valid and gives rise to consistent estimates, but is inefficient in that it does not make use of all observed information by disregarding all incomplete cases. Like the AIPWCC approach, the augmented CCA approach utilizes all the observed data by drawing on the information available from both complete and incomplete cases and thus improves upon the efficiency of CCA through specification of an additional model for the probability of missingness given the fully observed variables. The estimating function used in Bartlett et al. [4] can be viewed as the difference of two separate estimating functions, but other linear combinations of these two estimating functions are also possible. Thus, the question arises as how to combine the two sets of estimating functions adopted in the augmented complete case analysis of Bartlett et al. [4].

Our objective in this paper is to explore the use of empirical likelihood methods in the nonignorable covariate-missing data problem described in Little and Zhang [3] and Bartlett et al. [4] to effectively incorporate incomplete cases into the analysis of covariate-missing data and thus to improve estimation efficiency when the data are not missing at random. We propose to construct a system of unbiased estimating equations, where there are more equations than unknown parameters of interest. These estimating equations are always unbiased for any given working regression function as long as the working probability model of missingness is correctly specified. One useful feature of these unbiased estimating equations is that they naturally incorporate the incomplete data into the data analysis, making it possible to seek efficient estimation of the parameter of interest even when the working regression function is possibly not specified to be the optimal regression function. We apply the general methodology of empirical likelihood to effectively and optimally combine these unbiased estimating equations when the number of estimating equations is greater than the number of parameters of interest. Furthermore, we propose to study maximum empirical likelihood estimation of the underlying regression parameters based on the aforementioned system of unbiased estimating equations. Moreover, we apply the proposed empirical likelihood estimators to the analysis of the alcohol and blood pressure data from the 2003–2004 NHANES.

As a nonparametric method, empirical likelihood was introduced by [7, 8] for constructing confidence intervals or regions for the mean and other parameters. One advantage of using empirical likelihood is that the shape of the confidence region is determined automatically by the data. Qin and Lawless [9] have demonstrated that the empirical likelihood method can be used to solve estimating equations when the number of estimating equations exceeds the number of parameters. For theoretical developments of the empirical likelihood method, we refer readers to Hall and La Scala [10] and Owen [11]. For the analysis of missing data using the empirical or nonparametric likelihood method, see, for example, Wang and Rao [12, 13], Chen et al. [14], Liang et al. [15], Stute et al. [16], Qin and Zhang [17], and Wang and Dai [18], among others.

This paper is organized as follows. In Section 2, we propose three unbiased estimating functions and study maximum empirical likelihood estimation of the underlying regression parameters. In Section 3, we study efficiency comparison between the proposed and existing estimators. In Section 4, the proposed methodology is illustrated using a data set from the US National Health and Nutrition Examination Survey (NHANES). Simulation results are presented in Section 5 and concluding remarks are given in Section 6. Proofs of the main theoretical results are delegated to an Appendix in Section 7.

2 Methodology

In clinical trials and sample surveys, completely observed covariate information is often not available for every subject. We consider a regression analysis of an outcome variable Y on a vector Z of covariates which are always observed and a vector X of covariates which can be missing for some subjects. Our interest lies in the estimation of the unknown p×1 vector of regression parameters, β, characterized by the conditional mean model

(1)Y=μ(X,Z,β)+ε

for a specified, possibly nonlinear, link function μ(X,Z,β), where the random error ε satisfies E(ε|X,Z)=0 so that E(Y|X,Z)=μ(X,Z,β), and the joint distribution of the regressors (X,Z) is left completely unspecified. Conditional mean models include familiar linear and logistic regression models. The regression analysis with missing covariate data has received much attention recently in the statistical literature; see, for example, Little and Rubin [1], Little and Schluchter [19], Ibrahim [20], Little [21], Ibrahim and Lipsitz [22], Lipsitz and Ibrahim [23], and Ibrahim et al. [24]. Let D denote a binary non-missing indicator such that D=1 if the covariate X is observed and D=0 if X is missing. When the covariate vector X is missing at random, the probability of X being observed is commonly modeled by a parametric model for the propensity score P(D=1|Y=y,Z=z)=P(D=1|Y=y,X=x,Z=z); the parameters in this parametric propensity score model can be consistently estimated by the fully observed data on (Y,Z,D). The validity of the commonly used IPW and AIPWCC methods is contingent upon the correct specification of a propensity score model for the missingness mechanism P(D=1|Y=y,Z=z) under the MAR assumption.

Let (Y1,X1,Z1,D1),,(Yn,Xn,Zn,Dn) denote a random sample of (Y,X,Z,D). We are interested in estimating the regression parameter β based on the observed data {(Yi,DiXi,Zi,Di),i=1,,n} available for analysis. Let U(Y,X,Z,β) denote a p×1 specific full-data unbiased estimating function for β; a common choice for U(Y,X,Z,β) is A(X,Z,β){Yμ(X,Z,β)} under the conditional mean model (1), where A(X,Z,β) is a vector of p known functions of (X,Z) and β. In the absence of missing data, since E{U(Y,X,Z,β)}=0, a full-data estimator βˆf of β solves the full-data estimating equation: i=1nU(Yi,Xi,Zi,β)=0. With missing covariate data, a complete-case estimator βˆc of β solves the complete-case estimating equation

(2)i=1nDiU(Yi,Xi,Zi,β)=0;

it is well known that βˆc can be biased when X is not missing completely at random (MCAR). Under the nonignorable missing-data mechanism in which the missingness of X is dependent on the value of X itself, we cannot directly estimate the underlying missingness mechanism P(D=1|Y=y,X=x,Z=z) on the basis of the observed data {(Yi,DiXi,Zi,Di),i=1,,n} due to missingness of Xi whenever Di=0. As a result, the propensity score approach based on modeling P(D=1|Y=y,X=x,Z=z) is not applicable to the estimation of β under model (1) with nonignorable missingness of X; in particular, the MAR-based IPW and AIPWCC methods cannot be applied to estimate β in this context.

Under the assumption that the missingness of X is independent of the outcome variable Y conditional on covariates X and Z, namely, D and Y are conditionally independent given X and Z or DY|X,Z, the complete-case estimator βˆc is a consistent estimator of the regression parameter β. Although the complete-case analysis using the estimating eq. (2) provides valid inferences for β under the conditional independence assumption of DY|X,Z, the resulting complete-case estimator βˆc is inefficient in that it fails to draw on the observed information contained in the incomplete cases. To attempt to improve upon the efficiency of the complete-case analysis, Bartlett et al. [4] have proposed an additional model for the probability of missingness given the fully observed outcome variable Y and the fully observed covariate Z by making the same conditional independence assumption for missingness as in the complete-case analysis. Specifically, under the conditional independence assumption of DY|X,Z, the probability of X being observed is described by the probability model π0(y,z)=P(D=1|Y=y,Z=z). Let π(y,z;γ) represent a working probability model for π0(y,z), where π(y,z;γ) is a strictly positive function of (y,z,γ) and γ is a q×1 vector parameter. Let γ0 denote the truth of γ; if the working probability model is correctly specified, then π(y,z;γ0)=π0(y,z)=P(D=1|Y=y,Z=z). Since the model π(y,z;γ) for P(D=1|Y=y,Z=z) only involves fully observed variables Y and Z, we can estimate γ by the method of maximum likelihood based on the fully observed data (Y1,Z1,D1),,(Yn,Zn,Dn). Indeed, we can estimate γ by the maximum likelihood estimator γˆn, which maximizes the binomial likelihood:

LB(γ)=i=1n{π(Yi,Zi;γ)}Di{1π(Yi,Zi;γ)}1Di

and is a solution to the following system of score equations:

(3)Un(γ)1nlogLB(γ)γ=1ni=1n{Diπ(Yi,Zi;γ)}πγ(Yi,Zi;γ)π(Yi,Zi;γ){1π(Yi,Zi;γ)}=0,

where πγ(y,z;γ)=π(y,z;γ)/γ. Although the additional model π(y,z;γ) for the probability of missingness is not a model for the underlying missingness mechanism P(D=1|Y=y,X=x,Z=z) under the conditional independence assumption, it allows us to develop an alternative approach to estimation of β. Under the conditional independence assumption of DY|X,Z and based on the working probability model π(y,z;γ) for the missingness of X, Bartlett et al. [4] have proposed an augmented CCA estimation method for estimating β. To describe their method, we posit a working conditional score model for m0(Y,Z;β,ξ0)=E{U(Y,X,Z,β)|Y,Z,D=1} in terms of a parametric model m(Y,Z;β,ξ) with the same dimension as β, where m(Y,Z;β,ξ) is an arbitrary, user specified, and possibly data dependent working regression function of (Y,Z;β,ξ) and ξ is an additional finite-dimensional parameter with its true value denoted as ξ0. For each fixed (γ,ξ), let βˆacc(γ,ξ) be defined as a solution to the system of augmented complete-case estimating equations:

(4)i=1n[DiU(Yi,Xi,Zi,β){Diπ(Yi,Zi;γ)}m(Yi,Zi;β,ξ)]=0.

Then the augmented complete case (ACC) estimators of β are given, respectively, by βˆacc0=βˆacc(γ0,ξˆn) when γ0 is known and βˆacc=βˆacc(γˆn,ξˆn) when γ0 is unknown, where ξˆn is an estimator of ξ based on the complete data {(Yi,Xi,Zi):Di=1,i=1,,n}; for example, ξˆn may be chosen to be the possibly nonlinear least squares estimator of ξ from the regression of U(Yi,Xi,Zi;βˆc) on (Yi,Zi) among the complete cases {i:Di=1}. We assume throughout that ξˆn is n-consistent for some constant ξ or ξˆnξ=Op(n1/2) under suitable regularity conditions. When the working probability model π(y,z;γ) is correctly specified for P(D=1|Y=y,Z=z), Bartlett et al. [4] have shown under suitable regularity conditions that the augmented complete case (ACC) estimator βˆacc is consistent and asymptotically normal for any working regression function m(Y,Z;β,ξ). They have also shown that for a given choice of U(Y,X,Z,β), the optimal choice for the working regression function m(Y,Z;β,ξ) is given by

(5)m0(Y,Z;β,ξ0)=E{U(Y,X,Z,β)|Y,Z,D=1}.

When the working regression function m(y,x;β,ξ) is correctly specified to be the optimal regression function in eq. (5), we have ξ=ξ0. In this case, the augmented complete case estimator βˆacc improves upon the efficiency of the complete case estimator βˆc by minimizing the variance of βˆacc among all the choices of the working regression function m(Y,Z;β,ξ) for any given choice of U(Y,X,Z,β). In addition, Bartlett et al. [4] have proposed a modified estimator of β, which is guaranteed to be at least as efficient as the complete case estimator βˆc for any choice of the working regression function m(Y,Z;β,ξ).

To explore the empirical likelihood approach to the estimation of the regression parameter β under model (1) with covariate X subject to nonignorable missing, we are motivated by eqs (3) and (4) to define the following estimating functions on the basis of the working probability model π(y,z;γ) and working regression model m(y,z;β,ξ):

g1Y,Z,D,X;β=DUY,Z,X;β,g2Y,Z,D;β,γ,ξ=DπY,Z;γmY,Z;β,ξ,g3Y,Z,D;γ=DπY,Z;γπγY,Z;γπY,Z;γ1πY,Z;γ,gY,Z,D,X;β,γ,ξ=g1Y,Z,D,X;βg2Y,Z,D;β,γ,ξ,GY,Z,D,X;β,γ,ξ=g1Y,Z,D,X;βg2Y,Z,D;β,γ,ξg3Y,Z,D;γ.

The first estimating function g1(Y,Z,D,X;β) is identical to the complete case estimating function in eq. (2) and has mean zero or EF{g1(Y,Z,D,X;β0)}=0 when the conditional independence assumption of DY|X,Z holds, where (Y,Z,D,X)F. The second estimating function g2(Y,Z,D;β,γ,ξ) involves both subjects with X observed and those with X missing and has mean zero or EF{g2(Y,Z,D;β,γ0,ξ)}=0 for any working regression model m(y,z;β,ξ) provided that the probability model π(y,z;γ) for P(D=1|Y=y,Z=z) is correctly specified, since then E{Dπ(Y,Z;γ0)|Y,Z}=0. The third estimating function g3(Y,Z,D;γ) is the score function of γ and is optional when the true probability model π(y,z;γ0) for P(D=1|Y=y,Z=z) is completely known. Furthermore, when the working propensity score π(y,z;γ) is correctly specified for P(D=1|Y=y,Z=z), we have EF{g3(Y,Z,D;γ0)}=0.

2.1 Maximum empirical likelihood estimation of β when γ0 is known

In this subsection, we assume that the true probability model π(y,z;γ0) that we have adopted for modeling P(D=1|Y=y,Z=z) is completely known. When γ0 is known, we impose constraints on g1(Y,Z,D,X;β) and g2(Y,Z,D;β,γ0,ξˆn) and maximize the nonparametric likelihood LF=i=1npi subject to the constraints

i=1npi=1,pi0,i=1npig(Yi,Zi,Di,Xi;β,γ0,ξˆn)=0,

where pi=dF(Yi,Zi,Di,Xi) for i=1,,n and ξˆn is some n-consistent estimator of ξ based on the complete cases {i:Di=1}. For fixed β, an application of the Lagrange multipliers method shows that the maximum value of LF is attained at

p˜1i(β)=1n11+λˆ1nτg(Yi,Zi,Di,Xi;β,γ0,ξˆn),

for i=1,,n, where the Lagrange multiplier λˆ1n=λˆ1n(β) is determined by

(6)U1n(λ)U1n(λ,β,ξˆn)=1ni=1ng(Yi,Zi,Di,Xi;β,γ0,ξˆn)1+λτg(Yi,Zi,Di,Xi;β,γ0,ξˆn)=0.

After profiling the p1i’s, the profile log likelihood function of β is given by

(7)1(β)=i=1nlogpi=i=1nlog[1+λˆ1nτ(β)g(Yi,Zi,Di,Xi;β,γ0,ξˆn)]nlogn.

Let βˆel1 denote the maximum empirical likelihood estimator of β, which maximizes 1(β). The asymptotic distribution of βˆel1 is established in the following theorem. Theorem1:Under the regularity conditions (A1)–(A6) stated in the Appendix, if the true probability model π(y,z;γ0)=π0(y,z)=P(D=1|Y=y,Z=z) is known, we can write

βˆel1β0=1ni=1nC11{g1(Yi,Zi,Di,Xi;β0)C4C51g2(Yi,Zi,Di;β0,γ0,ξ)}+op(n1/2),

where C1,C4,C5 are defined in eqs (14) and (19), respectively. As a result, n(βˆel1β0)dN(0,Σel1), where

Σel1=C11Var{g1(Y,Z,D,X;β0)C4C51g2(Y,Z,D;β0,γ0,ξ)}(C11)τ.

2.2 Maximum empirical likelihood estimation of β when γ0 is unknown

In practice, the true probability model π(y,z;γ0) is often unknown because it is difficult to specify a completely known probability model for π0(y,z)=P(D=1|Y=y,Z=z). Thus, we assume that π(y,z;γ) is the posited working probability model for π0(y,z). We consider two different approaches to estimation of β without and with the constraints on g3(Y,Z,D;γ0). In the first approach, we impose no constraints on g3(Y,Z,D;γ), while the second approach imposes constraints on g3(Y,Z,D;γ).

In the first approach, we impose constraints on two estimating functions g1(Y,Z,D,X;β) and g2(Y,Z,D;β,γ,ξ), and employ the maximum likelihood estimator γˆn of γ. Specifically, we maximize the nonparametric likelihood LF=i=1npi subject to the constraints i=1npi=1, pi0, i=1npig(Yi,Zi,Di,Xi;β,γˆn,ξˆn)=0, where pi=dF(Yi,Zi,Di,Xi) for i=1,,n. Similar to eqs (6) and (7), we propose to estimate β by the maximum empirical likelihood estimator βˆel2, which maximizes the profile log empirical likelihood function of β:

2(β)=i=1nlog{1+λˆ2nτ(β)g(Yi,Zi,Di,Xi;β,γˆn,ξˆn)}nlogn,

where the Lagrange multiplier λˆ2n=λˆ2n(β) solves

U2n(λ)U2n(λ,β,γˆn,ξˆn)=1ni=1ng(Yi,Zi,Di,Xi;β,γˆn,ξˆn)1+λτg(Yi,Zi,Di,Xi;β,γˆn,ξˆn)=0.

The motivation of the second approach is to improve the efficiency of βˆel2 by adding the score estimating function g3(Y,Z,D;γ) to the constraints associated with 2(β). Thus, in the second approach, we impose constraints on g1(Y,Z,D,X;β),g2(Y,Z,D;β,γˆn,ξˆn), and g3(Y,Z,D;γˆn) and maximize LF=i=1npi subject to the constraints i=1npi=1,pi0,i=1npiG(Yi,Zi,Di,Xi;β,γˆn,ξˆn)=0. This leads to estimation of β by maximizing the profile log empirical likelihood function 3(β)=3(β,λˆ3n(β),γˆn) of β, where

(8)3(β,λ,γ)=i=1nlog{1+λτG(Yi,Zi,Di,Xi;β,γ,ξˆn)}nlogn,

and, for fixed β, the Lagrange multiplier λˆ3n=λˆ3n(β) is determined by

(9)U3n(λ)U3n(λ,β,γˆn,ξˆn)=1ni=1nG(Yi,Zi,Di,Xi;β,γˆn,ξˆn)1+λτG(Yi,Zi,Di,Xi;β,γˆn,ξˆn)=0.

Let βˆel3 denote the maximum empirical likelihood estimator of β that maximizes 3(β). The following theorem summarizes the large-sample results of βˆel2 and βˆel3. Theorem2:Under the regularity conditions (A1)-(A6) stated in the Appendix, if the working probability model π(y,z;γ) is correctly specified, we can write

βˆel2β0=1ni=1nC11[g1(Yi,Zi,Di,Xi;β0)C4C51{g2(Yi,Zi,Di;β0,γ0,ξ)C2Sγ1g3(Yi,Zi,Di;γ0)}]+op(n1/2),βˆel3β0=1ni=1nC11[g1(Yi,Zi,Di,Xi;β0)C7{g2(Yi,Zi,Di;β0,γ0,ξ)C2Sγ1g3(Yi,Zi,Di;γ0)}]+op(n1/2),

where Sγ,C2,C7 are, respectively, defined in eqs (11), (14), and (19). As a result, n(βˆel2β0)dN(0,Σel2) and n(βˆel3β0)dN(0,Σel3), where

Σel2=C11Var[g1(Y,Z,D,X;β0)C4C51{g2(Y,Z,D;β0,γ0,ξ)C2Sγ1g3(Y,Z,D;γ0)}](C11)τ,Σel3=C11Var[g1(Y,Z,D,X;β0)C7{g2(Y,Z,D;β0,γ0,ξ)C2Sγ1g3(Y,Z,D;γ0)}](C11)τ.

The asymptotic expansions of the maximum empirical likelihood estimators βˆel1, βˆel2, and βˆel3 in Theorems 1 and 2 provide a revealing insight into the potential efficiency gain by employing the empirical likelihood method to incorporate the working probability model of missingness and the working regression model into the analysis of nonignorable covariate-missing data problems; we will discuss the efficiency comparison of βˆel1, βˆel2, and βˆel3 with other competitive estimators in the next section. To make inference for β, we need to consistently estimate the asymptotic covariance matrices Σel1,Σel2, and Σel3 given in Theorems 1 and 2. Here, we present an empirical-likelihood-based estimator of Σel3; the empirical-likelihood-based estimators of Σel1 and Σel2 can be similarly derived. Throughout this paper, let E3n() denote the empirical likelihood mean operator associated with βˆel3. For example, E3n(Y)=i=1np˜3i(βˆel3)Yi, where

p˜3i(β)=1n11+λˆ3nτG(Yi,Zi,Di,Xi;β,γˆn,ξˆn),i=1,,n.

Then the asymptotic covariance matrix Σel3 of βˆel3 can be consistently estimated by

Σ˜el3=C˜11E3n([g1(Y,Z,D,X;βˆel3)C˜7{g2(Y,Z,D;βˆel3,γˆn,ξˆn)C˜2S˜γ1g3(Y,Z,D;γˆn)}]2)(C˜11)τ,

where

C˜1=E3n{DU(Y,Z,X;βˆel3)βτ},C˜2=E3n{m(Y,Z;βˆel3,ξˆn)πγτ(Y,Z;γˆn)},C˜4=E3n[DU(Y,Z,X;βˆel3){1π(Y,Z;γˆn)}mτ(Y,Z;βˆel3,ξˆn)]C˜5=E3n[π(Y,Z;γˆn){1π(Y,Z;γˆn)}m(Y,Z;βˆel3,ξˆn)mτ(Y,Z;βˆel3,ξˆn)],C˜6=E3n[DU(Y,Z,X;βˆel3)πγτ(Y,Z;γˆn)π(Y,Z;γˆn)],S˜γ=E3n[πγ(Y,Z;γˆn)πγτ(Y,Z;γˆn)π(Y,Z;γˆn){1π(Y,Z;γˆn)}],C˜7=(C˜4C˜6S˜γ1C˜2τ)(C˜5C˜2S˜γ1C˜2τ)1.

Here we define A2=AAτ for matrix A. The asymptotic normality of βˆel3 in Theorem 2 implies that an approximate level 1α Wald confidence region for the regression parameter β is the ellipsoid determined by all β such that

n(βˆel3β)τΣ˜el31(βˆel3β)χp2(1α),

where χp2(α) is the lower (100α)th percentile of the χp2-distribution, satisfying P{χp2χp2(α)}=α. The hypothesis H0:β=β0 is rejected in favor of H1:β/=β0, at a level of significance approximately α, if the observed Wald test statistic n(βˆel3β0)τΣ˜el31(βˆel3β0)>χp2(1α). In a similar manner, we can construct confidence intervals and test statistics for components of β.

3 Efficiency comparison

To compare the proposed maximum empirical likelihood estimators βˆel1, βˆel2, and βˆel3 with other existing estimators, we employ the notion of influence functions. For an exposition on influence functions, see, for example, Tsiatis [25] Chapter 3. For two matrices M1 and M2, the notations M1M2 and M2M1 mean that M2M1 is nonnegative definite. Recall that βˆc is the complete-case estimator defined in eq. (2), and βˆacc0 and βˆacc are the augmented complete-case estimators defined in eq. (4). In addition to βˆacc, Bartlett et al. [4] have also proposed a modified estimator βˆacc2 which, for a given choice of working regression model m(y,z;β,ξ), solves

i=1n[DiU(Yi,Xi,Zi,β)Cˆ7{Diπ(Yi,Zi;γˆn)}m(Yi,Zi;β,ξˆn)]=0,

where Cˆ7=(Cˆ4Cˆ6Sˆγ1Cˆ2τ)(Cˆ5Cˆ2Sˆγ1Cˆ2τ)1 is a consistent estimator of C7 defined in eq. (19) and

Cˆ2=En{m(Y,Z;βˆc,ξˆn)πγτ(Y,Z;γˆn)},Cˆ4=En[DU(Y,Z,X;βˆc){1π(Y,Z;γˆn)}mτ(Y,Z;βˆc,ξˆn)]Cˆ5=En[π(Y,Z;γˆn){1π(Y,Z;γˆn)}m(Y,Z;βˆc,ξˆn)mτ(Y,Z;βˆc,ξˆn)],Cˆ6=En[DU(Y,Z,X;βˆc)πγτ(Y,Z;γˆn)π(Y,Z;γˆn)],Sˆγ=En[πγ(Y,Z;γˆn)πγτ(Y,Z;γˆn)π(Y,Z;γˆn){1π(Y,Z;γˆn)}].

Throughout En() represents the empirical mean operator. Bartlett et al. [4] have shown that βˆacc2 is at least as efficient as βˆc.

We first compare βˆel1 with the complete-case estimator βˆc and the augmented complete-case estimator βˆacc0. According to Bartlett et al. [4], the influence function of βˆc is given by

ψc(Y,Z,D,X;β0)=C11g1(Y,Z,D,X;β0).

Furthermore, it can be shown that the influence function of βˆacc0 is equal to

ψacc0(Y,Z,D,X;β0,γ0)=C11{g1(Y,Z,D,X;β0)g2(Y,Z,D;β0,γ0,ξ)}.

Moreover, the asymptotic expansion of βˆel1 in Theorem 2.1 implies that the influence function of βˆel1 is given by

ψel1(Y,Z,D,X;β0,γ0,ξ)=C11{g1(Y,Z,D,X;β0)C4C51g2(Y,Z,D;β0,γ0,ξ)}.

To compare the asymptotic performances of βˆel1, βˆc, and βˆacc0, let

Λ2={Ag2(Y,Z,D;β0,γ0,ξ)for all1×preal vectorsA}

denote the linear space spanned by g2(Y,Z,D;β0,γ0,ξ) and let Π{g1(Y,Z,D,X;β0)|Λ2} stand for the projection of g1(Y,Z,D,X;β0) onto the space Λ2. It can be shown after some algebra that

Π{g1(Y,Z,D,X;β0)|Λ2}=C4C51g2(Y,Z,D;β0,γ0,ξ).

Thus, the influence function of βˆel1 can be alternatively written as

ψel1(Y,Z,D,X;β0,γ0,ξ)=ψc(Y,Z,D,X;β0)Π{ψc(Y,Z,D,X;β0)|Λ2}.

This implies that ψel1(Y,Z,D,X;β0,γ0,ξ) is the residual from the projection of ψc(Y,Z,D,X;β0) onto the space Λ2. It now follows from the theory of influence functions that

Σel1=Var{ψel1(Y,Z,D,X;β0,γ0,ξ)}Var{ψc(Y,Z,D,X;β0)}Σc.

In addition, since the second term in ψacc0(Y,Z,D,X;β0,γ0) belongs to the space Λ2, we have

Σel1=Var{ψel1(Y,Z,D,X;β0,γ0,ξ)}Var{ψacc0(Y,Z,D,X;β0,γ0)}Σacc0.

These two facts demonstrate that βˆel1 is asymptotically at least as efficient as both βˆc and βˆacc0 for any working regression function m(y,z;β,ξ), whether or not it correctly identifies the optimal regression function E{U(Y,X,Z,β)|Y,Z,D=1}. Nevertheless, βˆacc0 is not guaranteed to be asymptotically at least as efficient as βˆc.

Next, we compare the three proposed estimators βˆel1, βˆel2, and βˆel3. The asymptotic expansions of βˆel2 and βˆel3 in Theorem 2.2 imply that their influence functions are given by

ψel2(Y,Z,D,X;β0,γ0,ξ)=C11[g1(Y,Z,D,X;β0)C4C51{g2(Y,Z,D;β0,γ0,ξ)C2Sγ1g3(Y,Z,D;γ0)}],ψel3(Y,Z,D,X;β0,γ0,ξ)=C11[g1(Y,Z,D,X;β0)C7{g2(Y,Z,D;β0,γ0,ξ)C2Sγ1g3(Y,Z,D;γ0)}].

To compare the asymptotic performances among βˆel1, βˆel2, and βˆel3, let

Λ3={Ag3(Y,Z,D;γ0)for all1×qreal vectorsA}

denote the linear space spanned by g3(Y,Z,D;γ0). It can be shown after some algebra that

(10)Π{g2(Y,Z,D;β0,γ0,ξ)|Λ3}=C2Sγ1g3(Y,Z,D;γ0).

The residual from the projection of g2(Y,Z,D;β0,γ0,ξ) onto the space Λ3 is orthogonal to components of the estimating function g3(Y,Z,D;γ0) and is given by

h23(Y,Z,D;β0,γ0,ξ)g2(Y,Z,D;β0,γ0,ξ)C2Sγ1g3(Y,Z,D;γ0)=Π{g2(Y,Z,D;β0,γ0,ξ)|Λ3},

where Λ3 is the orthogonal complement of Λ3. Moreover, let

Λ23={Ah23(Y,Z,D;β0,γ0,ξ)for all1×qreal vectorsA}

represent the linear space spanned by h23(Y,Z,D;β0,γ0,ξ). Then it is seen that the projection of g1(Y,Z,D,X;β0) onto the space Λ23 is given by

Π{g1(Y,Z,D,X;β0)|Λ23}=C7h23(Y,Z,D;β0,γ0,ξ).

This, together with eq. (10), implies that ψel3(Y,Z,D,X;β0,γ0,ξ) is the residual from the projection of ψc(Y,Z,D,X;β0) onto the space Λ23, namely,

ψel3(Y,Z,D,X;β0,γ0,ξ)=ψc(Y,Z,D,X;β0)Π{ψc(Y,Z,D,X;β0)|Λ23}.

In other words, ψel3(Y,Z,D,X;β0,γ0,ξ) is the residual from the projection of ψc(Y,Z,D,X;β0) onto the linear space spanned by the residual from the projection of g2(Y,Z,D;β0,γ0,ξ) onto the linear space spanned by g3(Y,Z,D;γ0). Again, the theory of influence functions implies that

Σel3=Var{ψel3(Y,Z,D,X;β0,γ0,ξ)}Var{ψc(Y,Z,D,X;β0)}=Σc.

Furthermore, since the second term in ψel2(Y,Z,D,X;β0,γ0,ξ) belongs to the space Λ23, we have

Σel3=Var{ψel3(Y,Z,D,X;β0,γ0,ξ)}Var{ψel2(Y,Z,D,X;β0,γ0,ξ)}=Σel2.

Consequently, βˆel3 is asymptotically at least as efficient as both βˆc and βˆel2 for any working regression function m(y,z;β,ξ), whether or not it correctly identifies the optimal regression function E{U(Y,X,Z,β)|Y,Z,D=1}. Since the empirical likelihood method leading to βˆel2 imposes a smaller number of constrains and hence involves a smaller number of Lagrange multipliers, βˆel2 is computationally less intensive than βˆel3. On the other hand, use of βˆel2 does not ensure that its asymptotic efficiency is at least as good as βˆc, through specification of a working probability model for the probability of missingness. Moreover, there is no guarantee that either βˆel2 or βˆel3 improves upon the asymptotic efficiency of βˆel1, indicating that we are not guaranteed to improve on the asymptotic efficiency of estimation for β by estimating the probability of missingness even when it is known. This phenomenon contrasts with the well-known counter-intuitive fact under MAR scenarios where we can improve on the efficiency of estimation for a parameter by estimating the propensity score even when it is known.

Finally, we compare the proposed estimators βˆel2 and βˆel3 with the augmented complete-case estimators βˆacc and βˆacc2. According to Bartlett et al. [4], the influence function of βˆacc is

ψacc(Y,Z,D,X;β0,γ0,ξ)=C11{g1(Y,Z,D,X;β0)h23(Y,Z,D;β0,γ0,ξ)}.

Moreover, it is seen that the influence function of βˆacc2 is identical to that of βˆel3, i.e.,

ψacc2(Y,Z,D,X;β0,γ0,ξ)=ψel3(Y,Z,D,X;β0,γ0,ξ).

Thus, βˆel3 and βˆacc2 are asymptotically equivalent. Now since the second term in the influence function ψacc(Y,Z,D,X;β0,γ0,ξ) belongs to the space Λ23, we have

Σel3=Var{ψel3(Y,Z,D,X;β0,γ0,ξ)}=Var{ψacc2(Y,Z,D,X;β0,γ0,ξ)}Var{ψacc(Y,Z,D,X;β0,γ0,ξ)}Σacc.

As a result, βˆel3 and βˆacc2 are asymptotically at least as efficient as both βˆel2 and βˆacc for any given working regression model m(y,z;β,ξ) posited for E{U(Y,X,Z,β)|Y,Z,D=1}. However, there does not appear to have a direct asymptotic efficiency comparison between βˆel2 and βˆacc, though the simulation study in Section 5 indicates that βˆel2 has a better small-sample performance than βˆacc in terms of root mean square error.

We close this section by pointing out that when the working regression function m(Y,Z;β,ξ) correctly specifies the true regression function E{U(Y,X,Z,β)|Y,Z,D=1}, so that m(Y,Z;β0,ξ0)=E{U(Y,X,Z,β0)|Y,Z,D=1}, it can be shown after some algebra that C2=C6, C4=C5, and C7=Ip, and hence

ψel1(Y,Z,D,X;β0,γ0,ξ)=ψacc0(Y,Z,D,X;β0,γ0),ψel3(Y,Z,D,X;β0,γ0,ξ)=ψel2(Y,Z,D,X;β0,γ0,ξ)=ψacc2(Y,Z,D,X;β0,γ0,ξ)=ψacc(Y,Z,D,X;β0,γ0,ξ).

This implies that βˆel1 and βˆacc0 are asymptotically equivalent when γ0 is known and that βˆel2, βˆel3, βˆacc, and βˆacc2 are asymptotically equivalent when γ0 is unknown. As discussed earlier, the asymptotic efficiency of βˆel1 and βˆacc0 with γ0 known differs from that of βˆel2, βˆel3, βˆacc, and βˆacc2 with γ0 unknown, but all these estimators, βˆel1, βˆacc0, βˆel2, βˆel3, βˆacc, and βˆacc2 are asymptotically at least as efficient as βˆc.

4 Application to the NHANES data

To demonstrate our proposed methods, we revisit the Demographics Data, the Examination Data, and the Questionnaire Data from the 2003–2004 NHANES. We focus on the dependence of systolic blood pressure (SBP) on the average number of alcoholic drinks consumed per day, with adjustment for age and body mass index (BMI). As in Bartlett et al. [4], we consider the dataset drawn from 2,418 male participants, in which 278 men have missing values in SBP and 181 men have missing values in BMI. However, age was fully observed for all participants. According to Little and Zhang [3] and Bartlett et al. [4], it is reasonable to assume that the missingness in SBP and BMI is missing completely at random (MCAR) due to missed visits, and hence an analysis with these participants omitted shall not produce biased estimation and is thus valid. Consequently, we apply the proposed methods to the remaining 2,111 participants, of which 720 (34%) men have missing values in alcohol consumption. In addition to the MCAR assumption, Bartlett et al. [4] have argued that missingness in alcohol consumption is missing not at random (MNAR) because it is likely dependent largely on alcohol consumption itself, and that missingness in alcohol consumption is independent of SBP conditional on alcohol consumption, age, and BMI.

In our application, we fit a linear regression model by regressing SBP on the covariates: log (no.drinks+1), BMI, and age (with both linear and quadratic effects). As suggested by Bartlett et al. [4], a log transformation for the alcohol variable is performed to reduce the influence on parameter estimates caused by the few observations with extremely large values. The aforementioned assumption on the conditional independence of the alcohol consumption missingness and SBP given alcohol consumption, age, and BMI entails that the CCA estimates βˆc are unbiased. For the calculation of the estimates βˆacc, βˆacc2, βˆel2, and βˆel3, we postulate a logistic regression model for the working probability model with covariates age, BMI, SBP, and SBP2 and a negative binomial regression model for the working regression model with covariates age, age2, BMI, BMI2, SBP, and SBP2.

The results of the analysis are given in Table 1. The results show that the proposed empirical likelihood estimates βˆel2 and βˆel3 are quite similar to the estimates βˆacc and βˆacc2 of Bartlett et al. [4] and that all four of these estimates have smaller standard errors than the CCA estimate βˆc. Moreover, the methods that yield βˆel3 and βˆacc2 produce almost identical results in terms of estimated coefficients and standard errors and, in particular, render the lowest standard errors among all of the methods. Overall, our analysis indicate that an increased alcohol consumption per day is associated with an increased SBP, as expected.

Table 1

Parameter estimates and standard errors (in parentheses) in a conditional mean model of SBP (mmHg) (centered at 125 mmHg) on log (average number of drinks consumed per day +1), BMI, age (decades above 50) and age2 ((decades above 50)2).

EstimatorrVariable
Constantlog (no.drinks+1)BMI (kg/m2) Age Age2
βˆc1.9286 (0.7980)1.2665 (0.5831)0.4143 (0.0798)3.9431 (0.2606)0.2646 (0.1431)
βˆacc2.0663 (0.6530)1.2494 (0.3879)0.3893 (0.0656)3.8732 (0.2296)0.3147 (0.1078)
βˆacc22.0131 (0.6137)1.2180 (0.2764)0.3920 (0.0650)3.8699 (0.2273)0.3023 (0.1073)
βˆel22.1825 (0.6270)1.4116 (0.3117)0.3860 (0.0650)3.9268 (0.2282)0.3056 (0.1078)
βˆel32.0063 (0.6137)1.2372 (0.2763)0.3956 (0.0650)3.8710 (0.2273)0.2990 (0.1073)

5 Simulations

In this section, we conduct a simulation study to compare the proposed empirical likelihood based estimators with CCA, multiple imputation, IPW, and ACC estimators. We generate the data using the procedure of Bartlett et al. [4], in which the non-missing indicator variable D is simulated with P(D=1)=0.5 and the variables (X,Z,Y) are generated from a trivariate normal distribution conditional on D:

XZY|DNαXDαZDαYD,σX2σXZσXYσXZσZ2σZYσXYσZYσY2

with

αY=αX(σXYσZ2σXZσZY)+αZ(σZYσX2σXYσXZ)σX2σZ2σXZ2

such that DY|(X,Z) is satisfied.

The working probability model used in our simulation study is the logistic regression model:

π(y,z;γ0)=PD=1|Y=y,Z=z=11+exp{γC+γYy+γZz}.

To calculate a working regression model for the optimal regression function m0Y,Z;β,ξ0, we posit a parametric working model fX|D=1,Y,Z;ξ for the true conditional density function f(X|D=1,Y,Z) of X given (D=1,Y,Z). Then the conditional expectation m(Y,Z;β,ξ) of X given (D=1,Y,Z) under fX|D=1,Y,Z;ξ is the corresponding working regression model for m0Y,Z;β,ξ0 and is calculated by Monte-Carlo integration. The conditional mean model of interest is specified as

Y|X,ZNβ0+βXX+βZZ,σ2.

We consider two scenarios in our simulation study:

  1. the working regression functionmY,Z;β,ξ is correctly specified;

  2. the working regression function mY,Z;β,ξ is misspecified.

For each scenario, we generate 1,000 Monte Carlo data sets, each composed of either n=500 or n=1500 subjects according to two different setups:

  1. SETUP A: αX=1, αZ=0, σX2=σZ2=σY2=0.9,σXZ=σZY=σXY=0.25;

  2. SETUP B: αX=1, αZ=0, σX2=0.9,σZ2=1, σY2=0.8, σXZ=0.1, σZY=0.25, σXY=0.2.

Under scenario (a), the working regression models for SETUP A and SETUP B are both correctly specified as m(Y,Z;β,ξ0)=E(X|D=1,Y,Z)=ξC+ξZZ+ξYY. Under scenario (b), we postulate m(Y,Z;β,ξ)=1+0.5Z2+0.5Y2 for E(X|D=1,Y,Z) under SETUP A and m(Y,Z;β,ξ)=1.2+0.5Z2+0.4Y2 for E(X|D=1,Y,Z) under SETUP B. For each scenario and setup, we examine the bias, standard deviation (SD), and root mean squared error (RMSE) of the following estimators:

  1. CCA estimator βˆc.

  2. Multiple imputation estimator βˆMIby imputing X 10 times from a normal linear regression model for X|Y,Z.

  3. IPW estimator βˆIPW based on weights froma logistic regression model with Y and Z as the covariates.

  4. βˆacc0 and βˆel1 with γ0 known.In this case, the logistic working probability function π(y,z;γ0) satisfies

    γZ=αZσY2αYσZYσY2σZ2σZY2,γY=αYσZ2αZσZYσY2σZ2σZY2,γC=αZ22σY2+αYαZσZYαY22σZ2σY2σZ2σZY2.
  5. βˆacc, βˆacc2, βˆel2,and βˆel3 with γ0 unknownusing an estimated logistic working probability function.

The simulation results are summarized in Tables 25.

Table 2

Bias, SD, and RMSE based on 1,000 simulations with sample size n=500. The working regression model is correctly specified.

Estimatorβ0βXβZ
Bias (SD)RMSEBias (SD)RMSEBias (SD)RMSE
(1) SETUP A: Missing %=51.0%
βˆc0.0012 (0.0834)0.08340.0010 (0.0627)0.06280.0018 (0.0636)0.0637
βˆMI0.2783 (0.0715)0.28740.1760 (0.0609)0.18620.0494 (0.0445)0.0665
βˆIPW0.1121 (0.0771)0.13610.0007 (0.0543)0.05430.0314 (0.0429)0.0532
βˆacc00.0032 (0.0756)0.07560.0001 (0.0647)0.06470.0001 (0.0471)0.0471
βˆel10.0043 (0.0745)0.07460.0012 (0.0638)0.06380.0003 (0.0473)0.0473
βˆacc0.0005 (0.0850)0.08500.0004 (0.0649)0.06490.0015 (0.0481)0.0481
βˆacc20.0037 (0.0843)0.08440.0017 (0.0631)0.06310.0013 (0.0483)0.0484
βˆel20.0005 (0.0840)0.08400.0005 (0.0633)0.06330.0011 (0.0482)0.0482
βˆel30.0016 (0.0835)0.08350.0036 (0.0619)0.06200.0023 (0.0489)0.0490
(2) SETUP B: Missing %=49.4%
βˆc0.0010 (0.0781)0.07810.0009 (0.0575)0.05750.0011 (0.0524)0.0525
βˆMI0.1129 (0.0646)0.13010.0128 (0.0546)0.05610.0179 (0.0388)0.0427
βˆIPW0.1019 (0.0681)0.12260.0040 (0.0486)0.04880.0091 (0.0389)0.0400
βˆacc00.0042 (0.0669)0.06710.0012 (0.0576)0.05760.0004 (0.0385)0.0385
βˆel10.0047 (0.0666)0.06670.0019 (0.0576)0.05760.0004 (0.0384)0.0384
βˆacc0.0014 (0.0785)0.07850.0013 (0.0578)0.05780.0013 (0.0387)0.0387
βˆacc20.0011 (0.0781)0.07810.0005 (0.0573)0.05730.0015 (0.0387)0.0387
βˆel20.0020 (0.0779)0.07790.0019 (0.0575)0.05760.0013 (0.0386)0.0386
βˆel30.0026 (0.0779)0.07790.0029 (0.0576)0.05770.0013 (0.0386)0.0386

Tables 2 and 3, with sample sizes n=500 and n=1500, respectively, show the performances of different estimators under SETUP A and SETUP B when the working regression model is correctly specified. The results in Tables 2 and 3 can be summarized as follows.

  1. As expected, the biases of the CCA estimator βˆc,the ACC estimators βˆacc0, βˆacc,βˆacc2, and the empirical likelihood estimatorsβˆel1, βˆel2 and βˆel3are all negligible under both setups and for both sample sizes.

  2. The biases of the MI estimator βˆMIand IPW estimator βˆIPW are significantly larger thanthose of the other estimators in most cases for both sample sizes.This is not surprising because both the MI and IPW estimators require the MAR assumptionand are biased when missing is not at random (MNAR).

  3. When γ0 is known,βˆc, βˆacc0 andβˆel1 have similar biases,but βˆacc0 and βˆel1 have muchsmaller standard deviations and hence smaller RMSEs for β0and βZ. For βX, these three estimatorshave similar RMSEs. In addition, βˆel1 has slightlysmaller RMSEs than βˆacc0 for β0, βX and βZ.These observations made for both sample sizes are in agreement with the theoretical results.

  4. When γ0 is unknown, βˆacc,βˆacc2, βˆel2 and βˆel3have similar RMSEs. Compared to βˆc, these estimatorshave similar biases but largely reduced standard deviations forβZ and thus gain efficiency for βZ. As for theestimation of β0 and βX, these estimators havesimilar RMSEs to βˆc.These results obtained from both sample sizes confirm thatif the working regression model is correctly specified,βˆacc, βˆacc2, βˆel2and βˆel3 are asymptotically equivalent and at leastas efficient as βˆc when γ0 is unknown.

  5. When n=500, βˆacchas the largest RMSEs for β0 and βX under both setups.Furthermore,βˆel3 has the lowest RMSEsfor β0 and βX under SETUP A and the lowestRMSEs for β0 and βZ under SETUP B.By contrast, when n=1500, the proposed empirical likelihoodestimators βˆel2 and βˆel3 haverelatively smaller RMSEs than the ACC estimators βˆaccand βˆacc2. In addition, βˆel3performs best in terms of RMSEs under both SETUP A and SETUP B.

  6. When the sample size is increased from n=500 to n=1500,there is a large reduction in standard deviations and RMSEs for each estimator.

Table 3

Bias, SD, and RMSE based on 1,000 simulations with sample size n=1500. The working regression model is correctly specified.

Estimatorβ0βXβZ
Bias (SD)RMSEBias (SD)RMSEBias (SD)RMSE
(1) SETUP A: Missing %=48.6%
βˆc0.0016 (0.0509)0.05100.0017 (0.0366)0.03670.0005 (0.0359)0.0359
βˆMI0.2728 (0.0432)0.27620.1727 (0.0354)0.17630.0468 (0.0266)0.0538
βˆIPW0.1086 (0.0435)0.11700.0005 (0.0305)0.03050.0321 (0.0261)0.0414
βˆacc00.0019 (0.0449)0.04500.0012 (0.0373)0.03730.0012 (0.0282)0.0282
βˆel10.0026 (0.0445)0.04460.0020 (0.0367)0.03680.0009 (0.0280)0.0281
βˆacc0.0015 (0.0513)0.05130.0015 (0.0374)0.03740.0005 (0.0286)0.0286
βˆacc20.0004 (0.0512)0.05120.0012 (0.0367)0.03670.0003 (0.0284)0.0284
βˆel20.0018 (0.0510)0.05110.0019 (0.0368)0.03680.0004 (0.0283)0.0283
βˆel30.0015 (0.0510)0.05100.0016 (0.0367)0.03670.0005 (0.0283)0.0283
(2) SETUP B: Missing %=50.3%
βˆc0.0010 (0.0468)0.04680.0009 (0.0339)0.03390.0001 (0.0315)0.0315
βˆMI0.1086 (0.0387)0.11530.0108 (0.0322)0.03390.0172 (0.0233)0.0289
βˆIPW0.0996 (0.0404)0.10750.0031 (0.0277)0.02790.0090 (0.0231)0.0248
βˆacc00.0001 (0.0398)0.03980.0005 (0.0340)0.03400.0005 (0.0230)0.0230
βˆel10.0000 (0.0397)0.03970.0005 (0.0339)0.03390.0004 (0.0230)0.0230
βˆacc0.0007 (0.0469)0.04690.0006 (0.0340)0.03400.0005 (0.0232)0.0232
βˆacc20.0019 (0.0469)0.04690.0011 (0.0339)0.03390.0005 (0.0232)0.0233
βˆel20.0008 (0.0467)0.04670.0007 (0.0339)0.03390.0004 (0.0232)0.0232
βˆel30.0009 (0.0467)0.04670.0008 (0.0339)0.03390.0004 (0.0232)0.0232

Tables 4 and 5, with sample sizes n=500 and n=1500, respectively, pertain to the situation where the working regression model is misspecified under both setups. The results in Tables 4 and 5 can be summarized as follows.

Table 4

Bias, SD, and RMSE based on 1,000 simulations with sample size n=500. The working regression model is misspecified.

Estimatorβ0βXβZ
Bias (SD)RMSEBias (SD)RMSEBias (SD)RMSE
(1) SETUP A: Missing %=49.6%
βˆc0.0017 (0.0854)0.08540.0008 (0.0629)0.06290.0016 (0.0611)0.0612
βˆMI0.2668 (0.0711)0.27610.1678 (0.0606)0.17840.0475 (0.0440)0.0647
βˆIPW0.1062 (0.0713)0.12790.0007 (0.0511)0.05110.0300 (0.0437)0.0530
βˆacc00.0063 (0.0961)0.09630.0097 (0.0892)0.08970.0050 (0.0516)0.0518
βˆel10.0001 (0.0750)0.07500.0020 (0.0644)0.06440.0019 (0.0469)0.0470
βˆacc0.0026 (0.0909)0.09100.0018 (0.0705)0.07050.0008 (0.0479)0.0479
βˆacc20.0007 (0.0857)0.08570.0002 (0.0629)0.06290.0011 (0.0467)0.0467
βˆel20.0032 (0.0862)0.08630.0028 (0.0640)0.06400.0012 (0.0469)0.0469
βˆel30.0066 (0.0850)0.08520.0022 (0.0629)0.06290.0021 (0.0469)0.0469
(2) SETUP B: Missing %=49.6%
βˆc0.0012 (0.0779)0.07800.0026 (0.0568)0.05690.0004 (0.0541)0.0541
βˆMI0.1089 (0.0651)0.12690.0132 (0.0548)0.05630.0168 (0.0402)0.0436
βˆIPW0.0996 (0.0696)0.12150.0065 (0.0480)0.04840.0095 (0.0400)0.0411
βˆacc00.0053 (0.0852)0.08540.0093 (0.0799)0.08040.0012 (0.0421)0.0421
βˆel10.0019 (0.0666)0.06670.0047 (0.0572)0.05740.0005 (0.0412)0.0412
βˆacc0.0030 (0.0801)0.08020.0048 (0.0605)0.06070.0005 (0.0406)0.0406
βˆacc20.0014 (0.0788)0.07880.0019 (0.0569)0.05690.0006 (0.0407)0.0407
βˆel20.0029 (0.0781)0.07820.0048 (0.0570)0.05720.0004 (0.0408)0.0408
βˆel30.0052 (0.0775)0.07770.0014 (0.0568)0.05680.0016 (0.0407)0.0408
  1. Similar to the first scenario, βˆchas negligible biases, whereas βˆMI andβˆIPW have very large biases and hence arebiased under both setups and for both sample sizes.

  2. Under SETUP A with γ0 known and for both sample sizes,βˆacc0 has larger RMSEs thanβˆc for both β0 andβX, although they are still unbiased. By contrast,βˆel1 has much lower RMSEs for β0and βZ and a similar RMSE for βX compared toβˆc.

  3. Under SETUP B with γ0 known and for both sample sizes,βˆacc0 has relatively higher biases andlarger RMSEs than βˆc for both β0 andβX, although these estimators are unbiased. Again,βˆel1 has much smaller RMSEs for β0and βZ and a similar RMSE for βX compared toβˆc. Moreover, βˆel1 alwaysperforms better than βˆacc0 in terms of RMSE.These results confirm our asymptotic theory that when γ0is known, the empirical likelihood estimator βˆel1is at least as efficient as the CCA estimator βˆcwhether or not the working regression function is correctlyspecified. In contrast, the ACC estimator βˆacc0is not guaranteed to be at least as efficient as βˆc.

  4. Under SETUP A with γ0 unknown and for both sample sizes,βˆacc has larger RMSEs than βˆc for β0and βX; both estimators are unbiased. It is seen thatβˆacc2, βˆel2 andβˆel3 have similar RMSEs; they all gain efficiencyfor βZ compared to βˆc,while having similar efficiency to βˆcfor β0 and βX.In particular, when n=500, βˆel3 has the lowest RMSEs forboth β0 and βX and has a similar RMSE toβˆacc2 for βZ.By contrast, when n=1500,βˆel3 has slightly smaller RMSEs than bothβˆel2 and βˆacc2, andthus has the lowest RMSEs under SETUP A.

  5. Under SETUP B with γ0 unknown,βˆacc has larger RMSEs than βˆc for βXfor both sample sizes, although they are unbiased.When n=500, βˆel3 has thelowest RMSEs for β0 and βX and a similar RMSEfor βZ compared to βˆacc2 and βˆel2.For n=1500, we observe that βˆacc2, βˆel2, andβˆel3 have almost same RMSEs;they all gain efficiency for βZ compared to βˆc, whilehaving similar efficiency to βˆcfor β0 and βX. These results support ourasymptotic theory that when γ0 is unknown,βˆel3 and βˆacc2 areasymptotically equivalent and at least as efficient asβˆc whether or not the working regressionfunction is correctly specified. On the other hand,βˆacc is not guaranteed to be at leastas efficient as βˆc.

  6. The biases of βˆacc0 in absolute valueare at least 20% larger for n=500 than those for n=1500 under both setups,except for the case with βZ under SETUP B.By contrast, the biases of βˆacc in absolute valueare at least 43% larger for n=500 than those for n=1500for β0 and βX under both setups;the biases of βˆacc are, nevertheless, smaller for n=500 than for n=1500for βZ under both setups.Again, we observe that when the sample size is increased from n=500 to n=1500,there is a large reduction in standard deviations and RMSEs for each estimator.

Table 5

Bias, SD, and RMSE based on 1,000 simulations with sample size n=1500. The working regression model is misspecified.

Estimatorβ0βXβZ
Bias (SD)RMSEBias (SD)RMSEBias (SD)RMSE
(1) SETUP A: Missing %=50.0%
βˆc0.0003 (0.0477)0.04770.0013 (0.0356)0.03560.0012 (0.0354)0.0354
βˆMI0.2748 (0.0416)0.27800.1732 (0.0347)0.17670.0495 (0.0254)0.0557
βˆIPW0.1085 (0.0422)0.11640.0019 (0.0298)0.02980.0300 (0.0247)0.0389
βˆacc00.0052 (0.0499)0.05020.0011 (0.0444)0.04440.0026 (0.0284)0.0285
βˆel10.0042 (0.0421)0.04230.0006 (0.0359)0.03590.0007 (0.0272)0.0272
βˆacc0.0001 (0.0494)0.04940.0007 (0.0386)0.03860.0013 (0.0275)0.0275
βˆacc20.0011 (0.0476)0.04770.0014 (0.0355)0.03550.0015 (0.0270)0.0270
βˆel20.0005 (0.0477)0.04770.0003 (0.0356)0.03560.0014 (0.0270)0.0270
βˆel30.0001 (0.0475)0.04750.0009 (0.0355)0.03550.0013 (0.0269)0.0270
(2) SETUP B: Missing %=50.7%
βˆc0.0016 (0.0453)0.04530.0017 (0.0330)0.03300.0014 (0.0306)0.0306
βˆMI0.1117 (0.0382)0.11810.0137 (0.0319)0.03470.0172 (0.0220)0.0279
βˆIPW0.1018 (0.0388)0.10890.0054 (0.0278)0.02830.0094 (0.0220)0.0239
βˆacc00.0031 (0.0484)0.04850.0037 (0.0446)0.04480.0012 (0.0230)0.0230
βˆel10.0023 (0.0385)0.03860.0025 (0.0330)0.03310.0010 (0.0224)0.0224
βˆacc0.0021 (0.0459)0.04590.0022 (0.0345)0.03460.0013 (0.0220)0.0220
βˆacc20.0008 (0.0453)0.04530.0015 (0.0330)0.03300.0014 (0.0220)0.0220
βˆel20.0024 (0.0452)0.04530.0025 (0.0329)0.03300.0014 (0.0221)0.0222
βˆel30.0017 (0.0453)0.04530.0018 (0.0330)0.03300.0013 (0.0220)0.0220

In summary, our simulation results with sample sizes n=500 and n=1500 indicate that βˆel2 always performs better than βˆacc in terms of RMSE whether or not the working regression model is correctly specified, although there is no direct asymptotic efficiency comparison between them in our asymptotic theory. Also in our simulation scenarios, βˆel3 performs at least as well as βˆacc2, even though the reduction in RMSE is sometimes very slight. Finally, the nine estimators we have considered in Tables 25 are not directly comparable between the two cases, γ0 is known versus γ0 is unknown.

6 Concluding remarks

In this paper, we have studied empirical likelihood methods for estimating the regression coefficients in a nonignorable covariate-missing data problem under the conditional independence assumption advocated by Bartlett et al. [4]. We have proposed three unbiased estimating functions and three empirical likelihood-based estimators of the regression coefficients through specification of a working probability model of missingness and a working regression model. The proposed empirical likelihood methods can be viewed as pseudo-empirical likelihood methods for estimation of β because we have employed the maximum likelihood estimator γˆn of γ rather than the maximum empirical likelihood estimator of γ. We have also made efficiency comparisons of the three proposed estimators with other existing estimators. The simulation results indicate that the proposed empirical likelihood estimators have competitive finite sample properties in terms of bias and root mean square error. The proposed empirical likelihood approach is illustrated using an analysis of the alcohol and blood pressure data from the US National Health and Nutrition Examination Survey (NHANES).

An alternative empirical likelihood method to the pseudo empirical likelihood method discussed in this paper is to simultaneously estimate β and γ by maximizing the profile empirical likelihood function jointly with respect to (β,γ); this gives rise to studying the maximum empirical likelihood estimator of (β,γ). Another issue is to study the empirical likelihood ratio test for testing the linear null hypothesis H0:Cβ=Cβ0 with C being a r×p matrix; this leads to studying the constraint maximum empirical likelihood estimation of β under linear constraints on the regression parameter in nonignorable covariate-missing data problems. These considerations pave an avenue for further exploration.

Appendix: Proofs

Here we provide a proof for Theorem 2. The proof of Theorem 1 is similar to that of Theorem 2 and is therefore omitted here.

Regularity Conditions

(A1) For all (y,z), π(y,z;γ) admits all third partial derivatives 3π(y,z;γ)γiγjγk for all γ in a neighborhood of the true value γ0 and |3π(y,z;γ)γiγjγk| is bounded by an integrable function for all γ in this neighborhood.

(A2) For all (y,z), π(y,z;γ)(0,1) for all γ in a neighborhood of γ0.

(A3) The matrix Sγ is positive definite and E{||πγ(Y,Z,γ0)||2}<, where Sγ is defined in eq. (11) below.

(A4) For all (y,z,x), U(y,z,x;β) admits all second partial derivatives 2U(y,z,x;β)βiβj for all β in a neighborhood of β0 and |2U(y,z,x;β)βiβj| is bounded by an integrable function in this neighborhood. Also, U(y,z,x;β)/β has finite second-order moments and E{Uτ(Y,Z,X,β0)U(Y,Z,X,β0)}<.

(A5) For all (y,z), m(y,z;β,ξ) admits all second partial derivatives 2m(y,z;β,ξ)βiβj and 2m(y,z;β,ξ)ξkξl for all (β,ξ) in a neighborhood of (β0,ξ). Moreover, |2m(y,z;β,ξ))βiβj| and |2m(y,z;β,ξ))ξkξl| are bounded by an integrable function in this neighborhood.

(A6) The matrices C1,C5, and C5C2Sγ1C2τ are invertible, where C1,C2, and C5 are, respectively, defined in eqs (14) and (19) below.

Proof of Theorem 2: For parts (a) and (b), under regularity conditions (A1)-(A3), it can be shown that the maximum likelihood estimator γˆn solves the score equation Un(γ)=0 in eq. (3) and has influence function ψγˆn(Y,X,D;γ0)=Sγ1g3(Y,Z,D;γ0), where

(11)Sγ=E[πγ(Y,Z;γ0)πγτ(Y,Z;γ0)π(Y,Z;γ0){1π(Y,Z;γ0)}].

Since βˆel3 maximizes 3(β) in eq. (8) for each fixed λ, it is seen from eqs (8), (9), and (3) that (λˆ3n,βˆel3,γˆn) satisfies the equations U3n(λˆ3n,βˆel3,γˆn,ξˆn)=0,Q3n(λˆ3n,βˆel3,γˆn,ξˆn)=0, and Un(γˆn)=0, where

Q3n(λ,β,γ,ξ)=1n3(β,λ,γ)β=1ni=1nλτG(Yi,Zi,Di,Xi;β,γ,ξ)β1+λτG(Yi,Zi,Di,Xi;β,γ,ξ).

Under the regularity conditions in (A1)–(A6), it can be shown that (βˆel3,λˆ3n,γˆn) is an n-consistent estimator of (β0,0(2p+q)×1,γ0). An application of a first-order Taylor expansion gives

(12)0=U3n(λˆ3n,βˆel3,γˆn,ξˆn)=U3n(0,β0,γ0,ξˆn)+U3n(λ3n,βn,γn,ξ)λτλˆ3n+U3n(λ3n,βn,γn,ξ)βτ(βˆel3β0)+U3n(λ3n,βn,γn,ξ)γτ(γˆnγ0),0=Q3n(λˆ3n,βˆel3,γˆn,ξˆn)=Q3n(0,β0,γ0,ξˆn)+Q3n(λ3n,βn,γn,ξ)λτλˆ3n+Q3n(λ3n,βn,γn,ξ)βτ(βˆel3β0)+Q3n(λ3n,βn,γn,ξ)γτ(γˆnγ0),0=Un(γˆn)=Un(γ0)+Un(γn)γτ(γˆnγ0),

where (λ3n,βn,γn) is some intermediate value between (λˆ3n,βˆel3,γˆn) and (0,β0,γ0). As n, it follows from the law of large numbers that

(13)U3n(λ3n,βn,γn,ξ)λτpE{G(Y,Z,D,X;β0,γ0,ξ)Gτ(Y,Z,D,X;β0,γ0,ξ)}=H11,U3n(λ3n,βn,γn,ξ)βτpE{G(Y,Z,D,X;β0,γ0,ξ)βτ}=H12=C10,U3n(λ3n,βn,γn,ξ)γτpE{G(Y,Z,D,X;β0,γ0,ξ)γτ}=0C3,Q3n(λ3n,βn,γn,ξ)λτpE{Gτ(Y,Z,D,X;β0,γ0,ξ)β}=H21=H12τ=(C1τ,0τ),Q3n(λ3n,βn,γn,ξ)βτp0,Q3n(λ3n,βn,γn,ξ)γτp0,Un(γn)γτpSγ,

where

(14)C1=E{DU(Y,Z,X;β0)βτ},C2=E{m(Y,Z;β0,ξ)πγτ(Y,Z;γ0)},C3=C2Sγ.

Furthermore, it can be shown after some algebra that

(15)U3n(0,β0,γ0,ξˆn)=U3n(0,β0,γ0,ξ)+op(n1/2),Q3n(0,β0,γ0,ξˆn)=Q3n(0,β0,γ0,ξ)+op(n1/2).

Combining eqs (12), (13), and (15) yields

λˆ3nβˆel3β0=H11H12H2101Un(β0,γ0,ξ)0+op(n1/2)
(17)=(H111+H111H12H2211H21H111)H2211H21H111U4n(β0,γ0,ξ)+op(n1/2),

where H221=H21H111H12 and

(18)U4n(β0,γ0,ξ)=1ni=1ng1(Yi,Zi,Di,Xi;β0)g23(Yi,Zi,Di;β0,γ0,ξ)+C3Sγ1g3(Yi,Zi,Di;γ0),g23(Y,Z,D;β,γ,ξ)=g2(Y,Z,D;β,γ,ξ)g3(Y,Z,D;γ).

As a result, it follows from eqs (17) and (18) that we can write

βˆel3β0=H2211H21H111U4n(β0,γ0,ξ)+op(n1/2)=1ni=1nC11[g1(Yi,Zi,Di,Xi;β0)C7{g2(Yi,Zi,Di;β0,γ0,ξ)C2Sγ1g3(Yi,Zi,Di;γ0)}]+op(n1/2),

where

(19)C4=E[DU(Y,Z,X;β0){1π(Y,Z;γ0)}mτ(Y,Z;β0,ξ)],C5=E[π(Y,Z;γ0){1π(Y,Z;γ0)}m(Y,Z;β0,ξ)mτ(Y,Z;β0,ξ)],C6=E[DU(Y,Z,X;β0)πγτ(Y,Z;γ0)π(Y,Z;γ0)],C7=(C4C6Sγ1C2τ)(C5C2Sγ1C2τ)1.

This, together with Slutsky’s theorem, implies that n(βˆel3β0)dNp(0,Σel3). The proof is complete.

Acknowledgements:

We are grateful to the Editor, Professor Moulinath Banerjee, two reviewers for a number of helpful comments and suggestions, and to Dr. Jonathan Bartlett for providing the NHANES dataset.

1. Little RJ, Rubin DB. Statistical analysis with missing data, 2nd ed. Hoboken, NJ: Wiley, 2002.10.1002/9781119013563Suche in Google Scholar

2. Centers for Disease Control and Prevention. National health and nutrition examination survey data. Hyattsville: Centers for Disease Control and Prevention, 2004.Suche in Google Scholar

3. Little RJ, Zhang N. Subsample ignorable likelihood for regression analysis with missing data. J R Stat Soc Ser C 2011;60:591–605.10.1111/j.1467-9876.2011.00763.xSuche in Google Scholar

4. Bartlett JW, Carpenter JR, Tilling K, Vansteelandt, S. Improving upon the efficiency of complete case analysis when covariates are MNAR. Biostatistics 2014;15:719–30.10.1093/biostatistics/kxu023Suche in Google Scholar PubMed PubMed Central

5. Horvitz DG, Thompson DJ. A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 1952;47:663–85.10.1080/01621459.1952.10483446Suche in Google Scholar

6. Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc 1994;89:846–66.10.1080/01621459.1994.10476818Suche in Google Scholar

7. Owen AB. Empirical likelihood ratio confidence intervals for a single functional. Biometrika 1988;75:237–49.10.1093/biomet/75.2.237Suche in Google Scholar

8. Owen AB. Empirical likelihood confidence regions. Ann Stat 1990;18:90–120.10.1214/aos/1176347494Suche in Google Scholar

9. Qin J, Lawless JF. Empirical likelihood and general estimating equations. Ann Stat 1994;22:300–25.10.1214/aos/1176325370Suche in Google Scholar

10. Hall P, La Scala B. Methodology and algorithms of empirical likelihood. Int Stat Rev 1990;58:109–27.10.2307/1403462Suche in Google Scholar

11. Owen AB. Empirical likelihood. New York: Chapman & Hall/CRC, 2001.10.1201/9781420036152Suche in Google Scholar

12. Wang Q, Rao JN. Empirical likelihood-based inference under imputation for missing response data. Ann Stat 2002;30:896–924.Suche in Google Scholar

13. Wang Q, Rao JN. Empirical likelihood-based inference in linear errors-in-covariables models with validation data. Biometrika 2002;89:345–58.10.1093/biomet/89.2.345Suche in Google Scholar

14. Chen SX, Leung DH, Qin J. Information recovery in a study with surrogate endpoints. J Am Stat Assoc 2003;98:1052–62.10.1198/016214503000000972Suche in Google Scholar

15. Liang H, Wang SJ, Carroll RJ. Partially linear models with missing response variables and error-prone covariates. Biometrika 2007;94:185–98.10.1093/biomet/asm010Suche in Google Scholar PubMed PubMed Central

16. Stute W, Xue LG, Zhu LX. Empirical likelihood inference in nonlinear errors-in-covariables models with validation data. J Am Stat Assoc 2007;102:332–46.10.1198/016214506000000816Suche in Google Scholar

17. Qin J, Zhang B. Empirical-likelihood-based inference in missing response problem and its application in observational studies. J R Stat Soc Ser B 2007;69:101–22.10.1111/j.1467-9868.2007.00579.xSuche in Google Scholar

18. Wang Q, Dai P. Semiparametric model-based inference in the presence of missing responses. Biometrika 2008;89:721–34.10.1093/biomet/asn032Suche in Google Scholar

19. Little RJ, Schluchter MD. Maximum likelihood estimation for mixed continuous and categorical data with missing values. Biometrika 1985;72:497–512.10.1093/biomet/72.3.497Suche in Google Scholar

20. Ibrahim JG. Incomplete data in generalized linear models. J Am Stat Assoc 1990;85:765–9.10.1080/01621459.1990.10474938Suche in Google Scholar

21. Little RJ. Regression with missing X’s: a review. J Am Stat Assoc 1992;87:1227–37.10.2307/2290664Suche in Google Scholar

22. Ibrahim JG, Lipsitz SR. Parameter estimation from incomplete data in binomial regression when the missing data mechanism is nonignorable. Biometrics 1996;52:1071–8.10.2307/2533068Suche in Google Scholar

23. Lipsitz SR, Ibrahim JG. A conditional model for incomplete covariates in parametric regression models. Biometrika 1996;83:916–22.10.1093/biomet/83.4.916Suche in Google Scholar

24. Ibrahim JG, Chen MH, Lipsitz SR, Herring AH. Missing-data methods for generalized linear models: A comparative review. J Am Stat Assoc 2005;100:332–46.10.1198/016214504000001844Suche in Google Scholar

25. Tsiatis AA. Semiparametric theory and missing data. New York: Springer, 2006.Suche in Google Scholar

Published Online: 2017-4-20

© 2017 Walter de Gruyter GmbH, Berlin/Boston

Heruntergeladen am 25.9.2025 von https://www.degruyterbrill.com/document/doi/10.1515/ijb-2016-0053/html
Button zum nach oben scrollen