Home Mathematics On the Correction of the Asymptotic Distribution of the Likelihood Ratio Statistic If Nuisance Parameters Are Estimated Based on an External Source
Article Publicly Available

On the Correction of the Asymptotic Distribution of the Likelihood Ratio Statistic If Nuisance Parameters Are Estimated Based on an External Source

  • Marianne Jonker EMAIL logo and Aad Van der Vaart
Published/Copyright: May 16, 2014

Abstract

In practice, nuisance parameters in statistical models are often replaced by estimates based on an external source, for instance if estimates were published before or a second dataset is available. Next these estimates are assumed to be known when the parameter of interest is estimated, a hypothesis is tested or confidence intervals are constructed. By this assumption, the level of the test is, in general, higher than supposed and the coverage of the confidence interval is too low. In this article, we derive the asymptotic distribution of the likelihood ratio statistic if the nuisance parameters are estimated based on a dataset that is independent of the data used for estimating the parameter of interest. This distribution can be used for correctly testing hypotheses and constructing confidence intervals. Four theoretical and practical examples are given as illustration.

1 Introduction

Statistical models often contain multiple unknown parameters, some of interest, and some not, referred to as nuisance parameters. The presence of the latter parameters may make estimation, hypothesis testing and the construction of confidence intervals for the parameters of interest more complex, because more parameters must be estimated, possibly of infinite dimension.

Estimates of the nuisance parameters may be available in the literature or can sometimes be obtained from an independent dataset. Inserting these estimates into the model and next performing inference on the parameters of interest may simplify the statistical inference enormously. However, treating the nuisance parameters as known may affect the level of a test or the coverage of a confidence interval, making these higher and lower than nominal, respectively. In this paper, we investigate these effects for likelihood ratio procedures.

For standard statistical models, the likelihood ratio statistic for a finite-dimensional parameter is well known to be asymptotically distributed under the null hypothesis as a chi-squared random variable with degrees of freedom equal to the number of free parameters [13]. This result extends to likelihood ratio statistics for the parameter of interest in semiparametric models, as shown in various settings in Thomas and Grunkemeier [4] (right censoring), Murphy and van der Vaart [5] (semiparametric models), Banerjee and Wellner [6] (current status data), Banerjee [7] (monotone missingness), Kosorok [8], with empirical ratio likelihood statistics for functionals [911] as an important special case. The likelihood ratio statistic in the latter semiparametric models can be viewed as the likelihood ratio statistic for the parameter of interest in the model with the nuisance parameter replaced by its maximum likelihood estimator, based on the same data; equivalently as the likelihood ratio statistic based on the profile likelihood Murphy and van der Vaart [12]. With this estimate of the nuisance parameter, the asymptotic chi-squared distribution is retained, much as when testing only a part of the parameter in a finite-dimensional model. In contrast, it is shown below that the distribution changes if the nuisance is estimated using an independent sample of observations.

Naturally, this change will be small or even negligible if the additional sample is much bigger than the sample used to construct the likelihood ratio statistics. An example is estimating the disease-allele frequency in an inbred population, described in Example 4 below (see also Jonker et al. [13] and Teeuw et al. [14]). In this example, only 20 individuals were available for estimating the nuisance parameter, and 80 individuals to estimate the parameter of interest. It turns out that for two different approaches using a likelihood ratio, in the one case the distribution is well approximated by a chi-squared distribution as if the nuisance were known, whereas in the other case a correction is needed. One motivation of the present paper is to gain insight into the fact that not only the relative sample sizes determine the amount of correction needed but also the structure of the model. This is explained below in terms of a relevant Fisher information quantity.

The original motivation for the present paper was a study on the genetic determinants of the onset of migraine, described in Jonker et al. [15]. In this example, a baseline survival distribution was estimated using an additional, much larger dataset. As these data were subject to current status censoring, this estimate was relatively imprecise, of order comparable to the estimation error of the direct data, notwithstanding the size of the dataset. However, in a simulation study, it was found that substitution of the estimator in the likelihood ratio test caused only a minor error in the level of the test. This finding is explained by the detailed derivation of the likelihood ratio statistic in this paper, which reveals that the estimate for the baseline survival function enters (asymptotically) into the likelihood ratio through a smoothing operation, which repairs its inefficiency as an estimate for the survival function itself. See Example 3 below for details.

To our knowledge, the asymptotic distribution of the likelihood ratio statistic in these and other examples was not studied before. On the other hand, several authors have studied plug-in estimators within the context of empirical likelihood for parameters defined by estimating equations [9], among others Qin and Jing [16, 17], Li and Wang [18] and Wang and Jing [19]. In particular, Hjort et al. [20] study a similar problem in the setting of empirical likelihood. Besides the limit distribution, these authors also discuss a bootstrap approximation and allow an increasing number of estimating equations.

In the next section, we describe the theorem and give the proof. Thereafter we consider four examples; two theoretical examples, for a parametric and a semiparametric model, and two practical problems, for a large and a small dataset for estimating the nuisance parameter.

2 The likelihood ratio test

Suppose we observe a random sample X1,,Xn from a distribution Pθ,η with corresponding density pθ,η indexed by two parameters: a vector θΘd and a second parameter η belonging to an arbitrary parameter set H. We let η˜ be an estimator for η based on a second sample Y1,,Ym of observations that is independent of X1,,Xn and possesses a distribution depending on η only.

In the following, we will test the two-sided hypothesis H0:θΘ0 which is of the form H0:Σθ{(c1,,cdk,xdk+1,,xd)T:xdk+1,,xd} for c1,,cdk given and Σ a given d×d invertible matrix. If Σ equals the identity matrix and c1==cdk=0 it is tested whether the first dk coordinates of θ equal zero. We assume that, n(Θθ0) and n(Θ0θ0) tend to d and k dimensional spaces. The convergence of the sets n(Θθ0)={n(θθ0):θΘ} and n(Θ0θ0)={n(θθ0):θΘ0} is defined as follows (see van der Vaart [2], chapter 16). A set Hn converges to a set H if H is the set of all limits of all converging sequences hn with hnHn for all n and, if h=limihni for n1<n2< with hniHni for every i, then h is contained in H.

To test the null hypothesis, we use the likelihood ratio statistic in which we replace the parameter η by its estimator η˜:

Λm,n:=2logsupθΘi=1npθ,η˜(Xi)supθΘ0i=1npθ,η˜(Xi)=2logi=1npθ^,η˜(Xi)i=1npθ^0,η˜(Xi).

In the theorem below, we will determine the asymptotic distribution of the test-statistic under θ=θ0 with θ0Θ0Θ.

Theorem

Under the assumptions stated below and under θ=θ0 the test-statistic Λm,n converges in distribution to the distribution of a linear combination of dk independent chi-squared distributed variables each with one degree of freedom. The coefficients in the linear combination equal 1+λdi,i=1,,dk with λ=limn,mn/m and di positive values equal to the eigenvalues of a covariance matrix that can be computed explicitly.

For k=0 (the dimension of the null space is zero), the asymptotic distribution has a nice form and is given directly after the assumptions. For k>0 the values di can be found in the proof of the theorem.

Assumptions:

  1. Assume that for lθ,η(x)=logpθ,η(x) the log likelihood function in (θ,η), its first and second derivative with respect to θ, l˙θ,η(x)=θlθ,η(x) and l¨θ,η(x)=2θ2lθ,η(x) exist for all x.

  2. Assume that for θˆ=argmaxθΘi=1npθ,η˜(Xi) it holds that θˆθ0=Op(n1/2) for θ0 the true value.

  3. Assume that the sequence nl¨θ0,η˜=n1i=1nl¨θ0,η˜(Xi) converges in probability to Iθ0,η0=Varθ0,η0l˙θ0,η0(Xi) the Fisher information matrix for θ in the model where η=η0 is known to be the true value.

  4. Assume that Gnl˙θ0,η˜=n(PnPθ0,η0)l˙θ0,η˜=Gnl˙θ0,η0+op(1) converges in distribution to a normal distribution with mean zero and covariance matrix equal to the Fisher information Iθ0,η0.

  5. Assume that mPθ0,η˜l˙θ0,η0 is asymptotically normal with mean zero and covariance matrix Jθ0,η0.

  6. Assume that n(Pθ0,η0Pθ0,η˜)(l˙θ0,η˜l˙θ0,η0) converges in probability to zero.

  7. Assume that lθ,η(3)(x)=θl¨θ,η(x) exists x and in a small neighborhood of (θ0,η0) and that

    supnθ˜θ0∥≤Mθ,mrη˜η0Mη|1ni=1nlθ˜,η˜(3)(Xi)|

    is bounded in probability for m,n to infinity, and for Mθ>0 and Mη>0 constants and so that mrη˜η0=Op(1).

If n/m0, λ=0 and Λm,n converges in distribution to a chi-squared distribution with dk degrees of freedom. So, if m>>n the distribution of the test-statistic is hardly affected if the estimated nuisance parameters are seen as the true values and the level of the test is close to the supposed level under the assumption of known nuisance parameters. However, also if m<n and the values of di are close to zero, the distribution of the likelihood ratio statistic may be close to the chi-squared distribution with dk degrees of freedom (see Example 4).

In the theorem, the exact asymptotic distribution is not given. However, if the dimension of the null space Θ0 is zero (Θ0 consists of a single point, k=0) the asymptotic distribution of the likelihood ratio statistic has a nice form: the likelihood ratio statistic Λm,n is asymptotically distributed as i=1d(1+λdi)Zi2 with λ=limm,nn/m and d1,,dd the eigenvalues of the covariance matrix Iθ0,η01/2Jθ0,η0Iθ0,η01/2 and Z12,,Zd2 independent random variables with a chi-squared one distribution (the matrices Iθ0,η0 and Jθ0,η0 were defined in the assumptions 3 and 5). If d=1 then, Iθ0,η01/2Jθ0,η0Iθ0,η01/2=Jθ0,η0Iθ0,η01 and the asymptotic distribution simplifies to (1+λJθ0,η0Iθ0,η01)Z2; a chi-squared one distribution multiplied with a correction factor bigger than one.

For a statistical model probably the most difficult assumption to check is the asymptotic normality of mPθ0,η˜l˙θ0,η0 in case η˜ is a non-parametric estimator of η which converges at a lower rate than m1/2 (assumption 5). For example, if η˜ is the non-parametric maximum likelihood estimator (NPMLE) of a distribution function based on interval censored data, we know that η˜ converges pointwise at rate n1/3, but, under some assumptions, some functionals of it, for instance x(η˜(x)η(x))dx, converge at rate n1/2 to normal distributions (theorem 5.1 in Huang and Wellner [21]). So, even if the convergence rate of η˜ is lower than m1/2, it may be possible to prove that mPθ0,η˜l˙θ0,η0 is asymptotically normal.

Proof

In order to test the hypothesis H0:θΘ0 with test-statistic Λm,n, we have to compute the (asymptotic) distribution of Λm,n under the null hypothesis. Remember that θ0 is a point in Θ and Θ0 so that n(Θθ0) and n(Θ0θ0) tend to d and k dimensional spaces. Now write

(1)Λm,n:=2logsupθΘi=1npθ,η˜(Xi)supθΘ0i=1npθ,η˜(Xi)=2logsupθΘi=1npθ,η˜(Xi)i=1npθ0,η˜(Xi)+2logi=1npθ0,η˜(Xi)supθΘ0i=1npθ,η˜(Xi)=2i=1nlogpθ^,η˜(Xi)pθ0,η˜(Xi)2i=1nlogpθ^0,η˜(Xi)pθ0,η˜(Xi)

for θˆ the maximum likelihood estimator of θ and θˆ0 the restricted maximum likelihood; restricted to the parameter space Θ0 after inserting η˜ into the likelihood. Under assumption 2 and under θ0Θ0 the maxima over θΘ and θΘ0 are in 1/n-shrinking neighborhoods of θ0. That means that θˆ=θ0+hˆ/n for some value hˆn(Θθ0) and θˆ0=θ0+hˆ/n for hˆn(Θ0θ0). Instead of maximizing the likelihood i=1npθ,η˜ in the test-statistic with respect to θΘ or θΘ0, the product i=1npθ0+h/n,η˜ can be maximized with respect to hHn=n(Θθ0) or hHn,0=n(Θ0θ0). This yields the likelihood ratio statistic

Λm,n=2suphHni=1nlogpθ0+h/n,η˜(Xi)pθ0,η˜(Xi)2suphHn,0i=1nlogpθ0+h/n,η˜(Xi)pθ0,η˜(Xi).

By a Taylor expansion and by assumption 7

(2)i=1nlogpθ0+h/n,η˜(Xi)pθ0,η˜(Xi)=i=1nlogpθ0+h/n,η˜(Xi)i=1nlogpθ0,η˜(Xi)=hT1ni=1nl˙θ0,η˜(Xi)12nhTi=1nl¨θ0,η˜(Xi)h+Op(1n)=hTGnl˙θ0,η˜+hTnPθ0,η0l˙θ0,η˜12hTPnl¨θ0,η˜h+Op(1/n),

with

Gnl˙θ0,η˜=n1ni=1nl˙θ0,η˜(Xi)Pθ0,η0l˙θ0,η˜
Pnl¨θ0,η˜(Xi)=1ni=1nl¨θ0,η˜(Xi).

The term in eq. (2) should be maximized with respect to h.

We consider the three terms in eq. (2) separately. By assumption 4 the sequence Gnl˙θ0,η˜ converges in distribution to N(0,Iθ0,η0) and Pnl¨θ0,η˜(Xi) converges in probability to Iθ0,η0 by assumption 3. Left is the second term in eq. (2). For any η in its parameter space H, Pθ0,ηl˙θ0,η=0. Write

nPθ0,η0l˙θ0,η˜=nPθ0,η0Pθ0,η˜l˙θ0,η˜=nPθ0,η0Pθ0,η˜l˙θ0,η0+nPθ0,η0Pθ0,η˜l˙θ0,η˜l˙θ0,η0.

By assumption 6 the last term at the right-hand side converges in probability to zero. Thus

nPθ0,η0l˙θ0,η˜=nPθ0,η˜l˙θ0,η0+op(1)=nmmPθ0,η˜l˙θ0,η0+op(1),

what is, by assumption 5, asymptotically normal with mean zero and covariance matrix λJθ0,η0 with λ equal to the limit of n/m if m,n.

Combining the previous results, by assumption 4 and by independence of the samples X1,,Xn and Y1,,Ym the sum

Gnl˙θ0,η˜+nPθ0,η0l˙θ0,η˜=Gnl˙θ0,η0+nPθ0,η0l˙θ0,η˜+op(1)

converges to the normal distribution N(0,Iθ0,η0+λJθ0,η0). The same computations hold for the second term of eq. (1).

The following is similar to the derivation of the asymptotic distribution of the likelihood ratio statistic, if there is no nuisance parameter (van der Vaart [2], Theorem 16.7). The differentiability of logpθ0+h/n,η˜(Xi) with respect to h implies that

Zn(h)=i=1nlogpθ0+h/n,η˜(Xi)pθ0,η˜(Xi)hTGnl˙θ0,η˜+nhTPθ0,η0l˙θ0,η˜12hTIθ0,η0h

converges in probability to zero for every h. By assumption 7 (the remainder term of the Taylor expansion converges uniformly in probability to zero) and the fact that

suphM|hTIθ0,η0hhTnl¨θ0,η˜h|P0M>0,

there is also uniform convergence of Z on a bounded set:

suphM|Zn(h)|P0M>0.

The convergence in the previous display also holds if M=Mn increases to infinity sufficiently slowly. Since θˆ and θˆ0 are both n consistent, both estimators are within a ball of radius Mn/n around θ0 with probability tending to 1. Then hˆ is within a ball of radius Mn around zero with probability tending to 1. Consequently, the limit distribution of Λm,n does not change if Hn or Hn,0 are replaced by the intersection of these sets and the ball around zero with radius Mn. These intersected sets also converge to H and H0, respectively. By uniform convergence we conclude that for Gn,η˜=Gnl˙θ0,η˜+nPθ0,η0l˙θ0,η˜

(3)Λm,n=2suphHn(hTGnl˙θ0,η˜+nhTPθ0,η0l˙θ0,η˜12hTIθ0,η0h)2suphHn,0(hTGnl˙θ0,η˜+nhTPθ0,η0l˙θ0,η˜12hTIθ0,η0h)+oP(1)=infhHn,0(2hTGn,η˜+hTIθ0,η0h)infhHn(2hTGn,η˜+hTIθ0,η0h)+op(1)=infhHn,0(Iθ0,η01/2Gn,η˜TGn,η˜Iθ0,η01/22hTGn,η˜+hTIθ0,η0h)infhHn(Iθ0,η01/2Gn,η˜Gn,η˜TIθ0,η01/22hTGn,η˜+hTIθ0,η0h)+op(1)=Iθ0,η01/2Gn,η˜Iθ0,η01/2H0,n2Iθ0,η01/2Gn,η˜Iθ0,η01/2Hn2+op(1)=Iθ0,η01/2GIθ0,η01/2H02Iθ0,η01/2GIθ0,η01/2H2+op(1)

where the last equality follows from Lemma 7.13 in van der Vaart [2] with H and H0 the limit sets of Hn and Hn,0 as explained at the beginning of this section and G distributed as N(0,Iθ0,η0+λJθ0,η0), the limit distribution of Gn,η˜ (as was seen before).

Define Z:=Iθ0,η01/2G, then Z possesses a normal distribution with mean zero and covariance matrix

Qθ0,η0:=I+λIθ0,η01/2Jθ0,η0Iθ0,η01/2,

with I the d×d identity matrix. Furthermore, since θ0 is an inner point of Θ, the set H is the full space in k and the second term in eq. (3) equals zero, and

Λm,nZIθ0,η01/2H02=:Λ

for m,n to infinity.

Suppose that k=0, then H0 equals the singleton {0} in a d-dimensional space and the asymptotic distribution of Λm,n is equal to the law of Z2 with Z normally distributed with mean zero and covariance matrix Qθ0,η0. The covariance matrix Qθ0,η0 is a symmetric positive definite matrix and can therefore be factorized as Qθ0,η01/2=UTD1/2U with U an orthonormal matrix and D a diagonal matrix with the eigenvalues of Qθ0,η0 on the diagonal. Then Qθ0,η0=Qθ0,η01/2Qθ0,η01/2 with Qθ0,η01/2=UTD1/2U and

Z2=∥Qθ0,η01/2Qθ0,η01/2Z2=∥UTD1/2U(Qθ0,η01/2Z)2.

Now, Qθ0,η01/2Z has a d dimensional normal distribution with mean zero and covariance matrix the identity matrix. Since U is an orthonormal matrix and only rotates the vector and thus Ux∥=∥x, the last term in the previous display equals D1/2U(Qθ0,η01/2Z)2. Furthermore, U(Qθ0,η01/2Z) and Qθ0,η01/2Z are equally distributed by symmetry of the standard normal distribution. Consequently

D1/2U(Qθ0,η01/2Z)2=∥D1/2Qθ0,η01/2Z2=i=1deiZi2,

with e1,,ed the diagonal elements of the matrix D and Z1,,Zd independent standard normal distributed random variables. So, Zi2 has a chi-squared distribution with one degree of freedom and Λm,n converges in distribution to a linear combination of independent chi-squared random variables (all with one degree of freedom): i=1deiZi2. The eigenvalues e1,,ed of the matrix Qθ0,η0 equal ei=1+λdi,i=1,,d with d1,,dd the eigenvalues of the covariance matrix Iθ0,η01/2Jθ0,η0Iθ0,η01/2.

Now, let k>0. Remember that the asymptotic distribution of Λm,n equals the law of ZIθ0,η01/2H02 for H0 a subspace of dimension k. If P is the orthogonal projection onto the orthocomplement of Iθ0,η01/2H0, which is a dk dimensional linear subspace of d, then

ZIθ0,η01/2H02=∥PZ2=∥W2,WN0,PQθ0,η0PT.

In the situation that k=0 the covariance matrix Qθ0,η01/2 of Z was factorized in order to rewrite the quadratic norm Z2 into a sum. The same can now be done for the covariance matrix PQθ0,η0PT of W. It follows that W2 equals the sum i=1deiZi2 with Z12,,Zd2 chi-squared one distributed and e1,,ed equal to the eigenvalues of PQθ0,η0PT. We still have to compute these eigenvalues; we will show that k eigenvalues equal zero and that the remaining dk eigenvalues are strictly positive and equal to ei=1+λdi,i=1,,dk with d1,,ddk the strictly positive eigenvalues of PSθ0,η0PT with Sθ0,η0:=Iθ0,η01/2Jθ0,η0Iθ0,η01/2. Then, the sum reduces to i=1dk(1+λdi)Zi2.

The derivation of the eigenvalues is as follows. The covariance matrix PQθ0,η0PT=PPT+λPSθ0,η0PT=P+λPSθ0,η0PT. It vanishes on Iθ0,η01/2H0 and acts as the strictly positive-definite operator I+λPSθ0,η0PT on the orthocomplement of Iθ0,η01/2H0. It follows that PQθ0,η0PT has 0 as an eigenvalue of multiplicity k (the dimension of Iθ0,η01/2H0) and has dk strictly positive eigenvalues of the form ei=1+λdi, where di is a positive eigenvalue of PSθ0,η0PT. □

If d=1 then Iθ0,η01/2Jθ0,η0Iθ0,η01/2=Jθ0,η0Iθ0,η01 and the asymptotic distribution of the likelihood ratio statistic simplifies to (1+λJθ0,η0Iθ0,η01)Z2 where Z2 follows the chi-squared distribution with one degree of freedom. The matrix Jθ0,η0 is the asymptotic covariance matrix of mPθ0,η˜l˙θ0,η0 with Pθ0,η˜l˙θ0,η0 the “plug in” estimator for the parameter ηPθ0,ηl˙θ0,η0 based on the sample Y1,,Ym. The Fisher information matrix Iθ0,η0 is the asymptotic covariance matrix of the alternative estimator Pnl˙θ0,η0 based on X1,,Xn. So, the term Jθ0,η0Iθ0,η01 measures the relative efficiency at η=η0 for estimating Pθ0,ηl˙θ0,η0 using one observation Y compared to one observation X. Then Jθ0,η0/m and Iθ0,η0/n are the variances of Pθ0,η˜l˙θ0,η0 and Pnl˙θ0,η0 based on the samples Y1,,Ym and X1,,Xn, respectively. The fraction (Jθ0,η0/m)(Iθ0,η0/n)1=(n/m)(Jθ0,η0/Iθ0,η0) measures the relative efficiency of the two samples for estimating Pθ0,ηl˙θ0,η0. This fraction is approximately equal to λJθ0,η0Iθ0,η01 as in the asymptotic distribution of the likelihood ratio statistic. So, not only the sample sizes of the two samples are important but also the estimation precision of the estimators (the asymptotic variances). This is illustrated in Example 4.

Next we consider the asymptotic distribution of the likelihood ratio test-statistic for the situation that Θ is a half-space and Θ0 its boundary. Suppose we want to test the null hypothesis H0:θΘ0 against H1:θΘ/Θ0. Let θ0Θ0 and define, as before, Hn=n(Θθ0) and Hn,0=n(Θ0θ0). Assume that Hn and Hn,0 tend to a half space H in d and its boundary H0, respectively, like Θ0 and Θ.

Under the assumptions mentioned earlier, Λm,n converges, under θ=θ0, in distribution to a mixture of a point mass at 0 and the distribution of a linear combination of dk independent chi-squared distributed random variables, each with one degree of freedom. The coefficients of the linear combination are as described before.

The proof is very similar up to the following. Like in the previous proof

Λ=∥Iθ0,η01/2GIθ0,η01/2H02Iθ0,η01/2GIθ0,η01/2H2,

but now the term Iθ0,η01/2GIθ0,η01/2H is not equal to zero with probability one. The sets H:=Iθ0,η01/2H and H0:=Iθ0,η01/2H0 are a halfspace and its boundary, because this is true for H and H0 by assumption, and the boundary contains 0, because θ0Θ0. If Z:=Iθ0,η01/2G falls in the halfspace H, the second term in Λ vanishes, and Λ equals the square distance of Z to the boundary of the halfspace H. If Z falls outside H, then Λ equals zero, because in that case the distance of Z to H is equal to the distance to its boundary, as the projection of Z onto H is on the boundary of H. Thus Λ is equal to zero with probability that Z falls outside H. If Z falls into H, the right term vanishes and Λ=∥Iθ0,η01/2GIθ0,η01/2H02. This term can be considered as before.

Similar reasoning holds for other hypotheses, like the situation for the one-sided hypothesis.

3 Examples

In the following we will give some examples for illustration of the theorem.

3.1 Example 1: Theoretical, one-dimension

Let X1,,Xn be a sample from the gamma distribution with shape parameter α and rate λ. Furthermore, let Y1,,Ym be a sample from the exponential distribution with parameter 1/α. The two samples are independent. Suppose the null hypothesis H0:λ=λ0 is tested versus H0:λλ0 with λ0>0 a given value. The likelihood ratio test-statistic is used with the parameter α replaced by its maximum likelihood estimator based on the second sample: α˜=Yˉ. Then the likelihood ratio statistic is given by:

Λm,n=2logλˆλ0nα˜exp(λ0λˆ)i=1nXi=2nYˉlogλˆλ0+2(λ0λˆ)i=1nXi,

since λˆ=α˜/Xˉ=Yˉ/Xˉ. This estimator is asymptotically consistent since Xˉ converges in probability to α/λ if n, Yˉ to α if n and the two samples are independent. By a Taylor series of the likelihood ratio statistic with respect to λˆ in a neighborhood of λ0, the likelihood ratio statistic Λm,n approximately equals

2nYˉλˆλ0112λˆλ012+2(λ0λˆ)i=1nXi+oP(1)
=2nYˉλ0(λˆλ0)nYˉλ02(λˆλ0)2+2n(λ0λˆ)Xˉ+oP(1)
=αλ02nYˉXˉλ02+oP(1)=nαm+1n(Yˉλ0Xˉ)2α2m+αn+oP(1).

By the central limit theorem and by the independence between the two samples (Yˉλ0Xˉ)/α2/m+α/n is asymptotically standard normal distributed with mean zero. The test-statistic Λm,n is therefore asymptotically distributed as 1+αν times a chi-squared 1 distributed random variable, with ν equal to the limit limn,mn/m.

Since Iα,λ=α/λ2 and Jα,λ=α2/λ2, Iα,λ1Jα,λ=α, like it should be.

Suppose that n=100 and m=1,000, then ν=0.1, and that α=10. Then, the asymptotic distribution of the likelihood ratio statistic is distributed as 1+αν=1+100.1=2 times a chi-squared distributed random variable with one degree of freedom instead of 1 times this chi-squared distribution. Note that this asymptotic distribution is independent of the value of λ. The value of α determines the shape of the exponential distribution and of the gamma distribution. If α is small the mean and especially the variance of the exponential distribution are large and an estimate of α based on the sample Y1,,Ym is imprecise. Therefore the multiplication term is quite large.

3.2 Example 2: Theoretical, Semi-parametric Cox-model

Suppose we wish to estimate a distribution function F with F(t)=1S0(t)expβ at the interval [0,τ] for τ>0 and S0(τ)>0. The function S0 is an unknown survival function and β is an unknown parameter in , the parameter of interest. Let T1,,Tn be a sample from F. We observe min{T1,τ},,min{Tn,τ} and define Δi as the indicator function that indicates whether Ti was before or after τ: Δi:=1Tiτ.

The nuisance parameter S0 is estimated by the empirical distribution function of a sample Y1,,Ym from a distribution with survival curve S0. The aim is to estimate β and to test the hypothesis H0:β=β0 versus H0:ββ0.

The interpretation of this problem is as follows. Suppose the survival function of a particular group in society, S0, is known (i.e. estimated based on an independent large sample); for instance males or non-smokers. We want to estimate the survival curve for a different group (women or smokers). We assume that the survival curve for this group equals S0expβ for S0 the survival curve for the first group and β an unknown value in . The aim is to test whether β=β0 differs from zero (In particular β0=0 is of interest.). In order to test β, we use the likelihood ratio test where the survival curve S0 in the likelihood ratio statistic is estimated based on the first sample, as described before. To compute the asymptotic distribution of the test-statistic under the null hypothesis, the seven assumptions have to be checked.

All assumptions are checked in the appendix and are satisfied. By the theorem, the likelihood ratio statistic is asymptotically distributed as (1+λJβ0,S0Iβ0,S01) times a chi-squared distributed random variable with one degree of freedom with Iβ0,S0=1S0(τ) and

Jβ0,S0=0τexp(β0)logS0(τ)logS0(t)12S0(t)2expβ02exp(2β0)dS0(t).

This integral can be computed explicitly, but the formula found is complicated and not very interesting on itself. If β0=0, the value of Jβ0=0,S0 reduces to (1S0(τ)+log2S0(τ)) and

Jβ0=0,S0Iβ0=0,S01=1S0(τ)+log2S0(τ)/1S0(τ)=1+log2S0(τ)/1S0(τ).

If S0(τ)=0.5, Jβ0=0,S0Iβ0=0,S01=1.96 and if S0(τ)=0.8, Jβ0=0,S0Iβ0=0,S01=1.25. Suppose λ=0.1 and S0(τ)=0.5, then the asymptotic distribution is 1.2 times a chi-squared distribution with one degree of freedom. Ignoring the fact that S0 is unknown and was estimated by an independent sample yields a test with level 0.074 instead of 0.05.

3.3 Example 3: Practical, large m

In Jonker et al. [15] the aim is to test the genetic contribution to the age at which people experience their first migraine attack (heritability) and to find locations on the chromosomes that show linkage with the genes that influence this age at onset of migraine. The data came from a longitudinal study of Dutch twins and their family members. A detailed description of the data and the model used is given in Jonker et al. [15]. The parameters of interest describe the amount of heritability and linkage; they were finite dimensional. The (non-parametric) nuisance parameter was the survival function for age at onset of migraine for males and females separately. A one-sided hypothesis for heritability and linkage was tested with the likelihood ratio test, based on (genetic and survival) data of 258 dizygotic twins. The survival functions in the test-statistic were replaced by their NPMLEs based on a set of interval-censored age at onset data of 4,791 Dutch males and 6,796 Dutch females; for them no genetic data were available. A simulation study was performed to find the distribution of the likelihood ratio test-statistic under the null hypothesis. Based on the results, it was concluded that this asymptotic distribution was close to a 50–50 mixture distribution of zero and a chi-squared distribution with one degree of freedom; this would be the limit distribution if the survival curves were known.

As the size of the additional sample (m6,796+4,791) is considerably larger than the sample size involved in constructing the likelihood ratio statistic (n=258), this result may seem as expected. However, it is well known that the NPMLEs of the unknown survival function possess a m1/3-rate of convergence in this example, due to the severe, current status type of censoring (cf. Groeneboom and Wellner [22]). Thus at first it may seem more appropriate to compare m1/322.6 to n16 in this example. This comparison would suggest a serious bias in neglecting the fact that the survival functions were estimated, contrary to what was found in the simulation study.

However, the result of the present paper shows that the bias in the level of the likelihood ratio statistic is driven by the rate of estimation of the functional ηPθ0,η˙θ0,η0 by the plug-in estimator Pθ0,η˜˙θ0,η0, rather than by the rate of estimation of the nuisance parameter η itself (see assumption 5 of the main theorem and the discussion following its proof). The apparent insensitivity of the likelihood ratio test is explained by the fact that this particular functional is estimable at m-rate, rather than m1/3-rate, making the relevant ratio of sample sizes m/n44.9.

3.4 Example 4: Practical, small m

Suppose we are interested in a disease X caused by mutations at a single disease gene. In total there are K different disease alleles. An individual with two mutated, possibly different, alleles is affected. Let q be the frequency of any of the K mutations in the total population and a1,a2,,aK the relative allele frequencies among the disease alleles, so j=1Kaj=1. Consequently, the disease alleles have allele frequencies qa1,qa2,,qaK in the total population. For F the inbreeding coefficient of a child with consanguineous parents, the probability the child has disease X, is given by P(X|F)=Fq+(1F)q2. In this formula, the first term on the right-hand side, Fq represents the fraction of affected children with the alleles identical by descent (IBD, the alleles come from a common ancestor). The second term, (1F)q2, is the fraction whose alleles are not IBD. For individuals with F=0, the alleles are independent and this probability simplifies to P(X|F=0)=q2.

Based on a sample of n children with disease X, most of them with consanguineous parents (so F>0), the aim is to estimate the frequency q. For all these children, the inbreeding coefficient is known and the alleles are observed. So, from this information it is known whether a child is homozygous (i.e. the alleles are the same) or heterozygous (the alleles are different), but not whether the alleles are identical by descent. ten Kate et al. [23] derived an expression for q in terms of the relative allele frequencies, the inbreeding coefficient and the conditional probability an individual is homozygous given he is affected and has inbreeding coefficient F. In case of a sample of n individuals with equal inbreeding coefficient F>0, q can be estimated by replacing the relative allele frequencies in the expression by the sample frequencies and the conditional probability of homozygosity by the fraction of individuals who are homozygous. However, in case of a sample with varying inbreeding coefficients this estimation method does not hold; replacing the inbreeding coefficient in the expression by the average of the inbreeding coefficient in the sample, the estimator becomes biased (see Jonker et al. [13]). As an alternative, we consider two maximum likelihood estimators. The first estimator is based only on a part of the data, whether an individual is homozygous or heterozygous, whereas the second estimator uses all available information: the observed alleles.

Case A: homo- and heterozygosity data

Of a sample of n individuals with disease X, we observe whether the individuals are homozygous or heterozygous. Define the indicator function Δ that indicates whether an individual is homozygous or heterozygous at the disease gene: Δ=1 if the individual is homozygous and Δ=0 if he is heterozygous. The conditional probabilities that the individual is homozygous, respectively, heterozygous at the disease gene, equal

P(Δ=1|X,F)=Fq+(1F)q2j=1Kaj2Fq+(1F)q2=F+(1F)qj=1Kaj2F+(1F)q
(4)P(Δ=0|X,F)=(1F)q2(1j=1Kaj2)Fq+(1F)q2=(1F)q(1j=1Kaj2)F+(1F)q

Based on these probabilities, it directly follows that the likelihood equals

(5)i=1nFi+(1Fi)qj=1kaj2Δi(1Fi)q1j=1Kaj21ΔiFi+(1Fi)q

where the i in the subscript refers to the ith individual.

Case B: Allele information

Suppose we also use the allele information for every individual in the sample. The variable Zi indicates which pair of alleles was observed; Z=(j,k) means that for the individual the alleles j and k were observed. Then, the corresponding probabilities equal

P(Z=(j,j)|X,F)=Fqaj+(1F)q2aj2Fq+(1F)q2=Faj+(1F)qaj2F+(1F)q
P(Z=(j,k)|X,F)=(1F)q22ajakFq+(1F)q2=(1F)q2ajakF+(1F)q

from which the likelihood directly follows as the product over the terms for all individuals.

Neither likelihoods admits an explicit expression for the maximum likelihood estimator of q. The maximum likelihood estimates can be found by numerical maximization of the log likelihood with respect to q and all allele frequencies or the sum aj2. Numerical maximization for multi-dimensional parameters is often unreliable and it makes sense to estimate the relative allele frequencies beforehand, to insert them into the likelihood and next maximize the likelihood with respect to q.

Estimation of the relative allele frequencies can be done in different ways. Sometimes reliable estimates are available from the literature or previous research. Often this is not the case and the observed data have to be used. A part of the individuals has inbreeding coefficient equal to zero. The observations of these individuals do contain information on the relative allele frequencies, but not on the value of q. This can be easily seen from the likelihood functions; their terms do not depend on q. So, we can split the data. One part with F=0 to estimate the allele frequencies and one part, with F>0, to estimate q by maximizing the likelihood after inserting estimates of the allele frequencies.

After estimating q, confidence intervals have to be constructed. This can be done based on the likelihood ratio statistic: the confidence interval contains all values q0 for which the null hypothesis H0:q=q0 is not rejected; i.e. for which the value of the likelihood ratio statistic is lower than the (1α)-quantile of the (asymptotic) distribution of the likelihood ratio statistic. If the data of the individuals are independent and identically distributed it follows from Theorem at the beginning of this paper that the asymptotic distribution equals (1+λν) times a chi-squared distribution with one degree of freedom, for ν equal to an expression computed in Appendix B. In our example, the data of the individuals are independent, but not identically distributed because the inbreeding coefficient may vary among the affected individuals. In the following, we first assume that all inbreeding coefficients equal and we will later extend the results for different inbreeding coefficients. In Appendix B, the computations to find ν are given for the first maximum likelihood estimator. For the second estimator, the computations are analogous, but slightly more complex because the likelihood has a more complex form; only the results are given in Appendix B.

Simulation study

In order to get more insight into the amount of correction that is needed, we consider a situation that is of practical relevance. We have 80 individuals with inbreeding coefficient equal to 1/32 and 20 children with inbreeding coefficient equal to zero. The relative allele frequencies will be estimated based on the observed alleles of the 20 individuals (so 40 observations, every individual contributes two alleles) and q will be estimated based on data of the 80 individuals. We assumed there were four disease alleles with relative allele frequencies 0.55, 0.15, 0.1 and 0.2. The frequency q was first taken equal to 0.1 and in a second simulation equal to 0.01. For the first maximum likelihood estimator, the theoretical value of the multiplication factor 1+(n/m)ν equals 1.7 and for the second estimator 1.045. So the correction for the second maximum likelihood estimator is small and ignoring the fact that the relative allele frequencies were estimated beforehand does hardly affect the level of the test and, thus, the level of the confidence interval. Ignoring the fact that the relative allele frequencies were estimated yields confidence intervals of level 0.87 and 0.94 for the two estimation methods, instead of 0.95.

We sampled 1000 times 100 (=80 + 20) individuals from the population as described above, computed the estimates and the value of the likelihood ratio statistic (under the null hypothesis). The parameter ν in the correction factor depends on the unknown parameters in the model. For every sample, we estimated ν by inserting the estimates of these parameters. We plotted QQ-plots of these likelihood ratio values against the quantiles of the chi-squared distribution with one degree of freedom. Next, we plotted QQ-plots for the values of the likelihood ratio test-statistic divided by its estimated correction term against the quantiles of the chi-squared one distribution. The plots are shown in Figure 1. From the plots, it can be clearly seen that the likelihood ratio statistic for the first maximum likelihood estimator has a thicker tail than the chi-square distribution with one degree of freedom. Correction is needed. For the second estimator, no clear difference between the distribution of the likelihood ratio statistic and the chi-square distribution is visible.

Figure 1 QQ-plots of the two maximum likelihood methods (first method: upper row, second method: lower row), uncorrected (left) and corrected (right). The line is y=x$$y = x$$
Figure 1

QQ-plots of the two maximum likelihood methods (first method: upper row, second method: lower row), uncorrected (left) and corrected (right). The line is y=x

Next we lowered the value of q to 0.01. Then the theoretical correction terms equal 1.14 and 1.028 for the two methods and the levels of the uncorrected confidence interval have level 0.93 and 0.95, instead of 0.95. Again we plotted QQ-plots as described above. This time the corrections are hardly visible in the QQ-plots; the values follow more or less the line y=x (QQ-plots not shown).

In Appendix B, we computed the theoretical value of the correction term 1+(n/m)ν. Based on this formula, it can easily be seen in which situations correction is needed or not.

If we take m much smaller than n the correction term is much bigger than one for both maximum likelihood estimators and the deviance from the chi-squared distribution is clearly visible from the QQ-plots; the tail of the distribution of the likelihood ratio statistic is much thicker than the tail of the chi-squared one distribution. However, in practice, if m, the number of individuals with zero-inbreeding-coefficient is small, or the number of different alleles is large (which is often the case), one would choose to use all data, including the data of the individuals with a positive inbreeding coefficient, to estimate the allele frequencies. Furthermore, in practice, not all individuals have equal inbreeding coefficients. Then, no theory concerning the asymptotic distribution of the likelihood ratio statistic is known. We therefore performed a simulation study. For different populations (different distributions of F), we sampled 100 individuals from the population, estimated q and computed the likelihood ratio statistic under the null hypothesis. We repeated this 1000 times. QQ-plots of the likelihood ratio static values against the quantiles of the chi-squared one distribution showed that the likelihood ratio statistic follows this distribution quite well. We considered only a few combinations of parameter values. In case correction is needed, we cannot use the correction term as in the previous simulation study, because now the inbreeding coefficient varies among the individuals. Instead, we propose to compute the correction term per individual and next take the mean of all correction terms found.

More details on the estimation method and the application to real data are given in Jonker et al. [13] and Teeuw et al. [14].

4 Discussion

In this paper, we computed the asymptotic distribution of the likelihood ratio statistic for testing a finite dimensional parameter in which the nuisance parameter was replaced by an estimator based on an independent dataset. We showed that ignoring the fact that the nuisance parameter was estimated yields a test with an increased level and a confidence interval with a coverage that is too low. If the dataset used for estimating the nuisance parameter is much bigger than the dataset available for estimating the parameter of interest, the asymptotic distribution is close to the chi-squared distribution which holds if the nuisance parameter is indeed known.

If for a statistical model, it is too difficult to compute the asymptotic distribution explicitly and the asymptotic chi-squared distribution is used for testing or the construction of a confidence interval, one should keep in mind that the true level of the test is (slightly) higher and the coverage of the interval (slightly) lower than supposed, depending on the sample sizes for estimating the parameters and the statistical efficiency.

Acknowledgments

We are very grateful to two reviewers for their constructive comments.

Appendix A: computations for Example 3

In this appendix, we check all assumptions of the theorem for the situation described in Example 2, the simplified Cox model. The aim is to test the hypothesis H0:β=β0 versus H1:ββ0 with the likelihood ratio test. To simplify notation, we define T:=min{T,τ}. Then S0(T)=S0(T)ΔS0(τ)1Δ.

The likelihood for the sample (T1,Δ1),,(Tn,Δn)=(t1,δ1),,(tn,δn) is given by

i=1npβ,S0(ti,δi)=i=1nddtFβ,S0(t)|t=tiδiS0(τ)expβ1δi=i=1nddtS0(t)expβ|t=tiδiS0(τ)expβ1δi=i=1nexp(β)S0(ti)expβ1ddtS0(t)|t=tiδiS0(τ)expβ1δi

This gives a log likelihood, a first and second derivative, and a Fisher information (all in (β0,S0)):

lβ0,S0(t,δ)=δ{β0+(exp(β0)1)logS0(t)+log(ddtS0(ti))}+(1δ)exp(β0)logS0(τ)l˙β0,S0(t,δ)=δ(1+exp(β0)logS0(t))+(1δ)exp(β0)logS0(τ)l¨β0,S0(t,δ)=δexp(β0)logS0(t)+(1δ)exp(β0)logS0(τ)=exp(β0)logS0(T)Iβ0,S0=Pβ0,S0l¨β0,S0(T,Δ)=(exp(β0)log(S0(τ))S0(τ)+0τexp(β0)logS0(t)dS0(t)expβ0=0τexp(β0)S0(t)expβ01ddtS0(t)dt=S0(t)expβ0|0τ=1S0(τ)expβ0.

The Fisher information was obtained by partial integration. The estimator β^ is the value that solves the equation

l˙β0,S˜0(Ti,Δi)=i=1nΔi{1+exp(β^)logS˜0(Ti)}+(1Δi)exp(β^)logS0(τ)}=0

with S˜0 the estimator of S0 that is based on the sample Y1,,Ym. This yields

β^=logi=1nΔii=1n{ΔilogS˜0(Ti)+(1Δi)logS˜0(τ)}=logi=1nΔii=1nlogS˜0(Ti).

Before checking the assumptions, we first consider the asymptotic behavior of the sequence n(PnlogS˜0(T)Pβ0,S0logS˜0(T)). For every m large, the function logS˜0(t) belongs (almost surely) to the set of bounded monotone non-increasing functions at the interval [0,τ] and constant at [τ,). This set is a Donsker class. Moreover,

Pβ0,S0logS˜0(t)logS0(t)2≤∥logS˜0logS0[0,τ]20

in probability. From lemma 19.24 of van der Vaart [2], we conclude that

n(nlogS˜0(T)Pβ0,S0logS˜0(T))
(6)=n(nlogS0(T)Pβ0,S0logS0(T))+op(1)G

for G a Gaussian process.

We now consider the assumptions one by one. Assumption 1 holds; l˙β0,S0 and l¨β0,S0 do exist. To check assumption 2, we have to show that n(β^β0) is bounded in probability. By the Delta method, it is sufficient to show that the sequence n(exp(β^)+exp(β0)) is bounded in probability. Since,

Pβ0,S0logS0(T)=Pβ0,S0exp(β0)l¨β0,S0(T,Δ)=exp(β0)(1S0(τ)expβ0).

we can write

exp(β0)=Pβ0,S0logS0(T)1S0(τ)expβ0

and thus by using the expression of β^ we find

n(exp(β^)+exp(β0))
=nni=1nΔiPnlogS˜0(T)+Pβ0,S0logS0(T)1S0(τ)expβ0
=n(ni=1nΔi(1S0(τ)expβ0)1)nlogS˜0(T)
+(1S0(τ)expβ0)1n(nlogS˜0(T)Pβ0,S0logS0(T)).

Note that Eβ0,S0n1i=1nΔi=F(τ)=1S0(τ)expβ0. Then, the term

nni=1nΔi1S0(τ)expβ01

is asymptotically normal, and consequently bounded in probability, by the central limit theorem and the Delta method. The term PnlogS˜0(T) is also bounded in probability, since S0(τ) is bounded away from zero by assumption. The second term equals (up to the constant ((1S0(τ)expβ0)1)

nPnlogS˜0(T)Pβ0,S0logS˜0(T)+nPβ0,S0logS˜0(T)logS0(T).

The first term converges in distribution to a Gaussian process, as was seen before, and is therefore bounded in probability. The second term equals g(n(logS˜0(T)logS0(T))) with g the function defined as g(z)=z(t)dPβ0,S0(t) that maps D[0,τ] into . This function is continuous with respect to the supremum norm. By the continuous mapping theorem, it suffices to show that the sequence n(logS˜0(T)logS0(T)) converges in distribution. This is true by the Delta method and the fact that n(S˜0S0) converges weakly (Donsker theorem).

For assumption 3 we have to show that Pnl¨β0,S˜0 converges in probability to Iβ0,S0. In order to see this, write

Pnl¨β0,S˜0(T,Δ)=Pnl¨β0,S˜0(T,Δ)Pβ0,S0l¨β0,S˜0(T,Δ)Pβ0,S0l¨β0,S˜0(T,Δ).

Since PnlogS˜0(T)Pβ0,S0logS˜0(T) converges in probability to zero by eq. (6), it directly follows that the first term at the right-hand side in the previous display converges in probability to zero. The second term equals Pβ0,S0exp(β0)logS˜0(T) and converges in probability to exp(β0)Pβ0,S0logS0(T) which equals

exp(β0)exp(β0)(1S0(τ)expβ0)=1S0(τ)expβ0

(this was seen when checking assumption 2).

Assumption 4 follows since n(PnlogS˜0(T)Pβ0,S0logS˜0(T)) converges in distribution to a Gaussian process.

For assumption 5, we have to show that mPβ0,S˜0l˙β0,S0(T,Δ) converges weakly to a normal distribution.

mPβ0,S˜0l˙β0,S0(T,Δ)=m(Pβ0,S˜0Pβ0,S0)l˙β0,S0(T,Δ)
=exp(β0)logS0(τ)m(S˜0expβ0(τ)S0expβ0(τ))
(7)m0τ1+exp(β0)logS0(t)dS˜0(t)expβ0S0(t)expβ0
=m0τexp(β0)logS0(τ)exp(β0)logS0(t)1dS˜0(t)expβ0S0(t)expβ0.

Note that m(S˜0expβ0S0expβ0)m(S˜0S0)exp(β0)S0expβ01 by a Taylor expansion. The sequence m(S˜0S0) is asymptotically equal to a Gaussian process with mean zero and covariance functions S0(titj)S0(ti)S0(tj) for ti,tj[0,τ]. The sequence m(S˜0S0)exp(β0)S0expβ01 is therefore asymptotically equal to a Gaussian process with mean zero and covariance function

S0(titj)S0(ti)S0(tj)exp(2β0)S0(ti)expβ01S0(tj)expβ01.

for ti,tj[0,τ]. This Gaussian process is denoted as Xt.

By partial integration, it can be seen that the map g on D[0,τ] defined as g(z)=0τ(exp(β0)logS0(τ)exp(β0)logS0(t)1)dz(t) is continuous with respect to the supremum norm. By use of the continuous mapping theorem, the sequence at the right-hand side in eq. (7) converges weakly to

(8)0τexp(β0)logS0(τ)exp(β0)logS0(t)1dXt.

Take M large, Δt=τ/M and define

Y(M)=k=0M1exp(β0)logS0(τ)logS0(tk)1ΔXk

where ΔXk=Xtk+1Xtk, and tk=kΔt. Then, for M growing to infinity, Y(M) converges to the ingral in eq. (8). Since Y(M) is a sum of independent Gaussian random variables, it also has a normal distribution with mean zero and variance

(9)VarY(M)=k=0M1Varexp(β0)(logS0(τ)logS0(tk)1ΔXk=k=0M1exp(β0)logS0(τ)logS0(tk)12Var(ΔXk)

with

Var(ΔXk)=VarXtk+1Xtk=VarXtk+1+VarXtk2CovXtk+1,Xtk

with VarXt and Cov(Xt,Xs) as given above. For t>s:

exp(2β0)(VarXt+VarXs2Cov(Xt,Xs))=(S0(t)S0(t)2)S0(t)2expβ02+(S0(s)S0(s)2)S0(s)2expβ022(S0(ts)S0(t)S0(s))S0(t)expβ01S0(s)expβ01=(1S0(t))S0(t)2expβ01+(1S0(s))S0(s)2expβ012(1S0(s))S0(t)expβ0S0(s)expβ01
=S0(t)2expβ01S0(t)2expβ0+S0(s)2expβ01S0(s)2expβ02S0(t)expβ0S0(s)expβ01+2S0(t)expβ0S0(s)expβ0=S0(t)2expβ01S0(t)expβ0S0(s)expβ01S0(t)2expβ0+S0(t)expβ0S0(s)expβ0+S0(s)2expβ01S0(t)expβ0S0(s)expβ01S0(s)2expβ0+S0(t)expβ0S0(s)expβ0=S0(t)expβ0(S0(t)expβ01S0(s)expβ01)S0(t)expβ0(S0(t)expβ0S0(s)expβ0)S0(s)expβ01(S0(t)expβ0S0(s)expβ0)+S0(s)expβ0(S0(t)expβ0S0(s)expβ0)=S0(t)expβ0(S0(t)expβ01S0(s)expβ01)(S0(t)expβ0S0(s)expβ0)2S0(s)expβ01(S0(t)expβ0S0(s)expβ0)

So,

VarXt+VarXs2Cov(Xt,Xs)
=exp(2β0)(S0(t)expβ0(S0(t)expβ01S0(s)expβ01)(S0(t)expβ0S0(s)expβ0)2
S0(s)expβ01(S0(t)expβ0S0(s)expβ0))

Filling in this expression into the sum in eq. (9), letting M go to infinity and straightforward computations yield

0τexp(β0)logS0(τ)logS0(t)12S0(t)2expβ02exp(2β0)dS0(t).

So, mPβ,S˜0l˙β0,S0(TΔ) converges weakly to a normal distribution with mean zero and variance equal to the integral in the previous display. This integral can be computed explicitly, but yields a complicated formula which is not interesting on itself. Taking β=0 makes the integral easier to compute; this yields 1S0(τ)+log2S0(τ).

For assumption 6, we have to show that n(Pβ0,S0Pβ0,S˜0)(l˙β0,S˜0l˙β0,S0) converges in probability to zero.

n(Pβ0,S0Pβ0,S˜0)(l˙β0,S˜0l˙β0,S0)
=exp(β0)n(Pβ0,S0Pβ0,S˜0)(logS˜0(T)logS0(T))
=exp(β0)n(logS˜0(τ)logS0(τ))(S0(τ)expβ0S˜0(τ)expβ0)
exp(β0)n0τ(logS˜0(t)logS0(t))d(S0(t)expβ0S˜0(t)expβ0)
=oP(1)exp(β0)n0τlogS˜0(t)logS0(t)dS0(t)expβ0S˜0(t)expβ0
=oP(1)exp(β0)n0τS˜0(t)S0(t)S0(t)d(S0(t)expβ0S˜0(t)expβ0),

where the last equality follows from a Taylor expansion of logS˜0 around S0. The function (g,h)g(t)dh(t) is a continuous map from the set of functions defined on [0,τ]×[0,τ] which are of bounded variation into with repect to the supremum norm. Application of the continuous mapping theorem and the fact that m(S˜0S0) and m(S˜0expβ0S0expβ0) converge in distribution to a Gaussian process yields weak convergence of m(logS˜0(t)logS0(t))d(S0(t)expβ0S˜0(t)expβ0). Since m=O(n), the last term in the previous display converges in distribution to zero.

Assumption 7 follows directly from the fact that ddβl¨β,S(t,δ)=l¨β,S(t,δ) is continuous and differentiable with respect to β. Then 1ni=1nddβl¨β,S(Ti,Δi) is bounded for (β,S) is a small neighborhood of (β0,S0).

Appendix B: computations for Example 4

In this appendix, we give the computations for the correction term 1+(n/m)ν for the maximum likelihood estimators A as described in Example 4. The computations for estimator B are analogous; only the results are given.

We have to compute ν=Iq0,a1Jq0,a for Iq0,a1 equal to the inverse of the Fisher information for estimating q0 and a=j=1Kaj2 under the assumption that (a1,,aK) is known and Jq0,a is the asymptotic variance of mPq0,a˜l˙q0,a with l˙q0,a the derivative of the log likelihood and a˜=j=1Ka˜j2 the estimated value of a based on the data of individuals with F=0.

Straightforward computations show that

Iq0,a1=(1a)(1F)F2q0F+(1F)q02F+(1F)q0a,

and

mPq0,a˜l˙q0,a=(1F)FF+(1F)q0F+(1F)q0amj=1Ka˜j2j=1Kaj2

for a=j=1Kaj2 and a˜=j=1Ka˜j2. We denote the asymptotic variance of m(a˜a)=m(j=1Ka˜j2j=1Kaj2) by σ2, then the asymptotic variance of mPq0,a˜l˙q,a equals

Jq0,a=(1F)2F2σ2F+(1F)q02F+(1F)q0a2

and, thus,

Jq0,aIq0,a1=q0(1F)σ2F+(1F)q0a(1a)

The value of σ2 can be found with help of the Delta method (see for instance van der Vaart [2]). The second dataset consists of m observations. Define Z1,,ZK as the number of observed alleles 1,2,,K. Then Zi=2m because every individual contributes two alleles. The allele frequency ai is estimated by Zi/(2m). Known is that m((a˜1,,a˜K1)T(a1,,aK1)T) is asymptotically normal with mean zero and covariance matrix matrix Σ with diagonal (a1(1a1)/2,,aK1(1aK1)/2) and Σ(i,j) for ij equal to aiaj/2 (note that aK=1j=1K1aj).

Define the function g:{x[0,1]K1|j=1K1xj<1} as g(x)=j=1Kxj2 with xK=1i=1K1xj. Then m(j=1Ka˜j2j=1Kaj2)=m(g(a˜1,,a˜K1)g(a1,,aK1)). The asymptotic distribution of m(j=1Ka˜j2j=1Kaj2) now follows from the Delta method; it equals a normal distribution with mean zero and variance equal to

σ2=ga1,,aK1TΣga1,,aK1
=i=1K12(aiaK)2ai(1ai)i=1K1j=1,j>iK14aiaj(aiaK)(ajaK),

where g(a1,,aK1)=(2(a1aK),,2(aK1aK))T is the derivative of g in (a1,,aK1).

The computations for ν for maximum likelihood estimator B are very similar to those for estimator A, but the notation is slightly more complex. We only give the results.

Define the indicator function Γj,k as 1 if Z=(j,k) and otherwise Γj,k equals 0. Then the likelihood for one individual can be written as

j=1KFaj+(1F)qaj2F+(1F)qΔΓj,jj,k=1,jkK(1F)q2ajakF+(1F)q(1Δ)Γj,k.

Straightforward computations show that

Iq0,a1Jq0,a=q0(1F)σ2j=1Kaj(1aj)F+(1F)q0aj.

with σ2 the asymptotic variance of j=1Ka˜jm(aja˜j)F+(1F)qaj. An expression for σ2 can be determined with help of Slutsky’s lemma and the Delta method (see for instance van der Vaart [2]).

References

1. Cox DR, Hinkley DV. Theoretical statistics. London: Chapman and Hall, 1974.10.1007/978-1-4899-2887-0Search in Google Scholar

2. van der Vaart AW. Asymptotic statistics. Cambridge: Cambridge University Press, 1998.10.1017/CBO9780511802256Search in Google Scholar

3. Wilks SS. The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann Math Stat1938;19:60–2.10.1214/aoms/1177732360Search in Google Scholar

4. Thomas DR, Grunkemeier GL. Confidence interval estimation of survival probabilities for censored data. J Am Stat Assoc 1975;70:865–71.10.1080/01621459.1975.10480315Search in Google Scholar

5. Murphy SA, van der Vaart AW. Semiparametric likelihood ratio inference. Ann Stat 1997;25:1471–509.10.1214/aos/1031594729Search in Google Scholar

6. Banerjee M, Wellner JA. Likelihood ratio tests for monotone functions. Ann Stat 2001;29:1699–731.10.1214/aos/1015345959Search in Google Scholar

7. Banerjee M. Likelihood based inference for monotone response models. Ann Stat 2007;35:931–56.10.1214/009053606000001578Search in Google Scholar

8. Kosorok MR. Introduction to empirical processes and semiparametric inference. Series in statistics.New York: Springer, 2008.10.1007/978-0-387-74978-5Search in Google Scholar

9. Owen AB. Empirical likelihood ratio confidence intervals for a single functional. Biometrika 1988;75:237–49.10.1093/biomet/75.2.237Search in Google Scholar

10. Owen AB. Empirical likelihood. Monographs on statistics and applied probability 92. New York: Chapman and Hall, 2001.10.1201/9781420036152Search in Google Scholar

11. Qin J, Lawless J. Empirical likelihood and general estimating equations. Ann Stat 1994;22:300–25.10.1214/aos/1176325370Search in Google Scholar

12. Murphy SA, van der Vaart AW. On profile likelihood. J Am Stat Assoc 2000;95:449–85.10.1080/01621459.2000.10474219Search in Google Scholar

13. Jonker MA, Teeuw M, ten Kate LP, Cornel MC. Estimating total pathogenic allele frequency of recessive disorders when parents are consanguineous 1: Methodological aspects. Unpublished manuscript, 2014.Search in Google Scholar

14. Teeuw ME, Kelmemi W, Jonker MA, Sefiani A, Laarabi F-Z, Hama I, et al. Estimating total pathogenic allele frequency ofrecessive disorders when parents are consanguineous 2: Cah in Tunisia and FMF in Morocco. Unpublished manuscript, 2014.Search in Google Scholar

15. Jonker MA, Bhulai S, Ligthart RSL, Posthuma D, Boomsma DI, van der Vaart AW. Gamma frailty model for linkage analysis with application to interval censored migraine data. Biostatistics 2009;10:187–200.10.1093/biostatistics/kxn027Search in Google Scholar PubMed

16. Qin G, Jing BY. Censored partial linear models and empirical likelihood. J Multivariate Anal 2001;78:37–61.10.1006/jmva.2000.1944Search in Google Scholar

17. Qin G, Jing BY. Empirical likelihood for censored linear regression. Scand J Stat 2001;28:661–73.10.1111/1467-9469.00261Search in Google Scholar

18. Li G, Wang QH. Empirical likelihood regression analysis for right censored data. Stat Sin 2003;13:51–68.Search in Google Scholar

19. Wang QH, Jing BY. Empirical likelihood for a class of functionals of survival distribution with censored data. Ann Inst Stat Math 2001;53:517–27.10.1023/A:1014617112870Search in Google Scholar

20. Hjort NL, McKeague IW, Keilegom IV. Extending the scope of empirical likelihood. Ann Stat 2009;37:1079–111.10.1214/07-AOS555Search in Google Scholar

21. Huang J, Wellner JA. Asymptotic normality of the NPMLE of linear functionals for interval censored data, case 1. Stat Neerlandica 1995;49:153–63.10.1111/j.1467-9574.1995.tb01462.xSearch in Google Scholar

22. Groeneboom P, Wellner JA. Information bounds and nonparametric maximum likelihood, DMV seminar, vol. 19. Basel: Birkhäuser Verlag, 1992.10.1007/978-3-0348-8621-5Search in Google Scholar

23. ten Kate LP, Teeuw M, Henneman L, Cornel MC. Autosomal recessive disease in children of consanguineous parents: inferences from the proportion of compound heterozygotes. J Community Genet 2010;1:37–40.10.1007/s12687-010-0002-4Search in Google Scholar PubMed PubMed Central

Published Online: 2014-5-16
Published in Print: 2014-11-1

©2014 by De Gruyter

Downloaded on 31.12.2025 from https://www.degruyterbrill.com/document/doi/10.1515/ijb-2013-0063/html
Scroll to top button