Simultaneous prediction in the generalized linear model

Chao Bai; Haiqi Li

doi:10.1515/math-2018-0087

Article Open Access

Simultaneous prediction in the generalized linear model

Chao Bai and Haiqi Li

Published/Copyright: August 24, 2018

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

$Open Mathematics$

From the journal Open Mathematics Volume 16 Issue 1

Abstract

This paper studies the prediction based on a composite target function that allows to simultaneously predict the actual and the mean values of the unobserved regressand in the generalized linear model. The best linear unbiased prediction (BLUP) of the target function is derived. Studies show that our BLUP has better properties than some other predictions. Simulations confirm its better finite sample performance.

Keywords: Generalized linear model; Simultaneous prediction; Best linear unbiased prediction

MSC 2010: 62M20; 62J12

1 Introduction

Generalized linear models have a long history in the statistical literature and have been used to analyze data from various branches of science on account of both mathematical and practical convenience. Consider the following generalized linear model:

yy0=XX0β+εε0(1)

where

y is the n-dimensional vector of observed data;

y₀ is the m-dimensional vector of unobserved values that is to be predicted;

X and X₀ are n × p and m × p known matrices of explanatory variables. Let rk(A) denote the rank of matrix A and suppose rk(X) ≤ p;

β is the p × 1 unknown vector of regression coefficients, and

ε and ε₀ are random errors with zero mean and covariance matrix

Cov(ε,ε0′)=ΣV′VΣ0,

where Σ ≥ 0 and Σ₀ ≥ 0 are known positive semi-definite matrices of arbitrary ranks.

The problem of predicting unobserved variables plays an important role in decision making and has received much attention in recent years. For the prediction of y₀ in model (1), [1] obtained the best linear unbiased predictor (BLUP) when Σ > 0. The Bayes and minimax prediction were obtained by [2] when random errors were normally distributed. [3] and [4] derived the linear minimax prediction under a modified quadratic loss function. [5] considered the optimal Stein-rule prediction. [6] reviewed the existing theory of minimum mean squared error loss predictors and made an extension based on the principle of equivariance. [7] investigated the admissibility of linear predictors with inequality constraints under the mean squared error loss function. Another interested subject of prediction relates to the mean of y₀, since [8] figured out that the best predictor of y₀ is the conditional mean under the criterion of minimum mean squared error. In model (1), prediction of the mean value of y₀ (namely = X₀β) relates naturally to the plug-in estimators of parameter β. [9] proposed the simple projection predictor (SPP) of X₀β by plugging in the best linear unbiased estimator (BLUE) of β. [10, 11] considered plugging in the prediction of β under the balanced loss function. The plug-in approach spawned a large literature for the derivation of combined prediction, see [12, 13, 14].

Generally, predictions are investigated either for y₀ or for Ey₀ at a time. However, sometimes in the fields of medicine and economics, people would like to know the actual value of y₀ and its mean value Ey₀ simultaneously. For example, in the financial markets, some investors may want to know the actual profit while others would be more interested in the mean profit. Therefore, in order to meet different requirements, the market manager should acquire both the prediction of the actual profit and the prediction of the mean profit simultaneously. Let aside investors’ demands and from the point of view of a decision maker, the market manager needs to determine which prediction should be preferred or provides another comprehensive combined prediction both of the actual and the mean profit based on empirical data. [15] gave other examples of practical situations where one is required to predict both the mean and the actual values of a variable. Under these circumstances, we consider predictions of the following target function

δ=λy0+(1−λ)Ey0,(2)

where λ ∈ [0, 1] is a non-stochastic weight scalar representing the preference to the prediction of actual and the mean value of the studied variable. Note that, δ = y₀ if λ = 1 and δ = Ey₀ if λ = 0, which means predicting δ can achieve the prediction of y₀ and Ey₀ simultaneously. If 0 < λ < 1, then prediction of δ balances the prediction of actual and the average value of y₀. Besides, the unbiased prediction of δ is also the unbiased prediction of y₀ or Ey₀. Therefore, δ is more sensitive and inclusive to be studied.

Studies on the prediction of δ have been carried out in the literature from various perspective. The properties of the predictors by plugging in Stein-rule estimators have been concerned by [16, 17, 18]. [19] investigated the Stein-rule prediction for δ in linear regression model when the error covariance matrix was positive definite yet unknown. [20] studied the admissible prediction of δ. [21, 22], and [23] considered predictors for δ in linear regression models with stochastic or non-stochastic linear constraints on the regression coefficients. The issues of simultaneous prediction in measurement error models have been addressed in [24] and [25]. [26] considered a scalar multiple of the classical prediction vector for the prediction of δ and discussed the performance properties.

For model (1), most former work concerned about biased prediction under Σ > 0 (including the special case Σ = I), and did not discuss the value of the weight scalar λ in (2). In this paper, supposing Σ ≥ 0, we studied the best linear unbiased prediction (BLUP) of δ and make some comparisons to the usual BLUPs of y₀ and Ey₀. We also propose a method to choose the value of λ in (2), which can give the way to determine which prediction of δ or y₀ or Ey₀ should be provided by finite sample data.

The rest of the paper is organized as follows. In Section 2, we derive the BLUPs of the target function (2) in the generalized linear model, and discuss the efficiency of our BLUP comparing to the usual BLUP and SPP. Simulation studies are provided in Section 3 to illustrate the determination of the weight scalar in our BLUP and the performance of our proposed BLUP comparing to the other two predictors. Concluding remarks are given in Section 4.

2 The BLUP of δ and its efficiency

Denote ℒℋ = {Cy ∣ C is an m × n matrix} as the set of all the homogeneous linear predictor of y₀. Denote δ̂_BLUP as the best linear unbiased predictor of δ in model (1). In this section, we first derive the expressions of δ̂_BLUP in ℒℋ, and then study its performance comparing to the BLUP of y₀ and the SPP of Ey₀. All of the predictors discussed in this paper are derived under the criterion of minimum mean squared error. Some preliminaries and basic results are given as follows:

Definition 2.1

The predictor δ̂ of δ is unbiased if E δ̂ = E δ.

Definition 2.2

δ is linearly predictable if there exists a linear predictor Cy in ℒℋ such that Cy is an unbiased predictor of d.

Lemma 2.3

In model (1), δ is linearly predictable if there exists a matrix C such that CX = X₀, or ℳ(X0′)⊆ ℳ(X^′).

Proof

From Definition 2.1 and 2.2, there exists a matrix C such that E(Cy) = Eδ for any β, namely CX = X₀ or X′C′ = X0′ which is equivalent to ℳ( X0′)⊆ℳ(X′). □

If not specified otherwise, the variables we aim to predict in this paper are all linearly predictable.

Lemma 2.4

([27]). Suppose the n × n matrix Σ ≥ 0 and let X be an n × p matrix, then

ΣXX′0−=T+−T+X(X′T+X)−X′T+T+X(X′T+X)−(X′T+X)−X′T+(X′T+X)−(X′T+X)−(X′T+X)−,

where T = Σ + XX^′. Especially, if Σ > 0, then

ΣXX′0−=Σ−1−Σ−1X(X′Σ−1X)−X′Σ−1Σ−1X(X′Σ−1X)−(X′Σ−1X)−X′Σ−1−(X′Σ−1X)−.

Lemma 2.5

In model (1), the BLUP of y₀and the SPP of Ey₀are respectively

y~0BLUP=X0β~+VT+(y−Xβ~),andy~0SPP=X0β~,

where T = Σ + XX^′andβ͠ = (X^′T⁺X)^–X^′T⁺y is the best linear unbiased estimator (BLUE) of β in model (1).

If Σ > 0 and rk (X) = p in model (1), the BLUP of y₀and the SPP of Ey₀are respectively

y^0BLUP=X0β^BLUE+VΣ−1(y−Xβ^BLUE),andy^0SPP=X0β^BLUE,

where β̂_BLUE = (X^′Σ^–1X)^–1X^′Σ^–1y is the BLUE of β.

Proof

BLUPs of y₀ in Lemma 2.5 were derived by [1] and [28]. The SPPs of Ey₀ were derived by [9]. □

The BLUPs and SPPs are presented here for further comparisons.

2.1 The best linear unbiased predictor of δ

Theorem 2.6

In model (1), the BLUP of δ in ℒℋ is

δ^BLUP=X0β~+λVT+(y−Xβ~),

where T = Σ + XX^′, β͠ = (X^′T⁺X)^–X^′T⁺y.

Proof

Suppose δ̂ = Cy ∈ ℒℋ and is unbiased, then by Lemma 2.3, CX = X₀. Denote R(δ̂;β) as the risk of δ̂ and tr(A) as the trace of squared matrix A, we have

R(δ^;β)=E[(δ^−δ)′(δ^−δ)]=E[Cy−λy0−(1−λ)X0β]′[Cy−λy0−(1−λ)X0β]=E(Cy)′(Cy)+λE(Cy)′(Cy0)−(1−λ)E(Cy)′X0β−λE(y0)′(Cy)+λ2Ey0′y0+λ(1−λ)Ey0′X0β−(1−λ)E(X0β)′Cy+λ(1−λ)E(X0β)′y0+(1−λ)2E(X0β)′X0β=tr(CΣC′)+λ2trΣ0−2λtr(CV′)+β′(CX−X0)′(CX−X0)β.

Minimizing R(δ̂;β) is equivalent to solve the following optimization problem to obtain C such that

argmin[tr(CΣC′)+λtrΣ0−2λtr(CV′)]=CCX−X0=0.

Let Λ be a p × m Lagrange multiplier and construct the Lagrange function as

L(C,Λ)=tr(CΣC′)+λtrΣ0−2λtr(CV′)+2tr[(CX−X0)Λ].

Let ∂ L/∂ C = 0 and ∂ L/∂ Λ = 0, we have

CΣ−λV+Λ′X′=0,X′C′=X0′,

namely

ΣXX′0C′Λ=λV′X0′,(3)

and

C′Λ=ΣXX′0−λV′X0′.

By Lemma 2.4, we obtain C = X₀(X′T⁺X)^–X′T⁺ + λ VT⁺ (I – X(X^′T⁺X)^–X^′T⁺). Let β͠ = (X^′T⁺X)^–X^′T⁺y, thus δ͠_BLUP = Cy = X₀β͠ + λ VT⁺(y – Xβ͠). □

Corollary 2.7

If Σ > 0 and rk(X) = p in model (1), then the BLUP of δ is

δ^BLUP=X0β^BLUE+λVΣ−1(y−Xβ^BLUE),

where β̂_BLUE = (X^′Σ^–1X)^–1X^′Σ^–1y.

Proof

If Σ > 0 and rk(X) = p, then X^′Σ^–1X is nonsingular. Since

ΣXX′0=ΣX0−X′Σ−1X=−|Σ||X′Σ−1X|≠0,

then ΣXX′0 is nonsingular. By Lemma 2.4,

ΣXX′0−1=Σ−1−Σ−1X(X′Σ−1X)−1X′Σ−1Σ−1X(X′Σ−1X)−1(X′Σ−1X)−1X′Σ−1−(X′Σ−1X)−1.

With similar calculations as in the proof of Theorem 2.6, the solution of (3) gives that

C=X0(X′Σ−1X)−1X′Σ−1+λVΣ−1(I−X(X′Σ−1X)−1X′Σ−1),

and therefore δ̂_BLUP = X₀β̂_BLUE + λ VΣ^–1(y – Xβ̂_BLUE). □

Theorem 2.8

For the prediction of (2) in model (1), Eδ̂_BLUP = Ey͠_{0_BLUP} = y͠_{0_SPP} = Ey₀ = X₀β.

Proof

By Theorem 2.6, E δ̂_BLUP = E[X₀β͠ + λ VT⁺ (y – X β͠)] = X₀β = Ey₀. From Lemma 2.5, it is easy to prove that E y͠_{0_BLUP} = y͠_{0_SPP} = Ey₀ = X₀β. □

Remark 2.9

According to Definition 2.1 and Theorem 2.8, δ̂_BLUP, y͠_{0_BLUP}and y͠_{0_SPP}are all unbiased predictors of y₀or Ey₀. Let λ = 1, δ̂_BLUP = y͠_{0_BLUP}is the BLUP of y₀; Let λ = 0, δ̂_BLUP = X₀β͠ is the SPP of Ey₀. It shows that the function (2) can simultaneously predict the actual value of y₀and its mean value. Since δ̂_BLUP = λy͠_{0_BLUP} + (1 – λ)y͠_{0_SPP}, then δ̂_BLUP can be viewed as a tradeoff between the BLUP of y₀and the SPP of Ey₀. By using δ̂_BLUP in practical applications, forecasters can provide a more comprehensive predictor by assigning different weights in δ̂_BLUP.

As for the choice of λ, usually the weight scalar should be given before predicting. Since λ represents the weight to the prediction of y₀ and is not a parameter, then there is no “true” but suitable value of it. One method to select λ is by forecasters’ subjective preferences. For example, if the prediction of y₀ and Ey₀ are treated equally, then λ = 0.5. Another method to determine λ is by using observed data of (y, X) in model (1). In this paper we recommend to use the leave-one-out cross-validation technique. In order to determine λ, we take δ̂_BLUP as the predictor of y₀ by Theorem 2.8 since the true β in Ey₀ = X₀β is unknown. Define δ̂_(–j)(λ) to be the predictor of y_j when the jth case of (y, X) in (1) is deleted. Denote 𝒯 = {λ_i|0 ≤ λ_i ≤ 1, i = 1, 2, ⋯}. The predicted residual sum of squares is defined as

CV(λ)=∑j=1n[yj−δ^(−j)(λ)]2.

For each λ_i ∈ 𝒯, compute ∑j=1n[y_j – δ̂_(–j)(λ_i)]². The choice of λ is the one that minimizes CV(λ) over 𝒯. Simulations in Section 3 indicate the leave-one-out cross-validation technique for the selection of λ is feasible. Forecasters can determine which one of δ̂_BLUP, y͠_{0_BLUP} and y͠_{0_SPP} is more “suitable” to be afforded through the selection of λ by observed data.

2.2 Efficiency of δ̂_BLUP

According to Theorem 2.8, δ̂_BLUP, y͠_{0_BLUP} and y͠_{0_SPP} are all unbiased predictors of y₀ or Ey₀. From the point of view of the linearity and unbiasedness of the prediction, we mainly discuss the performance of δ̂_BLUP comparing to y͠_{0_BLUP} and y͠_{0_SPP} in what follows.

Theorem 2.10

For model (1),

Cov(δ^BLUP)≤Cov(y~0BLUP),

and the equality holds if and only if (1 – λ²) VT⁺[I – T⁺X (X^′T⁺X)^–X^′]T⁺V^′ = 0.

Proof

Denote ε͠₀ = λ VT⁺(y – Xβ͠) as the predictor of ε₀, we have

Cov(δ^BLUP)=Cov(X0β~+λε~0),Cov(y~0BLUP)=Cov(X0β~+ε~0).

Since Σ = T – XX^′ and X^′[I – T⁺X(X^′T⁺X)^–X′] = 0, then

Cov(X0β~,ε~0)=X0(X′T+X)−X′T+Σ[I−T+X(X′T+X)−X′]T+V′=X0(X′T+X)−X′T+(T−XX′)[I−T+X(X′T+X)−X′]T+V′=0.

Therefore, Cov (δ̂_BLUP)– Cov (y͠_{0_BLUP}) = (1 – λ²)Cov(ε͠₀)≤ 0, and

Cov(δ^BLUP)≤Cov(y~0BLUP),

and the equality holds if and only if (1 – λ²)Cov(ε͠₀) = (1 – λ²)VT⁺[I – T⁺X(X^′T⁺X)^–X^′]T⁺V^′ = 0. □

Corollary 2.11

If Σ > 0 and rk(X) = p in model (1), then

Cov(δ^BLUP)≤Cov(y^0BLUP),

and the equality holds if and only if (1 – λ²)VΣ^–1[I – Σ^–1X(X^′Σ^–1X)^–1X^′]Σ^–1V^′ = 0.

Proof

Corollary 2.11 is easily proved by Lemma 2.4 and Theorem 2.10. □

Remark 2.12

Theorem 2.10 and Corollary 2.11 show that δ̂_BLUP is better than y͠_{0_BLUP}under the criterion of covariance.

Theorem 2.13

For model (1), if DT⁺V^′X₀(X^′T⁺X)^–X^′T⁺ + T⁺X(X′T⁺X)^–X0′VT⁺D≥ 0, where D = I – X(X^′T⁺X)^–X^′T⁺, then

E(y~0SPP−X0β)′(y~0SPP−X0β)≤E(δ^BLUP−X0β)′(δ^BLUP−X0β)≤E(y~0BLUP−X0β)′(y~0BLUP−X0β).

Proof

Denote

C1=X0(X′T+X)−X′T++λVT+[I−X(X′T+X)−1X′T+],C2=X0(X′T+X)−X′T++VT+[I−X(X′T+X)−1X′T+],

then δ̂_BLUP = C₁y and y͠_{0_BLUP} = C₂y. By the unbiasedness, C₁X = X₀ and C₂X = X₀. Therefore,

E(δ^BLUP−X0β)′(δ^BLUP−X0β)−E(y~0BLUP−X0β)′(y~0BLUP−X0β)=(Xβ)′(C1′C1−C2′C2)Xβ+tr(C1ΣC1′−C2ΣC2′).(4)

Note that D is a symmetric idempotent matrix and

C1ΣC1′=C1(T−XX′)C1′=X0(X′T+X)−X0′+λ2VT+DV′−X0X0′,C2ΣC2′=C2(T−XX′)C2′=X0(X′T+X)−X0′+VT+DV′−X0X0′,

then we have

C1ΣC1′−C2ΣC2′=−(1−λ2)VT+DV′≤0,andtr(C1ΣC1′−C2ΣC2′)≤0.(5)

Besides,

C1′C1−C2′C2=(λ−1)[DT+V′X0(X′T+X)−X′T++T+X(X′T+X)−X0′VT+D]+(λ2−1)DT+V′VT+D≤(λ−1)[DT+V′X0(X′T+X)−X′T++T+X(X′T+X)−X0′VT+D]≤0.(6)

Substituting (5) and (6) into (4), we have

E(δ^BLUP−X0β)′(δ^BLUP−X0β)≤E(y~0BLUP−X0β)′(y~0BLUP−X0β).

Let λ = 0 in (2), then y͠_{0_SPP} = X₀β͠ = arg miny^0∈LI E(ŷ₀ – X₀β)′(ŷ₀ – X₀β) by Theorem 2.6. It is obvious that

E(y~0SPP−X0β)′(y~0SPP−X0β)≤E(δ^BLUP−X0β)′(δ^BLUP−X0β).

□

By Lemma 2.4 and Theorem 2.13, we have

Corollary 2.14

In model (1), if Σ > 0, rk(X) = p and DΣ^–1V^′X₀(X^′Σ^–1X)^–1X^′Σ^–1 + Σ^–1X (X^′Σ^–1X)^–1X0′VΣ^–1D ≥ 0, where D = I – X(X^′Σ^–1X)^–1X^′Σ^–1, then

E(y^0SPP−X0β)′(y^0SPP−X0β)≤E(δ^BLUP−X0β)′(δ^BLUP−X0β)≤E(y^0BLUP−X0β)′(y^0BLUP−X0β).

Remark 2.15

Theorem 2.13 and Corollary 2.14 show that δ̂_BLUP is better than y͠_{0_BLUP}under the squared loss function as the predictor of Ey₀.

Theorem 2.16

For model (1),

E(y~0BLUP−y0)′(y~0BLUP−y0)≤E(δ^BLUP−y0)′(δ^BLUP−y0)≤E(y~0SPP−y0)′(y~0SPP−y0).

Proof

Denote

C1=X0(X′T+X)−X′T++λVT+[I−X(X′T+X)−X′T+],C2=X0(X′T+X)−X′T++VT+[I−X(X′T+X)−X′T+],C3=X0(X′T+X)−X′T+,

then δ̂_BLUP = C₁y, y͠_{0_BLUP} = C₂y and y͠_{0_SPP} = X₀β͠ = C₃y. By Lemma 2.3, C₁X = X₀, C₂X = X₀ and C₃X = X₀. Since

E(Ciy−y0)′(Ciy−y0)=trCiΣCi′−2tr(CiV′)+trΣ0,E(Ciy−y0)′(Ciy−y0)−E(Cjy−y0)′(Cjy−y0)=tr(CiΣCi′−CjΣCj′)−2tr(Ci−Cj)V′,1≤i,j≤3,and0≤λ≤1,

we have

E(C1y−y0)′(C1y−y0)−E(C2y−y0)′(C2y−y0)=(λ−1)2trVT+DV′≥0,E(C1y−y0)′(C1y−y0)−E(C3y−y0)′(C3y−y0)=[(λ−1)2−1]trVT+DV′≤0,

which give that

E(y~0BLUP−y0)′(y~0BLUP−y0)≤E(δ^BLUP−y0)′(δ^BLUP−y0)≤E(y~0SPP−y0)′(y~0SPP−y0).

□

By Lemma 2.4 and Theorem 2.16, we have

Corollary 2.17

In model (1), if Σ > 0 and rk(X) = p, then

E(y^0BLUP−y0)′(y^0BLUP−y0)≤E(δ^BLUP−y0)′(δ^BLUP−y0)≤E(y^0SPP−y0)′(y^0SPP−y0).

Remark 2.18

Theorem 2.16 and Corollary 2.17 show that δ̂_BLUP is better than y͠_{0_SPP}under the squared loss function as the predictor of y₀.

3 Simulation studies

In this section, we conduct simulations to illustrate the selection of λ in δ̂_{0_BLUP} and the finite sample performance of our simultaneous prediction comparing to ŷ_{0_BLUP} and ŷ_{0_SPP}.

The data are generated from the following model:

yy0=XX0β+εε0,εε0∼N(0,Σ),(7)

where Σ=502⋯2250⋯2⋮⋮⋱⋮22⋯50.

We assume y is the observation with sample size n = 200 and y₀ is to be predicted with sample size m = 1. In Section 3.1 we only need the sample data of y to determine λ, while in Section 3.2 we use all the sample data of y and y₀ for comparison with various λ. Elements in corresponding matrices X and X₀ are generated from the Uniform distribution [1.1, 30.7].

3.1 Selection of λ in δ̂_BLUP

We set β to be the one-dimensional parameter with the true value 0.8. The number of simulated realizations for choosing λ is 1000. In each simulation, let λ vary from 0 to 1 with step size 0.001. We use the leave-one-out cross-validation technique (see Section 2.1) to determine λ. Let λ^* be the selected value of λ, then

λ⋆=argminCV(λ)=argmin∑j=1200[yj−δ^−j(λ)]2,0≤λ≤1.

Simulations show that the relationship between CV(λ) and λ is varying. Three of the simulations are presented to illustrate the relation between λ and log CV(λ) in Figure 1. Subfigure (a) tells that λ = 1 and ŷ_{0_BLUP} should be provided when predicting; (b) tells that λ = 0 and ŷ_{0_SPP} should be preferred; (c) tells that λ = 0.315 and δ̂_BLUP should be provided when predicting. The relationship between CV(λ) and λ also tells us that there are three kinds of λ^* in our simulations. Table 1 shows that among 1000 simulations, 267 of them give that λ = 0, 332 of them determine λ = 1 and 401 of them give that 0 < λ < 1. Simulation performance shows that the leave-one-out cross-validation technique for the selection of λ is feasible and give the way to solve the question “ which one of δ̂_BLUP, ŷ_{0_BLUP} and ŷ_{0_SPP} is preferred from the observations”.

$Fig. 1 Relationships between λ and log[CV(λ)] in three simulations (a),(b) and (c) and the corresponding selection of λ$

Fig. 1

Relationships between λ and log[CV(λ)] in three simulations (a),(b) and (c) and the corresponding selection of λ

Table 1

Frequency of occurrences of three kinds of λ^* in 1000 simulations

	λ^*=0	λ^*=1	0 < λ^* < 1
Frequency	2671000	3321000	4011000

3.2 Finite sample performance of the predictors

Let n = 200, m = 1, p = 3 and the true β = (1, 0.8, 0.2)^′ in (7). λ in δ̂_BLUP varies on a grid from 0.1 to 0.9. For each λ, the number of simulations is 1000. In each simulation, we make some comparisons about δ̂_BLUP, ŷ_{0_BLUP} and ŷ_{0_SPP}. Regarding δ̂_BLUP – y₀, ŷ_{0_BLUP} – y₀ and ŷ_{0_SPP} – y₀, the sample means (sms), the standard deviations (stds) and the mean squares (mss) of which are obtained in Table 2. Also, regarding δ̂_BLUP-X₀β, ŷ_{0_BLUP} – X₀β and ŷ_{0_SPP} – X₀β, the sms, the stds and the mss of which are presented in Table 3.

Table 2

Finite sample performance about forecast precision ofŷ_{0_BLUP}, δ̂_BLUP(with different λ)and ŷ_{0_SPP}

		λ =0.1	λ = 0.2	λ = 0.3	λ = 0.4	λ = 0.5	λ = 0.6	λ = 0.7	λ = 0.8	λ = 0.9
	ŷ_{0_BLUP}^–y₀	-0.2596	0.1509	-0.1056	0.3124	0.4998	-0.0703	0.1251	0.2432	0.2150
sm	δ̂_BLUP^–y₀	0.2524	0.1790	-0.0662	0.3314	0.4911	-0.0783	0.1292	0.2358	0.2152
	ŷ_{0_SPP}^–y₀	0.2516	0.1861	-0.0494	0.3341	0.4825	-0.0904	0.1389	0.2060	0.2172
	ŷ_{0_BLUP}^–y₀	7.2516	7.1930	6.9865	6.8545	6.9253	6.9844	6.8622	6.9193	6.9606
std	δ̂_BLUP^–y₀	7.2758	7.2239	7.0448	6.8462	6.9624	6.9791	6.8682	6.9277	6.9640
	ŷ_{0_SPP}^–y₀	7.2843	7.2433	7.0890	6.8656	7.0298	7.0051	6.9230	7.0053	7.0463
	ŷ_{0_BLUP}^–y₀	52.601	51.711	48.773	47.035	48.162	48.738	47.058	47.888	48.448
ms	δ̂_BLUP^–y₀	52.948	52.163	49.584	46.933	48.668	48.665	47.141	48.000	48.496
	ŷ_{0_SPP}^–y₀	53.072	52.447	50.206	47.207	49.601	49.030	47.899	49.068	49.648

Table 3

Finite sample performance about goodness fit of the model ofŷ_{0_BLUP}, δ̂_BLUP(with different λ)and ŷ_{0_SPP}

		λ = 0.1	λ = 0.2	λ = 0.3	λ = 0.4	λ = 0.5	λ = 0.6	λ = 0.7	λ = 0.8	λ = 0.9
	ŷ_{0_BLUP}^–X₀β	0.0249	-0.0190	-0.0742	-0.0077	-0.0256	-0.0257	0.0047	-0.0543	-0.0340
sm	δ̂_BLUP^–X₀β	0.0177	0.0091	-0.0349	0.0113	-0.0343	-0.0337	0.0089	-0.0618	-0.0338
	ŷ_{0_SPP}^–X₀β	0.0169	0.0162	-0.0180	0.0240	-0.0429	0.0457	0.0186	-0.0915	-0.0318
	ŷ_{0_BLUP}^–X₀β	1.6389	1.6769	1.6415	1.6124	1.6121	1.6640	1.6401	1.5242	1.6445
std	δ̂_BLUP^–X₀β	1.3389	1.3831	1.3844	1.3774	1.3914	1.5010	1.5048	1.4389	1.5966
	ŷ_{0_SPP}^–X₀β	1.3334	1.3629	1.3626	1.3308	1.3039	1.3983	1.3547	1.2949	1.3700
	ŷ_{0_BLUP}^–X₀β	2.6841	2.8097	2.6974	2.5973	2.5968	2.7668	2.6872	2.3239	2.7028
ms	δ̂_BLUP^-X₀β	1.7910	1.9112	1.9159	1.8955	1.9352	2.2518	2.2621	2.0721	2.5477
	ŷ_{0_SPP}^-X₀β	1.7765	1.8558	1.8551	1.7697	1.7002	1.9554	1.8336	1.6835	1.8761

From Table 2 and Table 3, we make the following observations:

As for the prediction precision, no matter what λ is set to be, the sample means (sms) of these prediction error of ŷ_{0_BLUP}, δ̂_BLUP and ŷ_{0_SPP} are all small. Comparisons of sms can not tell which one of the three predictors is better, yet the standard deviations (stds) and the mean squares (mss) of δ̂_BLUP – y₀ are less than that of ŷ_{0_SPP} – y₀.
No matter what λ is set to be, the sample means (sms) of ŷ_{0_BLUP} – X₀β, δ̂_BLUP – X₀β and ŷ_{0_SPP} – X₀β are all small. Comparisons of sms can not determine which predictor is better, yet the standard deviations (stds) and the mean squares (mss) of δ̂_BLUP – X₀β are less than that of ŷ_{0_BLUP} – X₀β.

The above facts imply that for any λ ∈ (0, 1), δ̂_BLUP, ŷ_{0_BLUP} and ŷ_{0_SPP} are all unbiased predictions of y₀ and Ey₀. δ̂_BLUP is more efficient than X₀β̂_BLUE when predicting the actual value, and is more efficient than ŷ_{0_BLUP} when predicting the mean value. Simulation performances verify the results in Section 2.2.

4 Conclusion

In this paper, we study the prediction based on a composite target function that allows to simultaneously predict the actual and the mean values of the unobserved regressand in the generalized linear model. The BLUP of the target function is derived when the model error covariance is positive semi-definite. The BLUP is also the unbiased prediction of the actual and the mean values of the the unobserved regressand. We propose the leave-one-out cross-validation technique to determine the value of the weight scalar in our prediction, which can help to provide a suitable prediction. For the efficiency of the proposed BLUP, studies show that it is better than the usual BLUP under the criterion of covariance and dominates it as a prediction of the mean value of the regressand. Besides, the proposed BLUP is better than the SPP as a prediction of the actual value of the regressand. Simulation studies illustrate the selection of the weight scalar in the proposed BLUP and show that it has better finite sample performance. Further researches on simultaneous prediction are in progress.

Acknowledgement

The authors are grateful to the responsible editor and the anonymous reviewers for their valuable comments and suggestions, which have greatly improved this paper. This research is supported by the Scientific Research Fund of Hunan Provincial Education Department (13C1139), the Youth Scientific Research Foundation of Central South University of Forestry and Technology of China (QJ2012013A) and the Natural Science Foundation of Hunan Province (2015JJ4090).

References

[1] Goldberger A.S., Best linear unbiased prediction in the generalized linear regression model, Journal of the American Statistical Association, 1962, 57(298), 369–37510.1080/01621459.1962.10480665Search in Google Scholar

[2] Bolfarine H., Zacks S., Bayes and minimax prediction in finite populations, J. Statist. Plann. Infer., 1991, 28, 139–15110.1016/0378-3758(91)90022-7Search in Google Scholar

[3] Yu S. H., The linear minimax predictor in finite populations with arbitrary rank under quadratic loss function, Chin. Ann. Math., 2004, 25, 485–496Search in Google Scholar

[4] Xu L. W., Wang S. G., The minimax predictor in finite populations with arbitrary rank in normal distribution, Chin. Ann. Math., 2006, 27, 405–416Search in Google Scholar

[5] Gotway C. A., Cressie N., Improved multivariate prediction under a general linear model, J. Multivariate Anal., 1993, 45, 56–7210.1006/jmva.1993.1026Search in Google Scholar

[6] P J G Teunissen P. J. G., Best prediction in linear models with mixed integer/real unknowns: theory and application, Journal of Geodesy, 2007, 81(12), 759–78010.1007/s00190-007-0140-6Search in Google Scholar

[7] Xu L. W., Admissible linear predictors in the superpopulation model with respect to inequality constraints, Comm. Statist. Theory Methods, 2009, 38, 2528–254010.1080/03610920802571211Search in Google Scholar

[8] Searle S. R., Casella G., McCulloch C. E., Variance components, 1992, New York: Wiley.10.1002/9780470316856Search in Google Scholar

[9] Bolfarine H., Rodrigues J., On the simple projection predictor in finite populations, Australian Journal of Statistics, 1988, 30(3), 338–34110.1111/j.1467-842X.1988.tb00627.xSearch in Google Scholar

[10] Hu G. K., Li Q. G., Yu S. H., Optimal and minimax prediction in multivariate normal populations under a balanced loss function, J. Multivariate Anal., 2014, 128, 154–16410.1016/j.jmva.2014.03.014Search in Google Scholar

[11] Hu G. K., Peng P., Linear admissible predictor of finite population regression coefficient under a balanced loss function, J. Math., 2014, 34, 820–828Search in Google Scholar

[12] Diebold F. X., Lopez J. A., Forecast evaluation and combination, Handbook of statistics, 1996, 14, 241–26810.1016/S0169-7161(96)14010-4Search in Google Scholar

[13] Hendry D. F., Clements M. P., Pooling of forecasts, Econometrics Journal, 2002, 5, 1–2610.1111/j.1368-423X.2004.00119.xSearch in Google Scholar

[14] Timmermann A., Forecast combinations, Handbook of economic forecasting, 2006, 1, 135–19610.1016/S1574-0706(05)01004-9Search in Google Scholar

[15] Shalabh, Performance of stein-rule procedure for simultaneous prediction of actual and average values of study variable in linear regression models, Bull. Internat. Statist. Inst, 1995, 56, 1357–1390Search in Google Scholar

[16] Chaturvedi A., Singh S. P., Stein rule prediction of the composite target function in a general linear regression model, Statist. Papers, 2000, 41(3), 359–36710.1007/BF02925929Search in Google Scholar

[17] Chaturvedi A., Kesarwani S., Chandra R., Simultaneous prediction based on shrinkage estimator, in: Shalabh, C. Heumann (Eds.), Recent Advances in Linear Models and Related Areas, Springer, 2008, 181–20410.1007/978-3-7908-2064-5_10Search in Google Scholar

[18] Shalabh, Heumann C., Simultaneous prediction of actual and average values of study variable using stein-rule estimators, in: K. Kumar, A. Chaturvedi (Eds.), Some Recent Developments in Statistical Theory and Application, Brown Walker Press, USA, 2012, 68–81Search in Google Scholar

[19] Chaturvedi A., Wan A. T. K., Singh S. P., Improved multivariate prediction in a general linear model with an unknown error covariance matrix, J. Multivariate Anal., 2002, 83(1), 166–18210.1006/jmva.2001.2042Search in Google Scholar

[20] Bai C., Li H., Admissibility of simultaneous prediction for actual and average values in finite population, J. Inequal. Appl., 2018, 2018(1), 11710.1186/s13660-018-1707-xSearch in Google Scholar

[21] Toutenburg H., Shalabh, Predictive performance of the methods of restricted and mixed regression estimators, Biometrical J., 1996, 38(8), 951–95910.1002/bimj.4710380807Search in Google Scholar

[22] Toutenburg H., Shalabh, Improved predictions in linear regression models with stochastic linear constraints, Biom. J., 2000, 42(1), 71–8610.1002/(SICI)1521-4036(200001)42:1<71::AID-BIMJ71>3.0.CO;2-HSearch in Google Scholar

[23] Dubeand M., Manocha V., Simultaneous prediction in restricted regression models, J. Appl. Statist. Sci., 2002, 11(4), 277–288Search in Google Scholar

[24] Shalabh, Paudel C. M., Kumar N., Simultaneous prediction of actual and average values of response variable in replicated measurement error models, in: Shalabh, C. Heumann (Eds.), Recent Advances in Linear Models and Related Areas, Springer, 2008, 105–13310.1007/978-3-7908-2064-5_7Search in Google Scholar

[25] Garg G., Shalabh, Simultaneous predictions under exact restrictions in ultrastructural model, Journal of Statistical Research (in Special Volume on Measurement Error Models), 2011, 45(2), 139–154Search in Google Scholar

[26] Shalabh, A revisit to efficient forecasting in linear regression models, J. Multivariate Anal., 2013, 114, 161–17010.1016/j.jmva.2012.07.017Search in Google Scholar

[27] Wang S. G., Shi J. H., Introduction to the linear model, 2004, Science Press, Beijing.Search in Google Scholar

[28] Yu S. H., Xu L. W., Admissibility of linear prediction under quadratic loss, Acta Mathematicae Applicatae Sinica, 2004, 27, 385–396Search in Google Scholar

Received: 2017-11-28

Accepted: 2018-07-17

Published Online: 2018-08-24

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Articles in the same Issue

https://doi.org/10.1515/math-2018-0087

Keywords for this article

Generalized linear model; Simultaneous prediction; Best linear unbiased prediction

Creative Commons

BY-NC-ND 4.0

Simultaneous prediction in the generalized linear model

Article

Abstract

1 Introduction

2 The BLUP of δ and its efficiency

Definition 2.1

Definition 2.2

Lemma 2.3

Proof

Lemma 2.4

Lemma 2.5

Proof

2.1 The best linear unbiased predictor of δ

Theorem 2.6

Proof

Corollary 2.7

Proof

Theorem 2.8

Proof

Remark 2.9

2.2 Efficiency of δ̂BLUP

Theorem 2.10

Proof

Corollary 2.11

Proof

Remark 2.12

Theorem 2.13

Proof

Corollary 2.14

Remark 2.15

Theorem 2.16

Proof

Corollary 2.17

Remark 2.18

3 Simulation studies

3.1 Selection of λ in δ̂BLUP

3.2 Finite sample performance of the predictors

4 Conclusion

Acknowledgement

References

Articles in the same Issue

Articles in the same Issue

Articles in the same Issue

2.2 Efficiency of δ̂_BLUP

3.1 Selection of λ in δ̂_BLUP