Skip to main content
Article Publicly Available

Functional and Parametric Estimation in a Semi- and Nonparametric Model with Application to Mass-Spectrometry Data

  • , EMAIL logo , and
Published/Copyright: November 3, 2015

Abstract

Motivated by modeling and analysis of mass-spectrometry data, a semi- and nonparametric model is proposed that consists of linear parametric components for individual location and scale and a nonparametric regression function for the common shape. A multi-step approach is developed that simultaneously estimates the parametric components and the nonparametric function. Under certain regularity conditions, it is shown that the resulting estimators is consistent and asymptotic normal for the parametric part and achieve the optimal rate of convergence for the nonparametric part when the bandwidth is suitably chosen. Simulation results are presented to demonstrate the effectiveness and finite-sample performance of the method. The method is also applied to a SELDI-TOF mass spectrometry data set from a study of liver cancer patients.

1 Introduction

We are concerned with the following semi- and nonparametric regression model

(1)yit=αi+βim(xit)+σi(xit)εit,

where yit is the observed response from i-th individual (i=1,,n) at time t for (t=1,,T), xit is the corresponding explanatory variable, αi and βi are individual-specific location and scale parameters and m() is a baseline intensity function. Here, E(εit)=0, Var(εit)=1, and εit and xit are independent. Of interest here is the simultaneous estimation of αi, βi and m(). We shall assume throughout the paper that εit(i=1,,n;t=1,,T) are independent and identically distributed (i.i.d.) with an unknown distribution function, though most results only require that the errors be independent with zero mean.

Model (1) is motivated by analyzing the data generated from mass spectrometer (MS), which is a powerful tool for the separation and large-scale detection of proteins present in a complex biological mixture. Figure 1 is an illustration of MS spectra, which can reveal proteomic patterns or features that might be related to specific characteristic of biological samples. They can also be used for prognosis and for monitoring disease progression, evaluating treatment or suggesting intervention. Two popular mass spectrometers are SELDI-TOF (surface enhanced laser desorption/ionization time-of-fight) and MALDI-TOF (matrix assisted laser desorption and ionization time-of-flight). The abundance of the protein fragments from a biological sample (such as serum) and their time of flight through a tunnel under certain electrical pressure can be measured by this procedure. The y-axis of a spectrum is the intensity (relative abundance) of protein/peptide, and the x-axis is the mass-to-charge ratio (m/z value) which can be calculated using time, length of flight, and the voltage applied. It is known that the SELDI intensity measures have errors up to 50% and that the m/z may shift its value by up to 0.1–0.2% [1]. Generally speaking, many pre-processing steps need to be done before the MS data can be analyzed. Some of the most important steps are noise filtering, baseline correction, alignment, normalization, etc. See, e.g., Guilhaus [2], Banks and Petricoin [3], Baggerly et al. [4], Baggerly et al. [5], Diamandis [6], Feng et al. [7]. We refer readers to Roy et al. [8] for an extensive review about the recent advances in mass-spectrometry data analysis. Here, we assume all the pre-processing steps have already been taken.

Figure 1: Illustration of MS spectra.
Figure 1:

Illustration of MS spectra.

In model (1), m() represents the common shape for all individuals while αi and βi represents the location and scale parameters for the i-th individual, respectively. Because m() is unspecified, model (1) may be viewed as a semiparametric model. However, it differs from the usual semi-parametric models in that for model (1), both the parametric and nonparametric components are of primary interest, while in a typical semiparametric setting, the nonparametric component is often viewed as a nuisance parameter. Model (1) contains many commonly encountered regression models as special cases. If all the parametric coefficients αi and βi are known, model (1) reduces to the classical nonparametric regression. On the other hand, if the function m() is known, then it reduces to the classical linear regression model with each subject having its own regression line. For the present case of αi, βi and function m() being unknown, the parameters are identifiable only up to a common location-scale change. Thus we assume, without loss of generality, that α1=0 and β1=1. It is also clear that for αi, βi and m() to be consistently estimable, we need to require that both n and T go to .

There is an extensive literature on semiparametric and nonparametric regression. For semiparametric regression, Begun et al. [9] derived semiparametric information bound while Robinson [10] developed a general approach to constructing n-consistent estimation for the parametric component. We refer to Bickel et al. [11] and Ruppert et al. [12] for detailed discussions on the subject. For nonparametric regression, kernel and local polynomial smoothing methods are commonly used [1316]. In particular, local polynomial smoothing has many attractive properties including the automatic boundary correction. We refer to Fan and Gijbels [17] and Hardle et al. [18] for comprehensive treatment of the subject.

The existing methods for dealing with nonparametric and semiparametric problems are not directly applicable to model (1). This is due to the mixing of the “finite dimensional” parameters and the nonparametric component. A natural way to handle such a situation is to de-link the two aspects of the estimation through a two-step approach. In this paper, we propose an efficient iterative procedure, alternating between estimation of the parametric component and the nonparametric component. We show that the proposed approach leads to consistent estimators for both the finite-dimensional parameter and the nonparametric function. We also establish the asymptotic normality for the parametric estimators, and the convergence rate for the nonparametric estimate that is then used to derive the optimal bandwidth selection.

2 Main results

In this section, we develop a multi-step approach to estimating both the finite-dimensional parameters αi and βi and the nonparametric baseline intensity m(). Our approach is an iterative procedure which alternates between estimation of αi and βi and that of m(). We show that under reasonable conditions, the estimation for the parametric component is consistent and asymptotically normal when the bandwidth selection are done appropriately. The estimation of the nonparametric component can also attain the optimal rate of convergence.

2.1 A multi-step estimation method

Recall that if αi and βi were known, the problem would reduce to the standard nonparametric regression setting; on the other hand, if m() were known, it would reduce to the simple linear regression for each i. For the nonparametric regression, we can apply the local linear regression with the weights Kh()=K(/h)/h for suitably chosen kernel function K and bandwidth h. For the simple linear regression, the least squares estimation may be applied.

To ensure identifiability, we shall set α1=0 and β1=1. Thus, for i=1, (1) becomes a standard nonparametric regression problem, from which an initial estimator of m() can be derived. Replacing m() in (1) by the initial estimator, we can apply the least squares method to get estimators of αi, βi for i2, which, together with α1=0 and β1=1 and local polynomial smoothing, can then be used to get an updated estimator of m(). This iterative estimation procedure is described as follows.

  1. Set α1=0 and β1=1, so that y1t=m(x1t)+σ1(x1t)ε1t, t=1,,T. Apply local linear regression to (x1t,y1t),t=1,,T, to get initial estimator of m()

(2)m˜(x)=t=1Tω1t(x)y1tt=1Tω1t(x),

where ω1t(x)=Kh(x1tx)(ST,2(x1tx)ST,1) and

(3)ST,k=t=1TKh(x1tx)(x1tx)k,fork=1,2.
  1. With m() being replaced by m˜() as the true function, αi, βi, i=2,,n can be estimated by the least squares method, i.e.

    (4)β^i=t=1T[m˜(xit)m˜¯(xi)]yitt=1T[m˜(xit)m˜¯(xi)]2,
    (5)α^i=y¯iβ^im˜¯(xi)=y¯it=1T(m˜(xit)m˜¯(xi))yitt=1T(m˜(xit)m˜¯(xi))2m˜¯(xi),

where

y¯i=1Tt=1Tyit,andm˜¯(xi)=1Tt=1Tm˜(xit).
  1. With the estimates αˆi and βˆi, we can update the estimation of m() viewing αˆi and βˆi as true values. Specifically, we apply the local linear regression with the same kernel function K() to get an updated estimator of m(),

    (6)mˆ(x)=i=1nt=1Tωit(x)yiti=1nt=1Tωit(x),

    where yit=(yitαˆi)/βˆi,

    (7)ωit(x)=βˆi2Kh(xitx)i=1nβˆi2ST,2(i)(xitx)i=1nβˆi2ST,1(i)

    and

    (8)ST,k(i)=t=1TKh(xitx)(xitx)k,fork=1,2.

    Note that the bandwidth for this step, h, may be chosen differently from h in order to achieve better convergence rate. The optimal choices for h and h will become clear in the next subsection where large sample properties are studied.

  2. Given a pre-specified tolerance level η>0, repeat steps (b) and (c) until the parametric and the nonparametric estimators for iterations l and l+1 meet the following convergence criteria.

    Σi=1n||αˆi(l+1)αˆi(l)||2+Σi=1n||βˆi(l+1)βˆi(l)||2+(mˆ(x)(l+1)mˆ(x)(l))dx>η.

Our limited numerical experiences indicate that the final estimator is not sensitive to the initial estimate. However, as a safe guard, we may start the algorithm with different initial estimates by choosing different individuals as the baseline intensity. In step (c), the βˆi is in the denominator, which, when close to 0, may cause instability. Thus, in practice, we can add a small constant to the denominator to make it stable, though we have not encountered this problem.

The iterative process often converges very quickly. In addition, our asymptotic analysis in the next subsection shows that no iteration is needed to reach the optimal convergence rate for the estimate of both parametric and nonparametric components when the bandwidths of each step are properly chosen. Therefore, we may stop after step (c) to save computation time for large problems.

2.2 Large sample properties

In this section, we study the large sample properties of the estimates for m(), αi and βi. By large sample, we mean that both n and T are large. However, the size of n and that of T can be different. Indeed, for MS data, T is typically much larger than n. The optimal bandwidth selection in the nonparametric estimation will be determined by asymptotic expansions to achieve optimal rate of convergence. We will also investigate whether or not the accuracy of αˆi and βˆi may affect the rate of convergence for the estimation of m().

The following conditions will be needed to establish the asymptotic theory.

  1. The baseline intensity m() is continuous and has a bounded second order derivative.

  2. There exist constants α>0 and δ>0, such that the marginal density f() of xit satisfies f(x)>δ, and |f(x)f(y)|c|xy|α for any × and y in the support of f().

  3. The conditional variance σi2(x)=Var(yit|xit=x) is bounded and continuous in x, where i=1,,n and t=1,,T.

  4. The kernel K() is a symmetric probability density function with bounded support. Hence K() has the properties: K(u)du=1,uK(u)du=0,u2K(u)du0 and bounded. Without loss of generality, we could further assume the support of K() lies in the interval [1,+1].

Condition C1 is a standard condition for nonparametric estimation. Condition C2 requires that the density of xit is bounded away from 0, which may be a strong assumption in general but reasonable for mass spectrometry data as xit are approximately uniformly distributed on the support as shown in Figure 1. Condition C3 allows for heteroscedasticity while restricting the variances to be bounded. Condition C4 is a standard condition for kernel function used in the local linear regression.

The moments of K and K2 are denoted respectively by μl=ulK(u)du and νl=ulK2(u)du for l0.

Lemma 1

Suppose that Conditions C1–C4 are satisfied. Then form˜()defined by (2), we have, ash0andTh,

(9)m˜(x)=m(x)+12m(x)μ2h2+o(h2)+U1(x),

where U1(x)=(t=1Tω1s(x)σ1(x1s)ε1s)/(t=1Tω1s(x)).

Lemma 1 allows us to derive the asymptotic bias, variance and mean squared error for the estimator m˜(). This is summarized in the following corollary.

Corollary 1

LetXdenote all the observed covariates{xit,i=1,,n,t=1,,T}. Under Conditions C1–C4, the bias, variance and mean squared error ofm˜(x)conditional onXhave the following expressions.

m˜(x)=m(x)+12m(x)μ2h2+o(h2)+U1(x),Var(m˜(x)|X)=1Th[f(x)]1σ12(x)ν0+o(1Th),E[{m˜(x)m(x)}2|X]=14(m(x)μ2)2h4+1Th[f(x)]1σ12(x)ν0+o(h4+1Th).

It is clear from the above expansions that in order to minimize the mean squared error of m˜(x), the bandwidth h should be chosen to be of order T1/5. However, we will show later that this is not necessarily optimal for our final estimator mˆ().

For estimation of scale parameters βi, we can apply Lemma 1 together with the Taylor expansion to derive asymptotic bias and variance. In particular, we have the following theorem.

Theorem 1

Suppose that Conditions C1–C4 are satisfied and thath0is chosen so thatTh. Then the following expansions hold fori2.

(10)E(βˆiβi|X)=βi(h2Pi+1ThQi)+o(h2+1Th),
(11)Var(βˆi|X)=t=1TWit2σi2(xit)(t=1TWit2)2+βi2t=1TW1t2σi2(x1t)(t=1TWit2)2+o(1T),

where Wit=m(xit)mˉ(xi),mˉ(xi)=T1t=1Tm(xit),

Pi=μ22t=1TWitm(xit)t=1TWit2,Qi=ν0t=1Tf1(x1t)σi2(x1t)t=1TWit2.
Remark 1

The asymptotic bias and variance of parameter estimatorαˆican be similarly derived. In fact, they can be inferred from the bias and variance ofβˆithrough its linear relationship withβˆi, thus having the same order as those ofβˆiin (10) and (11).

Remark 2

The bias ofβˆiis of the orderh2+(Th)1and the variance is of the orderT1. To obtain theT-consistency forβˆi, i.e. T(βˆiβi)=Op(1), the order of bias should beO(T12). This is achieved by choosing the order of h betweenO(T12)andO(T14).

From the asymptotic expansion for the mean and variance of the initial functional estimator m˜() and parameter estimator βˆi, we can obtain the asymptotic expansions for the bias and variance of the subsequent estimator of the baseline intensity, mˆ().

Theorem 2

Suppose that Conditions C1–C4 are satisfied. Suppose also that h form˜()andhformˆ()are chosen so thath0, h0, Th, andnTh. Then the following expansions hold:

E(m^(x)m(x)|X)=(i=2nβi2(h2Pi+(Th)1Qi)i=1nβi2)m(x)i=2nβi2(h2Ri+(Th)1m(xi)Qi)i=1nβi2+m(x)μ22h*2+o(h2+1Th+h*2),
Var(mˆ(x)|X)=1i=1nβi22t=1Ti=2nβi21T+Zit2σ12(x1t)+ν0i=2nβi2f1(x)σi2(xit)Th(i=1nβi2)2+o1T+1nTh.

where Pi,Qi,Wit are the same as those in Theorem 1, and Ri=m¯(xi)Pi21μ2m¯(xi), Zit=(s=1TWis2)1(m(x)mˉ(xi))W1t.

In the ideal case when the location-scale parameters are known, the bias and variance of the local linear estimator of baseline intensity m() should be of the order O(h2) and O(1nTh). And the optimal bandwidth in this ideal case should be of order (nT)15. Therefore the bias and variance of the nonparametric estimator are O((nT)25) and O((nT)45), respectively. In addition, the mean squared error is of order O((nT)45). Interestingly, by choosing the bandwidths h and h separately, we can achieve this optimal rate of convergence for the baseline intensity estimator mˆ() through the proposed multi-step estimation procedure when the orders of n and T satisfy certain requirement. Notice that the parametric components will have the optimal T convergence rate simultaneously. The conclusions are summarized in the following theorem.

Theorem 3

Suppose that Conditions C1–C4 are satisfied. The optimal parametric convergence rate of location-scale estimators can be attained by setting h to be of orderT13; the optimal nonparametric convergence rate of the baseline intensity estimatormˆ()can be attained by settinghto be of order(nT)15and h of orderT13, whenT, n, andn=O(T14).

Remark 3

In Theorem 3, ifTwhile n is fixed, the optimal nonparametric convergence rate could still be reached by settingh=O(T15)and h to betweenO(T15)andO(T35). By Remark 2, we still have theT-consistency for parametric estimation for the fixed-n situation.

Remark 4

From Theorem 3, if the requirementn=O(T14)is not satisfied, then the nonparametric estimatormˆ()will not achieve the optimal rate of convergence at any choice of the bandwidths. However, the choice of h andhis optimal even ifn=O(T14)does not hold.

Theorem 4

Suppose that Conditions C1–C4 are satisfied. In addition, assumeE[m2(xit)(σi2(xit)+1)]<andE[m2(xit)]>0for alli=1,...,nandt=1,,T. If we restrict the order of h to lie betweenT12andT14, βˆiis asymptotic normal:

(12)T(βˆiβ)N(0,σi2),
where
σi2=limTT1t=1TWit2σi2(xit)(T1t=1TWit2)2+βi2T1t=1TW1t2σ12(x1t)(T1t=1TWit2)2.

Here, if we assume σi2() to be a constant for each subject i, then its value can be consistently estimated by the plug-in estimator

(13)σˆi2=T1t=1T(yitαˆiβˆimˆ(xit))2,

where αˆ1=0 and βˆ1=1.

From (12), the asymptotic variance of βˆi is of order O(T1), provided that the order of the bandwidth h is properly chosen. Since the asymptotic expansion for βˆi does not involves the choice of h, the specific choice of different h will not affect the order of the asymptotic variance of βˆi.

2.3 Bandwidth selection

In Section 2.2, we studied how the choice of bandwidths h and h may affect the asymptotic properties of the estimators. However, in practice, we need a data-driven approach to choosing the bandwidths. Our suggestion on this is to use a K-fold cross-validation bandwidth selection rule.

First, we divide the n individuals into K groups Z1,Z2,,ZK randomly. Here, Zk is the k-th test set, and the k-th training set is Zk={{1,,n}Zk}. We estimate the baseline curve m() using the observations in the training set Zk and denote the estimator as mˆ(Zk,h,h), where h and h are the bandwidths of the two nonparametric regression steps for m˜() and mˆ(), respectively. Recall that at the beginning of the multi-step estimation procedure, we fix the first observation as the baseline to solve the identifiability issue. In the case of cross-validation, for each split, the baseline will corresponds to the first observation inside Zk, which is different for different k. We circumvent the problem of comparing different baseline estimates by using them to predict the test data in Zk, i.e., after obtaining the estimator of baseline curve from Zk. We then regress it on the data in Zk, and compute the mean squared prediction error (MSPE).

(14)MSPE(Zk,h,h)=1|Zk|iZkt=1T[yit(αˆki+βˆkimˆt(Zk,h,h))]2,

where αˆki and βˆki are the estimated regression coefficients. We repeat the calculation for k=1,,K, and the optimal pair (hˆ,hˆ) is the one which minimizes the average MSPE, i.e.

(15)(hˆ,hˆ)=argmin(h,h)1Kk=1KMSPE(Zk,h,h).

The effectiveness of the cross-validation will be evaluated in Sections 3 and 4.

3 Application to mass spectrometry data

We now apply the proposed multi-step method to a SELDI-TOF mass spectrometry data set from a study of 33 (n=33) liver cancer patients conducted at Changzheng Hospital in Shanghai. For each patient, we extract the m/z values in the region 2,000–10,000 Da (T=21,000), which is believed to contain all the useful information. Figure 2 contains the curves of 10 randomly picked patients.

Figure 2: Curves of 10 observations and the baseline estimate.
Figure 2:

Curves of 10 observations and the baseline estimate.

There are some noticeable features in the data. All curves appear to be continuous. They peak simultaneously around certain locations; at each location, curves have the same shape but with different heights. All those features are captured well by our model. Since the observed values of m/z for each person may fluctuate, we need to perform registration to make the analysis easier. Here, we use the observations from the first individual and set his/her m/z values as the reference. Then we use the linear interpolation method to compute the intensities of all the other individuals at the reference m/z locations. After that we get the preprocessed data which has the same m/z values for each observation.

We use the cross-validation method described in Section 2.3 to select the optimal bandwidths with K=33, i.e., leave-one-out cross validation. We compute the MSPE at the grid of h=2,4,6,,40 and h=2,4,6,,40. Table 1 contains a portion of the result with h=30,32,,40 and h=2,4,6,,20.

Table 1:

MSPE of the leave-one-out prediction of real data. The minimum MSPE is shown in bold.

hh303234363840
21,104.9461,104.9411,104.9361,104.9341,104.9311,104.930
41,104.4831,104.4821,104.4811,104.4831,104.4841,,104.487
61,110.2611,110.2651,110.2691,110.2751,110.2811,110.288
81,122.6011,122.6101,122.6191,122.6301,122.6401,122.652
101,140.5251,140.5391,140.5521,140.5681,140.5821,140.598
121,161.7391,161.7571,161.7751,161.7951,161.8131,161.832
141,183.8421,183.8641,183.8861,183.9091,183.9311,183.953
161,205.3561,205.3821,205.4071,205.4331,205.4581,205.483
181,225.2981,225.3271,225.3551,225.3831,225.4111,225.438
201,243.0681,243.0991,243.1301,243.1621,243.1911,243.222

As we can see in Table 1, the minimum MSPE occurs at the location of h=34 and h=4, which agrees with our theory that h and h should not be chosen with the same rate for the purpose of estimating the nonparametric component.

Finally, we use the selected bandwidths to estimate the location and scale parameters as well as the nonparametric curve for the whole data set. The estimated parameters are reported in Table 2 and the baseline nonparametric curve estimation is shown in the lower part of Figure 2. From Table 2, we can see that each individual has very different regression coefficients, which was also verified by looking at Figure 2. In addition, comparing the estimated curve for the baseline intensities with the real curves of 10 observations, it is clear that the majority of the peaks and shapes are captured by the nonparametric estimate with appropriate degree of smoothing. A graphical representation of the raw curve of a single individual (16th subject) is also illustrated in Figure 3 along with estimates derived from m˜() and mˆ(). We can see from the figure that the estimate from mˆ() is notably better than that from m˜(), which shows that multi-step procedure is effective in improving the estimates for the baseline curve. We observed similar phenomenon for all the other subjects.

Table 2:

Regression parameters of real data.

IDαˆβˆIDαˆβˆIDαˆβˆ
10112−1.03021.591423−0.72341.4448
2−0.20861.183613−0.17881.1366240.20210.8915
3−1.22081.683614−0.32521.2586250.53410.7957
4−0.56301.368915−0.61691.3599260.53730.7203
5−1.47611.872116−0.39191.241827−0.37481.2181
6−1.29311.7142170.08201.0178280.09350.9642
70.79280.5925180.74020.6569290.78520.5971
80.05821.038719−0.06091.058630−0.00851.0503
9−0.33381.183920−0.92181.505331−0.38271.2131
10−0.90661.539721−0.05801.037832−0.78031.5011
11−1.50541.877022−0.35261.214933−0.21151.1108
Figure 3: Nonparametric estimates of the curve from m˜(⋅)$\tilde m(\cdot)$ and mˆ(⋅)$\hat m(\cdot)$ for the 16th object.
Figure 3:

Nonparametric estimates of the curve from m˜() and mˆ() for the 16th object.

4 Simulation studies

We conduct simulations to assess the performance of the proposed method for parameter and curve estimation. The true curve m() is chosen from a moving average smoother of the cross-sectional mean of a fraction of real Mass Spectrometry data in Section 3 after log transformation. We set 10000 m/z values equally-spaced from 1 to 10,000 (T=10,000) and the number of individuals n=30. The true values of the parameters αi,βi,i=1,2,,n for each individual are shown in Table 3. And the error terms εit are sampled independently from N(0,σ2), where we set σ=0.25.

We apply our multi-step procedure to the simulated data with different choices of the bandwidth. The number of runs is 100. The estimated parameters αˆi and βˆi are shown in Table 3 along with the standard errors. We set h=35, which leads to the smallest SSE of m˜ shown in Table 4. It is evident that the estimation is very accurate for all the location and scale parameters. In addition, for each h and h, the computation time is less than two minutes on a laptop.

From Table 4, we can see that the global optimal bandwidths are h=25,h=36. It is interesting to note that the optimal bandwidth for m˜() is h=35, which is different from the optimal bandwidth for the final estimator.

To evaluate the quality of the our multi-step estimation method for the nonparametric baseline function, we consider a classical nonparametric estimation on another set of data where the same true function m() is used but αi=0,βi=1 for all i=1,,n. We applied the same local linear estimation with different bandwidths. The mean SSE of the estimated m() from 100 repetitions for different hs are given in Table 5. When we applied the multi-step estimation procedure, the best mean SSE we achieved in Table 4 is very close to the minimal mean SSE 0.4442 for the oracle estimator. This comparison confirms that there is little loss of information in the proposed method when both parametric and nonparametric components are estimated simultaneously.

Table 3:

Regression parameter estimates (h=35).

IDαβαˆβˆIDαβαˆβˆ
1010.000(0.000)1.000(0.000)160.610.605(0.026)0.997(0.019)
20.20.20.202(0.020)0.198(0.013)170.80.20.799(0.020)0.201(0.014)
30.40.50.399(0.023)0.501(0.016)1810.51.004(0.024)0.497(0.016)
40.61.50.598(0.040)1.502(0.027)1901.5−0.001(0.044)1.501(0.029)
50.820.801(0.047)2.000(0.031)200.220.206(0.043)1.997(0.029)
6110.999(0.029)1.001(0.020)210.410.403(0.030)0.998(0.021)
700.2−0.001(0.022)0.201(0.015)220.60.20.598(0.024)0.201(0.016)
80.20.50.200(0.021)0.500(0.014)230.80.50.801(0.024)0.500(0.016)
90.41.50.404(0.033)1.498(0.023)2411.50.996(0.038)1.503(0.025)
100.620.603(0.044)1.999(0.030)25020.001(0.044)2.000(0.029)
110.810.800(0.026)1.001(0.018)260.210.203(0.030)0.998(0.020)
1210.21.002(0.021)0.198(0.014)270.40.20.399(0.021)0.201(0.015)
1300.50.003(0.023)0.499(0.016)280.60.50.604(0.021)0.497(0.014)
140.21.50.206(0.036)1.497(0.024)290.81.50.803(0.032)1.499(0.023)
150.420.401(0.048)2.001(0.033)30121.001(0.047)2.000(0.031)
Table 4:

SSE of the initial and updated estimation of m. The minimum SSE is shown in bold.

SSE of m˜
h2025303540
9.11127.55096.70246.34186.3653
SSE of m˜
hh2025303540
200.61450.59360.59250.60780.6388
220.57620.55630.55610.57230.6042
240.54530.52650.52720.54430.5771
260.52040.50260.50430.52230.5561
280.50050.48380.48660.50560.5403
300.48500.46950.47330.49340.5291
320.47350.45920.46410.48520.5220
340.46570.45270.45870.48090.5188
360.46120.44960.45680.48020.5192
380.46010.44980.45830.48290.5231
400.46220.45330.46310.48890.5303
Table 5:

SSE of the estimation of m in the dataset with same parameters in each individual. The minimum SSE is shown in bold.

h2030405060
SSE0.68810.49360.44420.49260.6337

We use cross-validation to get a data-driven choice of the bandwidths. Here, we set K=5 to get a mean MSPE of every different bandwidth choices of both steps over 100 runs, and the optimal bandwidths are those with the minimum mean MSPE. The mean MSPE values are shown in Table 6, from which we can see that the smallest value is located at h=25,h=36, which is quite close to the optimal bandwidths h=25 and h=38 in Table 4. Therefore, the cross-validation idea appears to work well in terms of selecting the best bandwidths.

Table 6:

Mean MSPE of the 5-fold CV over 100 times. Here, we multiply T for all the MSPE values and subtract the minimum 625.61956.

hh2025303540
200.2937330.2937280.2937240.2937220.293721
220.2229480.2229430.2229400.2229390.222939
240.1658160.1658130.1658100.1658090.165809
260.1192530.1192500.1192480.1192480.119248
280.0816030.0816000.0815990.0815980.081599
300.0522970.0522950.0522940.0522940.052295
320.0302560.0302550.0302540.0302550.030256
340.0146210.0146200.0146200.0146210.014622
360.0047610.0047600.0047600.0047610.004763
383.4e0805.6e072.0e064.2e06
400.0005900.0005900.0005920.0005930.000596

5 Discussion

This paper proposes a semi- and nonparametric model suitable for analyzing the mass spectra data. The model is flexible and intuitive, capturing the main feature in the MS data. Both the parametric and nonparametric components have natural interpretation. A multi-step iterative algorithm is proposed for estimating both the individual location and scale regression coefficients and the nonparametric function. The algorithm combines local linear fitting and the least squares method, both of which are easy to implement and computationally efficient. Both simulation studies and real data analysis demonstrate that the proposed multi-step procedure works well.

The local linear fitting for the nonparametric function estimation maybe replaced with other nonparametric estimation techniques. Because the location and scale parameters are subject specific, the empirical Bayes method [19] may be used. In addition, nonparametric Bayes may also be applicable with the nonparametric function being modeling as a realization of Gaussian process.

The proposed model and the associated iterative estimation method do not account for the random error in the measurement of X. It is desirable to incorporate the measurement error into the model [20].

Many studies involving MS data are aimed at classifying patients of different disease types. The information of peaks are usually applied as the basis of the classifier. The proposed method provides a natural way to identify the peaks for different group of patients by using the multi-step estimation procedure on each group and finding out the corresponding nonparametric baseline function. From the estimated baseline function, the information of peaks can be easily extracted, which can then be used for classification.

Funding statement: Funding: National Science Foundation, Directorate for Mathematical and Physical Sciences, Division of Mathematical Sciences (Grant / Award Number: “DMS-1308566”) and U.S. Department of Health and Human Services, National Institutes of Health (Grant / Award Number: “R37GM047845”).

Acknowledgement

The research was supported in part by National Science Foundation grant DMS-1308566 and National Institutes of Health grant R37GM047845. The authors thank the editor, the associate editor, and referees for their constructive comments. The authors would also like to thank Liang Zhu and Cheng Wu at Shanghai Changzheng Hospital for providing the data.

Appendix

The Appendix contains proofs of Theorems 1–4, where the proofs of Lemma 1 and Corollary 1 can be found in the supplementary materials. We begin with some notations, which will be used to streamline some of the proofs. Because all asymptotic expansions are derived with xit’s being fixed, we will, for notational simplicity, use E to denote the conditional expectation and Var to denote the conditional variance given xit’s throughout the Appendix. For i=1,,n and t=1,,T, let

Vit=1Tσi(Xit)Witmˉ(xi)s=1TWis2.

A.1 Proof of Theorem 1

Proof. First of all, define W˜it=m˜(xit)m˜¯(xi) to simplify the presentation. By definition, we have the following expansion for βˆi when i2.

(16)βˆiβi=βit=1TW˜it(m(xit)m˜(xit))t=1TW˜it2+t=1TW˜itσi(xit)εitt=1TW˜it2βiDi+Gi.

From Lemma 1 and the proof of Corollary 1, we have

(17)W˜itWit=Op(h2).

Plugging (9) into Di, we have

(18)Di=t=1T[(U1(xit)U¯1(xi))U1(xit)+12μ2m(xit)With2+o(h2)]t=1TW˜it2t=1T{[Wit+O(h2)]U1(xit)+O(h2)[U1(xit)U¯1(xi)]}t=1TW˜it2=t=1T(U1(xit)U¯1(xi))U1(xit)t=1TWit2h2Pit=1TWitU1(xit)t=1TWit2(1+op(1))+o(h2+1Th),

where the last asymptotic expansion follows from (17). Similarly for Gi, we have

(19)Gi=t=1TWitσi(xit)εit(1+O(h2))t=1TW˜it2+t=1T(U1(xit)Uˉ1(xi))σi(xit)εitt=1TW˜it2=t=1TWitσi(xit)εitt=1TWit2(1+op(1)).

We observe that for any i2, U1(xit) is a linear combination of {ε1t,t=1,,T}. Therefore, U1(xit) is independent of {εit,i=2,,n,t=1,,T}. By using the tower property, we have EGi=0. Therefore, βiDi is the only part that contributes to the bias of βˆi. In view of these and Corollary 1, we have the following expansions for the bias and variance terms

E(βˆiβi)=βih2PiβiEt=1T(U1(xit)Uˉ1(xi))2t=1TWit2+o(h2+1Th)=βih2Piβit=1TVar(U1(xit))(1+o(1))t=1TWit2+o(h2+1Th)=βih2Piβi1ThQi+o(h2+1Th),

and

Var(βˆi)=Varβit=1TWitU1(xit)t=1TWit2(1+op(1))+Vart=1TWitσi(xit)εitt=1TWit2(1+op(1)).

Straightforward variance calculation for an independent sum gives

(20)Vart=1TWitU1(xit)=s=1Tt=1TWitω1s(xit)l=1Tω1l(xit)2σ12(x1s).

We have

t=1TWitω1s(xit)s=1Tω1s(xit)=t=1T(m(xit)mˉ(xi))Kh(xitx1s)(ST,2(xitx1s)ST,1)ST,0ST,2ST,12.

We expand m(x) in the neighborhood of point x1s using Taylor’s expansion,

m(xit)=m(x1s)+(xitx1s)m(x1s)+12(xitx1s)2m(x1s)+op((xitx1s)2).

Since the kernel function Kh(xx1s) vanishes out of the neighborhood of x1s with diameter h, we can obtain the following

t=1Tm(xit)Kh(xitx1s)(ST,2(xitx1s)ST,1)ST,0ST,2ST,12=m(x1s)+t=1Tm(x1s)(xitx1s)Kh(xitx1s)(ST,2(xitx1s)ST,1)ST,0ST,2ST,12+t=1T[12(xitx1s)2m(x1s)+op((xitx1s)2)]Kh(xitx1s)(ST,2(xitx1s)ST,1)ST,0ST,2ST,12=m(x1s)+Op(h2)t=1TKh(xitx1s)(ST,2(xitx1s)ST,1)ST,0ST,2ST,12=m(x1s)+Op(h2),

where the functions ST,k,k=0,1,2 are evaluated at the point xit. Combined with mˉ(xi)=mˉ(x1)+Op(T1/2), we can have the expansion

t=1TWitω1s(xit)s=1Tω1s(xit)=m(x1s)+Op(h2)mˉ(x1)+Op(T1/2)=W1s+Op(h2+T1/2)

Then recall (20), we have Var(t=1TWitU1(xit))=t=1TW1t2σ12(x1t)+op(T), which leads to the variance expansion

Var(βˆi)=βi2t=1TW1t2σ12(x1t)(t=1TWit2)2+t=1TWit2σi2(xit)(t=1TWit2)2+op(1T).

A.2 Proof of Theorem 2

Proof. Recall (7) and (8), we have

i=1nt=1Tωit(x)(xitx)=i=1nβˆi2ST,2(i)i=1nβˆi2ST,1(i)i=1nβˆi2ST,1(i)i=1nβˆi2ST,2(i)=0.

Then we have the asymptotic expansion of the updated estimator of baseline intensity mˆ() at time point × as follows.

By definition of mˆ() in (6), we can write

(21)m^(x)m(x)=i=1nt=1Tωit*(αiα^i)/β^ii=1nt=1Tωit*+i=1nt=1Tωit*σ(xit)εi/β^ii=1nt=1Tωit*+i=1nt=1Tωit*m(xit)(βiβ^i)/β^ii=1nt=1Tωit*+i=1nt=1Tωit*(12m(x)(xitx)2+o((xitx)2))i=1nt=1Tωit*.

From the proof of Theorem 1, we have

(22)βˆi=βiβih2Piβit=1T(U1(xit)Uˉ1(xi))U1(xit)t=1TWit2βit=1TWitU1(xit)t=1TWit2(1+op(1))+t=1TWitσi(xit)εitt=1TWit2(1+op(1))+o(h2+1Th).

Then, from the least square expression, we have the asymptotic expansion for αˆi as follows.

(23)α^i=y¯iβ^im˜¯(xi)=αi+βim¯(xi)+ε¯iβ^i[m¯(xi)+μ22m¯(xi)h2+U¯1(xi)+o(h2)]=αi+βih2Ri+m¯(xi)βit=1T(U1(xit)U¯1(xi))U1(xit)t=1TWit2+o(h2+1Th)+t=1TVitεit(1+op(1))βit=1TVitU1(xit)(1+op(1)).

Now, we plug the above asymptotic expansions (22) and (23) into the right hand side of (21). The first part of (21) could be expanded as follows

(24)i=1nt=1Tωit(αiαˆi)/βˆii=1nt=1Tωit=i=1nt=1T(αiαˆi)/βˆiβˆi2Kh(xitx)i=1nβˆi2ST,2(i)(xitx)i=1nβˆi2ST,1(i)i=1nt=1Tβˆi2Kh(xitx)i=1nβˆi2ST,2(i)(xitx)i=1nβˆi2ST,1(i)=i=1n(αiαˆi)βˆit=1TKh(xitx)i=1nβˆi2ST,2(i)(xitx)i=1nβˆi2ST,1(i)i=1nβˆi2t=1TKh(xitx)i=1nβˆi2ST,2(i)(xitx)i=1nβˆi2ST,1(i).

The numerator of (24) has expansion

(25)i=1n(αiαˆi)βˆi[Tf(x){1+op(1)}i=1nβˆi2Th2f(x)μ2{1+op(1)}Thhf(x)μ2+Oph2+1Thi=1nβˆi2Thhf(x)μ2+Oph2+1Th]=T2h2i=1n(αiαˆi)βˆif(x)\lef{1+op(1)i=1nβˆi2f(x)μ2{1+op(1)}hf(x)μ2+Oph2+1Thi=1nβˆi2hf(x)μ2+Oph2+1Th]=T2h2i=1n(αiαˆi)βˆif(x)i=1nβi2f(x)μ2{1+op(1)},

where the last equation following from βˆi=βi+O(h2)+O((Th)1)+Op(T1/2).

Similarly, the denominator of (24) has the following expansion

(26)i=1nβˆi2[Tf(x){1+op(1)}i=1nβˆi2Th2f(x)μ2{1+op(1)}Thhf(x)μ2+Oph2+1Thi=1nβˆi2Thhf(x)μ2+Oph2+1Th]=T2h2i=1nβˆi2[f(x){1+op(1)}i=1nβˆi2f(x)μ2{1+op(1)}hf(x)μ2+Oph2+1Thi=1nβˆi2hf(xi)μ2+Oph2+1Th]=T2h2i=1nβˆi2[f(x)i=1nβi2f(x)μ2{1+op(1)}].

Then combining the expansions (25) and (26), we have the following expansion for the first part of (21).

i=1nt=1Tωit(αiαˆi)/βˆii=1nt=1Tωit=T2h2i=1n(αiαˆi)βˆif(x)i=1nβi2f(x)μ2{1+op(1)}T2h2i=1nβˆi2f(x)i=1nβi2f(x)μ2{1+op(1)}=i=2n(αiαˆi)βii=1nβi2(1+op(1)).

For other parts of (21), we can apply the same techniques for expansion. As a result, the following expansion of mˆ holds.

m^(x)m(x)=i=2nβi(αiα^i)i=1nβi2(1+op(1))+i=1nβit=1TKh*(xitx)εit(1+op(1))i=1nβi2Tf(x)+i=1nβi(βiβ^i)t=1TKh*(xitx)εit(1+op(1))i=1nβi2Tf(x)=i=2nβi2[h2Ri+m¯(xi)t=1T(U1(xit)U¯1(xi))2/t=1TWit2t=1TVitU1(xit)(1+op(1))]i=1nβi2+i=1nβit=1TKh*(xitx)εit(1+op(1))i=1nβi2+m(x){i=2nβi2h2Pii=1nβi2+i=2nβi2t=1T(U1(xit)U¯1(xi))2/t=1TWit2i=1nβi2f(x)}+m(x){i=2nβi2t=1TWitU1(xit)(1+op(1))/t=1TWit2i=1nβi2}+m(x)2μ2h*2+o(h2+1Th+h*2).

Then it is straightforward to derive the bias of mˆ(x) as follows

E(m^(x)m=x))=i=2nβi2(h2Ri+(Th)1m¯(xi)=1)i=1nβi2+[i=2nβi2(h2Pi+(Th)1=1)i=1nβi2]m(x)+m(x)2μ2h*2+o(h2+1Th+h*2).

For the variance of mˆ(x), we notice that the error terms {εit,i=1,,n,t=1,,T} are independent, which implies the independence of εit,i=2,,n and U1(xit). Therefore, we have the following asymptotic expansion for the variance.

Var(mˆ(x)m(x))=Var(i=2nβi2t=1TVitU1(xit)i=1nβi2+m(x)i=2nβi2t=1TWitU1(xit)/t=1TWit2i=1nβi2)+i=1nt=1TβiKh(xitx)εiti=1nβi2t=1TKh(xitx)+o1T+1nTh=Vari=2nβi2t=1T(Vit+m(x)Wit/t=1TWit2)U1(xit)i=1nβi2+ν0i=2nβi2f1(x)σi2(xit)Th(i=1nβi2)2+o1T+1nTh,

where the expansions follow similar techniques as (25) and (26). Now, by the definition of U1, we have

Var(mˆ(x)m(x))=1(i=1nβi2)2s=1Ti=2nβi21T+Zis2σ12(x1s)+ν0i=2nβi2f1(x)σi2(xit)Th(i=1nβi2)2+o1T+1nTh.

A.3 Proof of Theorem 3

Proof. From the results of Theorem 2, it is straightforward to show that the order of the mean squared error of mˆ(x) is h4+(T2h2)1+h4+T1+(nTh)1. To minimize the mean squared error, we can taken h=O(T13) and h=O((nT)15). Under such choices of h and h, the order of the mean squared error is (nT)45+T1.

Therefore, to match the optimal nonparametric convergence rate (nT)45 for mean squared error, the condition n=O(T14) is required.□

A.4 Proof of Theorem 4

Proof. We start from the asymptotic expansion from (22) in the proof of Theorem 2. First, we investigate the asymptotic behavior of the third term on the right hand side of (22).

As a first step, we have

(27)Vart=1T(U1(xit)Uˉ1(xi))28Et=1TU1(xit)22.

Now, following the definition of U1() and applying the same expansion of ω1s(xit) as in the proof of Theorem 1,

E([t=1TU1(xit)2]2)=E([t=1T[s=1TKh(xitx1s)σ1(x1s)ε1sTf(xit)]2]2)(1+o(1))1T4E(s,u=1T(t=1TKh2(xitx1s)f2(xit)I{|x1sx1u|<2h})2σ12(x1s)σ12(x1u))(1+o(1)),
1T4Es,u=1Tt=1TKh2(xitx1s)f2(xit)I{|x1sx1u|<2h}2σ12(x1s)σ12(x1u)(1+o(1)),

where the last inequality follows from exchanging the summation order and the property of the kernel function K(). Observe that f() is bounded from below by Condition C2, the following inequality sequence is obtained.

Et=1TU1(xit)221T4δ4Es,u=1TTν0f(x1s)h2I{|x1sx1u|<2h}σ12(x1s)σ12(x1u)(1+o(1))
O(1)T2h2s,u=1TI{|x1sx1u|<2h},

where the last term has the order of O(h1) by noticing

s,u=1TI{|x1sx1u|<2h}=s=1T4Thf(x1s).

We can also derive the order of the variance for the other two terms,

Varβit=1TWitU1(xit)t=1TWit2+t=1TWitσi(xit)εitt=1TWit2=O(T1).

Due to the relationship of h and T, the third term is negligible when calculating the asymptotic variance. Then, the expansion for the bias of βˆi can be rewritten as follows

βˆiβi+βi(h2Pi+1ThQi)=βis=1TW1sσ1(x1s)ε1st=1TWit2+t=1TWitσi(xit)εitt=1TWit2(1+op(1))+o(h2+1Th),

where the right hand side is an independent sum of random variables with their variances being of the same order, O(T1). As a result, the central limit theorem can be applied directly for βˆi.

T[βˆiβiβi(h2Pi+1ThQi)]dN(0,σi2),

where the asymptotic variance σi2 is finite with the following expression.

σ2=limTT1t=1TWit2σi2(xit)(T1t=1TWit2)2+βi2T1t=1TW1t2σ12(x1t)(T1t=1TWit2)2.

Notice that if the order of h is between T12 and T14, then βˆi is asymptotic unbiased since Tβi(h2Pi+1ThQi)d0.

From Theorems 1 and 2 we have βˆi,αˆi,mˆ() are consistent estimators of βi,αi,m(), respectively. Thus, σˆi2=1Tt=1T(yitαˆiβˆimˆ(xit))2 is also consistent for the variance under the assumption that σi() is a constant function for each subject i. □

References

1. Yasui Y, Pepe M, Thompson M, Adam B-L, Wright G, Qu Y, et al. A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. Biostatistics 2003;4(3):449–63.10.1093/biostatistics/4.3.449Search in Google Scholar PubMed

2. Guilhaus M. Principles and instrumentation in time-of-flight mass spectrometry. J Mass Spectrom 1995;30:1519–32.10.1002/jms.1190301102Search in Google Scholar

3. Banks D, Petricoin E. Finding cancer signals in mass spectrometry data. Chance 2003;16:8–57.10.1080/09332480.2003.10554868Search in Google Scholar

4. Baggerly KA, Morris JS, Wang J, Gold D, Xiao L-C, Coombes KR. A comprehensive approach to the analysis of MALDI-TOF proteomics spectra from serum samples. Proteomics 2003;3:1667–72.10.1002/pmic.200300522Search in Google Scholar PubMed

5. Baggerly KA, Morris JS, Coombes KR. Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments. Bioinformatics 2004;20:777–85.10.1093/bioinformatics/btg484Search in Google Scholar PubMed

6. Diamandis EP. Mass spectrometry as a diagnostic and a cancer biomarker discovery tool: opportunities and potential limitations. Mol Cell Proteomics 2004;3:367–78.10.1074/mcp.R400005-MCP200Search in Google Scholar

7. Feng Y, Ma W, Wang Z, Ying Z, Yang Y. Alignment of protein mass spectrometry data by integrated Markov chain shifting method. Stat Interface 2009;2:329–40.10.4310/SII.2009.v2.n3.a6Search in Google Scholar

8. Roy P, Truntzer C, Maucort-Boulch D, Jouve T, Molinari N. Protein mass spectra data analysis for clinical biomarker discovery: a global review. Briefings Bioinf 2011;12:176–86.10.1093/bib/bbq019Search in Google Scholar PubMed

9. Begun J, Hall WJ, Huang WM, Wellner JA. Information and asymptotic efficiency in parametric-nonparametric models. Ann Statist 1983;11:432–52.10.1214/aos/1176346151Search in Google Scholar

10. Robinson PM. Root-N consistent semiparametric regression. Econometrica 1988;55:931–54.10.2307/1912705Search in Google Scholar

11. Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA. Efficient and adaptive estimation for semiparametric models. Berlin: Springer, 1998.Search in Google Scholar

12. Ruppert D, Wand M, Carroll R. Semiparametric regression. Cambridge: Cambridge University Press, 2003.10.1017/CBO9780511755453Search in Google Scholar

13. Fan J. Local linear regression smoothers and their minimax efficiency. Ann Statist 1993;21:196–216.10.1214/aos/1176349022Search in Google Scholar

14. Rosenblatt M. Remarks on Some nonparametric estimates of a density function. Ann Math Statist 1956;27:832–7.10.1214/aoms/1177728190Search in Google Scholar

15. Stone CJ. Consistent nonparametric regression, with discussion. Ann Statist 1977;5:549–645.10.1214/aos/1176343886Search in Google Scholar

16. Stone CJ. Optimal global rates of convergence for nonparametric regression. Ann Stat 1982;10:1040––1053.10.1214/aos/1176345969Search in Google Scholar

17. Fan J, Gijbels I. Local polynomial modelling and its applications. London: Chapman and Hall, 1996.Search in Google Scholar

18. Hardle W, Muller M, Sperlich S, Werwatz A. Nonparametric and semiparametric models. Springer Series in Statistics. Berlin: Springer, 2004.10.1007/978-3-642-17146-8Search in Google Scholar

19. Carlin B, Louis T. Bayesian methods for data analysis, 3rd ed. London: Chapman & Hall, 2008.10.1201/b14884Search in Google Scholar

20. Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement error in nonlinear models: a modern perspective, 2nd ed. London: Chapman and Hall/CRC, June 2006.10.1201/9781420010138Search in Google Scholar

Published Online: 2015-11-3
Published in Print: 2015-11-1

© 2015 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 23.4.2026 from https://www.degruyterbrill.com/document/doi/10.1515/ijb-2014-0066/html
Scroll to top button