Adjustment models for multivariate geodetic time series with vector-autoregressive errors

Boris Kargoll; Alexander Dorndorf; Mohammad Omidalizarandi; Jens-André Paffenholz; Hamza Alkhatib

doi:10.1515/jag-2021-0013

Article Open Access

Adjustment models for multivariate geodetic time series with vector-autoregressive errors

Boris Kargoll , Alexander Dorndorf , Mohammad Omidalizarandi , Jens-André Paffenholz and Hamza Alkhatib

Published/Copyright: May 11, 2021

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Journal of Applied Geodesy Volume 15 Issue 3

Abstract

In this contribution, a vector-autoregressive (VAR) process with multivariate t-distributed random deviations is incorporated into the Gauss-Helmert model (GHM), resulting in an innovative adjustment model. This model is versatile since it allows for a wide range of functional models, unknown forms of auto- and cross-correlations, and outlier patterns. Subsequently, a computationally convenient iteratively reweighted least squares method based on an expectation maximization algorithm is derived in order to estimate the parameters of the functional model, the unknown coefficients of the VAR process, the cofactor matrix, and the degree of freedom of the t-distribution. The proposed method is validated in terms of its estimation bias and convergence behavior by means of a Monte Carlo simulation based on a GHM of a circle in two dimensions. The methodology is applied in two different fields of application within engineering geodesy: In the first scenario, the offset and linear drift of a noisy accelerometer are estimated based on a Gauss-Markov model with VAR and multivariate t-distributed errors, as a special case of the proposed GHM. In the second scenario real laser tracker measurements with outliers are adjusted to estimate the parameters of a sphere employing the proposed GHM with VAR and multivariate t-distributed errors. For both scenarios the estimated parameters of the fitted VAR model and multivariate t-distribution are analyzed for evidence of auto- or cross-correlations and deviation from a normal distribution regarding the measurement noise.

Keywords: Gauss-Helmert model; Gauss-Markov model; vector-autoregressive model; auto-correlations; cross-correlations; multivariate t-distribution; iteratively reweighted least squares; expectation maximization algorithm; accelerometer time series

1 Introduction

Geodetic observation models of surveyed phenomena often include unknown quantities in terms of parameters to be estimated by adjustment computations. The most general structure of observation model is described by non-linear condition equations in which multiple observations and unknown parameters may be linked to each other [1]. This general case of adjustment calculus was called Gauss-Helmert model (GHM) by Wolf [2]. There exist numerous special cases of this model that have been studied extensively, e. g., the errors-in-variables (EIV) model adjusted by the method of total least squares [3] or by its various generalizations [4], [5], [6], [7]. Another special case of the GHM is the Gauss-Markov model (GMM), in which the observations are still linked to functional model parameters but occur in separate equations [8], [9], [10]. Such models may also contain variance-covariance information about the measurements; this information defines the so-called stochastic model. Since adjustment calculus has been elaborated on the basis of matrix calculus, variance-covariance information regarding the observables is usually arranged as a matrix, called the “variance-covariance matrix” (VCM).

Since the dimension of the VCM equals the number of observations, the storage and processing of this matrix in the course of the adjustment computations can be challenging, especially because modern sensors such as accelerometers, laser scanners, laser tracker or satellite-based sensors tend to provide huge data sets consisting of millions or even billions of observations even within relatively short periods of time. It is usually not an option to omit the variance-covariance information within the adjustment computations since the standard parameter estimators lose desirable quality characteristics when this information is neglected. For instance, the least-squares-estimator in the GMM ceases to be the best linear unbiased estimator (BLUE) when the VCM is dropped in the estimation equation [11]. Furthermore, it is often desirable in geodetic data analysis to obtain a realistic description of the uncertainties or variance-covariance information of the estimated parameters, which information is unavailable when an adequate stochastic model of the observables is missing. The “curse of dimensionality” concerning the VCM is aggravated by the increasingly practiced combination of multiple observation series stemming, e. g., from 3D sensors or from sensor networks. Time series of such measurements, e. g., GPS and laser scanner measurements, frequently have been found to be correlated in such a way that the VCM is fully populated, e. g., [12], [13], [17], [16], [14], [15].

To alleviate the computational burden or even computational infeasibility in connection with the storage and processing of a huge VCM, an alternative approach to the stochastic modeling of variance-covariance information can be applied. The goal of that approach is to fully de-correlate the observables by transforming the observation model in such a way that the VCM of the transformed observables becomes a diagonal matrix, e. g., the identity matrix [18]. A diagonal VCM can easily be handled within the adjustment procedure [19], e. g., through vectorization of the occurring matrix products. In geodetic applications the correlations of the observables may frequently be described as colored noise [20], in which case the stochastic model can be expressed as an autoregressive (AR) or autoregressive moving average (ARMA) process and the transformation of the observation equations achieved by means of a computationally efficient digital de-correlation filter [21], [22]. Such processes have also been estimated by means of total least squares based on an EIV model [23]. To model auto- and cross-correlations of multivariate time series, the aforementioned AR and ARMA processes have been extended to vector-autoregressive (VAR) and vector-autoregressive moving average (VARMA) processes [24]. The model order of these recursive processes can be specified to define how far the correlations reach into the past. Thus, VAR(MA) processes can be used to model quite detailed patterns of auto- and cross-correlations. The innovation of the current contribution is to incorporate VAR processes into the GHM as colored noise models.

The white-noise components of (V)AR or (V)ARMA processes are usually assumed to be normally distributed. However, this assumption may be unrealistic or at least questionable in a practical situation, e. g., due to numerous outliers afflicting the measurements. It may then be safer to replace the normal distribution by a larger family of distributions defined by probability density functions having thicker tails than the “ideal” normal distributions. For instance, the usage of generalized t-distributions and of scaled t-distributions has been proposed in connection with ARMA processes [25]. The scaled, multivariate t-distributions have also been applied in connection with VAR processes within rather simple GMM [26], [27]. A key feature of using multivariate t-distributions is that the associated degree of freedom (df) can be estimated alongside the functional parameters, the VAR coefficients and the scale or cofactor matrix; such a method may be called a self-tuning estimator [28] since it does not involve a tuning constant for classifying the in- and outliers, in contrast to for instance Huber’s M-estimator [29]. The df provides evidence of deviations of the measurements’ actual probability distribution from a normal distribution: For large df the multivariate t-distribution is similar to a multivariate normal distribution, whereas the t-distribution has much thicker tails than the latter for a small df, which may indicate the presence of numerous outliers.

The estimation procedure is implemented as an iteratively reweighted least-squares (IRLS) method. The weights are treated as random variables, which are numerically determined by the expectation step within an expectation maximization (EM) algorithm [30]. The various types of model parameters are estimated group-by-group within the maximization step. To deal with the condition equations within the GHM, constrained EM in the spirit of [31] is applied. Due to the down-weighting of observations with extreme errors,^[1] this method can be expected to yield a robust estimator for the GHM. Koch introduced the GHM with t-distributed errors and subsequently extended the model to include variance components [32], [33]. This type of GHM is now extended to include a VAR model with multivariate-t-distributed errors. This yields an approach complementary to that of modeling a small number of outliers as additive, unknown offsets within the afflicted observations, as done in [34], [35]. Therefore, the proposed adjustment model is directed at applications in which rather large numbers of outliers can be expected. To model the auto- and cross-correlations, we do not employ the more general VARMA processes since we found the estimation of the moving average part by means of an EM algorithm to diverge.

The paper is structured as follows. In Section 2, we specify the multivariate observation model, the correlation model, and the stochastic model, which jointly define the GHM with VAR and multivariate t-distributed errors. In Section 3, we formulate the optimization principle employed to adjust this model, and we derive the normal equations to be solved within the EM algorithm. We show that this algorithm estimates the functional parameters, the VAR coefficients and the parameters defining the multivariate t-distribution essentially via IRLS. The Monte Carlo simulation described in Section 4 serves the purpose of validating the algorithm, by showing the biases to be expected in the practical situation of data approximation by a circle based on a GHM with VAR and multivariate t-distributed errors. In Section 5 we demonstrate the application of the algorithm for the adjustment of a real, multivariate accelerometer data set, whose offset and drift are modeled as part of a GMM (viewed as a special case of the GHM) with VAR and multivariate t-distributed errors. In a second applied scenario real laser tracker measurements with outliers are adjusted to estimate the parameters of a sphere employing the proposed GHM with VAR and multivariate t-distributed errors. Finally, some limitations and consequently potential extensions of the presented methodology to be explored in the future are outlined in the conclusions and outlook.

2 Adjustment models for multivariate time series

The purpose of this section is to develop adjustment models suitable for the fitting of multivariate geodetic time series. According to the standard categorization in geodetic adjustment theory these models have two main components: the deterministic model and the stochastic model. The following sub-sections demonstrate some possible structures of these models that have been found useful in geodetic time series analysis. These structures allow for certain generalizations of the standard GHM and GHM by allowing their random deviations to be auto- and cross-correlated through (V)AR processes while potential deviations from a normal distribution are modeled stochastically by a multivariate t-distribution.

2.1 Structures of the deterministic observation model

2.1.1 Deterministic part of the Gauss-Helmert model

We suppose that N time series are observed simultaneously without gaps, so that each series consists of n measurements. The measurements can then be doubly indexed in the form ℓ k , t with k = 1 , … , N and t = 1 , … , n, where k represents the group index and t a time-related index. We further suppose that the measurements are used to estimate a specified parameter-dependent functional model, which describes spatial objects possibly varying throughout time. In the most general case, each observation is modeled as the sum of an observation-specific location parameter μ k , t and a corresponding random deviation or “error” e k , t , that is,

(1) ℓ k , t = μ k , t + e k , t ( k = 1 , … , N ; t = 1 , … , n ) .

Each type of quantity in these equations can be vectorized in different ways to obtain more compact notations. The rule will be that all column vectors are symbolized by bold-faced, lower-case Greek letters (in case of unknown parameters) or Roman letters. Firstly, the quantities of one type occurring throughout the time series k = 1 , … , N may be collected for every time instance t = 1 , … , n via

(2) ℓ t = ℓ 1 , t ⋮ ℓ N , t , μ t = μ 1 , t ⋮ μ N , t , e t = e 1 , t ⋮ e N , t ,

to define the multivariate observation time series

(3) ℓ t = μ t + e t ( t = 1 , … , n ) .

Secondly, the observations, location parameters and random deviations can be grouped series-by-series for k = 1 , … , N according to

(4) ℓ k : = ℓ k , 1 ⋮ ℓ k , n , μ k : = μ k , 1 ⋮ μ k , n , e k : = e k , 1 ⋮ e k , n ,

to define the observation series

(5) ℓ k : = μ k : + e k : ( k = 1 , … , N ) .

Note that the additional colons are used to distinguish the two types of vectors, e. g., the observations ℓ 1 at time instance t = 1 and the observations ℓ 1 : within time series k = 1. Finally, it will be convenient to combine the vectors indexed by time to the “complete” vectors

(6) ℓ = ℓ 1 ⋮ ℓ n , μ = μ 1 ⋮ μ n , e = e 1 ⋮ e n ,

so that (3) turns into the single vector equation

(7) ℓ = μ + e .

Note that the relationship μ = ℓ − e obtained from (7) would be written as ℓ ˆ = ℓ + v in the usual notation of the GHM, where ℓ ˆ are the (unknown) “adjusted observations”. The choice of terminology (“location parameter”) and notation ( μ) is motivated by our intention to describe the observations in such a way that maximum likelihood (ML) estimation via an EM algorithm can easily be established.

Note also that the vectors ℓ t , μ t and e t can be retrieved from the corresponding complete vectors ℓ, μ and e via

(8) ℓ t = J t ℓ , μ t = J t μ , e t = J t e ,

where the ( N × N t )-matrix J t consists of rows

N · ( t − 1 ) + 1 , … , N · t

of the ( N n )-dimensional identity matrix I N n . This matrix can be written as the block matrix

(9) J t = 0 N ⋯ 0 N I N 0 N ⋯ 0 N ( t = 1 , … , n )

beginning with t − 1 square zero matrices 0 N of dimension N, followed by the N-dimensional identity matrix I N , and ending with n − t further zero blocks 0 N .

In practice the goal usually is to fit a specific functional model to the given measurements, e. g., a circle. Such a model generally depends on unknown parameters β = [ β 1 ⋯ β m ] T , e. g., the center coordinates and the radius of a circle. To incorporate this additional model structure it is assumed that the functional parameters β and the location parameters μ fulfill condition equations

(10) h 1 ( β , μ ) = 0

(11) ⋮

(12) h r ( β , μ ) = 0 ,

which can be written in vectorized form as

(13) h ( β , μ ) = 0 .

This equation and (7) form the deterministic part of the GHM, which was defined similarly in [32]. The given somewhat unusual notation, especially the introduction of location parameters, in the setup of the GHM is more aligned with the method of constrained ML estimation than with the usual method of LS. The former estimation method will be more suitable than the latter in connection with the estimation of VAR models and usage of a non-Gaussian pdf.

2.1.2 Deterministic part of the Gauss-Markov model

With certain kinds of adjustment problems each location parameter μ k , t can be expressed directly as the value of a function f k , t depending on the functional parameters β, through the relationship

(14) μ k , t = f k , t ( β ) .

Then, the observation equations (1) take the form

(15) ℓ k , t = f k , t ( β ) + e k , t ( k = 1 , … , N ; t = 1 , … , n ) .

Vectorizing the (unknown) function values f k , t ( β ) similarly as in (2) and (4), these observation equations can be arranged by time as

(16) ℓ t = f t ( β ) + e t ( t = 1 , … , n ) ,

or by time series as

(17) ℓ k : = f k : ( β ) + e k : ( k = 1 , … , N ) ,

or collectively, by vectorizing f t ( β ) for t = 1 , … , n as in (6), as

(18) ℓ = f ( β ) + e .

This structure of observation equation defines the GMM. Using (14), the condition equations (13) are then simply taken as h ( β , μ ) = μ − f ( β ) = 0. In adjustment computations it is more common to state the observation equations in the equivalent form

(19) ℓ + v = f ( β ) ,

where the “corrections” v are related to the random deviations e via sign reversal. In view of the additional stochastic model imposed on the observation model later on, the representation (18) is preferred in this paper. In case the function f can be written as a matrix-vector product X β, we speak of a “linear model”, where X is called the “design matrix”; we then write

(20) ℓ = X β + e

instead of (18). Linear models have been studied thoroughly in mathematical statistics and geodesy [36], [18]. The parameters β can be estimated without any additional assumptions, e. g., by applying “ordinary LS”. However, if the observables are correlated or heteroscedastic, the LS estimator will not be BLUE if the correlations and unequal variances present are not taken into account by incorporating a corresponding stochastic model into the observation model and the estimator.

2.2 Stochastic modeling

2.2.1 Modeling of correlations using vector-autoregressive processes

The simplest case of a stochastic model occurs for uncorrelated observables with identical variances, so that the VCM Σ is the rescaled identity matrix σ 2 I, where the variance factor σ 2 may be a known number or an unknown parameter. Geodesists have frequently pointed out that observables are usually correlated (e. g., [37]) and that it is of great practical importance to take correlations into account in geodetic data analysis (e. g., [38]). In case of correlated measurements the VCM Σ typically is either fully populated or has a band-limited structure (see [21]). Its decomposition therefore yields Σ = σ 2 Q, where the so-called “cofactor matrix” Q has the same population characteristics as Σ. When a univariate time series is surveyed the correlations between the various random variables are called “auto-correlations”. Oftentimes these can be modeled by means of an AR process

(21) E t = α 1 E t − 1 + ⋯ + α p E t − p + U t

for some p ∈ N (the “model order” of the AR process) and some α 1 , … , α p (the “coefficients” of the AR process). ( U t ) t ∈ Z is a white noise series that recursively generates an auto-correlated time series ( E t ) t ∈ Z (see, e. g., [42] for details). In case of multivariate observables, additional correlations between the random variables of different time series, so-called “cross-correlations”, may occur. To take both auto- and cross-correlations into account, an error process ( E t ) t ∈ Z can be defined to be a VAR(p) process by the equation (cf. Sect. 8.4 in [24])

(22) E t = A 1 E t − 1 + ⋯ + A p E t − p + U t ,

where ( U t ) t ∈ Z is a multivariate form of white noise. The coefficients of this VAR process are now arranged within the ( N × N )-matrices

(23) A j = α j , 1 , 1 ⋯ α j , 1 , N ⋮ ⋱ ⋮ α j , N , 1 ⋯ α j , N , N ( j = 1 , … , p ) .

2.2.2 Modeling of observational weights using multivariate t-distributions

Another standard decomposition of a the VCM Σ is given in terms of the “weight matrix” P by

(24) Σ = σ 2 P − 1 .

The main diagonal of the VCM Σ consists of the variances of the individual observables, whereas the main diagonal elements of P are commonly referred to as “weights”. The meaning of the weights is twofold in practice: On the one hand, weights may be interpreted as “repetition numbers”, i. e., as the number of times that the corresponding observations have occurred. On the other hand, weights can be assigned to observations in order to diminish or increase their relative importance and influence in statistical inference. Abnormal or extreme observations (“outliers”), located in the tails of the probability density function (pdf) defining their probability distribution, are frequently encountered in geodetic data analysis. One standard approach to handling such observations consists in their “down-weighting” by giving them lesser weights than non-outliers (“inliers”). Alternatively, it is possible to carry out statistical tests in order to identify and delete outlying observations (e. g., [39], [40], [41]). However, the deletion of observations creates data gaps, which prohibit the usage of recursive time series models, such as the VAR processes considered in Sect. 2.2.1. Therefore, we restrict ourselves to the approach based on the down-weighting of outlying observations, which avoids their elimination and consequential data gaps. Since the data sets analyzed nowadays tend to be huge and oftentimes have stochastic properties unknown to the practitioner, it is necessary to have a data-adaptive and computationally efficient procedure for computing weights at hand. Mathematically, such a procedure is based on a function (the “weight function”), which corresponds to a certain pdf of the random deviations.

For the purpose of multivariate modeling including potential cross-correlations, the well-known Student’s t-distribution can be generalized to an N-variate t-distribution t ( μ , Ψ , ν ) with location parameter vector μ, scale matrix Ψ and df ν, defined by the pdf (cf. [43])

(25) p ( x ) = Γ ν + N 2 ( ν π ) N det Ψ Γ ν 2 × 1 + ( x − μ ) T Ψ − 1 ( x − μ ) ν − ν + N 2 .

Since the scale matrix Ψ is related to the VCM via Σ = ν ν − 2 · Ψ for df ν > 2, it may be viewed as a “cofactor matrix”. Then, a multivariate t-distributed form of white noise of a VAR process can be defined by

(26) U t ∼ ind . t ( 0 , Ψ , ν )

(see also [44]). The joint pdf of n such white noise vectors U 1 , … , U n is therefore given by

(27) p ( u 1 , … , u n ) = ∏ t = 1 n p ( u t ) = Γ ν + N 2 ( ν π ) N det Ψ Γ ν 2 n × ∏ t = 1 n 1 + u t T Ψ − 1 u t ν − ν + N 2 .

This pdf is more intricate and therefore more difficult to handle in maximum likelihood (ML) estimation than the pdf of a multivariate normal distribution. However, it is well known that a multivariate t-distributed random vector X ∼ t ( μ , Ψ , ν ) has the property of being a conditionally normally distributed random vector X ∼ N ( μ , Ψ / w ) where w is a given realization of a gamma-distributed random variable W ∼ G ν / 2 , ν / 2 W (cf. [43]). Due to this property, the distributional model (26) can be replaced by the assumptions

(28) U t ∼ ind . N ( 0 , Ψ / w t )

and

(29) W t ∼ ind . G ν 2 , ν 2 .

It can be seen in (28) that the white noise vector U t has the VCM Σ t = Ψ / w t , which is a rescaled version of the cofactor matrix Ψ. For a small value w t , the cofactor matrix Ψ may be viewed as being “inflated”; in the univariate case N = 1, this rescaling approach therefore has been called “variance inflation model” (e. g., [41]). The idea of this type of outlier model is to assign a relatively large variance to an error which is located in the tails of the pdf, in comparison to an error near the center of the pdf.

Under the distributional assumptions (28)–(29) the joint pdf of white noise vectors U 1 , … , U n and the random variables W 1 , … , W n assigned to n measurements can be shown to result in

(30) p ( u 1 , … , u n , w ) = ν 2 ν 2 Γ ν 2 n ∏ t = 1 n w t ν 2 − 1 exp − ν 2 w t × 1 ( 2 π ) N det ( Ψ / w t ) × exp − w t 2 u t T Ψ − 1 u t ,

and (27) defines the corresponding marginal distribution of U 1 , … , U n . Although the joint pdf (30) appears to be even more complicated than the marginal pdf defining the multivariate t-distribution, considerable simplifications can be achieved by taking a conditional expectation of the log-likelihood function defined by this pdf. This step, which exploits the fact that the pdf (30) now involves natural exponential functions exp(.) in contrast to (27), corresponds to the “E-step” of an EM algorithm shown in Sect. 3. We first proceed by defining particular instances of the aforementioned log-likelihood functions which include the deterministic model of the GHM or GMM, as well as the VAR process.

2.3 Representations of the adjustment models as likelihood functions

A likelihood function L can be defined by a pdf by fixing an element of its domain (the observation space) and by letting its parameters be “variables”. The pdf (30) directly involves as parameters the entries of the cofactor matrix Ψ and the df ν. When certain realizations u 1 , … , u n of the white-noise vector are given, they lead to realizations e 1 , … , e n of the VAR process (22). These computational relationships can inversely be written as

(31) u t = e t − A 1 e t − 1 − … − A p e t − p ,

where e 0 , … , e 1 − p are conventionally taken to be zero vectors. Moreover, the realizations e 1 , … , e n of a VAR process occur within the condition equations (3) of the GHM, which can inversely be written as

(32) e t = ℓ t − μ t .

Consequently, (31) can be recast as

(33) u t = I ( ℓ t − μ t ) − A 1 ( ℓ t − 1 − μ t − 1 ) − … − A p ( ℓ t − p − μ t − p ) = I ( ℓ t − μ t ) − A 1 L 1 ( ℓ t − μ t ) − … − A p L p ( ℓ t − μ t ) = ( I − A 1 L 1 − … − A p L p ) ( ℓ t − μ t ) = A ( L ) ( ℓ t − μ t ) .

Here, for brevity of expressions, we defined the so-called lag polynomial A ( L ) (see [24], p. 243) based on the lag operator L, whose powers are used to shift the time index t according to the rule L j Z t : = Z t − j with j ∈ { 1 , 2 , … } (see Chap. 2 in [45]).

Thus, the pdf (30) indirectly depends also on the VAR coefficients in the matrices A 1 , … , A p and on the location parameters μ 1 , … , μ n of the deterministic model in case of a GHM. In the case of the more structured GMM, f t ( β ) is substituted for μ t in (32) and (33) in view of (16), so that the pdf (30) then depends on functional model parameters β rather than on μ 1 , … , μ n .

To fix the dimensions of the parameter space, the kth rows of the VAR coefficient matrices are combined to column vectors of length p · N

(34) α k = [ ( A 1 ) k ⋯ ( A p ) k ] T ( k = 1 , … , N ) ,

which in turn are combined to the single column vector

(35) α = [ α 1 T ⋯ α N T ] T

of length p · N 2 . Similarly, the row-wise vectorization of the inverted cofactor matrix Ψ − 1 yields a column vector of length N 2 , which is denoted by ψ ¯. Then, the parameters that the pdf for the GHM and the GMM with VAR and multivariate t-distributed errors directly and indirectly depend upon can be grouped according to

(36) θ GHM = [ μ T α T ψ ¯ T ν ] T

and

(37) θ GMM = [ β T α T ψ ¯ T ν ] T .

Taking the natural logarithm of the pdf (30) with u t given by (33), the log-likelihood functions of this GHM can be obtained, see (81) in the Appendix. The log-likelihood function for the GHM describes only the observation model, so that the constraints h ( β , μ ) = 0 in (13) need to be provided for in addition when the model parameters θ GHM are estimated. Thus, the complete parameter vector consists of the groups

(38) Φ = [ β T θ GHM T ] = [ β T μ T α T ψ ¯ T ν ] T .

A similar log-likelihood involving multiple (univariate) AR processes instead of a single (multivariate) VAR process was established in [46]. To obtain the log-likelihood function log L ( θ GMM ; ℓ ) for the GMM, which was employed in [26], the location parameters μ t are simply replaced by f t ( β ), in which the aforementioned constraints are omitted since the location parameters μ are not modeled explicitly.

3 Estimation

Maximum likelihood estimation of the model parameters θ GHM based on the log-likelihood function (81) requires values for the unobservable data w. Since a stochastic model has been specified for these quantities, such values can be determined as part of an EM algorithm [30]. A similar approach has previously been carried out for numerous models that can be identified as special cases of the current GHM. In all previous instances the EM algorithm was found to be suitable in view of its stable convergence and its computationally simple form as IRLS (see, e. g., [47], [48]), where the entries of w play the role of the weights. The iterations are indexed by s in the following. In the expectation step (“E-step”) of the EM or IRLS algorithm the weights are computed as conditional expectations of the random variables W based on their stochastic model, using the current solution θ ˆ GHM ( s ) . In the subsequent maximization step (“M-step”) a new solution θ ˆ GHM ( s + 1 ) is computed by maximizing the conditional expectation of the log-likelihood function (“Q-function”) using the current weights.

The Q-function for the GHM with VAR and multivariate t-distributed errors is given by (82) in the Appendix. Evaluation of the conditional expectation of the random weight variables W yields the computational formula

(39) w ˜ t ( s ) = ν ˆ ( s ) + N ν ˆ ( s ) + u t T ( Ψ ˆ ( s ) ) − 1 u t

The tilde on the conditional expectation w ˜ t ( s ) is used to distinguish this “theoretical” type of quantity from estimates. To maximize the resulting Q-function (85) under the constraints the Lagrangian is defined as

(40) F ( ϕ , λ | θ ˆ GHM ( s ) ) = Q ( θ GHM | θ ˆ GHM ( s ) ) + λ T h ( β , μ ) ,

which involves as additional parameters an ( r × 1 )-vector λ of unknown Lagrange multipliers. This approach leads to a “constrained EM algorithm” [31]. In the frequently encountered case that h ( β , μ ) cannot be expressed “linearly” in the form of

(41) X β + B ( μ − ℓ ) + m = 0

for some numerically fixed matrix X, matrix B and vector m, linearization

(42) h ( β , μ ) ≈ X Δ β + B ( μ − ℓ ) + m ‾ .

is applied as shown for instance in [32]. Here, we define the ( m × 1 )-vector of “incremental parameters”

(43) Δ β = β − β ( s ) ,

the ( r × 1 )-vector of “misclosures”

(44) m ( s ) = h ( β ( s ) , μ ( s ) ) ,

the ( r × 1 )-vector of “pseudo-misclosures”

(45) m ‾ ( s ) = m + B ( ℓ − μ ( s ) ) ,

the ( r × m )-Jacobi matrix

(46) X ( s ) = ∂ h ( β ( s ) , μ ( s ) ) ∂ β ,

and the ( r × N n )-Jacobi matrix

(47) B ( s ) = ∂ h ( β ( s ) , μ ( s ) ) ∂ μ .

Whereas Δ β is unknown, the quantities (44)–(47) are computable with the given parameter estimates of iteration step s. For notational brevity, the dependency of the latter quantities on the iteration step will not be indicated in the following; thus, we write m, m ‾, X and B instead of m ( s ) , m ‾ ( s ) , X ( s ) and B ( s ) . Forming now the partial derivatives of the linearized Lagrangian (86) given in the Appendix with respect to the different groups of parameters and setting these equal to zero, one obtains

the normal equations for the incremental functional parameters Δ β:
(48) X T λ = 0 [ m × 1 ] ;
the normal equations for the location parameters μ:
(49) ∑ t = 1 n J ¯ t T w ˜ t ( s ) Ψ − 1 [ ℓ ¯ t − J ¯ t μ ) ] + B T λ = 0 [ N n × 1 ]
denoting
(50) J ¯ t = A ( L ) J t ,

(51) ℓ ¯ t = A ( L ) ℓ t ;
the normal equations for the VAR coefficient matrices A 1 , … , A p :
(52) E W ˜ ( s ) e 1 T ⋮ e n T − E W ˜ ( s ) E T A 1 T ⋮ A p T = 0 N ⋮ 0 N ,
denoting
(53) E = e 0 ⋯ e n − 1 ⋮ ⋮ e 1 − p ⋯ e n − p
with e i = 0 3 × 1 for all i ≤ 0 and letting W ˜ ( s ) be the n-dimensional diagonal matrix with diagonal entries w ˜ 1 ( s ) , … , w ˜ n ( s ) ;
the normal equations for the inverse cofactor matrix Ψ − 1 :
(54) Ψ − 1 n ∑ t = 1 n w ˜ t ( s ) [ A ( L ) ( ℓ t − μ t ) ] [ A ( L ) ( ℓ t − μ t ) ] T = 0 N ;
the normal equation for the df ν (cf. Sect. 5.8.2 [47]):
(55) − dg ν 2 + log ν 2 + 1 + 1 n ∑ t = 1 n log w ˜ t ( s ) − w ˜ t ( s ) + dg ν ( s ) + N 2 − log ν ( k ) + N 2 = 0 ,
involving the digamma function dg;
the normal equations for the Lagrange multipliers λ:
(56) X Δ β + B ( μ − ℓ ) + m ‾ = 0 [ r × 1 ] .

Since the location parameters are obtained from (49) as

(57) μ = ∑ t = 1 n J ¯ t T w ˜ t ( s ) Ψ − 1 J ¯ t − 1 × ∑ t = 1 n J ¯ t T w ˜ t ( s ) Ψ − 1 ℓ ¯ t + B T λ ,

(supposing that the inverse exists), these parameters can be eliminated from (56) via substitution, which results in

(58) B ∑ t = 1 n J ¯ t T w ˜ t ( s ) Ψ − 1 J ¯ t − 1 B T λ + X Δ β = B ℓ − m ‾ − B ∑ t = 1 n J ¯ t T w ˜ t ( s ) Ψ − 1 J ¯ t − 1 × ∑ t = 1 n J ¯ t T w ˜ t ( s ) Ψ − 1 ℓ ¯ t .

Together with (48), this gives the normal equation system

(59) B ∑ t = 1 n J ¯ t T w ˜ t ( s ) Ψ − 1 J ¯ t − 1 B T X X T 0 m λ Δ β = B ℓ − m ‾ − B ∑ t = 1 n J ¯ t T w ˜ t ( s ) Ψ − 1 J ¯ t − 1 ∑ t = 1 n J ¯ t T w ˜ t ( s ) Ψ − 1 ℓ ¯ t 0 [ m × 1 ] .

The parameter groups ( λ , Δ β ), ( A 1 , … , A p ), Ψ − 1 and ν are estimated in that order by successively solving (59), (52), (54) and (55). The parameter group μ can be determined through (57) after solving for ( λ , Δ β ). While equation systems (59), (52) and (54) are linear, the solution of (55) requires a zero search. This step-wise solution approach defines an ECM algorithm, which extends previously established ECM algorithms in simpler models.

Having completed s iterations, the first ECM-step of the new iteration s + 1 yields a new solution

(60) λ ( s + 1 ) Δ β ( s + 1 ) = B ∑ t = 1 n J ¯ t T w ˜ t ( s ) Ψ − 1 J ¯ t − 1 B T X X T 0 m − 1 × B ℓ − m ‾ − B ∑ t = 1 n J ¯ t T w ˜ t ( s ) Ψ − 1 J ¯ t − 1 ∑ t = 1 n J ¯ t T w ˜ t ( s ) Ψ − 1 ℓ ¯ t 0 [ m × 1 ]

from (59), where the unknown matrices Ψ and

J ¯ t = A ( L ) J t = J t − A 1 J t − 1 − ⋯ − A p J t − p

are fixed by substituting (i. e., “conditioning on”) the previous solutions Ψ ( s ) and A 1 ( s ) , … , A p ( s ) . Substituting then in addition λ ( s + 1 ) for the variables λ in (57) gives the new solution

(61) μ ( s + 1 ) = ∑ t = 1 n ( J ¯ t ( s ) ) T w ˜ t ( s ) ( Ψ ( s ) ) − 1 J ¯ t ( s ) − 1 × ∑ t = 1 n ( J ¯ t ( s ) ) T w ˜ t ( s ) ( Ψ ( s ) ) − 1 ℓ ¯ t + B T λ ( s + 1 ) .

Moreover, the new solution

(62) β ( s + 1 ) = β ( s ) + Δ β ( s + 1 )

via substitution into the rearranged equation (43).

The solution ( β ( s + 1 ) , μ ( s + 1 ) ) might not fulfill the constraints h ( β , μ ) = 0 exactly due to linearization errors. Therefore, we apply the principle of constrained EM in the spirit of [31] and iterate the estimation of these parameters until either a maximum number of iteration steps is reached or the maximum absolute misclosure falls below a specified threshold (say, 10 − 14 ). To do this, the quantities m, m ‾, X and B in (44)–(47) are recomputed using β [ 0 ] : = β ( s + 1 ) and μ [ 0 ] : = μ ( s + 1 ) , and new solutions λ [ 1 ] , β [ 1 ] , μ [ 1 ] are computed via evaluation of (60)–(62) by substituting again the currently available estimates Ψ ( s ) and A 1 ( s ) , … , A p ( s ) . When one of the stop criteria is fulfilled, say after t additional iterations, one redefines λ ( s + 1 ) : = λ [ t ] , β ( s + 1 ) : = β [ t ] and μ ( s + 1 ) : = μ [ t ] .

Next, the new solution

(63) A 1 ( s + 1 ) ⋯ A p ( s + 1 ) = e 1 ( s + 1 ) ⋯ e n ( s + 1 ) W ˜ ( s ) × ( E ( s + 1 ) ) T E ( s + 1 ) W ˜ ( s ) ( E ( s + 1 ) ) T − 1

of the VAR coefficient matrices is computed from (52), based on the new correlated residuals

(64) e ( s + 1 ) = ℓ − μ ( s + 1 )

computed from the model (7) and the corresponding residual matrix

(65) E ( s + 1 ) = e 0 ( s + 1 ) ⋯ e n − 1 ( s + 1 ) ⋮ ⋮ e 1 − p ( s + 1 ) ⋯ e n − p ( s + 1 )

with the usual boundary values e i ( s + 1 ) = 0 3 × 1 for all i ≤ 0. The current VAR model defines a new de-correlation filter A ( s + 1 ) ( L ), by which the correlated residuals e ( s + 1 ) are transformed into the white noise residuals

(66) u ( s + 1 ) = A ( s + 1 ) ( L ) e t ( s + 1 )

in view of (33) and (64). Conditional on these values, equation system (54) gives rise to the solution for the cofactor matrix

(67) Ψ ( s + 1 ) = 1 n ∑ t = 1 n w ˜ t ( s ) u ( s + 1 ) ( u ( s + 1 ) ) T .

Finally, ν ( s + 1 ) may be found by a zero search to satisfy (55). It is generally known that the ECM algorithm can be sped up by solving instead the equation

(68) 0 = log ν ( s + 1 ) + 1 − ψ ν ( s + 1 ) 2 + ψ ν ( s + 1 ) + N 2 − log ν ( s + 1 ) + N + 1 n ∑ t = 1 n log w ˜ t ( s + 1 ) − w ˜ t ( s + 1 )

where

(69) w ˜ t ( s + 1 ) = ν ( s + 1 ) + N ν ( s + 1 ) + ( u t ( s + 1 ) ) T ( Ψ ( s + 1 ) ) − 1 u t ( s + 1 ) .

Equation (68) is obtained by defining the log-likelihood function directly on the joint pdf (30) of the white-noise series based on the multivariate t-distribution, and setting its partial derivative with respect to ν equal to zero. This modification of the ECM algorithm has been called ECM either (ECME) [49], [50]. The zero search can be performed with mathematical standard software such as the INTLAB library ([51]) or using a grid search. The estimation of the df concludes the current M-step. The E- and M-step (in the form of an ECME algorithm) are repeated until certain stop criteria are fulfilled, e. g., a combination of a maximum number of iterations and thresholds

for the minimum absolute difference between the current and the previous solution. We defined the threshold d ν by a greater value than the other threshold since the df ν fluctuates quite strongly and is the parameter most challenging to estimate. In the following, the solution resulting from the final iteration step is indicated by using “hats”, that is, β ˆ, μ ˆ, etc. This EM algorithm was tested in the Monte Carlo simulation of Sect. 4.1. In the special case of a GMM with VAR and multivariate t-distributed errors, a similar unconstrained algorithm can be derived [26]. The methodology is applied in the real data study of Sect. 4.2 and Sect. 4.3.

4 Results

4.1 Gauss-Helmert model: Adjustment of simulated data of a 2D circle

In the first part of this section, the performance of the EM algorithm derived in previous Sect. 3 in producing correct estimates within a Monte Carlo (MC) simulation is studied. For this purpose, generated measurements of N = 2 time series are approximated by a 2D circle defined by the nonlinear equation

(70) h t ( β , μ ) = x t − c x 2 + y t − c y 2 − r 2 = 0 .

On the one hand, the vector β of functional parameters consists of the circle center coordinates c x , c y and radius r, whose simulated true values are specified by

(71) β = c x c y r = 1 1 2 .

On the other hand, the vector μ = [ μ 1 T , … , μ n T ] T of location parameters consists of n measured points

μ t = μ x , t μ y , t = x t y t ,

where n was set to 250, 500, 2500 and 5000 in different simulation runs. For greater clarity of notation, we write μ x , t , μ y , t instead of μ 1 , t , μ 2 , t in the sequel. The true point coordinates were defined by

x t = c x + r sin Φ t y t = c y + r cos Φ t

with

Φ t = t · 10 n ( t ∈ { 1 , … , n } ) .

Thus, the measurements are equidistantly distributed along the circle, and one half of the circle was measured twice. The numerical realizations e t = [ e x , t , e y , t ] T of the corresponding random errors are assumed to be determined by a VAR(1) process

(72) e x , t e y , t = 0.5653 − 0.0197 − 0.0431 0.7577 e x , t − 1 e y , t − 1 + u x , t u y , t ,

where random numbers u t = [ u x , t , u y , t ] T for the white-noise components were generated from the multivariate t-distribution

U x , t U y , t ∼ ind . t ( 0 , Ψ , ν ) = t 0 0 , σ 0 2 · Ψ x Ψ x , y Ψ x , y Ψ y , ν

with σ 0 2 = 0.001 2 , Ψ x = 1.0, Ψ y = 4.0 and Ψ x , y = 0.5. The given cofactor matrix reflects the assumption that the entries of U t are pairwise correlated by correlation coefficient ρ = 0.25. Simulations were carried out for the df ν = 4 (referred to as “stochastic model A”) and ν = 7 (“stochastic model B”).

In total 500 MC samples of the white-noise components for stochastic model A and B as well as for the different sample sizes n were randomly generated using MATLAB’s mvtrnd routine. The resulting simulated observations in models A and B were then adjusted by means of the EM algorithm. The maximum number of iterations was set to itermax = 50, and the convergence thresholds to ε = 10 − 8 and ε ν = 0.001. The unknown parameters β and Ψ of the simulated observations of Model A were additionally estimated using the classical least-squares approach to the GHM (“LS”), without iterative re-weighting and without imposing a VAR model (cf. [18]). The comparison between the results of LS and EM algorithm demonstrates the effect of assuming a simplified stochastic model.

4.1.1 Analysis of the simplified stochastic model on the estimation results

The results of this comparison are visualized in Fig. 1 and Fig. 2 by means of box plots. In each box plot, the middle line shows the median. The bottom and the top border of the boxes are the 25^th percentile (Q1) and 75^th percentile (Q3), respectively. The distance Q3-Q1 is the inter-quartile range (IQR). The additional lines above and below the box are the whiskers defined by Q 1 − 1.5 · IQR and Q 3 + 1.5 · IQR, respectively. The red crosses indicate the outliers in the sense that their value is more than 1.5 times the IQR away from Q1 and Q3. The results of Fig. 1 show that the precision of the estimated parameters β ˆ decreases with increasing number of observations. Overall, the mean values of the estimated parameters are similar for LS and EM. However, Fig. 1 shows for numbers of observations less than 5000 that LS produces numerous extreme MC radius estimates outside of the inter-quartile range far away from the true radius, whereas all radius values estimated by EM lie within that range. Thus, EM is significantly more accurate than LS as far as the estimation of the radius is concerned. Figure 2 shows the estimated scale parameters Ψ ˆ for 500 MC runs by means of EM and LS estimation. The number of outliers (red crosses) resulting from the LS estimate is significantly larger than the number of outliers from the EM algorithm. It can be clearly seen that the bias in the EM-estimated scale parameters is much smaller compared to the LS-estimated scale parameters. As the number of observations increases, the bias of the estimated scale parameters decreases and the dispersion becomes smaller. The bias of all estimated parameter can be calculated for each MC run by

(73) Δ Φ ˆ ( i ) = Φ ˆ ( i ) − Φ ( i = 1 , … , 500 ) .

The statistical moments of the calculated bias Δ β and Δ Ψ of the simulations with 5000 observations are presented in Table 1. For the functional parameters β the true value lies in the 95 % confidence interval for all estimates (LS, EM model A and EM model B). In contrast, the true values of Ψ lie outside the estimated 95 %-confidence interval of the estimated cofactor matrix Ψ ˆ when LS is employed. Applying EM, this statement is valid only for the 95 %-confidence interval for Ψ ˆ x under stochastic model A and for Ψ ˆ y under stochastic model B. Here, the biases of the estimator Ψ ˆ with respect to the true simulation parameter values applying EM is less than the corresponding bias when employing LS. The greater deviation concerning the VCM estimated by LS is the result of not taking the correlations and the heavy-tailedness of the noise into account.

4.1.2 Analysis of the degree of freedom on the estimation results

The influence of the df on the estimation results using the EM algorithm is displayed in Fig. 3 and Fig. 4. Figure 3 shows the estimated β ˆ x , estimated scale parameters Ψ ˆ and the differences of the estimated ν to the true values for 500 MC runs by means of EM model A and EM model B estimation. The values of the estimated functional parameters β ˆ under stochastic model A and stochastic model B show no significant differences. It can be clearly seen that the bias in the EM model B estimated scale parameters is much smaller compared to EM model A estimated scale parameters. The number of outliers (red crosses) resulting from the EM model A estimate is significantly larger than the number of outliers from the EM model B algorithm. For both models (A and B) applies that if the number of observations is increased, the bias and the dispersion becomes smaller. The bias of the estimated df ν ˆ under model A is approximately 2.25, which is significantly larger than the bias under model B (approximately 0.44). It can clearly be seen that increasing the number of observations has no significant influence on the bias of the estimated df ν ˆ for both models. The estimated VAR coefficients for both models (EM model A and EM model B) are shown in Fig. 4. The estimated MC solutions for both models show similar spreading behavior around the true value. The number of observations only influences the precision of the estimation but does not lead to a reduction of the bias. The statistical moments of the calculated bias Δ α ˆ and Δ ν ˆ of the simulations with 5000 observations are presented in Table 2. For the VAR parameters only α 1 ; 2.2 lies in the 95 % confidence interval.

Figure 1

MC simulation results of the estimated parameters for stochastic model A. Dash line: true value of simulation.

Figure 2

MC simulation results of the estimated VCM for stochastic model A.

$Figure 3 Comparison of the MC simulation results between the stochastic model A and B. The maximum deviation of Δ ν ˆ\Delta \widehat{\nu } for the stochastic model B with n = 250n=250 is 113.$

Figure 3

Comparison of the MC simulation results between the stochastic model A and B. The maximum deviation of Δ ν ˆ for the stochastic model B with n = 250 is 113.

Figure 4

Comparison of VAR coefficients between the MC simulation results of stochastic model A and B.

Table 1

Statistics of MC simulation results for the bias of the estimated parameters β ˆ and Ψ ˆ. The results were calculated for the simulation with n = 5000 observations. The boundaries of the 95 % interval estimates are defined by the empirical 2.5 and 97.5 percentiles.

	LS Model A	EM Model A	EM Model B
Δ β ˆ c x
mean	3.7 · 10 − 6	3.0 · 10 − 6	1.2 · 10 − 6
std	1.1 · 10 − 4	9.8 · 10 − 5	9.0 · 10 − 5
minimum	− 2.4 · 10 − 4	− 2.4 · 10 − 4	− 2.2 · 10 − 4
maximum	3.5 · 10 − 4	3.1 · 10 − 4	2.5 · 10 − 4
95 % interval	[ − 2.0 · 10 − 4 , 2.1 · 10 − 4 ]	[ − 1.9 · 10 − 4 , 1.9 · 10 − 4 ]	[ − 1.7 · 10 − 4 , 1.8 · 10 − 4 ]
Δ β ˆ c y
mean	− 6.9 · 10 − 7	6.0 · 10 − 7	7.1 · 10 − 6
std	2.0 · 10 − 4	1.9 · 10 − 4	1.7 · 10 − 4
minimum	− 7.1 · 10 − 4	− 6.5 · 10 − 4	− 4.7 · 10 − 4
maximum	5.5 · 10 − 4	5.0 · 10 − 4	4.2 · 10 − 4
95 % interval	[ − 3.9 · 10 − 4 , 4.1 · 10 − 4 ]	[ − 3.8 · 10 − 4 , 3.9 · 10 − 4 ]	[ − 3.3 · 10 − 4 , 3.4 · 10 − 4 ]
Δ β ˆ r
mean	6.3 · 10 − 6	3.3 · 10 − 6	3.8 · 10 − 7
std	9.9 · 10 − 5	9.0 · 10 − 5	8.3 · 10 − 5
minimum	− 3.1 · 10 − 4	− 2.7 · 10 − 4	− 2.1 · 10 − 4
maximum	2.8 · 10 − 4	2.6 · 10 − 4	2.7 · 10 − 4
95 % interval	[ − 1.9 · 10 − 4 , 1.9 · 10 − 4 ]	[ − 1.9 · 10 − 4 , 1.7 · 10 − 4 ]	[ − 1.5 · 10 − 4 , 1.6 · 10 − 4 ]
Δ Ψ ˆ x
mean	1.2 · 10 − 6	3.5 · 10 − 7	− 6.1 · 10 − 8
std	2.8 · 10 − 7	1.6 · 10 − 7	4.8 · 10 − 8
minimum	1.2 · 10 − 7	− 2.5 · 10 − 7	− 1.9 · 10 − 7
maximum	2.9 · 10 − 6	1.6 · 10 − 6	1.1 · 10 − 7
95 % interval	[ 7.0 · 10 − 7 , 1.8 · 10 − 6 ]	[ 1.2 · 10 − 7 , 6.7 · 10 − 7 ]	[ − 1.5 · 10 − 7 , 3.2 · 10 − 8 ]
Δ Ψ ˆ x , y
mean	2.8 · 10 − 7	7.4 · 10 − 8	− 9.2 · 10 − 8
std	4.7 · 10 − 7	2.1 · 10 − 7	6.5 · 10 − 8
minimum	− 3.1 · 10 − 6	− 1.4 · 10 − 6	− 3.1 · 10 − 7
maximum	4.0 · 10 − 6	1.7 · 10 − 6	1.1 · 10 − 7
95 % interval	[ − 4.6 · 10 − 7 , 1.3 · 10 − 6 ]	[ − 3.0 · 10 − 7 , 4.6 · 10 − 7 ]	[ − 2.2 · 10 − 7 , 3.5 · 10 − 8 ]
Δ Ψ ˆ y
mean	4.8 · 10 − 6	− 2.6 · 10 − 7	− 1.4 · 10 − 6
std	9.4 · 10 − 7	3.6 · 10 − 7	1.3 · 10 − 7
minimum	2.8 · 10 − 6	− 9.7 · 10 − 7	− 1.8 · 10 − 6
maximum	1.2 · 10 − 5	2.6 · 10 − 6	− 7.1 · 10 − 7
95 % interval	[ 3.4 · 10 − 6 , 6.7 · 10 − 6 ]	[ − 7.5 · 10 − 7 , 5.1 · 10 − 7 ]	[ − 1.6 · 10 − 6 , − 1.2 · 10 − 6 ]

Table 2

Statistics of MC simulation results for the bias of the estimated parameters ν ˆ and α ˆ. The results were calculated for the simulation with n = 5000 observations. The boundaries of the 95 % interval estimates are defined by the empirical 2.5 and 97.5 percentiles.

	EM Model A	EM Model B
Δ ν ˆ
mean	2.25	0.44
std	0.21	0.21
minimum	1.70	−0.19
maximum	3.12	1.16
95 % interval	[ 1.87 , 2.70 ]	[ 0.03 , 0.89 ]
Δ α ˆ 1 ; 1 , 1
mean	6.6 · 10 − 2	6.3 · 10 − 2
std	1.9 · 10 − 2	2.1 · 10 − 2
minimum	0.8 · 10 − 2	− 2.6 · 10 − 2
maximum	11.7 · 10 − 2	12.4 · 10 − 2
95 % interval	[ 2.9 · 10 − 2 , 10.1 · 10 − 2 ]	[ 2.2 · 10 − 2 , 10.3 · 10 − 2 ]
Δ α ˆ 1 ; 1 , 2
mean	− 4.4 · 10 − 2	− 4.8 · 10 − 2
std	2.4 · 10 − 2	2.4 · 10 − 2
minimum	− 14.2 · 10 − 2	− 11.2 · 10 − 2
maximum	3.9 · 10 − 2	3.0 · 10 − 2
95 % interval	[ − 8.7 · 10 − 2 , − 0.2 · 10 − 2 ]	[ − 9.5 · 10 − 2 , − 0.3 · 10 − 2 ]
Δ α ˆ 1 ; 2 , 1
mean	3.3 · 10 − 2	3.4 · 10 − 2
std	1.0 · 10 − 2	1.0 · 10 − 2
minimum	4.8 · 10 − 3	6.5 · 10 − 3
maximum	6.9 · 10 − 2	6.8 · 10 − 2
95 % interval	[ 1.2 · 10 − 2 , 5.3 · 10 − 2 ]	[ 1.6 · 10 − 2 , 5.2 · 10 − 2 ]
Δ α ˆ 1 ; 2 , 2
mean	− 7.3 · 10 − 4	6.3 · 10 − 4
std	1.3 · 10 − 2	1.4 · 10 − 2
minimum	− 3.4 · 10 − 2	− 3.9 · 10 − 2
maximum	4.6 · 10 − 2	3.5 · 10 − 2
95 % interval	[ − 2.9 · 10 − 2 , 2.4 · 10 − 2 ]	[ − 2.7 · 10 − 2 , 2.8 · 10 − 2 ]

4.2 Gauss-Markov model: Adjustment of 3D MEMS accelerometer data

In the second part of this section, the performance of the EM algorithm elaborated in Sect. 3 (which we also refer to as “VAR-multivariate algorithm” in the following) is studied on real data sets, which were measured by an accelerometer. Cost-effective Micro-Electro-Mechanical-Systems (MEMS) accelerometers have received increasing attention in numerous applications such as vibration analysis of bridge structures, e. g., [52], [53], [54]. In such applications, it is desirable to select a suitable accelerometer by considering, for instance, measurement and sensitivity ranges, uncertainty of measurements, cost, and sampling rate. [55] proposed a two-step scenario to accomplish the suitability analysis for MEMS accelerometers. Firstly, the calibration of the MEMS accelerometers was carried out for different positions and over different temperature ranges. Therefore, the characteristics of the calibration parameters such as biases, scale factors and non-orthogonalities between axes have been inspected during different temperature changes. Secondly, a controlled excitation experiment was conducted using a shaker to compare and validate estimated harmonic oscillation parameters including frequency, amplitude and phase shift with other MEMS accelerometers as well as a high-end reference accelerometer. The second step additionally allowed to check the time synchronization between the MEMS accelerometers. Later on [54] extended the aforementioned two-step scenario to three-step scenario. In the third-step, a static test experiment was accomplished over a long period of time, which allows to estimate an offset and a drift of the measurements as well as auto-correlations and underlying distributional parameters for each axis individually.

As part of this study and to make the third step of the aforementioned three-step scenario more comprehensive, the (unknown) auto- as well as cross-correlations and the (unknown) distributional characteristics of the acceleration measurements recorded from three different axes are analyzed by employing VAR models with t-distributed errors. For this purpose, acceleration measurements were recorded from a cost-effective MEMS accelerometer of type BNO055 with Arduino UNO-Board and 9 axes motion shield (produced from Bosch GmbH), which is a so-called NAMS. The measurements were carried out in three different axes over a period of 24 hours (Fig. 5) with a sampling frequency of 0.3789 Hz. Although it is possible to perform the measurements with a higher sampling frequency (e. g., up to 200 Hz), the aforementioned sampling frequency is specified to speed up the processing.

Figure 5

Raw acceleration data recorded from the MEMS accelerometer of type NAMS (blue) and fitted linear model (red) estimated from VAR-multivariate algorithms.

An observation model consisting of (1) a functional model based on a linear drift with an unknown offset (intercept) and a linear drift coefficient, (2) an auto- and cross-correlation model based on a VAR process with unknown model order and coefficients, and (3) a stochastic model based on the multivariate t-distribution with unknown scale matrix and df. This observation model has the structure of the GMM with VAR and t-distributed errors. The maximum number of iterations is set to itermax = 200, the convergence thresholds to ε = 10 − 12 , and ε ν = 0.001. The model parameters estimated by means of the VAR-multivariate algorithm are then compared with the estimates resulting from the “AR-univariate algorithm” (see [54] and [56] for details) where each axis of the MEMS accelerometers is modeled by an individual AR process. The static acceleration measurements ℓ 1 , … , ℓ n ( n = 32743) are modeled based on a linear drift, which is described by the model equations

(74) ℓ t = f t ( β ) + e t = c 0 + c 1 x t + e t ( t = 1 , … , t n ) .

where the vector β of unknown functional parameters consists of the offset c 0 in [ m / s 2 ] and the linear drift coefficient c 1 in [ m / s 3 ]. The quantities x 1 , … , x n represent equidistant time instances and e 1 , … , e n the random deviations contaminated with colored noise. The preceding observation equations can be written in the form of

(75) ℓ t = X t β + e t ,

where X t are the rows of the design matrix

(76) X = 1 x 1 ⋮ ⋮ 1 x n ,

which has full rank.

Table 3

Statistics of the estimated unknown parameters in the functional model based on the linear drift model, the auto- and cross-correlation model based on the VAR process and the stochastic model based on the centered and scaled t-distribution with an unknown degree of freedom and unknown scale factor from the AR-univariate and VAR-multivariate algorithms.

Sensor	Method	Axis	c ˆ 0	c ˆ 1	σ l ˆ	p	α l ˆ	WNT	ν ˆ
Sensor	Method	Axis	[ m / s 2 ]	[ m / s 3 ]	[ m / s 2 ]	[–]	[–]	[–]	[–]
NAMS	multivar.	x	−0.3598	4.80e-09	0.0224	1	–	yes	2.1
		y	0.2810	−6.11e-08	0.0493		–
		z	9.6752	4.63e-09	0.0534		–
	univar.	x	−0.3597	6.81e-09	0.0066	1	−6.2927e-04	yes	16.87
	univar.	y	0.2811	−6.12e-08	0.0132	1	0.0013	yes	120
	univar.	z	9.6746	6.65e-09	0.0141	1	0.0087	yes	120

Table 3 summarizes the statistics of the estimated parameters from the VAR-multivariate and AR-univariate algorithms based on the GMM. c ˆ 0 [ m / s 2 ] is the estimated offset and c ˆ 1 the estimated linear drift coefficient [ m / s 3 ]. p is the VAR model order in case of the VAR-multivariate algorithm and the AR model order in case of the AR-univariate algorithm. WNT stands for “white-noise test” and indicates the acceptance (“yes”) or rejection (“no”) of the multivariate portmanteau test introduced by [57] and adapted by [26]. σ l ˆ is the estimated standard deviation of the white noise components. In this study, the multivariate portmanteau test is applied to select an adequate VAR model order by testing the residuals of the VAR model if they are uncorrelated or not. Therefore, the p-value ( p v) is calculated based on different VAR model orders p = 1 , … , 10. Beforehand, the significance level is defined as α = 0.05, and the maximum lag is set to 20 similar to the research of [26]. The WNT is accepted if p v > 0.05, which corresponds to the acceptance of the white noise hypothesis, otherwise it is rejected. To estimate the df of the underlying t-distribution a zero search based on the reliable interval Newton method as described in [58] is performed by using the INTLAB library [51]. The estimated unknown parameters including offset and drift in both VAR-multivariate and AR-univariate algorithms are rather similar to each other and no significant differences are observable. The df estimate initially was below 2 with the application of the VAR-multivariate algorithm, so that the covariance matrix Σ of the white noise is theoretically undefined. We also found some numerical instability with the estimation of the parameters of the scale matrix, so that we set the estimate ν ˆ to the small but numerically stable value 2.1 and repeated the algorithm with this fixed df. In contrast, the AR-univariate algorithm produced much larger df estimates (16.87, 120, and 120) for the individual axes. The value 120 is the maximum value of the zero search interval; if the zero is not found within the search interval, it is automatically set to the threshold 120 (which happened in the present case). This threshold indicates the fitted t-distribution already is very close to a normal distribution. We may therefore conclude that the combination of a VAR process with a multivariate t-distribution yields quite different results than the three individual AR processes with univariate t-distributions, which ignore the cross-correlations in the noise. The reason for why the latter combination leads to increased df estimates needs to be investigated further. The standard deviations of the white noise components for both VAR-multivariate and AR-univariate algorithms are estimated, see Table 3. As it could be seen, the uncertainties in the x-axis of the NAMS acceleration data are less than for the two other axes.

Fig. 6 (a) illustrates the estimated p-values for different VAR model orders. The estimated p-values are all above the significance level, and thus, the VAR model order 1 is selected. Fig. 7 (b) shows the calculated test values that are compared with the cumulative distribution function of the chi-square distribution ( F χ 2 ( 4 2 · ( 20 − p ) ) ) with d f = 4 2 · ( 20 − p ) [26]. As it could be seen, all test values are below the critical values, and thus, the WNT are accepted.

$Figure 6 Results of the portmanteau test (p-value) for VAR model orders p = 1 , … , 10p=1,\dots ,10.$

Figure 6

Results of the portmanteau test (p-value) for VAR model orders p = 1 , … , 10.

$Figure 7 Results of the test values for VAR model orders p = 1 , … , 10p=1,\dots ,10.$

Figure 7

Results of the test values for VAR model orders p = 1 , … , 10.

The estimated VCM (Σ) and its corresponding correlation matrix ( R) based on the VAR-multivariate algorithm are

(77) Σ ˆ = 0.000501 2.0464 e − 05 9.7799 e − 06 2.0464 e − 05 0.002429 − 4.8095 e − 05 9.7799 e − 06 − 4.8095 e − 05 0.002862

(78) R ˆ = 1 0.018551 0.008167 0.018551 1 − 0.018241 0.008167 − 0.018241 1

The latter matrix contains the correlation coefficients between the different axes. As it could be seen, the correlation coefficients are very small and are negligible. Additionally, the coefficients of the VAR model order 1 is arranged within the (3×3)-matrix

(79) A ˆ 1 = 5.7887 e − 04 − 0.0012 0.0032 0.0090 − 0.0020 − 0.0090 − 0.0120 − 2.0993 e − 04 0.0083 ,

according to the definition of the VAR coefficient matrix (23). As it could be seen, the VAR model coefficients corresponding to y and z axes are slightly greater than those for the x-axis. In addition, the estimated colored and white noise residuals obtained from NAMS acceleration data are illustrated in Fig. 8. The subtraction of the aforementioned residuals are provided (Fig. 9) to have a better realization of the amplitude of the colored noise residuals at each axis. As it could be seen, the y and z-axes of the NAMS accelerometer have stronger colored measurement noise in comparison to the x-axis.

Figure 8

Representation of the estimated colored and white noise residuals (x, y, and z axes in a sequence) obtained from NAMS acceleration data for VAR model order 1.

Figure 9

Differences of the estimated residuals (x, y, and z axes in a sequence) (b).

The proposed approach in this study assists us to select a proper MEMS accelerometer for, e. g., the purpose of vibration analysis of bridge structures. It is applied as a third step of the aforementioned scenario in the suitability analysis of the MEMS accelerometer by providing information about unknown offset and drift coefficients, VAR model order and stochastic model parameters. Therefore, a sensor with possibly less offset and drift coefficients over a long period as well as lower uncertainty of the measurements could be selected. In addition, a minimum VAR model order between different MEMS accelerometers reveals less cross-correlation between their axes, which can also be considered as an important influencing factor in the selection procedure. Moreover, it is observed that the MEMS accelerometers used in this study have minimum standard deviations in the x-axis compared to the two other axes. Therefore, the sensors should be mounted with their x-axis in the main observations direction during the monitoring of bridge structures.

4.3 Gauss-Helmert model: Adjustment of a sphere from continuous laser tracker measurements

In the third part of this section, the performance of the EM algorithm in a real-world application estimating the parameters β = c x c y c z r T of sphere is studied. The practical relevance of this sphere parameter estimation can be seen in the field of surface fitting applications as well as referencing applications of surface and point-wise measuring sensors, i. e., laser scanner and laser tracker measurements. The 3D sphere is defined by the nonlinear equation

(80) h t ( β , μ ) = x t − c x 2 + y t − c y 2 + z t − c z 2 − r 2 = 0 ,

which is an extension in 3D space by considering the z-component in addition to the nonlinear equation of the 2D circle in (70).

To illustrate and discuss the adjustment of a sphere, 3D coordinates on the surface of a sphere with known radius were obtained by continuous measurements with 10 Hz using a laser tracker of type Leica AT960-LR. The extended uncertainty of the x-, y- and z-coordinate is given by U x , y , z = ± 15 µ m + 6 µ m / m (MPE according to ISO 10360-10) while using a Leica Red Ring Reflector 1.5” (RRR) in single measurement mode [60]. Here, the data acquisition is performed in continuous measurement mode at 10 Hz using the software tool Tracker Pilot. In the acquisition process, the RRR was moved randomly over the sphere surface from only one laser tracker setup; thus, only a half-sphere is measured. To capture the full sphere surface at least two standpoints of the laser tracker are required. In this case, the measurements have to be transformed into a common coordinate system. This leads to the presence of referencing uncertainties besides sensor-related uncertainties. In order to focus on sensor uncertainties only, the sphere is only observed from one standpoint. The distance between the laser tracker and the sphere was approximately 4 m and the y-axis of the local coordinate system of the laser tracker was oriented towards the sphere. The Cartesian coordinates obtained yield a time series of dimension N = 3 for a period of approximately 90 s, resulting in n = 888 3D points. There are no data gaps, which results in a Δ t = 0.1 s for subsequent measurement. The environmental temperature during the experiment was constant 19.3 ∘ C.

4.3.1 Analysis of the measurements

In analogy to the analysis scheme for the simulated 2D circle in Sect. 4.1, the sphere parameters are obtained on the one hand by means of a LS estimation, i. e. a classical GHM, and on the other hand by means of the proposed EM algorithm in this contribution. The thresholds used within the EM algorithm are similar to the simulation. In this real-world application study, the multivariate portmanteau test is applied to select an adequate VAR model order p by testing the estimated residuals u ˆ t of the VAR model if they are uncorrelated (i. e., by checking for the presence of white noise). This VAR model order selection for p = 0 , … , 10 was similarly applied in Sect. 4.2 using the significance level α = 0.05, and the maximum lag is set to 20 similar to the research of [26]. For the VAR model order p = 0 no multivariate VAR process is estimated. Instead, the GHM with stochastically independent t-distributed errors is evaluated. The EM algorithm estimates the unknown parameters β, Ψ, ν and A j , which yields – depending on the VAR model order – in total 4 + 6 + 1 + 9 · p parameters to estimate.

$Figure 10 Results of the test values for VAR model orders p = 1 , … , 10p=1,\dots ,10. Blue line indicates the critical value for the significance level α = 0.05\alpha =0.05 and the maximum lag is set to 20. The red dots show the test values of the weighted white noise, which are larger than the critical value. The green star represents the test value matching the null hypothesis H 0 {H_{0}}.$

Figure 10

Results of the test values for VAR model orders p = 1 , … , 10. Blue line indicates the critical value for the significance level α = 0.05 and the maximum lag is set to 20. The red dots show the test values of the weighted white noise, which are larger than the critical value. The green star represents the test value matching the null hypothesis H 0 .

Before discussing the results of the parameter estimation, we will discuss the results of the portmanteau test in Fig. 10. The blue line indicate the critical value for the significance level α = 0.05. The red dots show the test values of the weighted white noise, which are larger than the critical value. The green star represents the test value that is smaller than the critical value, which means that the test is passed. It can be seen that the test is passed for order p = 3, while for orders p = 4 … 7 the test value is close to the critical value. The test for order p = 0 failed and is, due to its significantly larger numerical value, not shown in Fig. 10. Based on these portmanteau test results, we will focus in the further discussion on the results of the LS model and the VAR model orders p = 0 and p = 3.

4.3.2 Discussion of the results

The estimated parameters β for the LS model and the VAR model orders p = 0 and p = 3 shown in Table 4 are almost the same, with the maximum differences of 5 μ m being in the range of the uncertainty of the laser tracker used. The small value for the df indicates that the residuals u t are not normally distributed. Differences in the estimated parameters of the three models can be found in the scale matrix and the VCM of the residuals. Σ ˆ u u is equivalent to the classical estimate of the VCM of the estimated white noise u. For the LS model, Σ ˆ u u is estimated by means of the estimated residuals e. Σ ˆ u u for VAR model order p = 3 clearly shows, how the values of the VCM of the observations on the main diagonal are reduced by means of the VAR process. The fact that the matrices Σ ˆ u u for the LS model and the VAR model order p = 0 are almost identical can be seen in the residuals shown in Fig. 11.

Table 4

Estimated sphere parameters for the LS model and the VAR model orders p = 0 and p = 3.

	LS Model	EM Model p = 0	EM Model p = 3
β ˆ c x [mm]	99.317	99.317	99.317
β ˆ c y [mm]	4014.130	4014.129	4014.125
β ˆ c z [mm]	4.009	4.009	4.006
β ˆ r [mm]	68.943	68.943	68.939
Ψ ˆ	–	29.24 − 0.17 3.09 − 0.17 20.86 7.07 3.09 7.07 5.85 10 − 5	1.15 0.69 0.31 0.69 0.79 0.38 0.31 0.38 0.35 10 − 5
ν ˆ	−	2.60	2.10
Σ ˆ u u	60.64 − 3.24 3.42 − 3.24 57.09 19.55 3.42 19.55 14.29 10 − 5	61.15 − 3.27 3.22 − 3.27 57.06 19.63 3.22 19.63 14.24 10 − 5	4.21 3.14 1.31 3.14 3.98 1.89 1.31 1.89 1.54 10 − 5
R ˆ	1.00 − 0.06 0.12 − 0.06 1.00 0.68 0.12 0.68 1.00	1.00 − 0.06 0.11 − 0.06 1.00 0.69 0.11 0.69 1.00	1.00 0.77 0.51 0.77 1.00 0.76 0.51 0.76 1.00

Figure 11

Estimated colored and white noise residuals obtained from laser tracker measurements for VAR model order p = 3. Blue line indicates colored noise, red circles indicate white noise and green dots indicate the weighted white noise.

Figure 11 depicts the estimated colored and white noise residuals obtained from laser tracker measurements for VAR model order p = 3. The blue line indicates the colored noise, which shows several significant peaks, which can be interpreted as potential outliers. These may stem from the imperfect surface of the sphere or even can be associated with the roughness of the surface. Furthermore, these significant peaks can result from the measurement procedure since for the data acquisition the RRR is moved individually by hand along the surface. If the contact to the surface is slightly lost, this will also produce peaks as can be seen in the blue line. Due to the orientation of the local laser tracker coordinate system, individual peaks in the y-component (middle part in Fig. 11) can be associated with an uplift of the RRR towards the laser tracker. The minor colored noise level of the z-component also results from the measurement setup, i. e. the orientation of the local laser tracker coordinate system, and shows the uncertainty differences for the angular and the distance measurement unit. Since the estimated parameters β for the LS model and the VAR model orders p = 0 and p = 3 are almost identical (see Table 4), the depicted colored noise is also nearly identical. For the special case of VAR model order p = 0 the residual estimates e ˆ t and u ˆ t are equal. This implies equality of Σ ˆ u u for the LS model and the VAR model order p = 0. Furthermore, the residuals in Fig. 11 show an equal distribution of positive and negative peaks (outliers) for the individual coordinate components. Due to the symmetric geometry of the sphere the positive and negative peaks compensate each other in the estimation of the parameters β.

The red circles in Fig. 11 indicate white noise and the green dots indicate the weighted white noise, i. e. white noise values multiplied with their weights w ˆ. It can be clearly seen that the VAR process is unable to fully smooth the colored noise. First, the substitution of t-distribution instead of the normal distribution allow for reducing the influence of the remaining outliers in the parameter estimation. Furthermore, only by using the weighted white noise (colored in green) the portmanteau test can be passed, which is not applicable for the white noise (colored in red).

Table 5 shows the results of the VAR coefficients for the model orders p = 1 , 2 , 3. To judge about any possible correlations for the colored noise of the x-, y- and z-component, we can confirm correlations based on the shown VAR coefficients in Table 5. Since the off-diagonal elements are significantly different from zero, this indicates the presence of correlations. As an example, A ˆ 3 shows similar magnitudes for the x-component with −0.263 and correlations with the y-component by a value of 0.203. Furthermore, the alternating positive and negative sign of the coefficients indicate the negative correlation for the coordinate components.

Taking a look at the VAR coefficients for the model order p = 1 one can see small off-diagonal elements, which shows that the correlation of the coordinate components is hardly reduced. This can also be seen by the significant difference of the test value in Fig. 10 for p = 1 with respect to the critical value. For the VAR model of order p = 2 it is firstly possible to estimate the correlations of the colored noise. On the one hand, in Table 5 the off-diagonal elements for p = 2 are obviously larger compared to p = 1. On the other hand, the test value is approaching the critical value, which can be seen in Fig. 10.

Table 5

Results of the VAR coefficients for the VAR models of order p = 1 , 2 , 3.

	EM Model p = 1	EM Model p = 2	EM Model p = 3
A ˆ 1	0.986 0.006 0.001 − 0.023 1.007 − 0.021 − 0.009 0.007 0.982	1.412 − 0.148 0.108 − 0.265 1.622 − 0.132 − 0.103 0.090 1.355	1.244 0.051 0.006 − 0.218 1.555 − 0.149 − 0.118 0.108 1.284

A ˆ 2	–	− 0.427 0.153 − 0.151 0.262 − 0.623 0.083 0.108 − 0.078 − 0.430	− 0.002 − 0.245 0.091 0.159 − 0.439 0.125 0.148 − 0.114 − 0.220

A ˆ 3	–	–	− 0.263 0.203 − 0.159 0.058 − 0.115 − 0.043 − 0.025 0.022 − 0.164

5 Conclusions and outlook

The nonlinear Gauss-Helmert model constitutes the most general adjustment model, which is widely used in geodesy. This model consists of condition equations (which link the observations and the functional parameters occurring in the mathematical model employed to approximate the measurements) and of a stochastic model. The latter usually is defined by a VCM, which tends to be huge when the number of measurements is large. We therefore applied and investigated, in the context of multivariate time series, a more manageable type of stochastic model defined by a vector autoregressive (VAR) process. This model takes both auto- and cross-correlations into account and can be estimated alongside the functional model parameters. By employing a multivariate t-distribution with data-adaptable scale matrix and degree of freedom, moderate deviations from the usual assumption of normally distributed noise are possible. To fuse this t-distribution with the VAR process and the constraints, we applied the idea of formulating a generalized Gauss-Helmert model in terms of a likelihood function, which is maximized under the constraints. The proposed model allows for a computationally convenient iteratively reweighted least-squares method based on a constrained expectation maximization algorithm. This methodology extends the previously established approach involving univariate autoregressive models that ignored potential cross-correlations between the time series. Analysis of the biases by means of Monte Carlo simulations showed that the functional model parameters tend to be highly accurate when the number of observations is increased, whereas a significant bias of the degree of freedom of the multivariate t-distribution persists, in particular for smaller degrees of freedom. The accuracy of the estimated VAR coefficients and of the scale parameters generally improves with larger datasets. The methodology is also applicable to functional models that take the form of a Gauss-Markov model, as shown by an analysis of real accelerometer data modeled by an offset and a linear drift. For another real-world application, which utilizes continuous measurements of a laser tracker on a sphere, the applicability of the proposed Gauss-Helmert model was also demonstrated. Furthermore, the analyses of the results for this application demonstrated that the estimated VAR process coefficients and the estimated parameters of the multivariate t-distribution allows for detailed evaluations of the auto- and cross-correlation as well as the distributional characteristics of the measurement noise. Thus, actual deviations of the noise from expected white-noise and normal-distribution behavior become visible in the course of adjusting real-world datasets. Due to its flexibility the proposed innovative type of Gauss-Helmert model with VAR and t-distributed errors could be useful in various fields of geodesy such as engineering geodesy or Earth system modeling where multivariate, functionally complex and correlated time series are analyzed. Since such datasets often contain gaps a further extension of the model and EM algorithms such that the E-step yields also values for missing measurement values appears to be useful. It is also conceivable that the VAR processes of the current model are replaced by other stochastic processes that might match the correlation pattern of a given set of observables better.

Funding source: Deutsche Forschungsgemeinschaft

Award Identifier / Grant number: 386369985

Funding statement: This research was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – project number 386369985.

Appendix A Derivations

Log-likelihood and Q-functions

(81) log L ( θ GHM ; ℓ , w ) = − N n 2 log ( 2 π ) − n 2 log det Ψ + n ν 2 log ν 2 − n log Γ ν 2 + N 2 − 1 ∑ t = 1 n log w t − 1 2 ∑ t = 1 n w t [ A ( L ) ( ℓ t − μ t ) ] T Ψ − 1 [ A ( L ) ( ℓ t − μ t ) ] + ν 2 ∑ t = 1 n ( log w t − w t ) .

(82) Q ( θ GHM | θ ˆ GHM ( s ) ) = E W | u ; θ ˆ GHM ( s ) log L θ GHM ; ℓ , W = − N n 2 log ( 2 π ) − n 2 log det Ψ + n ν 2 log ν 2 − n log Γ ν 2 − ∑ t = 1 n 1 2 ν + [ A ( L ) ( ℓ t − μ t ) ] T Ψ − 1 [ A ( L ) ( ℓ t − μ t ) ] × E W | u ; θ ˆ GHM ( s ) { W t } + ∑ t = 1 n 1 2 ( ν − 1 ) E W | u ; θ ˆ GHM ( s ) { log W t } .

Here, the conditional expectations E W | u ; θ ˆ GHM ( s ) { W t } and E W | u ; θ ˆ GHM ( s ) { log W t } are essentially the same as the ones occurring in Eq. (54) in [26] with respect to the GMM with VAR and multivariate-distributed errors. The only difference lies in the different parametrization of the deterministic model, which is independent of the stochastic model for W t . Consequently,

(83) w ˜ t ( s ) : = E W | u ; θ ˆ GHM ( s ) { W t } = ν ˆ ( s ) + N ν ˆ ( s ) + u t T ( Ψ ˆ ( s ) ) − 1 u t

and

(84) E W | u ; θ ˆ GHM ( s ) { log W t } = log w ˜ t ( s ) + ψ ν ˆ ( s ) + N 2 − log ν ˆ ( s ) + N 2 ,

with u t = A ˆ ( s ) ( L ) ( ℓ t − μ t ( s ) ), so that the Q-function (82) can be written as

(85) Q ( θ GHM | θ ˆ GHM ( s ) ) = − n 2 log det Ψ − 1 2 ∑ t = 1 n w ˜ t ( s ) [ A ( L ) ( ℓ t − μ t ) ] T Ψ − 1 [ A ( L ) ( ℓ t − μ t ) ] + n ν 2 log ν − n log Γ ν 2 + n ν 2 ψ ν ˆ ( s ) + N 2 − log ν ˆ ( s ) + N + 1 n ∑ t = 1 n log w ˜ t ( s ) − w ˜ t ( s ) .

The following normal equations can be derived from the linearized Lagrangian (obtained by combining (40) with (42))

(86) F ( ϕ , λ | θ ˆ GHM ( s ) ) ≈ − n 2 log det Ψ − 1 2 ∑ t = 1 n w ˜ t ( s ) [ A ( L ) ( ℓ t − μ t ) ] T Ψ − 1 [ A ( L ) ( ℓ t − μ t ) ] + n ν 2 log ν − n log Γ ν 2 + n ν 2 ψ ν ˆ ( s ) + N 2 − log ν ˆ ( s ) + N + 1 n ∑ t = 1 n log w ˜ t ( s ) − w ˜ t ( s ) + λ T X Δ β + B ( μ − ℓ ) + m ‾

by applying slight modifications to the steps concerning the GMM with VAR and multivariate t-distributed errors given in Appendix B in [26], using (8).

Normal equations for the location parameters

Applying (8) to eliminate the time dependence of the location parameters μ t allows one to form (and subsequently simplify) the partial derivatives with respect to the complete vector μ of location parameters as follows.

0 = ∂ F ( ϕ , λ | θ ˆ GHM ( s ) ) ∂ μ = − 1 2 ∑ t = 1 n w ˜ t ( s ) ∂ ∂ μ [ A ( L ) ( ℓ t − μ t ) ] T Ψ − 1 [ A ( L ) ( ℓ t − μ t ) ] + B T λ = − 1 2 ∑ t = 1 n w ˜ t ( s ) ∂ ∂ μ [ A ( L ) ( ℓ t − J t μ ) ] T Ψ − 1 [ A ( L ) ( ℓ t − J t μ ) ] + B T λ = − ∑ t = 1 n w ˜ t ( s ) [ A ( L ) ( ℓ t − J t μ ) ] Ψ − 1 · ∂ ∂ μ [ − A ( L ) ( J t μ ) ] + B T λ = ∑ t = 1 n [ A ( L ) J t ] T w ˜ t ( s ) Ψ − 1 [ A ( L ) ℓ t − A ( L ) J t μ ] + B T λ = ∑ t = 1 n J ¯ t T w ˜ t ( s ) Ψ − 1 ( ℓ ¯ t − J ¯ t μ ) + B T λ

Normal equations for the VAR coefficient matrices

Substituting (32) into (86) for brevity of expressions and applying also Eq. (88) in [59],

0 N = ∂ F ( ϕ , λ | θ ˆ GHM ( s ) ) ∂ A j = − 1 2 ∑ t = 1 n w ˜ t ( s ) ∂ ∂ A j [ A ( L ) e t ] T Ψ − 1 [ A ( L ) e t ] = − 1 2 ∑ t = 1 n w ˜ t ( s ) ∂ ∂ A j [ e t − A 1 e t − 1 − ⋯ − A p e t − p ] T × Ψ − 1 [ e t − A 1 e t − 1 − ⋯ − A p e t − p ] = − 1 2 ∑ t = 1 n w ˜ t ( s ) ∂ ∂ A j e t − ∑ i = 1 , i ≠ j p A i e t − i − A j e t − j T × Ψ − 1 e t − ∑ i = 1 , i ≠ j p A i e t − i − A j e t − j = − 1 2 ∑ t = 1 n w ˜ t ( s ) ( − 2 ) · Ψ − 1 e t − ∑ i = 1 , i ≠ j p A i e t − i − A j e t − j e t − j T = ∑ t = 1 n w ˜ t ( s ) Ψ − 1 [ A ( L ) e t ] e t − j T = ∑ t = 1 n w ˜ t ( s ) e t − j [ A ( L ) e t ] T ( Ψ − 1 ) T = ∑ t = 1 n w ˜ t ( s ) e t − j [ A ( L ) e t ] T ( Ψ − 1 ) T Ψ T = ∑ t = 1 n w ˜ t ( s ) e t − j [ A ( L ) e t ] T

The joint normal equation system for all of the VAR coefficient matrices A 1 , … , A p then becomes

0 N ⋮ 0 N = ∑ t = 1 n w ˜ t ( s ) e t − 1 [ A ( L ) e t ] T ⋮ ∑ t = 1 n w ˜ t ( s ) e t − p [ A ( L ) e t ] T = w ˜ 1 ( s ) e 1 − 1 ⋯ w ˜ n ( s ) e n − 1 ⋮ ⋮ w ˜ 1 ( s ) e 1 − p ⋯ w ˜ n ( s ) e n − p × [ e 1 − A 1 e 1 − 1 − ⋯ − A p e 1 − p ] T ⋮ [ e n − A 1 e n − 1 − ⋯ − A p e n − p ] T = e 0 ⋯ e n − 1 ⋮ ⋮ e 1 − p ⋯ e n − p W ˜ ( s ) × e 1 T ⋮ e n T − e 0 T ⋯ e 1 − p T ⋮ ⋮ e n − 1 T ⋯ e n − p T A 1 T ⋮ A p T = E W ˜ ( s ) e 1 T ⋮ e n T − E W ˜ ( s ) E T A 1 T ⋮ A p T .

Normal equations for the inverse scale matrix

Substituting (33) into (86) for brevity of expressions and applying the arguments given in Sect. 373 in [18], one obtains

0 = ∂ F ( ϕ , λ | θ ˆ GHM ( s ) ) ∂ Ψ − 1 = n 2 · ∂ ∂ Ψ − 1 ( − log det Ψ ) − 1 2 ∑ t = 1 n w ˜ t ( s ) ∂ ∂ Ψ − 1 u t T Ψ − 1 u t = n 2 Ψ − 1 2 ∑ t = 1 n w ˜ t ( s ) u t u t T = Ψ − 1 n ∑ t = 1 n w ˜ t ( s ) [ A ( L ) ( ℓ t − μ t ) ] [ A ( L ) ( ℓ t − μ t ) ] T .

Acknowledgments

The authors would like to acknowledge ALLSAT GmbH for providing the low-cost MEMS sensor used in the experiments. In addition, the authors would also like to acknowledge Eva Kemkes, M. Sc. (ALLSAT GmbH) for her assistance in data acquisition within the experiments.

Author contributions: Conceptualization, B. K., H. A., J.-A. P., A. D., M. O.; methodology, B. K., H. A. and A. D.; software, B. K., A. D., M. O. and H. A.; validation, H. A., A. D. and M. O.; formal analysis, A. D., M. O. and H. A.; investigation, B. K., A. D., M. O., J.-A. P., H. A.; resources, B. K.; data curation, A. D.; writing–original draft preparation, B. K., A. D., M. O., H. A. and J.-A. P.; writing–review and editing, B. K., H. A. and J.-A. P.; visualization, H. A., A. D. and M. O.; supervision, B. K., H. A. and J.-A. P.; project administration, B. K., H. A. and J.-A. P.; funding acquisition, B. K., H. A. and J.-A. P.

References

[1] Helmert FR. Die Ausgleichungsrechnung nach der Methode der kleinsten Quadrate. Leipzig, Teubner, 1872.Search in Google Scholar

[2] Wolf H. Das geodätische Gauß-Helmert-Modell und seine Eigenschaften. Z Vermessungswesen 1978, 103, 41–43.Search in Google Scholar

[3] Golub GH, Van Loan CH. An analysis of the total least squares problem. SIAM J Numer Anal 1980, 17, 883–893.10.1137/0717073Search in Google Scholar

[4] Schaffrin B, Wieser A. On weighted total least-squares adjustment for linear regression. J Geod 2008, 82, 415–421.10.1007/s00190-007-0190-9Search in Google Scholar

[5] Neitzel F. Generalization of total least-squares on example of unweighted and weighted 2D similarity transformation. J Geod 2010, 84, 751–762.10.1007/s00190-010-0408-0Search in Google Scholar

[6] Xu P, Liu J, Shi C. Total least squares adjustment in partial errors-in-variables models: algorithm and statistical analysis. J Geod 2012, 86, 661–675.10.1007/s00190-012-0552-9Search in Google Scholar

[7] Amiri-Simkooei A, Jazaeri S. Weighted total least squares formulated by standard least squares theory. J Geodetic Sci 2012, 2, 113–124.10.2478/v10156-011-0036-5Search in Google Scholar

[8] Gauß CF. Theoria Motus Corporum Coelestium. Hamburg, Perthes und Besser, 1809.Search in Google Scholar

[9] Gauß CF. Theoria Combinationis Observationum. Göttingen, Dieterich, 1823.Search in Google Scholar

[10] Markov AA. Wahrscheinlichkeitsrechnung. Leipzig, Teubner, 1912.Search in Google Scholar

[11] Aitken AC. On least squares and linear combinations of observations. Proc Royal Soc Edinburgh 1935, 55, 42–48.10.1017/S0370164600014346Search in Google Scholar

[12] Amiri-Simkooei AR. Noise in multivariate GPS position time-series. J Geod 2009, 83, 175–187.10.1007/s00190-008-0251-8Search in Google Scholar

[13] Koch KR, Kuhlmann H, Schuh WD. Approximating covariance matrices estimated in multivariate models by estimated auto- and cross-covariances. J Geod 2010, 84, 383–397.10.1007/s00190-010-0375-5Search in Google Scholar

[14] Kermarrec G, Schön S. Fully populated VCM or the hidden parameter. J Geod Sci 2017, 7, 151–161.10.1515/jogs-2017-0016Search in Google Scholar

[15] Kermarrec G, Schön S. A priori fully populated covariance matrices in least-squares adjustment—case study: GPS relative positioning. J Geod 2017, 91, 465–484.10.1007/s00190-016-0976-8Search in Google Scholar

[16] Kauker S, Schwieger V. A synthetic covariance matrix for monitoring by terrestrial laser scanning. J Appl Geod 2017, 11, 77–87.10.1515/jag-2016-0026Search in Google Scholar

[17] Kauker S, Holst C, Schwieger V, Kuhlmann H, Schön S. Spatio-temporal correlations of terrestrial laser scanning. AVN 2016, 123, 170–182.Search in Google Scholar

[18] Koch KR. Parameter Estimation and hypothesis testing in linear models. Berlin, Springer, 1999.10.1007/978-3-662-03976-2Search in Google Scholar

[19] Kermarrec G, Schön S. Taking correlations in GPS least squares adjustments into account with a diagonal covariance matrix. J Geod 2016, 90, 793–805.10.1007/s00190-016-0911-zSearch in Google Scholar

[20] Williams SDP. The effect of coloured noise on the uncertainties of rates estimated from geodetic time series. J Geod 2003, 76, 483–494.10.1007/s00190-002-0283-4Search in Google Scholar

[21] Schuh WD. The processing of band-limited measurements; filtering techniques in the least squares context and in the presence of data gaps. Space Sci Rev 2003, 108, 67–78.10.1007/978-94-017-1333-7_7Search in Google Scholar

[22] Klees R, Ditmar P, Broersen P. How to handle colored observation noise in large least-squares problems. J Geod 2003, 76, 629–640.10.1007/s00190-002-0291-4Search in Google Scholar

[23] Zeng W, Fang X, Lin Y, Huang X, Zhou Y. On the total least-squares estimation for autoregressive model. Surv Rev 2018, 50, 186–190.10.1080/00396265.2017.1281096Search in Google Scholar

[24] Brockwell PJ, Davis RA. Introduction to time series and forecasting. 3rd ed. Switzerland, Springer, 2016.10.1007/978-3-319-29854-2Search in Google Scholar

[25] McDonald JB. Partially adaptive estimation of ARMA time series models. Int J Forecasting 1989, 5, 217–230.10.1016/0169-2070(89)90089-7Search in Google Scholar

[26] Kargoll B, Kermarrec G, Korte J, Alkhatib H. Self-tuning robust adjustment within multivariate regression time series models with vector-autoregressive random errors. J Geod 2020, 94, 512020.10.1007/s00190-020-01376-6Search in Google Scholar

[27] Nduka UC, Ugah TE, Izunobi CH. Robust estimation using multivariate t innovations for vector autoregressive models via ECM algorithm. J Appl Stat 2021, 48, 693–711.10.1080/02664763.2020.1742297Search in Google Scholar

[28] Parzen E. A density-quantile function perspective on robust estimation. In: Launer L, Wilkinson GN (Eds.) Robustness in Statistics. Academic Press, 1979, 237–258.10.1016/B978-0-12-438150-6.50019-4Search in Google Scholar

[29] Huber PJ. Robust estimation of a location parameter. Ann Math Stat 1964, 35, 73–101.10.1007/978-1-4612-4380-9_35Search in Google Scholar

[30] Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc B 1977, 39, 1–38.10.1111/j.2517-6161.1977.tb01600.xSearch in Google Scholar

[31] Takai K. Constrained EM algorithm with projection method. Comput Stat 2012, 27, 701–714.10.1007/s00180-011-0285-xSearch in Google Scholar

[32] Koch KR. Robust estimations for the nonlinear Gauss Helmert model by the expectation maximization algorithm. J Geod 2014, 88, 263–271.10.1007/s00190-013-0681-9Search in Google Scholar

[33] Koch KR. Outlier detection for the nonlinear Gauss Helmert model with variance components by the expectation maximization algorithm. J Appl Geod 2014, 8, 185–194.10.1515/jag-2014-0004Search in Google Scholar

[34] Koch KR. Beispiele zur Parameterschätzung, zur Festlegung von Konfidenzregionen und zur Hypothesenprüfung. Mitteilungen aus den Geodätischen Instituten der Rheinischen Friedrich-Wilhelms-Universität, No. 87, Bonn, 2000.Search in Google Scholar

[35] Ettlinger A, Neuner H. Assessment of inner reliability in the Gauss-Helmert model. J Appl Geod 2019, 14, 13–28.10.1515/jag-2019-0013Search in Google Scholar

[36] Rao CR. Linear statistical inference and its applications. New York, Wiley, 1973.10.1002/9780470316436Search in Google Scholar

[37] Tienstra JM. An extension of the technique of the methods of least squares to correlated observations. Bull Geodesique 1947, 6, 301–335.10.1007/BF02525959Search in Google Scholar

[38] Kuhlmann H. Importance of autocorrelation for parameter estimation in regression models. In: Proceedings of the 10th FIG International Symposium on Deformation Measurements, FIG 354–61.Search in Google Scholar

[39] Baarda W. A testing procedure for use in geodetic networks. Publications on Geodesy (New Series), Vol. 2, No. 5. Delft, The Netherlands, Netherlands Geodetic Commission, 1968.10.54419/t8w4sgSearch in Google Scholar

[40] Pope AJ. The statistics of residuals and the detection of outliers. NOAA Technical Report NOS65 NGS1. Rockville, Maryland, USA, US Department of Commerce, National Geodetic Survey, 1976.Search in Google Scholar

[41] Lehmann R, Lösler M, Neitzel F. Mean-shift versus variance-inflation approach for outlier detection – a comparative study. Mathematics 2020, 8, 991.10.3390/math8060991Search in Google Scholar

[42] Priestley MB. Spectral analysis and time series. Academic Press, 1981.Search in Google Scholar

[43] Lange KL, Little RJA, Taylor JMG. Robust statistical modeling using the t distribution. J Am Stat Assoc 1989, 84, 881–896.10.2307/2290063Search in Google Scholar

[44] Liu Y, Sang R, Liu S. Diagnostic analysis for a vector autoregressive model under Student’s t-distributions. Stat Neerl 2017, 71, 86–114.10.1111/stan.12102Search in Google Scholar

[45] Hamilton JD. Time Series Analysis. Princeton University Press, 1994.10.1515/9780691218632Search in Google Scholar

[46] Kargoll B, Omidalizarandi M, Alkhatib H. Adjustment of Gauss-Helmert Models with autoregressive and Student errors. In: Novák P, Crespi M, Sneeuw N, Sansò F (Eds.) IX Hotine-Marussi Symposium on Mathematical Geodesy. International Association of Geodesy Symposia, Vol. 151. Cham, Springer, 2020, 79–87.10.1007/1345_2019_82Search in Google Scholar

[47] McLachlan GJ, Krishnan T. The EM algorithm and extensions. 2nd ed. Hoboken, John Wiley & Sons, 2008.10.1002/9780470191613Search in Google Scholar

[48] Rubin DB. Iteratively reweighted least squares. In: Kotz S, Johnson NL, Read CB (Eds.) Encyclopedia of Statistical Sciences, Vol. 4, New York, Wiley, 1983, 272–275.Search in Google Scholar

[49] Liu CH, Rubin DB. The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 1994, 81, 633–648.10.1093/biomet/81.4.633Search in Google Scholar

[50] Liu CH, Rubin DB. ML estimation of the t distribution using EM and its extensions, ECM and ECME. Stat Sin 1995, 5, 19–39.Search in Google Scholar

[51] Rump SM. INTLAB–interval laboratory. In: Csendes T (Ed.) Developments in Reliable Computing. Dordrecht, Kluwer Academic Publishers, 1999, 77–104.10.1007/978-94-017-1247-7_7Search in Google Scholar

[52] Neitzel F, Niemeier W, Weisbrich S. Investigation of low-cost accelerometer, terrestrial laser scanner and ground-based radar interferometer for vibration monitoring of bridges. In: Proceedings of the 6th European Workshop on Structural Health Monitoring, 2012, 542–551.Search in Google Scholar

[53] Omidalizarandi M, Herrmann R, Marx S, Kargoll B, Paffenholz JA, Neumann I. A validated robust and automatic procedure for vibration analysis of bridge structures using MEMS accelerometers. J Appl Geod 2020, 14, 327–354.10.1515/jag-2020-0010Search in Google Scholar

[54] Omidalizarandi, M. Robust deformation monitoring of bridge structures using MEMS accelerometers and image-assisted total stations. München 2020. Deutsche Geodätische Kommission: C (PhD Dissertation): Heft Nr. 859. http://publikationen.badw.de/de/047037691.Search in Google Scholar

[55] Omidalizarandi M, Neumann I, Kemkes E, Kargoll B, Diener D, Rüffer J, Paffenholz JA. MEMS based bridge monitoring supported by image-assisted total station. In: ISPRS – International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-4/W18, 2019, 833–842.10.5194/isprs-archives-XLII-4-W18-833-2019Search in Google Scholar

[56] Alkhatib H, Kargoll B, Paffenholz JA. Further results on robust multivariate time series analysis in nonlinear models with autoregressive and t-distributed errors. In: Rojas I, Pomares H, Valenzuela O (Eds.) Time Series Analysis and Forecasting. ITISE 2017. Contributions to Statistics, Cham, Springer, 2018, 25–38.10.1007/978-3-319-96944-2_3Search in Google Scholar

[57] Hosking JRM. The multivariate portmanteau statistic. J Am Stat Ass 1980, 75, 602–608.10.1080/01621459.1980.10477520Search in Google Scholar

[58] Hargreaves GI. Interval analysis in MATLAB. Numerical Analysis Report No. 416, Manchester Centre for Computational Mathematics, The University of Manchester, ISSN 1360-1725, 2002.Search in Google Scholar

[59] Petersen KB, Pedersen MS. The matrix cookbook. Version November 15, 2012, Technical University of Denmark, https://www2.imm.dtu.dk/pubdb/edoc/imm3274.pdf. Accessed 22 February 2021.Search in Google Scholar

[60] Hexagon Manufacturing Intelligence. Leica Absolute Tracker AT960 product brochure. Version 2017, https://www.hexagonmi.com/en-US/products/laser-tracker-systems/leica-absolute-tracker-at960. Accessed 20 April 2021.Search in Google Scholar

Received: 2021-02-26

Accepted: 2021-05-01

Published Online: 2021-05-11

Published in Print: 2021-07-27

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/jag-2021-0013

Keywords for this article

Gauss-Helmert model; Gauss-Markov model; vector-autoregressive model; auto-correlations; cross-correlations; multivariate t-distribution; iteratively reweighted least squares; expectation maximization algorithm; accelerometer time series

Creative Commons

BY 4.0