On a Different way of Understanding the Edge-Effect for the Inference of ARMA-type Processes (in 




Z


d






)

Chrysoula Dimitriou-Fakalou

doi:10.1515/jtse-2020-0012

Artikel

On a Different way of Understanding the Edge-Effect for the Inference of ARMA-type Processes (in Z d )

Chrysoula Dimitriou-Fakalou

Veröffentlicht/Copyright: 18. Mai 2021

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen

Aus der Zeitschrift Journal of Time Series Econometrics Band 14 Heft 1

Abstract

The edge-effect concerning the standard estimators’ bias for the parameters of multi-indexed ARMA-type series is a common hurdle; it is investigated whether an alternative ARMA parameterization might release any unwelcome complication. The theoretical blocks, of when the factorized model is free of the edge-effect, are provided and simulation results are used to reinforce the same views. Estimation or other perspectives are discussed in a conclusion.

Keywords: bias; edge-effect; factorization; linear-by-linear

Corresponding author: Chrysoula Dimitriou-Fakalou, Independent Researcher, 6 Kastelorizou Street, Ano Voula Attikis, Athens 16673, Greece, E-mail: my.submission@mail.com

Acknowledgements

The author wishes to thank the wonderful Professor Javier Hidalgo and a reviewer for a number of priceless points made. Finally, this piece is dedicated to the author’s darling father who passed away on August 17, 2019.

Appendix A: Sketch of Theoretical Proof for Theorem 3.1

When Div^(l) is (asymptotically) equal to the number of terms in the sum, it is possible to show that (6) has a convergence in distribution (as Div^(l) → ∞) to a random variable N(0, w _≠l) (it should be kept in mind that, even though omitted, w _≠l is a function of the (d − 1) fixated indexes): that conclusion relies on Theorem 6.4.2 (Brockwell and Davis 1991, 213–214) that establishes the asymptotic normality for the case of K-dependent strictly stationary series, i.e., K would be a fixed positive integer to deal temporarily with the countably infinite term polynomial { ϑ ( l ) } − 1 ; under {e(v)} ∼ IID, (4) would easily convert into a derivation of independence of two variables, which together with truncating { ϑ ( l ) } − 1 would achieve the K-dependence. To establish the asymptotic normality of the original (6), the Proposition 6.3.9 (Brockwell and Davis 1991, 207–208) could be employed.

Regarding w _≠l, it should first be observed that

C o v ∏ l * ∈ L , l * ≠ l 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * e ( l ) ( v ) ∏ l * ∈ L , l * ≠ l 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * ϑ ( l ) ( B ) − 1 e ( l ) ( v − n * l ) , ∏ l * ∈ L , l * ≠ l 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * e ( l ) ( v − i l ) ∏ l * ∈ L , l * ≠ l 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * ϑ ( l ) ( B ) − 1 e ( l ) ( v − i l − m * l )

≡ E ∏ l * ∈ L , l * ≠ l 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * e ( l ) ( v ) ∏ l * ∈ L , l * ≠ l 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * ϑ ( l ) ( B ) − 1 e ( l ) ( v − n * l ) ∏ l * ∈ L , l * ≠ l 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * e ( l ) ( v − i l ) ∏ l * ∈ L , l * ≠ l 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * ϑ ( l ) ( B ) − 1 e ( l ) ( v − i l − m * l )

is zero for i > 0, since { ( ∏ l * ∈ L , l * ≠ l ( 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * ) ) e ( l ) ( v ) } is independent of the remaining three variables in the product. Then according to Theorem 6.4.2 (Brockwell and Davis 1991, 213–214), it has to be that

w ≠ l = V a r ∏ l * ∈ L , l * ≠ l 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * e ( l ) ( v ) V a r ∏ l * ∈ L , l * ≠ l 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * ϑ ( l ) ( B ) − 1 e ( l ) ( v − n * l ) ;

for the case of the p _l + q _l parameter vector

V a r ∏ l * ∈ L , l * ≠ l 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * e ( l ) ( v ) C o v ∏ l * ∈ L , l * ≠ l 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * ϑ ( l ) ( B ) − 1 e ( l ) ( v − n * l ) , ∏ l * ∈ L , l * ≠ l 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * ϑ ( l ) ( B ) − 1 e ( l ) ( v − m * l )

is the scalar corresponding to the (n*, m*)th parameter pair. In fact, by setting the variables

e ≠ l ( l ) ( v ) ≔ ∏ l * ∈ L , l * ≠ l 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * e ( l ) ( v ) , C o v e ≠ l ( l ) ( v ) , e ≠ l ( l ) ( v − n l ) = 0 , n ≠ 0 , σ ≠ l ( l ) 2 ≔ V a r e ≠ l ( l ) ( v ) ,

the l-directional auto-regressions { A ≠ l ( l ) ( v ) } and { M ≠ l ( l ) ( v ) } defined by

1 − ∑ n = 1 p l φ n ( l ) B n l A ≠ l ( l ) ( v ) = e ≠ l ( l ) ( v ) ≡ 1 + ∑ m = 1 q l θ m ( l ) B m l M ≠ l ( l ) ( v )

and the variance matrix

Σ ≠ l ( l ) ≔ V a r A ≠ l ( l ) ( v − l ) … A ≠ l ( l ) ( v − p l l ) M ≠ l ( l ) ( v − l ) … M ≠ l ( l ) ( v − q l l ) τ

then the convergence of the relevant random vector (as Div^(l) → ∞) is established to the N ( 0 , { σ ≠ l ( l ) } 2 Σ ≠ l ( l ) ) (it is highlighted again that all the latest definitions are functions of the (d − 1) fixated indexes and sampling set).

For the sake of the next step, it is mandatory to see that by fixating two points, say (∗1) and (∗2), over the (d − 1) directions, then the stacked random vector, (of the form 1 D i v ( l ) ∑ ( * 1 ) ∑ ( * 2 ) ) convergence as Div^(l) → ∞, is established to the

N 0 0 , σ ≠ l ( l ) ( * 1 ) 2 Σ ≠ l ( l ) ( * 1 ) C o v ≠ l ( l ) ( * 1 , * 2 ) C o v ≠ l ( l ) ( * 2 , * 1 ) σ ≠ l ( l ) ( * 2 ) 2 Σ ≠ l ( l ) ( * 2 ) :

the covariance element of the matrix C o v ≠ l ( l ) ( * 1 , * 2 ) takes the form

(7) ∑ j ∈ Z E ∏ l * ∈ L , l * ≠ l 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * e ( l ) ( v ( * 1 ) ) ∏ l * ∈ L , l * ≠ l 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * ϑ ( l ) ( B ) − 1 e ( l ) ( v ( * 1 ) − n * l ) ∏ l * ∈ L , l * ≠ l 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * e ( l ) ( v ( * 2 ) − j l ) ∏ l * ∈ L , l * ≠ l 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * ϑ ( l ) ( B ) − 1 e ( l ) ( v ( * 2 ) − m * l − j l )

with the difference v _(∗1) − v _(∗2) being (∗1) − (∗2) on the (d − 1) directions (and it could be 0 on the direction l).

For the second step of changing a direction l ₂ (call the first direction l ₁ now), i.e., there are (d − 2) fixated indexes with the sum below being with respect to two indexes, and the derivatives are still with respect to parameters in l ₁, the vectors of interest become

1 D i v ( l 1 ) D i v ( l 2 ) ∑ ∏ l * ∈ L , l * ≠ l 1 , l 2 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * 1 − ∑ n = 1 N v ( l 2 ) Φ n ( l 2 ) B n l 2 1 − ∑ n = 1 N v ( l 1 ) Φ n ( l 1 ) B n l 1 X ( v ) ∏ l * ∈ L , l * ≠ l 1 , l 2 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * 1 − ∑ n = 1 N v ( l 2 ) Φ n ( l 2 ) B n l 2 ∂ ∂ ϑ ( l 1 ) 1 − ∑ n = 1 N v ( l 1 ) Φ n ( l 1 ) B n l 1 X ( v ) ,

which, roughly speaking as D i v ( l 1 ) → ∞ , have become 1 D i v ( l 2 ) ∑ N ( 0 , { σ ≠ l 1 ( l 1 ) } 2 Σ ≠ l 1 ( l 1 ) ) (in the sense of 1 D i v ( l 2 ) ∑ ( 6 ) ) with the sum concerning the l ₂ direction only; these are zero mean Gaussian with variance matrix

(8) 1 D i v ( l 2 ) ∑ σ ≠ l 1 ( l 1 ) 2 Σ ≠ l 1 ( l 1 ) + ∑ l > 0 ∑ C o v ≠ l 1 ( l 1 ) ( l ) + C o v ≠ l 1 ( l 1 ) ( l ) τ ,

where the scalar ‘lag’ l corresponds to the l ₂ direction only and the sum with respect to l is for all possible lags that can be achieved in the case: the two sums without index in (8) correspond to all points of the sampling set (on the fixated (d − 2) direction indexes) and over the l ₂ direction that exhibit a pair also in the sampling set, of a ‘lag’ zero (i.e., with itself for the first sum) or lag l (for the second sum). The covariance elements in C o v ≠ l 1 ( l 1 ) ( l ) , l ≠ 0 become

(9) ∑ j ∈ Z E ∏ l * ∈ L , l * ≠ l 1 , l 2 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * 1 − ∑ n = 1 N v ( l 2 ) Φ n ( l 2 ) B n l 2 e ( l 1 ) ( v ) ∏ l * ∈ L , l * ≠ l 1 , l 2 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * 1 − ∑ n = 1 N v ( l 2 ) Φ n ( l 2 ) B n l 2 ϑ ( l 1 ) ( B ) − 1 e ( l 1 ) v − n * l 1 ∏ l * ∈ L , l * ≠ l 1 , l 2 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * 1 − ∑ n = 1 N v ( l 2 ) Φ n ( l 2 ) B n l 2 e ( l 1 ) ( v − l l 2 − j l 1 ) ∏ l * ∈ L , l * ≠ l 1 , l 2 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * 1 − ∑ n = 1 N v ( l 2 ) Φ n ( l 2 ) B n l 2 ϑ ( l 1 ) ( B ) − 1 e ( l 1 ) ( v − l l 2 − m * l 1 − j l 1 )

according to (7).

Next it is revealing to write for any v ∈ S , the key equality

(10) 1 − ∑ n = 1 N v ( l 2 ) Φ n ( l 2 ) B n l 2 e ( l 1 ) ( v ) − e ( l 1 , l 2 ) ( v ) = ∑ n = N v ( l 2 ) + 1 ∞ Φ n ( l 2 ) e ( l 1 ) ( v − n l 2 ) .

Then straight from (10), it can be derived for some constants C ₁, C ₂, C ₃ > 0, α ₁, α ₂, α ₃ ∈ (0, 1), that

σ ≠ l 1 ( l 1 ) 2 − V a r ∏ l * ∈ L , l * ≠ l 1 , l 2 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * e ( l 1 , l 2 ) ( v ) ≤ C 1 α 1 N v ( l 2 ) ,

and

E ∏ l * ∈ L , l * ≠ l 1 , l 2 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * 1 − ∑ n = 1 N v ( l 2 ) Φ n ( l 2 ) B n l 2 × ϑ ( l 1 ) ( B ) − 1 e ( l 1 ) ( v − n * l 1 ) ∏ l * ∈ L , l * ≠ l 1 , l 2 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * 1 − ∑ n = 1 N v ( l 2 ) Φ n ( l 2 ) B n l 2 × ϑ ( l 1 ) ( B ) − 1 e ( l 1 ) ( v − m * l 1 ) − E ∏ l * ∈ L , l * ≠ l 1 , l 2 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * ϑ ( l 1 ) ( B ) − 1 e ( l 1 , l 2 ) ( v − n * l 1 ) ∏ l * ∈ L , l * ≠ l 1 , l 2 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * × ϑ ( l 1 ) ( B ) − 1 e ( l 1 , l 2 ) ( v − m * l 1 ) ≤ C 2 α 2 N v ( l 2 )

as well as

( 9 ) − ∑ j ∈ Z E ∏ l * ∈ L , l * ≠ l 1 , l 2 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * e ( l 1 , l 2 ) ( v ) ∏ l * ∈ L , l * ≠ l 1 , l 2 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * ϑ ( l 1 ) ( B ) − 1 e ( l 1 , l 2 ) ( v − n * l 1 ) ∏ l * ∈ L , l * ≠ l 1 , l 2 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * e ( l 1 , l 2 ) ( v − l l 2 − j l 1 ) ∏ l * ∈ L , l * ≠ l 1 , l 2 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * ϑ ( l 1 ) ( B ) − 1 e ( l 1 , l 2 ) v − l l 2 − m * l 1 − j l 1 ≤ C 3 α 3 N v ( l 2 ) .

Additionally to (3), the independence of the two variables is a fact, which implies that for any l > 0, j ∈ Z , it is

0 ≡ E ∏ l * ∈ L , l * ≠ l 1 , l 2 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * e ( l 1 , l 2 ) ( v ) ∏ l * ∈ L , l * ≠ l 1 , l 2 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * ϑ ( l 1 ) ( B ) − 1 e ( l 1 , l 2 ) ( v − n * l 1 ) ∏ l * ∈ L , l * ≠ l 1 , l 2 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * e ( l 1 , l 2 ) ( v − l l 2 − j l 1 ) ∏ l * ∈ L , l * ≠ l 1 , l 2 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * ϑ ( l 1 ) ( B ) − 1 e ( l 1 , l 2 ) ( v − l l 2 − m * l 1 − j l 1 )

since e ( l 1 , l 2 ) ( v ) is independent of all variables in { ϑ ( l 1 ) ( B ) } − 1 e ( l 1 , l 2 ) ( v − n * l 1 ) , it is independent of e ( l 1 , l 2 ) ( v − l l 2 − j l 1 ) and it is also independent of all variables in { ϑ ( l 1 ) ( B ) } − 1 e ( l 1 , l 2 ) ( v − l l 2 − m * l 1 − j l 1 ) ; hence the last of three equalities can be re-expressed as

| ( 9 ) | ≤ C 3 α 3 N v ( l 2 ) .

From all the above results, it can be contained that

( n * , m * ) t h e l e m e n t o f ( 8 ) − 1 D i v ( l 2 ) ∑ V a r ∏ l * ∈ L , l * ≠ l 1 , l 2 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * e ( l 1 , l 2 ) ( v ) E ∏ l * ∈ L , l * ≠ l 1 , l 2 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * ϑ ( l 1 ) ( B ) − 1 e ( l 1 , l 2 ) ( v − n * l 1 ) ∏ l * ∈ L , l * ≠ l 1 , l 2 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * ϑ ( l 1 ) ( B ) − 1 e ( l 1 , l 2 ) ( v − m * l 1 ) ≤ 1 D i v ( l 2 ) ∑ C 4 α 4 N v ( l 2 ) for some C 4 > 0 , α 4 ∈ ( 0,1 ) ;

provided that lim D i v ( l 2 ) → ∞ 1 D i v ( l 2 ) ∑ C 4 α 4 N v ( l 2 ) = 0 , then by setting

e ≠ l 1 , l 2 ( l 1 , l 2 ) ( v ) ≔ ∏ l * ∈ L , l * ≠ l 1 , l 2 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * e ( l 1 , l 2 ) ( v ) , C o v e ≠ l 1 , l 2 ( l 1 , l 2 ) ( v ) , e ≠ l 1 , l 2 ( l 1 , l 2 ) ( v − n 1 l 1 − n 2 l 2 ) = 0 , ( n 1 , n 2 ) ≠ ( 0,0 ) , σ ≠ l 1 , l 2 ( l 1 , l 2 ) 2 ≔ V a r e ≠ l 1 , l 2 ( l 1 , l 2 ) ( v ) ,

as well as the l ₁-directional auto-regressions { A ≠ l 1 , l 2 ( l 1 ) ( v ) } and { M ≠ l 1 , l 2 ( l 1 ) ( v ) } defined by

1 − ∑ n = 1 p l 1 φ n ( l 1 ) B n l 1 A ≠ l 1 , l 2 ( l 1 ) ( v ) = e ≠ l 1 , l 2 ( l 1 , l 2 ) ( v ) ≡ 1 + ∑ m = 1 q l 1 θ m ( l 1 ) B m l 1 M ≠ l 1 , l 2 ( l 1 ) ( v )

together with the variance matrix

Σ ≠ l 1 , l 2 ( l 1 ) ≔ V a r A ≠ l 1 , l 2 ( l 1 ) ( v − l 1 ) … A ≠ l 1 , l 2 ( l 1 ) ( v − p l 1 l 1 ) M ≠ l 1 , l 2 ( l 1 ) ( v − l 1 ) … M ≠ l 1 , l 2 ( l 1 ) ( v − q l 1 l 1 ) τ ,

then the convergence (as D i v ( l 2 ) → ∞ with D i v ( l 2 ) and the number of terms being asymptotically equal) is to the N ( 0 , { σ ≠ l 1 , l 2 ( l 1 , l 2 ) } 2 Σ ≠ l 1 , l 2 ( l 1 ) ) .

This is the ideal point to reverse the order of the two primary directions in the derivative, and write in the same spirit that

1 D i v ( l 2 ) D i v ( l 1 ) ∑ ∏ l * ∈ L , l * ≠ l 2 , l 1 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * 1 − ∑ n = 1 N v ( l 1 ) Φ n ( l 1 ) B n l 1 1 − ∑ n = 1 N v ( l 2 ) Φ n ( l 2 ) B n l 2 X ( v ) ∏ l * ∈ L , l * ≠ l 2 , l 1 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * 1 − ∑ n = 1 N v ( l 1 ) Φ n ( l 1 ) B n l 1 ∂ ∂ ϑ ( l 2 ) 1 − ∑ n = 1 N v ( l 2 ) Φ n ( l 2 ) B n l 2 X ( v ) → D 1 D i v ( l 2 ) D i v ( l 1 ) ∑ ∏ l * ∈ L , l * ≠ l 2 , l 1 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * e ( l 1 , l 2 ) ( v ) ∏ l * ∈ L , l * ≠ l 2 , l 1 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * ϑ ( l 2 ) ( B ) − 1 e ( l 1 , l 2 ) ( v − m * l 2 )

which are altogether N ( 0 , { σ ≠ l 1 , l 2 ( l 1 , l 2 ) } 2 Σ ≠ l 1 , l 2 ( l 2 ) ) . In fact, since

E ∏ l * ∈ L , l * ≠ l 2 , l 1 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * e ( l 1 , l 2 ) ( v ) 2 ∏ l * ∈ L , l * ≠ l 1 , l 2 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * ϑ ( l 1 ) ( B ) − 1 e ( l 1 , l 2 ) ( v − n * l 1 ) ∏ l * ∈ L , l * ≠ l 2 , l 1 1 − ∑ n = 1 N v ( l * ) Φ n ( l * ) B n l * ϑ ( l 2 ) ( B ) − 1 e ( l 1 , l 2 ) ( v − m * l 2 ) = σ ≠ l 1 , l 2 ( l 1 , l 2 ) 2 E ϑ ( l 1 ) ( B ) − 1 e ≠ l 1 , l 2 ( l 1 , l 2 ) ( v − n * l 1 ) E ϑ ( l 2 ) ( B ) − 1 e ≠ l 1 , l 2 ( l 1 , l 2 ) ( v − m * l 2 ) ≡ 0 ,

the two random vectors for the two sets of parameters ϑ ( l 1 ) , ϑ ( l 2 ) are asymptotically uncorrelated, i.e., independent (as D i v ( l 1 ) , D i v ( l 2 ) → ∞ ).

Appendix B: Simulations

This segment attempts to investigate empirically the elements of the theory.

All comparisons will take place for factorized models only: if a factorized model was to be compared to a standard one, how would any conclusion be drawn, when the two do not share the same properties? Nevertheless, the models studied in Section 3 have merits established, provided that up to d ‘independent’ straight lines ‘carry’ the dependence: that is a heavy restriction imposed on the standard parameterization. So to succeed more realistic covariance properties and remain in the factorized model department, more than d lines may be attempted expressing a Z d dependence. Regarding the edge-effect, a – more flexible to any type dependencies – factorized model, with more than d lines to be estimated, is expected to mirror its persistence ((3), (4) will not hold) as opposed to the Section 3 skeleton model.

Two equations have been singled out, i.e., the first one being

(11) ( 1 − 0.4 B 1 ) ( 1 − 0.8 B 2 ) ( 1 − φ B 1 B 2 ) X ( u , v ) = e ( u , v ) + θ e ( u − 1 , v − 1 )

and the second one being

(12) ( 1 − 0.8 B 2 ) ( 1 − φ B 1 B 2 ) X ( u , v ) = e ( u , v ) + θ e ( u − 1 , v − 1 ) ,

both with unknown parameters φ, θ ∈ (−1, 1), φ + θ ≠ 0 and independent error variables e(u, v) ∼ N(0, 1) (the ‘loose’ notation of Section 2 not 3, repeats here B 1 l 1 B 2 l 2 instead of ( B 1 , B 2 ) ( l 1 , l 2 ) ).

Clearly, the variables are Z 2 -indexed (for both (11), (12)), with the addition of the AR factor ( 1 − 0.4 B 1 ) from (12) to (11). According to Section 3, up to two different straight line directions of the parameters can render the desirable estimation results, while a third direction will mishandle the situation. Then (12) is undoubtedly expected to result optimally; for (11) and while only φ, θ (over the (1, 1) direction) are the parameters of interest, three different cases have materialized: that (i) all parameters (including the 0.8, 0.4) are unknown (referred to as ‘(11) with 2 unkn’), (ii) 0.8 (but not 0.4) is unknown (referred to as ‘(11) with 1 unkn’), and (iii) both 0.8, 0.4 are known (referred to as ‘(11)’). Hence case (i) is the most adequate to expose the edge-effect, with parameters to be searched, towards not just the (1, 1) but also the (1, 0), (0, 1) directions. Moreover, cases (ii) and (iii) still using the three directions to extend the dependence in Z 2 are worth attending to as they can witness what happens when knowledge or consistent estimation can be taken advantage of for redundant or essential directions.

To generate observations, model (11) yields the MA(∞) representation

X ( u , v ) = e ( u , v ) + ∑ j 1 = 1 ∞ 0 . 4 j 1 e ( u − j 1 , v ) + ∑ j 2 = 1 ∞ 0 . 8 j 2 e ( u , v − j 2 ) + ∑ j 1,2 = 1 ∞ φ j 1,2 e ( u − j 1,2 , v − j 1,2 ) + ∑ j 1 , j 2 = 1 ∞ ( 0 . 4 j 1 0 . 8 j 2 + θ 0 . 4 j 1 − 1 0 . 8 j 2 − 1 ) e ( u − j 1 , v − j 2 ) + ∑ j 1 , j 1,2 = 1 ∞ 0 . 4 j 1 φ j 1,2 e ( u − j 1 − j 1,2 , v − j 1,2 ) + ∑ j 2 , j 1,2 = 1 ∞ 0 . 8 j 2 φ j 1,2 e ( u − j 1,2 , v − j 2 − j 1,2 ) + ∑ j 1 , j 2 , j 1,2 = 1 ∞ ( 0 . 4 j 1 0 . 8 j 2 + θ 0 . 4 j 1 − 1 0 . 8 j 2 − 1 ) φ j 1,2 e u − j 1 − j 1,2 , v − j 2 − j 1,2 ,

while the relevant representation for model (12) is

X ( u , v ) = e ( u , v ) + ∑ j 2 = 1 ∞ 0 . 8 j 2 e ( u , v − j 2 ) + ∑ j 2 = 0 ∞ ∑ j = 1 ∞ 0 . 8 j 2 ( φ j + θ φ j − 1 ) e ( u − j , v − j − j 2 ) .

Observations for (11) (all three cases) have been generated on a square (according to the basis {(0, 1), (1, 0)}) of edge n = 5, 10, 15. For (12), the (n × n) observations have been generated not only on the square but also on a rhombus (in the sense of an equal number of points per side rather than an equal length of it) adjusted to the basis {(0, 1), (1, 1)}; so there will be two separate cases referred to as ‘(12) orth’ and ‘(12) adj’, respectively. That adjustment of the sampling set is tailor-made on the basis of (12), and is expected to bring out the best of results.

In practice, some real values of φ, θ have been taken in to generate the observations on the parallelograms: the two sets of values that have been tried are φ = −0.8, θ = −0.7 and φ = −0.3, θ = 0.8 (see Tables 1 and 2, respectively). Nevertheless for the estimation of the two (or more if applicable) parameters, a search has taken place φ, θ = 0, ±0.05, …±0.90, ±0.95: for each assumed value, the innovation for the sample location (u, v) is i n n o v ( u , v ) ≔ X cor ( u , v ) − ∑ j = 1 s ( u , v ) ( − θ ) j − 1 X cor ( u − j , v − j ) with the index j going up to ‘s(u, v)’ as allowed from the sample, and the corrected X, ‘X _cor’, being according to the left-hand side of the equation ((11) or (12) for either model). Then the quantity Q ≔ ∑ _u,v(innov(u,v))² has been calculated, so that the standard estimates are those values providing the minimum Q out of all values attempted.

Table 1:

Approximate (from 10000 replications – or 1000 replications for ‘(11) with 2 unkn’, n = 10, 15) bias, variance and MSE of the standard estimators based on (n × n) observations from the five different models as in (11) and (12) with real values φ = −0.8, θ = −0.7.

	Bias		Variance		MSE
	θ	φ	θ	φ	θ	φ
*n = 5*
(11) with 2 unkn	0.45738	−0.0031	0.537365	0.0376849	0.746561	0.0376945
(11) with 1 unkn	0.41801	−0.01717	0.424342	0.0292897	0.599075	0.0295845
(11)	0.374585	−0.02529	0.337826	0.0244664	0.47814	0.025106
(12) orth	0.29782	−0.035105	0.241412	0.0185264	0.330109	0.0197588
(12) adj	0.17507	−0.010325	0.114069	0.0202776	0.144719	0.0203843
n = 10
(11) with 2 unkn	0.1259	−0.03505	0.0205992	0.005124	0.03645	0.0063528
(11) with 1 unkn	0.12839	−0.034655	0.0188575	0.00494728	0.0353415	0.00614825
(11)	0.128355	−0.035725	0.0184807	0.00472997	0.0349557	0.00600625
(12) orth	0.124205	−0.0326	0.0155059	0.00442174	0.0309328	0.0054845
(12) adj	0.074165	−0.016895	0.0106533	0.00436781	0.0161538	0.00465325
n = 15
(11) with 2 unkn	0.08915	−0.02645	0.00539478	0.0021379	0.0133425	0.0028375
(11) with 1 unkn	0.09064	−0.027955	0.00538539	0.00202927	0.013601	0.00281075
(11)	0.09103	−0.02863	0.00529204	0.00203482	0.0135785	0.0028545
(12) orth	0.08699	−0.027265	0.00482674	0.00194037	0.012394	0.00268375
(12) adj	0.050135	−0.01324	0.00388423	0.0019617	0.00639775	0.002137

Table 2:

	Bias		Variance		MSE
	θ	φ	θ	φ	θ	φ
*n = 5*
(11) with 2 unkn	−0.80958	0.58066	0.661755	0.183735	1.31717	0.520901
(11) with 1 unkn	−0.76642	0.587945	0.565795	0.156718	1.15319	0.502397
(11)	−0.73879	0.58582	0.478299	0.137127	1.02411	0.480312
(12) orth	−0.723145	0.590555	0.397152	0.10714	0.920091	0.455895
(12) adj	−0.62876	0.535845	0.34862	0.10895	0.743959	0.39608
n = 10
(11) with 2 unkn	−0.52715	0.4923	0.10711	0.0399257	0.384997	0.282285
(11) with 1 unkn	−0.529475	0.48831	0.104446	0.0369543	0.38479	0.275401
(11)	−0.52895	0.486345	0.0973979	0.0350028	0.377186	0.271534
(12) orth	−0.508945	0.47463	0.0851692	0.0315449	0.344194	0.256818
(12) adj	−0.35135	0.35298	0.0776337	0.0393431	0.201081	0.163938
n = 15
(11) with 2 unkn	−0.3925	0.3898	0.0317838	0.0152221	0.18584	0.167165
(11) with 1 unkn	−0.39287	0.38608	0.0354852	0.0169952	0.189832	0.166053
(11)	−0.38886	0.38263	0.0336204	0.0168673	0.184833	0.163273
(12) orth	−0.37922	0.376845	0.0308072	0.0155341	0.174615	0.157546
(12) adj	−0.22972	0.250905	0.0199237	0.0144149	0.072695	0.0773683

The reader may now skim through Table 1 with the actual parameters being φ = −0.8 and θ = −0.7. A first observation is that, for the smaller sample sizes, the results for φ are better than for θ, since the latter is a moving-average parameter responsible for the AR(∞) representation that is ‘missed’ from a finite sample. Especially for the larger sample sizes n = 10, 15, the three cases of (11) do not seem to exhibit significant differences, so that one is always better than the others. Similarly, there are many realizations when ‘(11)’ provides almost identical results to ‘(12) orth’, which can be attributed to the fact that the two models do almost coincide, since the 0.8 (for both) and 0.4 (for (11) only) are taken out to be already known. It might be concluded from Table 1 that ‘(12) adj’ offers systematically the best performance, especially for the θ and based on all (approximate) bias, variance and MSE (mean squared error) of the estimators; another remark for the same model is that, as n becomes larger, it balances the performance of the θ and φ estimators with equivalent results. Regarding the estimator’s bias (and the edge-effect), it is mainly for n = 5 that it can be declared that the θ estimator is more biased for the three-directional ‘(11) with 2 unkn’: more evidence must be acquired in this appendix.

The picture read in Table 2 is far from similar to that of Table 1. Model ‘(12) adj’ is still the most prevalent in terms of bias, variance and MSE for all (except n = 5, 10 estimator φ variance) cases, but the bias this time seems to be high for both parameters, for all models and different n sizes: also note how all three models ‘(11)…′ are exposed with an excessive – larger than one – θ MSE when there are around 25 observations. The bias in Table 2 does reduce, yes, as n increases but not in the sense of a changing order (as in Table 1) for one model to conquer. Nevertheless, it might be observed that, eventually for n = 15, the bias for ‘(12) adj’ does almost equalize half the bias for ‘(11) with 2 unkn’, which is almost always (i.e., for n = 5, 10, 15) the case for Table 1. The difficulties deduced for Table 2 (as opposed to Table 1) might be attributed here to a θ being larger in absolute value, or to the positive θ sign translating into (−θ)^j, j = 1, 2, … that oscillates deterministically around zero, or to the fact that (φ + θ) is closer to zero than in Table 1.

The first part of the investigation does not suffice for victorious statements regarding the edge-effect or estimators’ bias. As the edge-effect dispute is over the bias speed to zero (the limit is zero anyway), the further investigation magnifies what happens for the smaller sample sizes. Moreover, as the finite number of AR parameters has easy fixes in X _cor, it is expected that the MA estimators’ bias should tend to reflect more the deviation from ideal. The focus now falls on ‘(11) with 2 unkn’ and ‘(11) with 1 unkn’ only, and they share identical covariance properties: the dependence extends over three directions (1, 0), (1, 1) and (0, 1) but the second case knows the (1, 0)-direction parameters, so that estimation is necessary over two lines in Z 2 .

According to Table 3, the case of 9 observations always nails the θ estimate on the highest search value (0.95) (also affecting the φ estimate): the (3 × 3) square fails in estimating the θ over the (1, 1) direction due to lack of information (after taking care of the X _cor). A ray of light is definite for the parameter case φ = −0.8, θ = −0.7, when for n = 4 (and n = 5) there is a significant difference in the θ (only) and φ bias favouring the ‘(11) with 1 unkn’: that is a promising sign in defense of the theoretical results, as this model estimates over a basis in Z 2 with two directions, while ‘(11) with 2 unkn’ has included a third direction-source of the edge-effect. The parameter values φ = −0.3, θ = 0.8 also achieve a higher θ absolute bias (of ‘2 unkn’ over ‘1 unkn’) for n = 4 but mainly for n = 5. Then for more than 30 observations (n = 6), the two models perform evenly (for both parameter values).

Table 3:

Approximate (from 10000 replications) bias, variance and MSE of the standard estimators based on (n × n) observations (on a square) from the two bigger models (11) (using the same seed) with real values φ and θ.

φ = −0.8, θ = −0.7	Bias		Variance		MSE
	θ	φ	θ	φ	θ	φ
*n = 3*
(11) with 2 unkn	1.65	0.55626	<0.0001	0.320996	2.7225	0.630422
(11) with 1 unkn	1.65	0.424655	<0.0001	0.358409	2.7225	0.538741
n = 4
(11) with 2 unkn	0.787575	0.129515	0.766174	0.152626	1.38645	0.1694
(11) with 1 unkn	0.682555	0.052505	0.694089	0.0911215	1.15997	0.0938782
n = 5
(11) with 2 unkn	0.465385	−0.003065	0.539252	0.0371669	0.755835	0.0371763
(11) with 1 unkn	0.410795	−0.0168	0.419662	0.0295793	0.588414	0.0298615
n = 6
(11) with 2 unkn	0.26878	−0.027465	0.251961	0.0189854	0.324204	0.0197398
(11) with 1 unkn	0.260825	−0.031985	0.198479	0.0165927	0.266509	0.0176157
φ = −0.3, θ = 0.8	Bias		Variance		MSE
n = 3
(11) with 2 unkn	0.15	0.394565	< 0.0001	0.33586	0.0225	0.491542
(11) with 1 unkn	0.15	0.399655	< 0.0001	0.417397	0.0225	0.577121
n = 4
(11) with 2 unkn	−0.8423	0.516	0.802983	0.336075	1.51245	0.60233
(11) with 1 unkn	−0.823785	0.550445	0.735621	0.299234	1.41424	0.602223
n = 5
(11) with 2 unkn	−0.81028	0.57742	0.663936	0.180371	1.32049	0.513784
(11) with 1 unkn	−0.761765	0.584175	0.561225	0.150981	1.14151	0.492242
n = 6
(11) with 2 unkn	−0.71586	0.578935	0.450568	0.11028	0.963024	0.445445
(11) with 1 unkn	−0.70413	0.58191	0.383069	0.0995153	0.878868	0.438135

Hence the common anomaly of the edge-effect (for estimation over Z 2 ) has been sought here via an oversize factorized model (that uses 3 directions). The scenario when one decides for a number of directions higher than the d of the index set is strictly forbidden in Section 3 for the validity of the estimation results. Consequently, the inflated estimators’ bias for ‘(11) with 2 unkn’ in Table 3, where applicable, is evidence to the theory. To verify the theory with more safety in the future, it is advisable to dare even more than one extra line, or to include moving-average parameters in most directions. The examination of Z d , d ≥ 3 processes might reveal a more grotesque presence of the edge-effect too. Anything that might scar the estimation process must keep the researcher alert.

References

Brockwell, P. J., and R. A. Davis. 1991. Time Series: Theory and Methods, 2nd ed. New-York: Springer-Verlag.10.1007/978-1-4419-0320-4Suche in Google Scholar

Dahlhaus, R., and H. R. Künsch. 1987. “Edge-Effects and Efficient Parameter Estimation for Stationary Random Fields.” Biometrika 74 (Issue 4): 877–82. https://doi.org/10.1093/biomet/74.4.877.Suche in Google Scholar

Dimitriou-Fakalou, C. 2019. “On Accepting the Edge-Effect (For the Inference of ARMA-type Processes in Z2${\mathbb{Z}}^{2}$).” Econometrics and Statistics Part A 10: 53–70. https://doi.org/10.1016/j.ecosta.2018.03.001.Suche in Google Scholar

Fernández-Casal, R., and R. M. Crujeiras. 2010. “Spatial Dependence Estimation Using FFT of Biased Covariances.” Journal of Statistical Planning and Inference 140 (Issue 9): 2653–68. https://doi.org/10.1016/j.jspi.2010.03.032.Suche in Google Scholar

Gupta, A. 2018. “Autoregressive Spatial Spectral Estimates.” Journal of Econometrics 203 (Issue 1): 80–95. https://doi.org/10.1016/j.jeconom.2017.10.006.Suche in Google Scholar

Guyon, X. 1982. “Parameter Estimation for a Stationary Process on a d-Dimensional Lattice.” Biometrika 69 (Issue 1): 95–105. https://doi.org/10.1093/biomet/69.1.95.Suche in Google Scholar

Hannan, E. J. 1973. “The Asymptotic Theory of Linear Time Series Models.” Journal of Applied Probability 10 (Issue 1): 130–45. https://doi.org/10.1017/s0021900200042145.Suche in Google Scholar

Martin, R. J. 1979. “A Subclass of Lattice Processes Applied to a Problem in Planar Sampling.” Biometrika 66 (Issue 2): 209–217. https://doi.org/10.1093/biomet/66.2.209.Suche in Google Scholar

Robinson, P. M., and J. M. Vidal-Sanz. 2006. “Modified Whittle Estimation of Multilateral Models on a Lattice.” Journal of Multivariate Analysis 97 (Issue 5): 1090–1120. https://doi.org/10.1016/j.jmva.2005.05.013.Suche in Google Scholar

Sakhno, L. 2007. “Bias Control in the Estimation of Spectral Functionals.” Theory of Stochastic Processes 13 (Issue 1-2): 225–233.Suche in Google Scholar

Vidal-Sanz, J. M. 2009. “Automatic Spectral Density Estimation for Random Fields on a Lattice via Bootstrap.” Test 18 (Issue 1): 96–114. https://doi.org/10.1007/s11749-007-0059-5.Suche in Google Scholar

Whittle, P. 1954. “On Stationary Processes in the Plane.” Biometrika 41 (Number 3/4): 434–449. https://doi.org/10.2307/2332724.Suche in Google Scholar

Yao, Q., and P. J. Brockwell. 2006. “Gaussian Maximum Likelihood Estimation for ARMA Models II: Spatial Processes.” Bernoulli 12 (Number 3): 403–429. https://doi.org/10.3150/bj/1151525128.Suche in Google Scholar

Received: 2019-11-26

Revised: 2021-03-21

Accepted: 2021-04-12

Published Online: 2021-05-18

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Artikel in diesem Heft

https://doi.org/10.1515/jtse-2020-0012

Schlagwörter für diesen Artikel

bias; edge-effect; factorization; linear-by-linear