Home Factor Modeling for High-Dimensional Interval-Valued Data
Article
Licensed
Unlicensed Requires Authentication

Factor Modeling for High-Dimensional Interval-Valued Data

  • Yan Guo , Guchu Zou and Jianhong Wu EMAIL logo
Published/Copyright: February 4, 2025
Become an author with De Gruyter Brill

Abstract

The paper considers an approximate factor model for interval-valued panel data with both large numbers of cross-section units and time series observations. A ratio-type estimator is proposed for the number of interval-valued factors in the approximate factor model. A variant of the estimator is also suggested, which is robust to the case with dominant factors. Under certain conditions, the estimators can be proved to be consistent. Moreover, the estimators of interval-valued factors and the pooled loadings can be obtained by the principal component analysis method for point-valued data. Monte Carlo simulation studies show that the proposed estimators have the desired finite sample properties.

JEL Classification: C01; C13

Corresponding author: Jianhong Wu, College of Mathematics and Science, Shanghai Normal University, Shanghai 200234, China; and Lab for Educational Big Data and Policymaking, Ministry of Education, Shanghai 200234, China, E-mail: 

Yan Guo and Guchu Zou contributed equally to this work.


Acknowledgments

We are deeply grateful to Professor Jeremy Piger and two anonymous referees for valuable comments that led to substantial improvement of this paper.

  1. Research funding: This research is supported in part by the National Nature Science Foundation of China (Grant No. 72173086).

Appendix: Technical Details

Proof of Theorem 1.

Under Assumptions 1–3, it follows from Lemma A.11 of Ahn and Horenstein (2013) that, for any k ≤ r, μ ̃ N T , k L = O p ( 1 ) and μ ̃ N T , k R = O p ( 1 ) . Thus, we have ω μ ̃ N T , k L = O p ( 1 ) and ( 1 ω ) μ ̃ N T , k R = O p ( 1 ) , ω [ 0,1 ] , k = 1,2 , , r . Accordingly, for any weight 0 ≤ ω ≤ 1,

ω μ ̃ N T , k L + ( 1 ω ) μ ̃ N T , k R ω μ ̃ N T , k + 1 L + ( 1 ω ) μ ̃ N T , k + 1 R = O p ( 1 ) , k = 1,2 , , r 1 .

Also, it follows from Lemma A.9 of Ahn and Horenstein (2013) that, for any r + 1 ≤ k ≤ [d c m] − r, μ ̃ N T , k L = O p ( 1 m ) and μ ̃ N T , k R = O p ( 1 m ) . Then, we have ω μ ̃ N T , k L = O p ( 1 m ) and ( 1 ω ) μ ̃ N T , k R = O p ( 1 m ) , for any r + 1 ≤ k ≤ [d c m] − r and ω ∈ [0, 1]. Accordingly,

ω μ ̃ N T , k L + ( 1 ω ) μ ̃ N T , k R ω μ ̃ N T , k + 1 L + ( 1 ω ) μ ̃ N T , k + 1 R = O p ( 1 ) , k = r + 1 , r + 2 , , r max .

However, when k = r,

ω μ ̃ N T , r L + ( 1 ω ) μ ̃ N T , r R ω μ ̃ N T , r + 1 L + ( 1 ω ) μ ̃ N T , r + 1 R = O p ( m ) .

It then follows that

lim m Pr ( r ̂ ω = r ) = 1 .

Proof of Theorem 2.

Let μ ̃ N T , k * be ω μ ̃ N T , k L + ( 1 ω ) μ ̃ N T , k R . It follows from the proof of Theorem 1 that, under Assumptions 1–2, μ ̃ N T , k * = O p ( 1 ) , k r and μ ̃ N T , k * = O p ( 1 m ) , k r + 1 . Therefore, for k ≤ r,

2 Φ μ ̃ N T , k * 1 = 0 μ ̃ N T , k * 2 2 π e t 2 2 d t 2 2 π e μ ̃ N T , k * 2 2 μ ̃ N T , k * = O p ( 1 ) .

For r + 1 ≤ k ≤ [d c m] − r, we have 2 Φ μ ̃ N T , k * 1 = O p ( 1 m ) , because

2 Φ μ ̃ N T , k * 1 = 0 μ ̃ N T , k * 2 2 π e t 2 2 d t 2 2 π e μ ̃ N T , k * 2 2 μ ̃ N T , k * = O p 1 m , 2 Φ μ ̃ N T , k * 1 = 0 μ ̃ N T , k * 2 2 π e t 2 2 d t 2 2 π μ ̃ N T , k * = O p 1 m .

Thus, we have

2 Φ μ ̃ N T , k * 1 2 Φ μ ̃ N T , k + 1 * 1 = O p ( 1 ) , k = 1 , , r 1 , r + 1 , , r max .

However, when k = r,

2 Φ μ ̃ N T , r * 1 2 Φ μ ̃ N T , r + 1 * 1 = O p ( m ) .

It then follows that

lim m Pr ( r ̃ ω = r ) = 1 .

References

Ahn, S. C., and A. R. Horenstein. 2013. “Eigenvalue Ratio Test for the Number of Factors.” Econometrica 81 (3): 1203–27.10.3982/ECTA8968Search in Google Scholar

Bai, J., and S. Ng. 2002. “Determining the Number of Factors in Approximate Factor Models.” Econometrica 70 (1): 191–221. https://doi.org/10.1111/1468-0262.00273.Search in Google Scholar

Bai, J. 2003. “Inferential Theory for Factor Models of Large Dimensions.” Econometrica 71 (1): 135–71. https://doi.org/10.1111/1468-0262.00392.Search in Google Scholar

Bai, J., and K. Li. 2016. “Maximum Likelihood Estimation and Inference for Approximate Factor Models of High Dimension.” Review of Economics and Statistics 98 (2): 298–309. https://doi.org/10.1162/rest_a_00519.Search in Google Scholar

Billard, L., and E. Diday. 2000. “Regression Analysis for Interval-Valued Data.” In Data Analysis, Classification, and Related Methods, 369–74. Berlin: Springer.10.1007/978-3-642-59789-3_58Search in Google Scholar

Billard, L., and E. Diday. 2002. “Symbolic Regression Analysis.” In Classification, Clustering, and Data Analysis: Recent Advances and Applications, 281–8. Berlin: Springer.10.1007/978-3-642-56181-8_31Search in Google Scholar

Billard, L., and E. Diday. 2003. “From the Statistics of Data to the Statistics of Knowledge: Symbolic Data Analysis.” Journal of the American Statistical Association 98 (462): 470–87. https://doi.org/10.1198/016214503000242.Search in Google Scholar

Dias, S., and P. Brito. 2017. “Off the Beaten Track: A New Linear Model for Interval Data.” European Journal of Operational Research 258 (3): 1118–30. https://doi.org/10.1016/j.ejor.2016.09.006.Search in Google Scholar

González-Rivera, G., and W. Lin. 2013. “Constrained Regression for Interval-Valued Data.” Journal of Business and Economic Statistics 31 (4): 473–90. https://doi.org/10.1080/07350015.2013.818004.Search in Google Scholar

Han, A., Y. Hong, and S. Wang. 2012. “Autoregressive Conditional Models for Interval-Valued Time Series Data.” In Proceedings of the 3rd International Conference on Singular Spectrum Analysis and its Applications.Search in Google Scholar

Hukuhara, M. 1967. “Integration des applications mesurables dont la valeur est uncompact convexe.” Funkcialaj Ekvacioj 10 (3): 205–23.Search in Google Scholar

Kaucher, E. 1980. “Interval Analysis in the Extended Interval Space IR.” Computing (Suppl 2): 33–49. https://doi.org/10.1007/978-3-7091-8577-3_3.Search in Google Scholar

Lam, C., and Q. Yao. 2012. “Factor Modeling for High-Dimensional Time Series: Inference for the Number of Factors.” Annals of Statistics 40 (2): 694–726. https://doi.org/10.1214/12-aos970.Search in Google Scholar

Lima Neto, E. A., and F. D. A. De Carvalho. 2008. “Centre and Range Method for Fitting a Linear Regression Model to Symbolic Interval Data.” Computational Statistics and Data Analysis 52 (3): 1500–15. https://doi.org/10.1016/j.csda.2007.04.014.Search in Google Scholar

Lima Neto, E. A., and F. D. A. De Carvalho. 2010. “Constrained Linear Regression Models for Symbolic Interval-Valued Variables.” Computational Statistics and Data Analysis 54 (2): 333–47. https://doi.org/10.1016/j.csda.2009.08.010.Search in Google Scholar

Onatski, A. 2010. “Determining the Number of Factors from Empirical Distribution of Eigenvalues.” Review of Economics and Statistics 92 (4): 1004–16. https://doi.org/10.1162/rest_a_00043.Search in Google Scholar

Rodrigues, P. M., and N. Salish. 2015. “Modeling and Forecasting Interval Time Series with Threshold Models.” Advances in Data Analysis and Classification 9 (1): 41–57. https://doi.org/10.1007/s11634-014-0170-x.Search in Google Scholar

Sun, Y., A. Han, Y. Hong, and S. Wang. 2018. “Threshold Autoregressive Models for Interval-Valued Time Series Data.” Journal of Econometrics 206 (2): 414–46. https://doi.org/10.1016/j.jeconom.2018.06.009.Search in Google Scholar

Sun, Y., X. Zhang, A. T. Wan, and S. Wang. 2022. “Model Averaging for Interval-Valued Data.” European Journal of Operational Research 301 (2): 772–84. https://doi.org/10.1016/j.ejor.2021.11.015.Search in Google Scholar

Wu, J. 2016. “Robust Determination for the Number of Common Factors in the Approximate Factor Models.” Economics Letters 144: 102–6. https://doi.org/10.1016/j.econlet.2016.04.026.Search in Google Scholar

Wu, J. 2018. “Eigenvalue Difference Test for the Number of Common Factors in the Approximate Factor Models.” Economics Letters 169: 63–7. https://doi.org/10.1016/j.econlet.2018.05.009.Search in Google Scholar

Xia, Q., W. Xu, and L. Zhu. 2015. “Consistently Determining the Number of Factors in Multivariate Volatility Modelling.” Statistica Sinica 25 (3): 1025–44, https://doi.org/10.5705/ss.2013.252.Search in Google Scholar


Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/snde-2024-0019).


Received: 2024-03-13
Accepted: 2024-12-19
Published Online: 2025-02-04

© 2025 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 21.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/snde-2024-0019/html?lang=en
Scroll to top button