Abstract
Virtually every seasonal adjustment software includes an ensemble of tests for assessing whether a given time series is in fact seasonal and hence a candidate for seasonal adjustment. However, such tests are certain to produce either agreeing or conflicting results, raising the questions how to identify the most accurate tests and how to aggregate the results in the latter case. We suggest a novel random forest-based approach to answer these questions. We simulate seasonal and non-seasonal ARIMA processes that are representative of the macroeconomic time series analysed regularly by the Bundesbank. Treating the time series’ seasonal status as a classification problem, we use the p-values of the seasonality tests implemented in the seasonal adjustment software JDemetra+ as predictors to train conditional random forests on the simulated data. We show that this aggregation approach avoids the size distortions of the JDemetra+ tests without sacrificing too much power compared to the most powerful test. We also find that the modified QS and Friedman tests are the most accurate ones in the considered ensemble.
We provide basic information about the six JD+ tests against the null hypothesis (H 0) of absence of seasonality in a weakly stationary time series {z t } of length T with τ observations per year. Non-stationary time series are made stationary by applying appropriate orders of (non-seasonal) differencing.
Modified QS Test
The modified QS test checks {z
t
} for significant positive autocorrelation at seasonal lags. Let
where
Friedman Test
The Friedman (FD) test checks for significant differences between the period-specific mean ranks of the values of {z
t
}. Let r
ij
be the within-year rank of the observation in the ith period of the jth year, so that 1 ≤ r
ij
≤ τ, and
Kruskal–Wallis Test
The Kruskal–Wallis (KW) test results from replacing the within-year ranks of the FD test with within-span ranks, so that 1 ≤ r ij ≤ T and the KW-statistic is calculated as a one-way ANOVA without repeated measures. Hence, the same asymptotic null distribution applies (Kruskal and Wallis 1952).
Periodogram Test
The periodogram (PD) test checks if a weighted sum of the spectral density f(ω) = (2π)−1 ∑
h
γ(h) e−ihω
evaluated at the seasonal frequencies
Seasonal Peaks Test
The seasonal peaks (SP) test checks if f(ω) displays visually significant peaks within
Seasonal Dummies Test
Dropping the stationarity assumption on {z
t
}, the seasonal dummies (SD) test checks if the joint effects of the τ − 1 seasonal dummies, denoted by
References
Almomani, A., B. Gupta, S. Atawneh, A. Meulenberg, and E. Almomani. 2013. “A Survey of Phishing Email Filtering Techniques.” IEEE Communications Surveys & Tutorials 15 (4): 2070–90. https://doi.org/10.1109/surv.2013.030713.00020.Search in Google Scholar
Bayer, C., and C. Hanck. 2013. “Combining Non-cointegration Tests.” Journal of Time Series Analysis 34 (1): 83–95. https://doi.org/10.1111/j.1467-9892.2012.00814.x.Search in Google Scholar
Breiman, L. 1996. “Bagging Predictors.” Machine Learning 24 (2): 123–40. https://doi.org/10.1007/bf00058655.Search in Google Scholar
Breiman, L. 2001. “Random Forests.” Machine Learning 45 (1): 5–32. https://doi.org/10.1023/a:1010933404324.10.1023/A:1010933404324Search in Google Scholar
Briët, O. J., P. H. Amerasinghe, and P. Vounatsou. 2013. “Generalized Seasonal Autoregressive Integrated Moving Average Models for Count Data with Application to Malaria Time Series with Low Case Numbers.” PLoS One 8 (6): 1–9. https://doi.org/10.1371/journal.pone.0065761.Search in Google Scholar PubMed PubMed Central
Bühlmann, P., and B. Yu. 2002. “Analyzing Bagging.” Annals of Statistics 30 (4): 927–61. https://doi.org/10.1214/aos/1031689014.Search in Google Scholar
Busetti, F., and A. C. Harvey. 2003. “Seasonality Tests.” Journal of Business & Economic Statistics 21 (3): 420–36. https://doi.org/10.1198/073500103288619061.Search in Google Scholar
Cario, M. C., and B. L. Nelson. 1997. Modeling and Generating Random Vectors with Arbitrary Marginal Distributions and Correlation Matrix. Technical report. Evanston: Department of Industrial Engineering and Management Sciences, Northwestern University.Search in Google Scholar
Díaz-Uriarte, R., and S. A. de Andrés. 2006. “Gene Selection and Classification of Microarray Data Using Random Forest.” BMC Bioinformatics 7. Article 3. https://doi.org/10.1186/1471-2105-7-3.Search in Google Scholar PubMed PubMed Central
Findley, D. F., B. C. Monsell, W. R. Bell, M. C. Otto, and B.-C. Chen. 1998. “New Capabilities and Methods of the X-12-ARIMA Seasonal-Adjustment Program.” Journal of Business & Economic Statistics 16 (2): 127–52. https://doi.org/10.1080/07350015.1998.10524743.Search in Google Scholar
Franses, P. H. 1992. “Testing for Seasonality.” Economics Letters 38 (3): 259–62. https://doi.org/10.1016/0165-1765(92)90067-9.Search in Google Scholar
Friedman, M. 1937. “The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance.” Journal of the American Statistical Association 32 (200): 675–701. https://doi.org/10.1080/01621459.1937.10503522.Search in Google Scholar
Geurts, P., D. Ernst, and L. Wehenkel. 2006. “Extremely Randomized Trees.” Machine Learning 63 (1): 3–42. https://doi.org/10.1007/s10994-006-6226-1.Search in Google Scholar
Ghysels, E., and D. R. Osborn. 2001. The Econometric Analysis of Seasonal Time Series. Cambridge: Cambridge University Press.10.1017/CBO9781139164009Search in Google Scholar
Gómez, V., and A. Maravall. 2001. “Automatic Modeling Methods for Univariate Series.” In A Course in Time Series Analysis, edited by D. Peña, G. C. Tiao, and R. S. Tsay, 171–201. New York: Wiley.10.1002/9781118032978.ch7Search in Google Scholar
Götz, T. B., and K. Hauzenberger. 2021. “Large Mixed-Frequency VARs with a Parsimonious Time-Varying Parameter Structure.” The Econometrics Journal 24 (3): 442–61. https://doi.org/10.1093/ectj/utab001.Search in Google Scholar
Harvey, D. I., S. J. Leybourne, and A. M. R. Taylor. 2009. “Unit Root Testing in Practice: Dealing with Uncertainty over the Trend and Initial Condition.” Econometric Theory 25 (3): 587–636. https://doi.org/10.1017/s026646660809018x.Search in Google Scholar
Hastie, T., R. Tibshirani, and J. Friedman. 2009. The Elements of Statistical Learning – Data Mining, Inference, and Prediction, 2nd ed. Heidelberg: Springer.10.1007/978-0-387-84858-7Search in Google Scholar
Hothorn, T., K. Hornik, and A. Zeileis. 2006. “Unbiased Recursive Partitioning: A Conditional Inference Framework.” Journal of Computational & Graphical Statistics 15 (3): 651–74. https://doi.org/10.1198/106186006x133933.Search in Google Scholar
Hsieh, C.-H., R.-H. Lu, N.-H. Lee, W.-T. Chiu, M.-H. Hsu, and Y.-C. J. Li. 2011. “Novel Solutions for an Old Disease: Diagnosis of Acute Appendicitis with Random Forest, Support Vector Machines, and Artificial Neural Networks.” Surgery 149 (1): 87–93. https://doi.org/10.1016/j.surg.2010.03.023.Search in Google Scholar PubMed
Kruskal, W. H., and W. A. Wallis. 1952. “Use of Ranks in One-Criterion Variance Analysis.” Journal of the American Statistical Association 47 (260): 583–621. https://doi.org/10.1080/01621459.1952.10483441.Search in Google Scholar
Maravall, A. 2011. Seasonality Tests and Automatic Model Identification in TRAMO-SEATS. Madrid: Bank of Spain. Mimeo.Search in Google Scholar
Patel, J., S. Shah, P. Thakkar, and K. Kotecha. 2015. “Predicting Stock and Stock Price Index Movement Using Trend Deterministic Data Preparation and Machine Learning Techniques.” Expert Systems with Applications 42 (1): 259–68. https://doi.org/10.1016/j.eswa.2014.07.040.Search in Google Scholar
Pinkwart, N. 2018. “Short-term Forecasting Economic Activity in Germany: A Supply and Demand Side System of Bridge Equations.” In Discussion Paper No 36/2018. Frankfurt: Deutsche Bundesbank.10.2139/ssrn.3255394Search in Google Scholar
R Core Team 2019. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.Search in Google Scholar
Stone, C. J., M. H. Hansen, C. Kooperberg, and Y. K. Truong. 1997. “Polynomial Splines and Their Tensor Products in Extended Linear Modeling.” Annals of Statistics 25 (4): 1371–470. https://doi.org/10.1214/aos/1031594728.Search in Google Scholar
Strobl, C., A.-L. Boulesteix, A. Zeileis, and T. Hothorn. 2007. “Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution.” BMC Bioinformatics 8. Article 25. https://doi.org/10.1186/1471-2105-8-25.Search in Google Scholar PubMed PubMed Central
Strobl, C., A.-L. Boulesteix, T. Kneib, T. Augustin, and A. Zeileis. 2008. “Conditional Variable Importance for Random Forests.” BMC Bioinformatics 9. Article 307. https://doi.org/10.1186/1471-2105-9-307.Search in Google Scholar PubMed PubMed Central
Supplementary Material
The online version of this article offers supplementary material (https://doi.org/10.1515/jem-2020-0020).
© 2022 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Research Articles
- Empirical Framework for Two-Player Repeated Games with Random States
- The Robustness of Conditional Logit for Binary Response Panel Data Models with Serial Correlation
- Density Forecast of Financial Returns Using Decomposition and Maximum Entropy
- On the Implementation of Approximate Randomization Tests in Linear Models with a Small Number of Clusters
- Quantile Difference in Differences with Time-Varying Qualification in Panel Data
- A Random Forest-based Approach to Combining and Ranking Seasonality Tests
- Teaching Corner
- On the Use of the Helmert Transformation, and its Applications in Panel Data Econometrics
- Practitioner's Corner
- Linear Rescaling to Accurately Interpret Logarithms
Articles in the same Issue
- Frontmatter
- Research Articles
- Empirical Framework for Two-Player Repeated Games with Random States
- The Robustness of Conditional Logit for Binary Response Panel Data Models with Serial Correlation
- Density Forecast of Financial Returns Using Decomposition and Maximum Entropy
- On the Implementation of Approximate Randomization Tests in Linear Models with a Small Number of Clusters
- Quantile Difference in Differences with Time-Varying Qualification in Panel Data
- A Random Forest-based Approach to Combining and Ranking Seasonality Tests
- Teaching Corner
- On the Use of the Helmert Transformation, and its Applications in Panel Data Econometrics
- Practitioner's Corner
- Linear Rescaling to Accurately Interpret Logarithms