To Bag is to Prune

Philippe Goulet Coulombe

doi:10.1515/snde-2023-0030

Article

To Bag is to Prune

Philippe Goulet Coulombe

Published/Copyright: October 25, 2024

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Studies in Nonlinear Dynamics & Econometrics

Abstract

It is notoriously difficult to build a bad Random Forest (RF). Concurrently, RF blatantly overfits in-sample without apparent consequences out-of-sample. Arguments like the bias-variance trade-off or double descent cannot rationalize this paradox. I propose a new explanation: bootstrap aggregation and model perturbation as implemented by RF automatically prune a latent “true” tree. More generally, I document that randomized ensembles of greedily optimized learners implicitly perform optimal early stopping out-of-sample. So, letting RF overfit the training data is a dominant tuning strategy against nature’s undisclosed choice of noise level. Additionally, novel ensembles of Boosting and MARS are also eligible. I empirically demonstrate the property, with simulated and real data, by reporting that these new completely overfitting ensembles perform similarly to their tuned counterparts – or better.

Keywords: random forest; overfitting; tuning; greedy optimization

JEL Classification: C45; C53; C52; C63

Corresponding author: Philippe Goulet Coulombe, Université du Québec à Montréal, Montréal, Canada, E-mail: p.gouletcoulombe@gmail.com

Appendix A

A.1 Additional Graphs and Tables

Figure 6:

This plots the hold-out sample R ² between the prediction and the true conditional mean. The level of noise is calibrated so the signal-to-noise ratio is 1. Column facets are DGPs and row facets are base learners. The x-axis is an index of depth of the greedy model. For CART, it is a decreasing minimal size node ∈ 1.4^{16,…,2}, for boosting, an increasing number of steps ∈ 1.5^{4,…,18} and for MARS, it is an increasing number of included terms ∈ 1.4^{2,…,16}.

Figure 7:

This is Figure 4’s first row with mtry = 0.5.

Table 1:

20 data sets.

Abbreviation	Observations	Features	Data source
Abalone	4,177	7	archive.ics.uci.edu
Boston housing	506	13	lib.stat.cmu.edu
Auto	392	7	archive.ics.uci.edu
Bike sharing	17,379	13	archive.ics.uci.edu
White wine	4,898	10	archive.ics.uci.edu
Red wine	1,599	10	archive.ics.uci.edu
Concrete	1,030	8	archive.ics.uci.edu
Fish toxicity	908	6	archive.ics.uci.edu
Forest fire	517	12	archive.ics.uci.edu
NBA salary	483	25	kaggle.com
CA housing	20,428	9	kaggle.com
Crime Florida	90	97	census.gov
Friedman 1 R ² = 0.7	1,000	10	cran.r-project.org
Friedman 1 R ² = 0.4	1,000	10	cran.r-project.org
GDP h = 1	212	599	Google Drive
GDP h = 2	212	563	Google Drive
UNRATE h = 1	212	619	Google Drive
UNRATE h = 2	212	627	Google Drive
INF h = 1	212	619	Google Drive
INF h = 2	212	611	Google Drive

Notes: The number of features includes categorical variables expanded as multiple dummies and will thus be sometimes higher than what reported at data source website. Data source URLs are visibly abbreviated but lead directly to the exact data set or package being used. The number of features varies for each macro data set because a mild screening rule was implemented ex-ante, the latter helping to decrease computing time.

Table 2:

R test 2 for all data sets and models.

	Benchmarks							GBM				MARS
	FA-AR	AR	LASSO	RF	Tree	NN	DNN	Tuned	Plain	B & P	Booging	Tuned	Plain	B & P	Quake
Abalone			0.52	0.56	0.45	0.54	0.53	0.50	0.48	0.53^*	0.54^**	0.57	0.35^*	0.31^*	0.58^***
Boston housing			0.67	0.88	0.79	0.86	0.85	0.89	0.88	0.90	0.85^*	0.83	0.87	0.92	0.91
Auto			0.66	0.71	0.61	0.13	0.64	0.64	0.59^**	0.65	0.64^*	0.71	−0.54^*	0.53	0.63
Bike sharing			0.38	0.91	0.73	0.88	0.94	0.95	0.93^***	0.91^***	0.91^***	0.71	0.89^***	0.87^***	0.90^***
White wine			0.28	0.52	0.28	0.37	0.26	0.37	0.32^*	0.44^***	0.38	0.33	0.33^***	0.39^**	0.38^***
Red wine			0.34	0.47	0.35	0.33	0.37	0.37	0.23^**	0.37	0.38	0.38	0.29^*	0.33	0.35
Concrete			0.59	0.90	0.71	0.89	0.88	0.92	0.92	0.90^*	0.90^***	0.83	0.87	0.30^***	0.89
Fish toxicity			0.56	0.65	0.57	0.60	0.63	0.63	0.54^***	0.61	0.62	0.56	−0.25^***	0.54^*	0.61
Forest fire			0.00	−0.11	0.00	−0.02	0.01	−0.03	−0.68^***	−0.32^***	−0.08	0.01	−1.55*	−0.68	−0.36
NBA salary			0.52	0.60	0.34	0.22	0.21	0.50	0.29^***	0.49	0.50	0.36	0.11^*	0.59^*	0.53
CA housing			0.64	0.82	0.59	0.75	0.74	0.82	0.82	0.83^***	0.82^**	0.72	0.77^***	0.81^***	0.79^***
Crime Florida			0.66	0.79	0.60	0.82	0.75	0.75	0.77	0.81^*	0.79	0.70	0.44^*	0.81	0.80
F1 R ² = 0.7			0.53	0.62	0.50	0.43	0.51	0.65	0.54^***	0.60^***	0.67^**	0.68	0.55	0.62	0.69^***
F1 R ² = 0.4			0.32	0.40	0.36	0.19	0.28	0.40	0.16^***	0.34^*	0.41	0.41	0.14^*	0.35	0.40^*
GDP h = 1	0.27	0.27	0.24	0.35	0.18	0.06	0.26	0.36	0.17	0.37	0.38	0.00	−9.08^***	−0.45^**	−0.12^**
GDP h = 2	−0.03	0.17	−0.01	0.16	0.00	−0.06	−0.52	0.15	−0.56^**	0.20	0.18	−0.40	−4.37^**	−0.41^*	−0.37^***
UNRATE h = 1	0.71	0.53	0.43	0.59	0.22	−0.69	0.62	0.59	0.66	0.58	0.65	−0.65	−0.72^***	0.53	0.68
UNRATE h = 2	0.52	0.29	0.26	0.37	0.16	0.14	0.41	0.43	0.35	0.42	0.48	0.16	−0.80^**	−0.28	0.26
INF h = 1	0.25	0.33	0.43	0.42	0.25	0.41	0.49	0.35	0.24	0.37	0.39	0.37	−0.57^**	0.45	0.34
INF h = 2	0.05	0.22	0.09	0.28	0.45	0.19	0.51	0.15	−0.26^***	0.16	0.27*	0.39	−2.50^**	0.24	0.42

Notes: This table reports R test 2 for 20 data sets and different models, either standard or introduced in the text. For macroeconomic targets (the last 6 data sets), the set of benchmark models additionally includes an autoregressive model of order 2 (AR) and a factor-augmented regression with 2 lags (FA-AR). For GBM and MARS, t-test (and Diebold and Mariano (2002) tests for time series data) are performed to evaluate whether the difference in predictive performance between the tuned version and the remaining three models of each block is statistically significant. ‘*’, ‘**’ and ‘***’ respectively refer to p-values below 5 %, 1 % and 0.1 %. F1 means “Friedman 1” DGP of Friedman (1991).

Table 3:

R train 2 for all data sets and models.

	Benchmarks							GBM				MARS
	FA-AR	AR	LASSO	RF	Tree	NN	DNN	Tuned	Plain	B & P	Booging	Tuned	Plain	B & P	Quake
Abalone			0.50	0.92	0.50	0.60	0.59	0.53	0.85	0.86	0.91	0.57	0.65	0.78	0.61
Boston housing			0.72	0.98	0.87	0.90	0.89	1.00	1.00	0.99	0.99	0.90	0.97	0.97	0.98
Auto			0.68	0.96	0.77	0.13	0.81	0.86	1.00	0.98	0.98	0.77	0.98	0.93	0.96
Bike sharing			0.38	0.98	0.73	0.89	0.95	0.96	0.95	0.94	0.95	0.71	0.89	0.88	0.90
White wine			0.26	0.92	0.27	0.47	0.75	0.44	0.82	0.85	0.88	0.37	0.46	0.52	0.51
Red wine			0.29	0.91	0.41	0.40	0.42	0.41	0.96	0.94	0.95	0.44	0.56	0.69	0.67
Concrete			0.61	0.98	0.75	0.91	0.93	0.98	0.99	0.98	0.99	0.88	0.98	0.74	0.95
Fish toxicity			0.54	0.93	0.60	0.64	0.61	0.92	0.97	0.95	0.97	0.63	0.96	0.82	0.88
Forest fire			0.00	0.81	0.00	0.00	0.07	0.40	0.97	0.88	0.91	0.04	0.62	0.73	0.76
NBA salary			0.47	0.93	0.72	0.65	0.71	0.99	1.00	0.97	0.97	0.64	0.92	0.84	0.93
CA housing			0.63	0.97	0.61	0.78	0.85	0.86	0.89	0.91	0.90	0.72	0.80	0.83	0.81
Crime Florida			0.65	0.96	0.84	0.88	0.94	1.00	1.00	0.98	0.98	0.75	1.00	0.97	0.98
F1 R ² = 0.7			0.45	0.93	0.45	0.62	0.71	0.95	1.00	0.97	0.97	0.65	0.81	0.84	0.86
F1 R ² = 0.4			0.23	0.89	0.30	0.34	0.35	0.48	1.00	0.94	0.94	0.38	0.64	0.75	0.76
GDP h = 1	0.41	0.11	0.23	0.91	0.51	0.26	0.44	0.81	1.00	0.96	0.96	0.47	1.00	0.94	0.94
GDP h = 2	0.26	0.06	0.07	0.89	0.00	0.26	0.55	0.76	1.00	0.95	0.95	0.29	1.00	0.94	0.95
UNRATE h = 1	0.57	0.40	0.48	0.93	0.81	−0.07	0.82	0.83	1.00	0.97	0.97	0.76	0.99	0.97	0.96
UNRATE h = 2	0.41	0.13	0.35	0.92	0.38	0.42	0.25	0.99	1.00	0.96	0.96	0.75	1.00	0.96	0.96
INF h = 1	0.76	0.73	0.90	0.97	0.81	0.64	0.94	1.00	1.00	0.99	0.99	0.73	1.00	0.99	0.99
INF h = 2	0.69	0.63	0.72	0.96	0.72	0.67	0.92	1.00	1.00	0.99	0.98	0.81	1.00	0.99	0.98

Notes: This table reports R train 2 for 20 data sets and different models, either standard or introduced in the text. For macroeconomic targets (the last 6 data sets), the set of benchmark models additionally includes an autoregressive model of order 2 (AR) and a factor-augmented regression with 2 lags (FA-AR). F1 means “Friedman 1” DGP of Friedman (1991).

A.2 Implementation Details for Booging and MARSquake

Booging and MARSquake are the B & P +DA versions of Boosted Trees and MARS, respectively. The data-augmentation option will likely be redundant in high-dimensional situations where the available regressors already have a factor structure (like macroeconomic data).

A.2.1 About B

For both algorithms, B is made operational by subsampling. As usual, reasonable candidates for the sampling rate are 2/3 and 3/4. All ensembles use B = 100 subsamples.

A.2.2 About P

The primary source of perturbation in Booging is straightforward. Using subsamples to construct trees at each step is already integrated within Stochastic Gradient Boosting. By construction, it perturbs the Boosting fitting path and achieve a similar goal as that of the original mtry in RF. Note that, for fairness, this standard feature is also activated for any reported results on “plain” Boosting.

The implementation of P in MARSquake is more akin to that of RF. At each step of the forward pass, MASS evaluate all variables as potential candidates to enter a hinge function, and select the one which (greedily) maximize fit at this step. In the spirit of RF’s mtry, P is applied by stochastically restricting the set of available features at each step. I set the fraction of randomly considered X’s to 1/2.

To further enhance perturbation in both algorithms, we can randomly drop a fraction of features from base learners’ respective information sets. Since DA creates replicas of the data and keep some of its correlation structure, features are unlikely to be entirely dropped from a boosting run, provided the dropping rate is not too high. I suggest 20 %. This can is analogous to mtry-like randomly select features, but for a whole tree (in RF) rather than at each split.

A.2.3 About DA

Perturbation work better if there is a lot to perturb. In many data sets, X is rich in observations but contains few regressors. To assure P meets its full randomization potential, a cheap data augmentation procedure can be carried. DA is simply adding fake regressors that are correlated with the original X and maintain in part their cross-correlation structure. Say X contains K regressors. I take the N × K matrix X and create two duplicates X ̃ = X + E where E is a matrix of Gaussian noise. SD is set to 1/3 that of the variable. For X _k’s that are either categorical or ordinal, I create the corresponding X ̃ k by taking X _k and shuffling 20 % of its observations.

A.2.4 Last Word on MARS

It is known that standard MARS has a forward and a backward pass. The latter’s role is to prevent overfitting by (traditional) pruning. Obviously, there is no backward pass in MARSquake. Certain implementations of MARS (like earth, Milborrow (2018)) may contain foolproof features rendering the forward pass recalcitrant to blatantly overfit in certain situations (usually when regressor are not numerous). To partially circumvent this rare occurrence, one can run MARS again on residuals obtained from a first MARS run which failed to attain a high enough R train 2 .

Appendix B: Simulation Details

Tree: The tree DGP is constructed as follows. Normal noise is generated and a CART tree is fitted to it with 10 normal and independant regressor. The minimal node size to consider a split is 100, which is one fourth of the training sample. This typically generates trees of around 8 nodes. The “fake” conditional mean estimated from this procedure itself used to generate data, on top of which is added two different level of normal noise as described in Figures 4 and 6. Finally, each model fitted on this DGP is given all the original 10 variables, whether they were used or not by the conditional mean function.

Friedman 1: Inputs are 10 independent variables uniformly distributed on the interval [0,1], only 5 out of these 10 enter the DGP so that

y i = 10 ⁡ sin ( π x 1 , i x 2 , i ) + 20 ( x 3 , i − 0.5 ) 2 + 10 x 4 , i + 5 x 5 , i + ϵ i

with ϵ _i being normal noise.

For Friedman 2 and Friedman 3, regressors are x _1,i ∈ [0, 100], x _2,i ∈ [40π, 560π], x _3,i ∈ [0, 1], x _4,i ∈ [1, 11] and the targets are

F 2 : y i = ( x 1 , i 2 + ( x 2 , i x 3 , i − ( 1 / ( x 2 , i x 4 , i ) ) ) 2 ) 0.5 + ϵ i

F 3 : y i = atan ( ( x 2 , i x 3 , i − ( 1 / ( x 2 , i x 4 , i ) ) ) / x 1 , i ) + ϵ i

with ϵ _i being normal noise.

Linear: The linear DGP is the sum of the first variables in F1, with normal noise. The model is fed 10 variables, with 5 being actually useful.

B.1 Additional NN Details

For both neural networks, the batch size is 32 and the optimizer is Adam (with Keras default values). Continuous X’s are normalized so that all values are within the 0–1 range.

More precisely, NN in Table 2 is a standard feed-forward fully-connected network with an architecture in the vein of Gu, Kelly, and Xiu (2020). There are two hidden layers, the first with 32 neurons and the second with 16 neurons. The number of epochs is fixed at 100. The activation function is ReLu and that of the output layer is linear. The learning rate ∈ {0.001, 0.01} and the LASSO λ parameter ∈ {0.001, 0.0001} are chosen by 5-fold cross-validation. A batch normalization layer follows each ReLu layers. Early stopping is applied by stopping training whenever 20 epochs pass without any improvement of the cross-validation MSE.

More precisely, DNN in Table 2 is a standard feed-forward fully-connected network with an architecture closely following that of Olson, Wyner, and Berk (2018) for small data sets. There are 10 hidden layers, each featuring 100 neurons. The number of epochs is fixed at 200. The activation function is eLu and that of the output layer is linear. The learning rate ∈ {0.001, 0.01, 0.1} and the LASSO λ parameter ∈ {0.001, 0.00001} are chosen by 5-fold cross-validation. No early stopping is applied.

References

Athey, S., J. Tibshirani, and S. Wager. 2019. “Generalized Random Forests.” Annals of Statistics 47 (2): 1148–78. https://doi.org/10.1214/18-aos1709.Search in Google Scholar

Bartlett, P. L., P. M. Long, G. Lugosi, and A. Tsigler. 2020. “Benign Overfitting in Linear Regression.” In Proceedings of the National Academy of Sciences.10.1073/pnas.1907378117Search in Google Scholar PubMed PubMed Central

Belkin, M., D. Hsu, S. Ma, and S. Mandal. 2019a. “Reconciling Modern Machine-Learning Practice and the Classical Bias–Variance Trade-Off.” Proceedings of the National Academy of Sciences 116 (32): 15849–54. https://doi.org/10.1073/pnas.1903070116.Search in Google Scholar PubMed PubMed Central

Belkin, M., A. Rakhlin, and A. B. Tsybakov. 2019b. “Does Data Interpolation Contradict Statistical Optimality?” In The 22nd International Conference on Artificial Intelligence and Statistics, 1611–9. PMLR.Search in Google Scholar

Bergmeir, C., R. J. Hyndman, and B. Koo. 2018. “A Note on the Validity of Cross-Validation for Evaluating Autoregressive Time Series Prediction.” Computational Statistics & Data Analysis 120: 70–83. https://doi.org/10.1016/j.csda.2017.11.003.Search in Google Scholar

Bertsimas, D., and J. Dunn. 2017. “Optimal Classification Trees.” Machine Learning 106 (7): 1039–82. https://doi.org/10.1007/s10994-017-5633-9.Search in Google Scholar

Breiman, L. 1996. “Bagging Predictors.” Machine Learning 24 (2): 123–40. https://doi.org/10.1007/bf00058655.Search in Google Scholar

Breiman, L. 2001. “Random Forests.” Machine Learning 45 (1): 5–32. https://doi.org/10.1023/a:1010933404324.10.1023/A:1010933404324Search in Google Scholar

Breiman, L., J. Friedman, C. J. Stone, and R. A. Olshen. 1984. Classification and Regression Trees. CRC Press.Search in Google Scholar

Bühlmann, P., and B. Yu. 2002. “Analyzing Bagging.” Annals of Statistics 30 (4): 927–61. https://doi.org/10.1214/aos/1031689014.Search in Google Scholar

Chen, J. C., A. Dunn, K. K. Hood, A. Driessen, and A. Batch. 2019. “Off to the Races: A Comparison of Machine Learning and Alternative Data for Predicting Economic Indicators.” In Big Data for 21st Century Economic Statistics. University of Chicago Press.Search in Google Scholar

Clements, M. P., and J. Smith. 1997. “The Performance of Alternative Forecasting Methods for Setar Models.” International Journal of Forecasting 13 (4): 463–75. https://doi.org/10.1016/s0169-2070(97)00017-4.Search in Google Scholar

Diebold, F. X., and R. S. Mariano. 2002. “Comparing Predictive Accuracy.” Journal of Business & Economic Statistics 20 (1): 134–44. https://doi.org/10.1198/073500102753410444.Search in Google Scholar

Duroux, R., and E. Scornet. 2016. “Impact of Subsampling and Pruning on Random Forests.” arXiv preprint arXiv:1603.04261.Search in Google Scholar

Efron, B., T. Hastie, I. Johnstone, and R. Tibshirani. 2004. “Least Angle Regression.” Annals of Statistics 32 (2): 407–99. https://doi.org/10.1214/009053604000000067.Search in Google Scholar

Eliasz, P., J. H. Stock, and M. W. Watson. 2004. “Optimal Tests for Reduced Rank Time Variation in Regression Coefficients and for Level Variation in the Multivariate Local Level Model.” manuscript. Harvard University.Search in Google Scholar

Elliott, G., A. Gargano, and A. Timmermann. 2013. “Complete Subset Regressions.” Journal of Econometrics 177 (2): 357–73. https://doi.org/10.1016/j.jeconom.2013.04.017.Search in Google Scholar

Friedman, J. H. 1991. “Multivariate Adaptive Regression Splines.” Annals of Statistics: 1–67. https://doi.org/10.1214/aos/1176347963.Search in Google Scholar

Friedman, J. H. 2002. “Stochastic Gradient Boosting.” Computational Statistics & Data Analysis 38 (4): 367–78. https://doi.org/10.1016/s0167-9473(01)00065-2.Search in Google Scholar

Friedman, J., T. Hastie, and R. Tibshirani. 2001. “The Elements of Statistical Learning.” In Springer Series in Statistics, 1. New York: Springer.Search in Google Scholar

Goulet Coulombe, P. 2020. “The Macroeconomy as a Random Forest.” arXiv preprint arXiv:2006.12724.10.2139/ssrn.3633110Search in Google Scholar

Goulet Coulombe, P. 2021. “Slow-Growing Trees.” Technical report.Search in Google Scholar

Goulet Coulombe, P., M. Leroux, D. Stevanovic, and S. Surprenant. 2020. “Prévision de l’activité économique au québec et au canada à l’aide des méthodes “machine learning”.” Technical report. CIRANO.Search in Google Scholar

Goulet Coulombe, P., M. Leroux, D. Stevanovic, and S. Surprenant. 2021a. “Macroeconomic Data Transformations Matter.” International Journal of Forecasting 37 (4): 1338–54. https://doi.org/10.1016/j.ijforecast.2021.05.005.Search in Google Scholar

Goulet Coulombe, P., M. Marcellino, and D. Stevanovic. 2021b. “Can Machine Learning Catch the Covid-19 Recession?” CEPR Discussion Paper No. DP15867.10.2139/ssrn.3796421Search in Google Scholar

Goulet Coulombe, P., M. Leroux, D. Stevanovic, and S. Surprenant. 2022. “How Is Machine Learning Useful for Macroeconomic Forecasting?” Journal of Applied Econometrics 37 (5): 920–64. https://doi.org/10.1002/jae.2910.Search in Google Scholar

Gu, S., B. Kelly, and D. Xiu. 2020. “Empirical Asset Pricing via Machine Learning.” Review of Financial Studies 33 (5): 2223–73. https://doi.org/10.1093/rfs/hhaa009.Search in Google Scholar

Hastie, T., A. Montanari, S. Rosset, and R. J. Tibshirani. 2019. “Surprises in High-Dimensional Ridgeless Least Squares Interpolation.” arXiv preprint arXiv:1903.08560.Search in Google Scholar

Hellwig, K.-P. 2018. Overfitting in Judgment-Based Economic Forecasts: The Case of IMF Growth Projections. International Monetary Fund.10.2139/ssrn.3314593Search in Google Scholar

Hillebrand, E., and M. C. Medeiros. 2010. “The Benefits of Bagging for Forecast Models of Realized Volatility.” Econometric Reviews 29 (5–6): 571–93. https://doi.org/10.1080/07474938.2010.481554.Search in Google Scholar

Hillebrand, E., M. Lukas, and W. Wei. 2020. “Bagging Weak Predictors.” Technical report. Monash University, Department of Econometrics and Business Statistics.Search in Google Scholar

Inoue, A., and L. Kilian. 2008. “How Useful is Bagging in Forecasting Economic Time Series? A Case Study of Us Consumer Price Inflation.” Journal of the American Statistical Association 103 (482): 511–22. https://doi.org/10.1198/016214507000000473.Search in Google Scholar

Kobak, D., J. Lomond, and B. Sanchez. 2020. “The Optimal Ridge Penalty for Real-World High-Dimensional Data can be Zero or Negative Due to the Implicit Ridge Regularization.” Journal of Machine Learning Research 21 (169): 1–16.Search in Google Scholar

Kotchoni, R., M. Leroux, and D. Stevanovic. 2019. “Macroeconomic Forecast Accuracy in a Data-Rich Environment.” Journal of Applied Econometrics 34 (7): 1050–72. https://doi.org/10.1002/jae.2725.Search in Google Scholar

Krstajic, D., L. J. Buturovic, D. E. Leahy, and S. Thomas. 2014. “Cross-Validation Pitfalls when Selecting and Assessing Regression and Classification Models.” Journal of Cheminformatics 6 (1): 1–15. https://doi.org/10.1186/1758-2946-6-10.Search in Google Scholar PubMed PubMed Central

Lee, T.-H., A. Ullah, and R. Wang. 2020. “Bootstrap Aggregating and Random Forest.” In Macroeconomic Forecasting in the Era of Big Data, 389–429. Springer.10.1007/978-3-030-31150-6_13Search in Google Scholar

LeJeune, D., H. Javadi, and R. Baraniuk. 2020. “The Implicit Regularization of Ordinary Least Squares Ensembles.” In International Conference on Artificial Intelligence and Statistics, 3525–35.Search in Google Scholar

McCracken, M., and S. Ng. 2020. “Fred-qd: A Quarterly Database for Macroeconomic Research.” Technical report. National Bureau of Economic Research.10.3386/w26872Search in Google Scholar

Medeiros, M. C., G. F. Vasconcelos, Á. Veiga, and E. Zilberman. 2021. “Forecasting Inflation in a Data-Rich Environment: The Benefits of Machine Learning Methods.” Journal of Business & Economic Statistics 39 (1): 98–119. https://doi.org/10.1080/07350015.2019.1637745.Search in Google Scholar

Mentch, L., and S. Zhou. 2019. “Randomization as Regularization: A Degrees of Freedom Explanation for Random Forest Success.” arXiv preprint arXiv:1911.00190.Search in Google Scholar

Milborrow, S. 2018. “Earth: Multivariate Adaptive Regression Splines.” R package.Search in Google Scholar

Mullainathan, S., and J. Spiess. 2017. “Machine Learning: An Applied Econometric Approach.” The Journal of Economic Perspectives 31 (2): 87–106. https://doi.org/10.1257/jep.31.2.87.Search in Google Scholar

Olson, M., A. J. Wyner, and R. Berk. 2018. “Modern Neural Networks Generalize on Small Data Sets.” In Proceedings of the 32nd International Conference on Neural Information Processing Systems, 3623–32.Search in Google Scholar

Rapach, D., and G. Zhou. 2013. “Forecasting Stock Returns.” In Handbook of Economic Forecasting, Vol. 2, 328–83. Elsevier.10.1016/B978-0-444-53683-9.00006-2Search in Google Scholar

Rasmussen, C. E. 1997. Evaluation of Gaussian Processes and Other Methods for Non-Linear Regression. PhD thesis. Toronto: University of Toronto.Search in Google Scholar

Rosset, S., J. Zhu, and T. Hastie. 2004. “Boosting as a Regularized Path to a Maximum Margin Classifier.” Journal of Machine Learning Research 5: 941–73.Search in Google Scholar

Scornet, E. 2017. “Tuning Parameters in Random Forests.” ESAIM: Proceedings and Surveys 60: 144–62. https://doi.org/10.1051/proc/201760144.Search in Google Scholar

Scornet, E., G. Biau, and J.-P. Vert. 2015. “Consistency of Random Forests.” Annals of Statistics 43 (4): 1716–41. https://doi.org/10.1214/15-aos1321.Search in Google Scholar

Stock, J. H., and M. W. Watson. 1999. “Forecasting Inflation.” Journal of Monetary Economics 44 (2): 293–335. https://doi.org/10.1016/s0304-3932(99)00027-6.Search in Google Scholar

Stock, J. H., and M. W. Watson. 2002. “Macroeconomic Forecasting Using Diffusion Indexes.” Journal of Business & Economic Statistics 20 (2): 147–62. https://doi.org/10.1198/073500102317351921.Search in Google Scholar

Timmermann, A. 2006. “Forecast Combinations.” Handbook of Economic Forecasting 1: 135–96.10.1016/S1574-0706(05)01004-9Search in Google Scholar

Wyner, A. J., M. Olson, J. Bleich, and D. Mease. 2017. “Explaining the Success of Adaboost and Random Forests as Interpolating Classifiers.” Journal of Machine Learning Research 18 (1): 1558–90.Search in Google Scholar

Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/snde-2023-0030).

Received: 2023-04-14

Accepted: 2024-09-24

Published Online: 2024-10-25

You are currently not able to access this content.

Supplementary Material Details

https://doi.org/10.1515/snde-2023-0030

Keywords for this article

random forest; overfitting; tuning; greedy optimization