Incorporation of structural properties of the response surface into oblique model trees
-
Marvin Schöne
Marvin Schöne is a Ph.D. student at the University of Siegen and researcher at the Institute for Data Science Solutions. He received his master’s degree in electrical engineering from the University of Applied Sciences and Arts Bielefeld in 2019. His research topics are design of experiments, local model networks and active learning., Martin Kohlhase
Martin Kohlhase is Professor of Control and Automation at the University of Applied Sciences and Arts Bielefeld and co-chair of the Institute for Data Science Solutions. He received his doctor’s degree in 2011 at the Technical University of Darmstadt. His research topics are nonlinear system identification, design of experiments, data-driven models and active learning.and Oliver Nelles
Oliver Nelles is Professor at the University of Siegen in the Department of Mechanical Engineering and chair of Automatic Control – Mechatronics. He received his doctor’s degree in 1999 at the Technical University of Darmstadt. His research topics are nonlinear system identification, design of experiments, metamodeling and local model networks.
Abstract
Oblique model trees are nonparametric regression models that partition the input space using axis-oblique splits to define local regions for regression. The Hierarchical Local Model Tree algorithm (HILOMOT) efficiently constructs these trees and produces a continuous output from overlapping local models, making it well suited for industrial applications. We enhance HILOMOT with a novel semi-greedy splitting method to reduce the short-sightedness of its greedy splitting, eliminate an undesirable trade-off during tree construction that can cause underfitting, and improve split interpretability. Our approach first estimates the strongest direction of curvature in a partition that the existing model cannot approximate. Then, a non-greedy method determines the optimal split orthogonal to this direction. We compare our method in experiments on synthetic data against HILOMOT and other established regression techniques. Qualitative results confirm that our method produces more interpretable splits, while quantitative results show it generally constructs more accurate and compact trees. Regardless of the splitting method, HILOMOT consistently outperforms all reference methods in accuracy.
Zusammenfassung
Schräge Modellbäume sind nichtparametrische Regressionsmodelle, die den Eingangsraum mittels achsenschräger Splits in lokale Regionen zur Regression aufteilen. HILOMOT ist ein leistungsfähiger Algorithmus zur effizienten Erzeugung dieser Bäume. Durch überlappende lokale Modelle entsteht eine kontinuierliche Modellausgabe, was für industrielle Anwendungen vorteilhaft ist. In dieser Arbeit verbessern wir HILOMOT durch eine neuartige Semi-Greedy-Splitting-Methode, die Nachteile des ursprünglichen Greedy-Splittings reduziert, einen unerwünschten Zielkonflikt während der Baumerzeugung eliminiert, der zu einer Unteranpassung führen kann, und die Interpretierbarkeit der Splits verbessert. Unsere Methode schätzt zunächst die stärkste Krümmungsrichtung, die im aktuellen Modell nicht darstellbar ist. Anschließend bestimmt eine Non-Greedy-Methode die optimale Aufteilungsstelle orthogonal zur identifizierten Richtung. Wir analysieren unsere Methode experimentell mit synthetischen Daten im Vergleich zu HILOMOT und anderen Regressionsverfahren. Qualitative Ergebnisse bestätigen die verbesserte Interpretierbarkeit der Splits. Quantitative Ergebnisse zeigen zudem, dass unsere Methode meist genauere und kompaktere Bäume erzeugt als die ursprüngliche. Unabhängig vom Splitting übertrifft HILOMOT alle Referenzmethoden deutlich in der Vorhersagegenauigkeit.
1 Introduction
In industrial digitalization, Machine Learning (ML) offers significant potential to enhance manufacturing, e.g. by optimizing or controlling processes through data-driven models [1]. The function f of the process to be approximated often depends on multiple inputs with unknown nonlinear effects on the output. When
An oblique model tree captures data patterns using multiple local models (e.g. linear affine models). Their regions are generated by a Divide-and-Conquer strategy that recursively partitions the input space with axis-oblique splits. This is typically done by a greedy algorithm that selects locally optimal splits without considering future ones, ensuring high computational efficiency [5]. Beyond their accuracy and efficiency, oblique model trees are interpretable, with local models providing insights into the approximated function [4]. Compared to the more common axis-orthogonal model trees, their axis-oblique splits allow greater structural flexibility, reducing both bias error and the number of required local models [5], [6].
However, oblique model trees also have certain drawbacks. Since axis-oblique splits are more complex than axis-orthogonal splits, these trees must address a tradeoff between increasing accuracy and decreasing interpretability and computational efficiency [5], [7]. The axis-oblique orientation (split direction) and location (splitting point) are crucial in this regard. As we show in Section 2, there are few methods that sufficiently address this tradeoff, which is the main reason why they are less common and the creation of axis-oblique splits is still noted as an open research question [5], [7], [8]. Moreover, many oblique model trees are discontinuous, which makes the application of the models (e.g. for process optimization or control) much more difficult [4]. Among the few existing methods, the Hierarchical Local Model Tree algorithm (HILOMOT) [9] provides the most notable advantages, as we outline in Section 2.
Due to overlapping local models, the trees generated by HILOMOT are continuous. In addition, the axis-oblique splits are determined by a very effective nonlinear optimization, resulting in accurate models. However, since a certain degree of overlap of the local models is required for successful optimization, HILOMOT struggles to adequately approximate less continuous functions. Moreover, the overlap has a regularization effect that is amplified in deeper layers of the tree, requiring HILOMOT to make a trade-off between larger overlaps for successful optimization and smaller overlaps to avoid underfitting [4]. Another drawback arises from the greedy construction of the tree. As with nearly all greedy algorithms, the splits are not always optimal for the entire global model [5]. When complex relationships are approximated, some previously executed splits must be corrected by additional splits, resulting in unnecessary local models and a decrease of accuracy. Furthermore, the axis-oblique split direction cannot be interpreted as it only reduces a loss function.
In our work, we improve HILOMOT through the following main contributions:
We develop a methodology that identifies the direction of strongest curvature in a partition that cannot be approximated by the so far generated tree, providing a more interpretable split direction.
We identify the necessary splitting points along that direction to best approximate the remaining nonlinearity in a partition by local affine models.
We employ the split direction and the splitting points in HILOMOT for a semi-greedy splitting that is unaffected by the degree of overlap, does not require major correction splits, and provides an additional pre-pruning criterion.
2 Oblique model trees
Figure 1a displays the structure of an oblique model tree, approximating
as for the tree in Figure 1, or a crisp indicator function
generating respectively a continuous or discontinuous model. The steepness κ defines the smoothness, the splitting coefficients

Oblique model tree approximating
Only a few oblique model tree algorithms exist in the literature. Table 1 summarizes them chronologically, classifying them by construction strategy (greedy or non-greedy), global model output (continuous or discontinuous), and the methods used to generate
Overview on existing algorithms to generate oblique model trees.
| Ref. | Non-greedy | Continuous | Split direction | Splitting point | Major drawbacks (except greediness and discontinuity) |
|---|---|---|---|---|---|
| [10] | ✗ | ✓ | Smoothed Hinging Hyperplane (HH) [11] fitted by Conjugate Gradient Algorithm [12] | Splits and local models require same inputs; tradeoff between overlap and underfitting | |
| [13] | ✗ | ✗ | First Principal Hessian Direction (PHD) [14] | Hinge-identification in residuals along first PHD | PHD requires multivariate normal distribution and only captures concave/convex curvatures |
| [15] | ✗ | ✗ | Linear Discriminant Analysis (LDA) on 2 Gaussian clusters | Quadratic Discriminant Analysis on LDA’s projection | Nonlinearity of approximated function not taken into account |
| [16] | ✗ | ✗ | Linear regression | Split which best reduces variance (brute force) | Nonlinearity of approximated function not taken into account |
| [9] | ✗ | ✓ | Separable Nonlinear Least Squares optimization [17] | Tradeoff between overlap and underfitting; split directions are not interpretable | |
| [18] | ✗ | ✗ | Smoothed HH fitted by restricted Fuzzy c-Regression [19] | Same drawbacks as for [10] | |
| [20] | ✓ | ✗ | Sparsity-regularized generalized logistic regression problem optimized by Orthogonal Matching Pursuit [21] | Both the local models and the split directions are not interpretable | |
| [22] | ✓ | ✓ | Optimization of randomly chosen hyperplane (called long dipole) and splitting point by Evolutionary Algorithm | Optimization has many hyperparameters; split directions are not interpretable | |
| [6] | ✗ | ✗ | First direction of refined Gradient Outer Product [23] | Residual-based averaging function of SUPPORT [24] | Refined Gradient Outer Product has high computational complexity on large scale data |
| [25] | ✓ | ✓ | Formulation of tree as nonlinear continuous optimization problem, solved by Sequential Least Squares Programming [26] | Optimization has many hyperparameters; parametric due to pre-selected symmetrical tree structure | |
The table shows that most algorithms (6 out of 10) are greedy and that non-greedy algorithms have only become relevant in recent years. Furthermore, 6 out of 10 algorithms determine
3 Proposed method
The basic concept of our novel splitting method is illustrated in Figure 2 and partially motivated by PHDRT and COMT. Since

Basic idea how we generate a splitting function in HILOMOT from a given set of samples and residuals in sequential steps.
First, we determine the average direction along which the local gradients
To generate Ψ
k
, we use prior knowledge from the split direction and the AHHM of the parent node to generate additional candidate pairs of
3.1 Curvature-oriented split direction estimation
Algorithm 1 shows how we determine the direction of strongest curvature in an iterative procedure. This procedure is primarily based on locally Weighted Least Squares (WLS) [23] to estimate the gradients, and Principal Component Analysis (PCA) to identify the directions with greatest variance.
Algorithm 1:
To determine curvature-oriented split direction in partition of t k .

The data set
and the corresponding residuals
from the so far trained tree
Our algorithm iterates through four steps, which we explain in the following, until either
3.1.1 Incorporate knowledge from identified directions
At each iteration, the PCA provides d* most significant principal components (eigendirections)
3.1.2 Estimate local gradients of residuals
To estimate the local gradients, we perform local linear regression by WLS around each sample
For each of the N local regressions, a vector
with
After the weights are determined, our algorithm approximates the residuals in the local region of each sample by linear regression using WLS
with the augmented matrix
3.1.3 Identify significant directions
From the local gradients, the most significant directions with the highest variation are identified using a PCA. Therefore, the gradients are merged into an N × M matrix, from which the covariance matrix
is constructed. From this matrix, the eigendirections
most significant eigendirections
that covering a proportion of ξ⌈j/τ⌉ of the total variance with ξ ∈ (0, 1) and
3.1.4 Selection of splitting coefficients
As soon as the refinement is reduced to a single direction (d* = 1), we search for the eigendirection with the greatest eigenvalue λ* as the most significant split direction. Since λ* has an upper bound, convergence of
3.2 Hinge-based splitting point identification
To identify a splitting point along
Non-greedy algorithms are much more computationally expensive as greedy algorithms. However, since our method adapts in a one-dimensional search space, the computational impact remains limited. The Hinging Hyperplane (HH) algorithm of Breiman [29] and its modification [27] offers a good trade-off between model quality and efficiency. It generates an AHHM
of L superimposed HHs
whose surface looks like the shape of an open book [27].
Algorithm 2 shows how we estimate an optimal AHHM
Algorithm 2:
To determine candidate splitting points along identified split direction in partition t k .

To determine the appropriate number of HHs for the underlying nonlinear structure, our algorithm 1) incrementally increases
with the residual sum of squares RSS and the model complexity o = 2L. The additional HH is initialized with its hinge placed at the largest gap between existing hinges. Once added to
of the others with
To verify that p contains a sufficient curvature for splitting (pre-pruning criterion), we estimate an affine model
3.3 Integration into HILOMOT
Figure 3 illustrates our splitting method, enhancing HILOMOT to a Semi-Greedy Model Tree algorithm (SG-MOT). To identify the most significant curvature in

Process of generating a splitting function in SG-MOT from up to 3 candidate pairs of v and
If no candidate pair identifies significant curvature, no splitting is performed and the node becomes a leaf
4 Experimental analysis
To demonstrate the advantages of SG-MOT in split interpretability, model accuracy, and model complexity, we conduct an extensive experimental analysis on synthetic data. We first illustrate in Section 4.1 the interpretability of a tree generated by SG-MOT. We then compare SG-MOT with original HILOMOT and restrict our benchmarking to other related nonparametric methods for a fair comparison. Specifically, we generate axis-orthogonal model trees using the well-established algorithms GUIDE [3] and M5′ [31]. A boosting ensemble (BE) of trees, known for its strong performance, serves as an additional accuracy baseline [32]. Further details on our experimental setup and results are provided in Sections 4.2 and 4.3, respectively. Finally, we discuss the results in Section 4.4.
4.1 Illustrative example
Figure 4 shows the first four splits in SG-MOT to approximate

First four splits of SG-MOT to approximate
Functions and data distributions to generate synthetic data for the experimental analysis.
| Function
|
Ref. | M | Distribution of u |
|---|---|---|---|
|
|
[33] | 5 |
|
|
|
[6] | 5 |
|
|
|
[34] | 7 |
|
|
|
[33] | 10 |
|
|
|
[13] | 10 |
|
|
|
|||
|
|
[35] | 6 |
|
|
|
[36] | 8 |
|
|
|
[37] | 20 |
|
|
|
[38] | 8 |
|
|
|
[39] | 2 |
|
|
|
[40] | 2 |
|
To effectively approximate f1 by local affine models, the input space must be partitioned axis-orthogonal in u3 and axis-oblique in the subspace spanned by u1 and u2. Moreover, the coefficients
When generating the tree, SG-MOT sequentially selects the node with the largest error for splitting until a termination criterion occurs. It begins with the root t1, where our method generates an almost axis-orthogonal split in u3. Projecting the residuals onto this direction reveals the nonlinear quadratic effect of u3. In this one-dimensional space, a single split point is identified, dividing the samples almost balanced into t2 and t3. The next split occurs in t3, where SG-MOT detects the nonlinear sinusoidal structure along u1 and u2, mirrored at the origin of p due to the negative signs of
Based on these splits, we can recognize that u1, u2 and u3 have nonlinear effects on f1 and that u1 and u2 interact with each other. With HILOMOT, for example, the splitting direction in t1 would be p = 0.27u1 − 0.07u2 + 0.74u3 + 0.01u4 + 0.18u5. With such a split, the effects of u2 and u5 and interactions with u1 are misinterpreted. A projection onto this direction would not reveal any meaningful nonlinear structure.
4.2 Experimental setup
The data for our experimental analysis are generated by 11 different functions, which are listed in Table 2. Except for f6, f7 and f11, all functions are common test functions for evaluating the performance of regression models. The functions f6 and f7 have been used respectively to evaluate active and passive learning strategies with regression models, and f11 for global optimization methods. The function f2 differs from f1 only in terms of the definition range of u1 and u2, and f4 and f6 have inputs that do not affect the response.
For each function, we conduct three experiments with different training data sizes of Ntr = 200, 400, 800, resulting in 33 experiments. To ensure meaningful results, we average each experiment over 200 independently sampled training sets. In each run, all models are trained and evaluated on the same data, using a test set of Ntest = 2,000 labeled samples, randomly drawn from the same distribution as the training data (see Table 2). White Gaussian noise with mean
To evaluate the models, we calculate their average root mean square error
for each experiment over all independent runs. To clarify differences between the tested model trees and the baseline, we normalize
We perform our benchmarking in MATLAB, where we use the LMN-Tool toolbox [41] for SG-MOT and HILOMOT, the M5PrimeLab toolbox [42] for M5′, and the Statistics and Machine Learning toolbox [43] for BE. GUIDE is generated by an external toolbox [44]. Except for GUIDE, where a suitable model structure is already generated by the integrated post-pruning and a cross validation (CV), we carry out the following hyperparameter tuning.
SG-MOT: Independent line search for steepness κ ∈ [1, 8] and bandwidth b ∈ [0.4, 2] using CV.
HILOMOT: Line search for κ ∈ [1, 8] using CV.
M5′: Grid search for aggressive pruning {true, false} and κ ∈ [0.5, 30] using CV.
BE: Integrated Bayesian Optimization for number of trees, size of trees and learning rate for shrinkage.
In order to achieve comparable model structures, we use local affine models without any stepwise selection in GUIDE and M5′. Apart from this, the default settings are used for the respective toolboxes.
4.3 Experimental results
Table 3 presents the results of our experimental analysis in which the overall best results are obtained by SG-MOT. In 19 out of 33 experiments,
Results for 33 experiments. The test error
![]() |
On average, SG-MOT produces 5.9 % lower errors and 10.1 % smaller trees than HILOMOT. In 24 out of 33 experiments,
4.4 Discussion
As shown in Section 4.1, SG-MOT produces more interpretable splits, which can also become nearly axis-orthogonal if necessary. The split direction captures only inputs or interacting input groups with a nonlinear effect on the residuals, enabling identification and visualization of these effects in a one-dimensional plot. These advantages are not provided by HILOMOT.
Our findings in Section 4.3 further show that SG-MOT produces smaller trees than HILOMOT while maintaining comparable or even superior accuracy, enhancing the overall interpretability of an oblique tree [5]. SG-MOT usually outperforms its competitors in terms of accuracy across all sizes of Ntr. This is particularly relevant for Ntr = 200, as it demonstrates the efficacy of SG-MOT for applications on small data. Furthermore, it is remarkable that both SG-MOT and HILOMOT outperform BE with an average error reduction of respectively 24.8 % and 21.7 %.
A detailed analysis confirms that the semi-greedy properties of SG-MOT overcome the short-sightedness of greedy algorithms, producing splits that require no correction. For example, for f2 our method identifies 3 optimal candidate splitting points along the direction in which the sinusoidal nonlinear structure of 10 sin(πu1u2) oscillates (Figure 5a). Additionally, our method’s independence from the overlap degree in Ψ yields two key advantages. The overlap can be adapted without any restrictions to the (dis)continuity of f and the required model complexity for a suitable fit. The first advantage becomes evident with f5, where SG-MOT effectively handles non-differentiable transitions (Figure 5b) by an almost crisp model output. The second advantage is confirmed by f1, f4, f10 and f11, for which SG-MOT becomes better than HILOMOT as N increases. With more training data, SG-MOT applies lower smoothing, reducing overlap regularization at deeper levels. This enables a more complex model, whereas HILOMOT tends to underfit.

Exemplary plots to explain benchmarking results.
The results in Section 4.3 also reveal limitations of SG-MOT. Some nonlinear effects from individual inputs remain partially undetected, e.g. u19 in f8 and u2 in f4 for Ntr = 200. A specific challenge arises with f6, where increasing oscillations along u1 (see Figure 5d) become inseparable in p = 0.99u1 − 0.04u2 − 0.04u3 + 0.11u4 + 0.09u5 + 0.06u6 once other coefficients in
5 Conclusion and future work
In this work, we improved HILOMOT by a novel semi-greedy splitting method to SG-MOT. Our approach first estimates the direction of the strongest curvature in a partition, which cannot be approximated by the existing tree. Therefore, local residual gradients are analyzed in an iterative procedure. Then, the residuals are projected onto this direction and fitted with a one-dimensional AHHM, whose hinges serve as optimal candidate splitting points. SG-MOT selects the candidate that best balances the partitioned data.
Experiments on synthetic data show that SG-MOT produces meaningful, interpretable splits that reveal and visualize nonlinear properties of the training data. Extensive benchmarking confirms that SG-MOT and HILOMOT significantly outperform established regression methods in accuracy. Compared to HILOMOT, SG-MOT yields more precise, smaller trees by generating more optimal splits that require no significant corrective adjustments and better matching tree complexity to the data. This is achieved by decoupling the overlap from the splitting method and introducing an additional pre-pruning criterion.
For future work, we will address the limitations of our method. Since SG-MOT has a computational time 2–3 times longer than HILOMOT, we aim to improve the computational efficiency by an already developed but unvalidated subset selection method that limits residuals to significant ones. We also seek to reduce the overhead from tuning an additional hyperparameter by introducing a self-tuning approach. While our method currently supports only axis-oblique trees with local affine models, an extension to more complex local models is in development. Finally, we aim to enhance accuracy and interpretability by enforcing sparsity in axis-oblique splits, e.g., via a L1 regularization.
About the authors

Marvin Schöne is a Ph.D. student at the University of Siegen and researcher at the Institute for Data Science Solutions. He received his master’s degree in electrical engineering from the University of Applied Sciences and Arts Bielefeld in 2019. His research topics are design of experiments, local model networks and active learning.

Martin Kohlhase is Professor of Control and Automation at the University of Applied Sciences and Arts Bielefeld and co-chair of the Institute for Data Science Solutions. He received his doctor’s degree in 2011 at the Technical University of Darmstadt. His research topics are nonlinear system identification, design of experiments, data-driven models and active learning.

Oliver Nelles is Professor at the University of Siegen in the Department of Mechanical Engineering and chair of Automatic Control – Mechatronics. He received his doctor’s degree in 1999 at the Technical University of Darmstadt. His research topics are nonlinear system identification, design of experiments, metamodeling and local model networks.
-
Research ethics: Not applicable.
-
Informed consent: Not applicable.
-
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
-
Use of Large Language Models, AI and Machine Learning Tools: DeepL to improve the language.
-
Conflict of interest: The authors state no conflict of interest.
-
Research funding: This work was funded by the Ministry of Economic Affairs, Industry, Climate Action and Energy of the State of North Rhine-Westphalia within the project AI4ScaDa, grant number 005-2111-0015.
-
Data availability: Not applicable.
References
[1] A. Dogan and D. Birant, “Machine learning and data mining in manufacturing,” Expert Syst. Appl., vol. 166, p. 114060, 2021. https://doi.org/10.1016/j.eswa.2020.114060.Search in Google Scholar
[2] D. L. Banks, R. T. Olszewski, and R. A. Maxion, “Comparing methods for multivariate nonparametric regression,” Commun. Stat. Simulat. Comput., vol. 32, no. 2, pp. 541–571, 2003. https://doi.org/10.1081/sac-120017506.Search in Google Scholar
[3] W.-Y. Loh, “Regression trees with unbiased variable selection and interaction detection,” Stat. Sin., vol. 12, no. 2, pp. 361–386, 2002.Search in Google Scholar
[4] O. Nelles, Nonlinear System Identification: From Classical Approaches to Neural Networks, Fuzzy Models, and Gaussian Processes, Cham, Springer International Publishing, 2020.10.1007/978-3-030-47439-3Search in Google Scholar
[5] V. G. Costa and C. E. Pedreira, “Recent advances in decision trees: an updated survey,” Artif. Intell. Rev., vol. 56, no. 5, pp. 4765–4800, 2023. https://doi.org/10.1007/s10462-022-10275-5.Search in Google Scholar
[6] M. Schöne and M. Kohlhase, “Curvature-oriented splitting for multivariate model trees,” in 2021 IEEE Symposium Series on Computational Intelligence (SSCI), Orlando, FL, USA, IEEE, 2021, pp. 1–9.10.1109/SSCI50451.2021.9659858Search in Google Scholar
[7] W.-Y. Loh, “Fifty years of classification and regression trees,” Int. Stat. Rev., vol. 82, no. 3, pp. 329–348, 2014, https://doi.org/10.1111/insr.12016.Search in Google Scholar
[8] M. Kretowski, Evolutionary Decision Trees in Large-Scale Data Mining, vol. 59 of Studies in Big Data, Cham, Springer International Publishing, 2019.10.1007/978-3-030-21851-5Search in Google Scholar
[9] O. Nelles, “Axes-oblique partitioning strategies for local model networks,” in 2006 IEEE Conference on Computer Aided Control System Design, 2006 IEEE International Conference on Control Applications, 2006 IEEE International Symposium on Intelligent Control, Munich, Germany, IEEE, 2006, pp. 2378–2383.10.1109/CACSD-CCA-ISIC.2006.4777012Search in Google Scholar
[10] S. Ernst, “Hinging hyperplane trees for approximation and identification,” in Proceedings of the 37th IEEE Conference on Decision and Control, vol. 2, Tampa, FL, USA, IEEE, 1998, pp. 1266–1271.10.1109/CDC.1998.758452Search in Google Scholar
[11] P. Pucar and M. Millnert, “Smooth hinging hyperplanes – an alternative to neural nets,” in European Control Conference (ECC), vol. 3, Rome, Italy, 1995, pp. 1173–1178.Search in Google Scholar
[12] L. E. Scales, Introduction to Non-Linear Optimization, London, Macmillan Education UK, 1985.10.1007/978-1-349-17741-7Search in Google Scholar
[13] K.-C. Li, H.-H. Lue, and C.-H. Chen, “Interactive tree-structured regression via principal hessian directions,” J. Am. Stat. Assoc., vol. 95, no. 450, pp. 547–560, 2000. https://doi.org/10.2307/2669398.Search in Google Scholar
[14] R. D. Cook, “Principal hessian directions revisited,” J. Am. Stat. Assoc., vol. 93, no. 441, pp. 84–94, 1998. https://doi.org/10.2307/2669605.Search in Google Scholar
[15] A. Dobra and J. Gehrke, “SECRET: a scalable linear regression tree algorithm,” in Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Canada, ACM, 2002, pp. 481–487.10.1145/775047.775117Search in Google Scholar
[16] J. Gama, “Functional trees,” Mach. Learn., vol. 55, no. 3, pp. 219–250, 2004. Available at: https://doi.org/10.1023/b:mach.0000027782.67192.13.10.1023/B:MACH.0000027782.67192.13Search in Google Scholar
[17] G. H. Golub and V. Pereyra, “The differentiation of pseudo-inverses and nonlinear least squares problems whose variables separate,” SIAM J. Numer. Anal., vol. 10, no. 2, pp. 413–432, 1973, https://doi.org/10.1137/0710036.Search in Google Scholar
[18] T. Kenesei and J. Abonyi, “Hinging hyperplane based regression tree identified by fuzzy clustering and its application,” Appl. Soft Comput., vol. 13, no. 2, pp. 782–792, 2013. https://doi.org/10.1016/j.asoc.2012.09.027.Search in Google Scholar
[19] R. Hathaway and J. Bezdek, “Switching regression models and fuzzy clustering,” IEEE Trans. Fuzzy Syst., vol. 1, no. 3, pp. 195–204, 1993. https://doi.org/10.1109/91.236552.Search in Google Scholar
[20] J. Wang, R. Fujimaki, and Y. Motohashi, “Trading interpretability for accuracy: oblique treed sparse additive models,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney NSW Australia, ACM, 2015, pp. 1245–1254.10.1145/2783258.2783407Search in Google Scholar
[21] J. A. Tropp and A. C. Gilbert, “Signal recovery from random measurements via orthogonal matching pursuit,” IEEE Trans. Inf. Theor., vol. 53, no. 12, pp. 4655–4666, 2007. https://doi.org/10.1109/tit.2007.909108.Search in Google Scholar
[22] M. Czajkowski and M. Kretowski, “The role of decision tree representation in regression problems – an evolutionary perspective,” Appl. Soft Comput., vol. 48, no. 1, pp. 458–475, 2016. https://doi.org/10.1016/j.asoc.2016.07.007.Search in Google Scholar
[23] B. Li, Sufficient Dimension Reduction – Methods and Applications with R, New York, CRC Press, 2018.10.1201/9781315119427Search in Google Scholar
[24] P. Chaudhuri, M.-C. Huang, W.-Y. Loh, and R. Yao, “Piecewise-polynomial regression trees,” Stat. Sin., vol. 4, no. 1, pp. 143–167, 1994.Search in Google Scholar
[25] R. Blanquero, E. Carrizosa, C. Molero-Río, and D. R. Morales, “On sparse optimal regression trees,” Eur. J. Oper. Res., vol. 299, no. 3, pp. 1045–1054, 2022. https://doi.org/10.1016/j.ejor.2021.12.022.Search in Google Scholar
[26] J. Nocedal and S. J. Wright, Numerical Optimization. Springer Series in Operations Research and Financial Engineering, 2nd ed. New York, Springer, 2006.Search in Google Scholar
[27] P. Pucar and J. Sjoberg, “On the hinge-finding algorithm for hinging hyperplanes,” IEEE Trans. Inf. Theor., vol. 44, no. 3, pp. 1310–1319, 1998. https://doi.org/10.1109/18.669422.Search in Google Scholar
[28] S. Esmeir and S. Markovitch, “Anytime learning of decision trees,” J. Mach. Learn. Res., vol. 8, no. 33, pp. 891–933, 2007.Search in Google Scholar
[29] L. Breiman, “Hinging hyperplanes for regression, classification, and function approximation,” IEEE Trans. Inf. Theor., vol. 39, no. 3, pp. 999–1013, 1993. https://doi.org/10.1109/18.256506.Search in Google Scholar
[30] C. F. Ansley and R. Kohn, “Convergence of the backfitting algorithm for additive models,” J. Aust. Math. Soc. A Pure Math. Stat., vol. 57, no. 3, pp. 316–329, 1994. https://doi.org/10.1017/s1446788700037721.Search in Google Scholar
[31] J. R. Quinlan, “Learning with continuous classes,” in Proceedings of AI ’92, Hobart, Tasmania, World Scientific, 1992, pp. 343–348.Search in Google Scholar
[32] M. Fernández-Delgado, M. Sirsat, E. Cernadas, S. Alawadi, S. Barro, and M. Febrero-Bande, “An extensive experimental survey of regression methods,” Neural Netw., vol. 111, no. 1, pp. 11–34, 2019. https://doi.org/10.1016/j.neunet.2018.12.010.Search in Google Scholar PubMed
[33] J. H. Friedman, “Multivariate adaptive regression splines,” Ann. Stat., vol. 19, no. 1, pp. 1–66, 1991. https://doi.org/10.1214/aos/1176347963.Search in Google Scholar
[34] H. Zhang, C.-Y. Yu, H. Zhu, and J. Shi, “Identification of linear directions in multivariate adaptive spline models,” J. Am. Stat. Assoc., vol. 98, no. 462, pp. 369–376, 2003. https://doi.org/10.1198/016214503000152.Search in Google Scholar
[35] R. B. Gramacy and H. K. H. Lee, “Adaptive design and analysis of supercomputer experiments,” Technometrics, vol. 51, no. 2, pp. 130–145, 2009. https://doi.org/10.1198/tech.2009.0015.Search in Google Scholar
[36] H. Dette and A. Pepelyshev, “Generalized Latin hypercube design for computer experiments,” Technometrics, vol. 52, no. 4, pp. 421–429, 2010. https://doi.org/10.1198/tech.2010.09157.Search in Google Scholar
[37] E. N. Ben-Ari and D. M. Steinberg, “Modeling data from computer experiments: an empirical comparison of kriging with MARS and projection pursuit regression,” Qual. Eng., vol. 19, no. 4, pp. 327–338, 2007. https://doi.org/10.1080/08982110701580930.Search in Google Scholar
[38] J. An and A. Owen, “Quasi-regression,” J. Complex, vol. 17, no. 4, pp. 588–607, 2001. https://doi.org/10.1006/jcom.2001.0588.Search in Google Scholar
[39] Y. B. Lim, J. Sacks, W. J. Studden, and W. J. Welch, “Design and analysis of computer experiments when the output is highly correlated over the input space,” Can. J. Stat., vol. 30, no. 1, pp. 109–126, 2002. https://doi.org/10.2307/3315868.Search in Google Scholar
[40] L. C. W. Dixon and G. P. Szego, “The global optimization problem: an introduction,” Towards Glob. Optim., vol. 2, no. 1, pp. 1–15, 1978.Search in Google Scholar
[41] B. Hartmann, T. Ebert, T. Fischer, J. Belz, G. Kampmann, and O. Nelles, “LMNTOOL – Toolbox zum automatischen Trainieren lokaler Modellnetze,” in Proceedings 22. Workshop Computational Intelligence, Dortmund, KIT Scientific Publishing, 2012.Search in Google Scholar
[42] G. Jekabsons, M5PrimeLab, 2020 [Online]. http://www.cs.rtu.lv/jekabsons/Files/M5PrimeLab.pdf [Accessed: Feb. 17, 2025].Search in Google Scholar
[43] I. The MathWorks, “Fitrensemble: fit ensemble of learners for regression,” [Online]. https://de.mathworks.com/help/stats/fitrensemble.html [Accessed: Feb. 17, 2025].Search in Google Scholar
[44] W.-Y. Loh, User Manual for GUIDE Ver. 42.6∗, 2024 [Online]. https://pages.stat.wisc.edu/loh/treeprogs/guide/guideman.pdf [Accessed: Feb. 17, 2025].Search in Google Scholar
© 2025 the author(s), published by De Gruyter, Berlin/Boston
This work is licensed under the Creative Commons Attribution 4.0 International License.
Articles in the same Issue
- Frontmatter
- Editorial
- Selected contributions from the workshops “Computational Intelligence” in 2023 and 2024
- Methods
- Nonlinear system categorization for structural data mining with state space models
- Incorporation of structural properties of the response surface into oblique model trees
- Takagi-Sugeno based model reference control for wind turbine systems in frequency containment scenarios
- On autoregressive deep learning models for day-ahead wind power forecasts with irregular shutdowns due to redispatching
- Applications
- Efficiently determining the effect of data set size on autoencoder-based metamodels for structural design optimization
- Kalibriermodellerstellung und Merkmalsselektion für die mikromagnetische Materialcharakterisierung mittels maschineller Lernverfahren
- Investigating quality inconsistencies in the ultra-high performance concrete manufacturing process using a search-space constrained non-dominated sorting genetic algorithm II
- EAP4EMSIG – enhancing event-driven microscopy for microfluidic single-cell analysis
Articles in the same Issue
- Frontmatter
- Editorial
- Selected contributions from the workshops “Computational Intelligence” in 2023 and 2024
- Methods
- Nonlinear system categorization for structural data mining with state space models
- Incorporation of structural properties of the response surface into oblique model trees
- Takagi-Sugeno based model reference control for wind turbine systems in frequency containment scenarios
- On autoregressive deep learning models for day-ahead wind power forecasts with irregular shutdowns due to redispatching
- Applications
- Efficiently determining the effect of data set size on autoencoder-based metamodels for structural design optimization
- Kalibriermodellerstellung und Merkmalsselektion für die mikromagnetische Materialcharakterisierung mittels maschineller Lernverfahren
- Investigating quality inconsistencies in the ultra-high performance concrete manufacturing process using a search-space constrained non-dominated sorting genetic algorithm II
- EAP4EMSIG – enhancing event-driven microscopy for microfluidic single-cell analysis
