Optimization of student learning status by instructional intervention decision-making techniques incorporating reinforcement learning

Jifeng Gong

doi:10.1515/nleng-2025-0155

Enjoy 40% off

academic books on De Gruyter Brill *

Article Open Access

Optimization of student learning status by instructional intervention decision-making techniques incorporating reinforcement learning

Jifeng Gong

Published/Copyright: September 8, 2025

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Nonlinear Engineering Volume 14 Issue 1

Abstract

The aim of this study is to solve the problem that existing instructional intervention decision-making techniques are difficult to achieve accurate interventions when optimizing students’ learning contexts. Therefore, this study combines the reinforcement learning model and the quantile tracking regression model to construct a theoretical model for instructional intervention decision-making and validate its effectiveness. The experimental results showed that the proposed model had high prediction accuracy in different student groups, and its application in practical teaching practice could obviously improve students’ learning effectiveness. Compared with the comparison method, the research model performed better in accuracy, precision, recall rate, and F1 value, and the accuracy was as high as 96.4%. In different educational data sets, the F1 scores of the proposed model were all above 0.89. The results show that the model can achieve accurate teaching intervention, thus optimizing students’ learning conditions. The research lays the foundation for creating a more intelligent and adaptable educational system, and promotes the intelligent development of educational technology.

Keywords: reinforcement learning; teaching intervention decisions; learning status; quantile trace regression modeling; estimating error rates

1 Introduction

The widespread use of various systems and platforms in education, due to the development of computer technology and the improvement of educational informatization management, has resulted in a massive growth of educational resource data. This growth has posed significant challenges to educational evaluation and instructional decision analysis [1,2]. In education, big data analytics can provide scientific support for educational decision-making and personalized services for learners. It helps to understand the process of education and teaching. The integration of information technology and modern education can achieve accurate teaching and personalized learning, which is becoming a development trend in education [3,4]. Personalized instruction, on the other hand, is tailored to individual needs but lacks the economies of scale. It is important to strike a balance between the two approaches. Although the classroom system has a scale effect, which makes it easier to popularize education and greatly promotes the progress and development of science and technology, it usually adopts the same teaching interventions for all students, which makes it difficult to pay enough attention to individual differences [5,6]. The use of big data to make accurate teaching intervention decisions (TIDs) is a promising solution. It is challenging to fully depict the rules of education, nevertheless, because of the complexity and dynamics of educational data. As a result, current TID technology is limited in its versatility and can only be used in certain scenarios. Additionally, the level of intelligence is insufficient [7,8,9]. Based on this, the research integrates a reinforcement learning (RL) model and a quantile trace regression (QTR) model to construct a theoretical model of TID. The purpose of this model is to enhance the generality of the current TID technology and to utilize computer technology to improve its intelligence level.

The research is divided into five sections. The first section is to summarize and discuss the current research on teaching intervention decision-making technology. The second section is to build the theoretical model and application framework of precision teaching. The third section is to verify the theoretical model of precision TTD-making. The fourth section is to discuss the research results. The fifth part is the summary of the whole article.

2 Related work

Due to the complexity and dynamics of education itself, its research problems usually involve many aspects of factors and interrelationships. How to accurately intervene in the decision-making of student teaching has become the focus of the current education big data research [10,11]. Usher et al. optimized teachers’ distance learning decisions by using learner data and data-driven to address the problems associated with distance TID during the New Crown Epidemic, thereby effectively improving the quality of distance education while also improving student learning outcomes [12]. Carter et al. addressed the issues related to teaching decision-making interventions for students with special educational needs (SEN) by providing a comprehensive discussion of teaching strategies for students with SEN in mainstream schools in Australia and conducting a survey of stakeholders, thereby providing data theoretical support for proposing a rationalized TID technique [13]. Yulianti et al. addressed the issues related to the mediating role of parents in student TID by providing a comprehensive discussion of the TID approach in multiple schools in Indonesia by using multilevel regression analysis, thus providing data support for the optimization of the TID technique [14]. To address the problems with TID in terms of students’ professional development, Gesel et al. proposed a TID technique that uses big data technology to integrate teachers’ knowledge, skills, and self-efficacy. This effectively raised teaching standards and boosted student academic performance [15].

In addition, Pesce et al. addressed the problems of TID in physical education by conducting a 2-year experiment in a classroom using random selection, thereby optimizing TID techniques in physical education while promoting students’ self-control [16]. Gion et al. addressed the problems associated with TID in a multilevel classroom by using an empirical experiment to synthesize the learning of different races in the same classroom. This effectively optimized the quality of the classroom while enhancing the effectiveness of multilevel classroom interventions [17]. Jungjohann and Gebhardt constructed a questionnaire portfolio model of TID-related issues in inclusive education by optimizing classroom assessment dimensions. This could improve the quality of teachers’ teaching based on optimized teaching assessment and inform the improvement of learning outcomes for students with TID [18]. Kim and Kim addressed the problems related to degenerate Bernoulli numbers and degenerate Euler numbers by proposing to derive fully degenerate Bernoulli polynomials and degenerate Euler polynomials using moment representations of the parameters of Laplacian random variables. It allowed further study of degenerate hyperbolic functions and optimized product expansions of related functions [19]. Lysytska et al. proposed a combined online and offline model for teaching intervention technologies in response to the challenges of teaching foreign languages in the context of turbulent world events. The creation of an adaptive online learning platform, the development of a multimedia resource library, and the selection of innovative pedagogical tools have been identified as effective strategies to meet the psychological and pedagogical needs of students [20].

Current decision-making technology of teaching intervention has poor versatility due to the educational data itself, which is difficult to be promoted in different technical environments, and has not yet realized both scaled and personalized educational TID. At the same time, current purely personalized teaching intervention decision-making technologies often exhibit representational and diversified characteristics, making it difficult to adapt to dynamic and continuous teaching processes. In education, RL has been widely used for personalized learning path recommendation and intelligent tutoring systems. In particular, Q-Learning is chosen for its simplicity and effectiveness in partially observable Markov decision processes [21]. Although deep RL performs well when dealing with high-dimensional data, in educational interventions, the dimensionality and complexity of the data are usually low and Q-learning is sufficient to deal with it. In addition, the QTR model is chosen for its ability to capture the nonlinear relationship between behavioral patterns and grade point average (GPA), which has been shown to be useful for educational decision-making in previous studies [22]. The study proposes a theoretical model of precision teaching that enables the computability of teaching interventions. This model lays the foundation for data-driven intelligent teaching interventions and constructs an application framework to strengthen the generality of teaching interventions. It provides a reference for the practical promotion of precision teaching interventions (PTIs). Additionally, the theoretical model’s learning effectiveness prediction method achieves differentiated and dynamic predictions of learning achievement. The integration of RL experiments with precise TID experiments enables intelligent and dynamic teaching intervention. This research is innovative as a whole.

3 Optimizing students’ learning status with a big data-based theoretical model of PTIs

The current science of traditional teaching interventions in actual teaching still needs to be strengthened. Therefore, this section mainly constructs a theoretical model of quasi-instructional intervention. Among them, the learning effect prediction method is its guarantee and the TID method is its key.

3.1 Precision instructional interventions modeling and application architecture building

The central concern of this study using big data for instructional interventions is to facilitate student learning. It also provides students with personalized and targeted instructional interventions based on their actual student status and characteristics. Thus, the problem PTI can be transformed into an optimization problem, i.e., finding the optimal PTI to achieve optimization of student learning states (LSs). Based on this, the study uses big data to construct a theoretical model of PTI, as shown in Figure 1.

Figure 1

Schematic diagram of a theoretical model for PTI based on big data.

In Figure 1, the PTI model constructed in the study contains learning characteristics and feature characterization, learning effectiveness prediction, and TID. LS and feature characterization are the prerequisites for accurate instructional interventions, learning effectiveness prediction is its guarantee, and TID is its key. Learning effectiveness prediction relies on LS and feature characterization, and is an important feedback for the effectiveness of instructional interventions. TID is realized based on LS and features with the goal of optimizing learning effectiveness. Learning characteristics and characteristics representations refer to the actual situation of students in educational activities and their relevant characteristics, according to the different student status of the existence of large personalized differences [23,24,25]. In Figure 1, the study uses big data information technology to categorize students’ learning characteristics into static and dynamic characteristics. Among them, static characteristics further include basic characteristics and learning styles, while dynamic characteristics further include behavioral characteristics, cognitive levels, and affective states. Therefore, in essence, students’ LSs and characteristics are portrayed by five dimensions, which can be expressed as a quintuple, the expression of which is shown in Eq. (1) [26].

(1) T = ( TJ , TX , TW , TR , TG ) ,

where T denotes LSs and characteristics, and TJ denotes basic characteristics. TX denotes learning style. TW denotes behavioral characteristics. TR denotes cognitive level and TG denotes affective state. Among them, TJ mainly consists of the student’s own information related to his/her academics, and the composition is represented by a numerical code. TX can be measured by self-selected methods, learning style scales, and big data analysis techniques, and the study focuses on the measurement of students’ learning styles through the use of big data techniques. TW measurement currently consists of coding theory using theoretical classifications of student behavior and the use of big data analytics to obtain learner behavioral characteristics from learner activity data. The study combines the two under the guidance of the theory of multidimensional characteristics of student behavior and uses big data analytics to mine student behavioral data in order to be able to better describe the characteristics of student behavior. TR is measured by means of educational assessment, which measures the cognitive structure of students based on their actual responses in the assessment. TG is measured by dividing the analyzed data of student subjects into three parts, namely, textual analysis, perceptions of outward behaviors, and representations of physiological signals, and by combining the three in an all-encompassing way.

Considering the problem of practical application of the theoretical model in reality, the study constructed the application architecture of PTI in teaching practice. Thus, the application architecture of PTI using big data is shown in Figure 2.

Figure 2

Application architecture of PTI based on big data.

In Figure 2, the architecture consists of a data layer, a methodology layer, a result layer, and an application layer from the bottom up, with the four layers progressing one after another and interlocking. The data layer provides an important source of information for the implementation of precise instructional interventions. The method layer is based on students’ LS and characteristics, and provides key technology for realizing intelligent and dynamic PTI, which is the way and medium to realize precision teaching. The result layer is the precise teaching intervention strategy obtained by solving the core technology of the method layer, which is the law to be followed for the implementation of precise teaching intervention. Intelligent and dynamic teaching interventions can be realized with the support of data layer, method layer, and result layer.

3.2 Learning effectiveness prediction method based on QTR

Based on the theoretical model of PTI constructed in the previous study, the study proceeds with the design of the learning effectiveness prediction method. Considering the relationship between students’ behavior and performance before and after learning, as well as the problem of relational similarity between adjacent pixel points of images, the study proposed a learning effectiveness prediction method using QTR. Conventionally speaking, statistical modeling is a crucial step in the process of statistical learning, which is a useful tool for extracting information from data [27,28,29]. Auxiliary data, which are typically in vector form, are frequently employed as variables to enhance learning performance. The development of big data technology has significantly enhanced the performance of data collection, and functional and matrix covariates have gradually appeared, which are widely used in statistics learning problems [30,31,32]. The simplest method in the analysis of matrix-type covariates is to construct the trace regression (TR) model, the expression of which is shown in Eq. (2) [33].

(2) ℜ = tr ( Φ T ℑ ) + φ ∗ ,

where ℜ denotes the response variable and tr ( ⋅ ) denotes the trace of the matrix. Φ is the matrix of unknown regression coefficients and ℑ denotes the matrix of explanatory variables possessing fixed dimensions w 1 and w 2 . φ ∗ denotes the zero-mean model error. It is worth noting that the environmental dimension of Φ ( w 1 × w 2 ) may be very large. At this point, it is necessary to suppose that Φ is a low-rank structure, meaning that its degree of freedom is significantly smaller than that of w 1 × w 2 . This is because it becomes impossible to regulate the real sparsity of the elements due to the complexity of the model of Eq. (1). Most of the current studies for the TR model focus on estimating the conditional mean, and very few relevant studies exist that consider the QTR, so the study analyzes the QTR model, the expression of which is shown in Eq. (3).

(3) ℜ = tr ( A t T ℑ ) + φ ,

where A denotes the quantile regression coefficient matrix, and t denotes the quantile that is in the range of 0–1. φ denotes random error. Eq. (3) computes the matrix inner product between the explanatory variable matrix ℑ and the parameter matrix A , and introduces a quantile-specific error term φ at each quantile level t . Therefore, it enables fine-grained prediction across different learning performance levels (e.g., low-, medium-, and high-performing groups). Compared to traditional regression methods, the QTR model can simultaneously capture variations in the relationship between behavioral traits and learning outcomes across distributional positions. With the support of this QTR model, the study constructed a prediction method for students’ learning effectiveness. For the determination index of students’ learning effectiveness prediction, the study chooses the most direct academic performance, and transforms the problem into a regression problem to solve. The final construction of the learning effectiveness prediction method architecture is shown in Figure 3.

Figure 3

Architecture diagram of learning effectiveness prediction method based on QTR model.

In Figure 3, considering the strong correlation between the dynamic data of students’ state and characteristics at the before and after stages of the teaching process, the direct regression using the zheshi data will seriously affect the prediction effect of learning effectiveness. Therefore, the study images the dynamic data of student status and features, which not only preserves the correlation between the dynamic data of student status and features but also avoids the influence of correlation on learning effectiveness. Based on this, the method integrates the advantages of QTR and TR, so that both QTR can be used to portray the different relationships between students’ LS and characteristics with different learning effectiveness. Moreover, TR can be used to portray the correlations between the ranks of the matrix variables used as regression inputs.

3.3 RL-based approach to TIDs

Based on the proposed method for predicting learning effectiveness, the study further proposes the TID method. At present, the development of targeted PTI strategies based on the actual conditions and characteristics of different students is the key issue for PTI. Furthermore, intelligence and dynamization are the important features of precision intervention decision-making, and the research introduces RL to construct the TID method to address these two features. RL, as an important branch of machine learning, has been widely used in sequential decision-making. Usually, RL utilizes the interaction between decision-making intelligences and the environment, and the continuous trial-and-error process to construct the TID method. In computational methods for understanding and automating goal-directed learning and decision-making, usually RL utilizes decision-making intelligences to interact with the environment and learn to obtain the optimal strategy in continuous trial and error [34,35,36]. From a methodological point of view, the application of RL in TID has the advantages of fitting the needs of precise teaching intervention, providing dynamic teaching intervention, facilitating the exploration of potential key factors affecting teaching intervention, and enriching the research methods of precise teaching intervention. Thus, the problem of precise instructional intervention is basically to solve the optimal choice function in order to maximize the learning effect, as indicated by Eq. (4), based on the theoretical model in Figure 1.

(4) a ∗ = arg max a E L ∣ B 1 = α 1 ∗ ( G 1 ) , B 2 = α 2 ∗ ( G 2 ) , … , B K = α K ∗ ( G K ) ,

where a ∗ denotes the optimal decision function and L denotes the learning effect. B denotes the instructional intervention, G K denotes the instructional history vector, which consists of G K . α denotes the decision function, K denotes the instructional stage, and E denotes the mathematical expectation. Eq. (4) is derived based on the Q-learning approach to RL and expresses the recursive relationship between the expected payoffs under the current state-action pair and the payoffs of subsequent states-actions. The related Q-function is optimized recursively, starting from the last stage and moving forward one step at a time until the best choice for the entire process is eventually found. The TID technique is constructed by the study using Q-learning, which is popular in RL. Figure 4 illustrates the total method flow.

Figure 4

Process of teaching intervention decision-making method based on Q-learning.

As illustrated in Figure 4, the method consists of first building an accurate statistical model of TID using Q-learning, then defining the Q-function, estimating the Q-function, and solving the optimal decision function. The Q-table, a central component of RL, stores the expected utility, or Q-value, of each state-action pair. In the context of educational interventions, states represent a student’s current learning situation, while actions represent possible instructional interventions. A state is an abstract representation of a student’s learning situation, including multiple dimensions such as the student’s current grade, study habits, engagement, and homework completion. For example, a state can be a vector containing a student’s most recent test score, attendance, activity on an online learning platform, and so on. Actions represent pedagogical interventions that teachers can take, such as providing additional tutoring, adjusting the difficulty of the course content, adding practice sessions, etc. Actions are selected based on teacher expertise and observation of student learning. The Q-value is initialized to zero or a small random number, indicating that there is no a priori knowledge of the expected utility of each state-action pair in the absence of experience. Its update follows the Q-learning algorithm, which learns the optimal utility of each state-action pair through trial and error. Reward functions, on the other hand, are defined based on student behavior and learning outcomes, with the goal of encouraging positive learning behaviors and improving learning effectiveness. Therefore, Eq. (5) displays the corresponding value expression for the decision function α in the creation of an accurate TID statistical model.

(5) J α = E α ∑ k = 1 K X k = ∫ ∑ k = 1 K X k α D α .

where J α denotes the value corresponding to the decision function α . D α denotes the distribution of the decision function generating instructional interventions corresponding to the generated random variables. X denotes the value-added learning effect. Eq. (6) illustrates the expression of a particular recursive connection that the value function fulfills since it is recursive.

(6) J α ( g k ) = max b k E [ X k + J k + 1 ( G k + 1 ) ∣ G k = g k , B k = b k ] ,

where b k denotes the student history vector, which consists of g k . b k denotes the elements in the instructional intervention B . The value function is the foundation of the RL method, which can be solved recursively in the opposite direction, and it can be done iteratively since it satisfies Eq. (6). The Q-function definition is defined accordingly in the last instructional stage K , which is expressed as shown in Eq. (7).

(7) F K ( g K , b K ) = E [ X K ∣ G K = g K , B K = b K ] α K ∗ ( g K ) = arg max b K F K ( g K , b K ) J K ( g K ) = max b K F K ( g K , b K ) ,

where F K ( g K , b K ) represents the expectation of the learning outcome that can be obtained by applying instructional intervention B K given the full history G K conditions prior to stage K . Moreover, the defined expression carried out at stage k is shown in Eq. (8).

(8) F k ( g k , b k ) = E [ X k + J k + 1 ( G k + 1 ) ∣ G k = g k , B k = b k ] = E [ X k + J k + 1 ( g k + 1 , b k , T k + 1 ) ∣ G k = g k , B k = b k ] α k ∗ ( g k ) = arg max b k F k ( g k , b k ) J k ( g k ) = max b k F k ( g k , b k ) ,

where T denotes the characteristics. F K ( g K , b K ) in Eqs. (7) and (8) is called the Q-function. If F K + 1 is defined to be zero, the Q-function definition for stage k is expressed as in Eq. (9).

(9) F k ( g k , b k ) = E [ X k + max b k + 1 F k + 1 ( g k + 1 , b k + 1 ) ∣ G k = g k , B k = b k ] .

Eq. (10), which defines the multi-stage optimal decision function using the Q-function, reflects this.

(10) a ∗ = ( a 1 ∗ ( G 1 ) , a 2 ∗ ( G 2 ) , … , a K ∗ ( G K ) ) a k ∗ ( g k ) = arg max b k F ( g k , b k ) ,

where a ∗ denotes the vector consisting of multi-stage optimal decision functions. In Q-function estimation, the study utilizes a linear model for estimation, which is expressed as shown in Eq. (11).

(11) F k ( g k , b k ; ζ k , γ k ) = ζ k T G k + B k ( γ k G k ) ,

where ζ and γ denote the parameter estimates. Therefore, the estimates of the regression parameters can be solved by recursion forward layer by layer from the last stage. Specifically, the least absolute contraction and selection operators can be utilized to obtain the two parameter estimates when the observed data of multiple students are given. The observed data are expressed as shown in Eq. (12).

(12) { T 1 j , B k j , L 1 j , … , T k j , B k j , L k j , … , T K j , B K j , L K j } ,

where j denotes the number of students. The ideal decision function for the relevant stage, which is stated as given in Eq. (13), is ultimately obtained by estimating the Q-function in the optimal decision function solution process.

(13) a k ∗ ( g k ) = arg max b k F k ( g k , b k ζ k , γ k ) .

Since B k = { 0 , 1 } , therefore based on Eq. (11) the optimal decision function for F k ( g k , b k ζ k , γ k ) maximization can be obtained to satisfy the equation. Its expression is shown in Eq. (14).

(14) a k ∗ ( g k ) = U ( γ k T G k > 0 ) ,

where U denotes the schematic function, i.e., this function takes the value of 1 when it holds, and 0 when vice versa. Based on this, the optimal decision function expression for all K stages is obtained as shown in Eq. (15).

(15) a ∗ = ( a 1 ∗ ( g 1 ) , … , a k ∗ ( g k ) , … , a K ∗ ( g K ) ) .

Eq. (15) allows for the precise targeting of instructional interventions at any point in the learning process, based on behavioral performance and student attributes.

4 Performance analysis of theoretical models of precision instructional interventions

To verify the validity of the PTI model, the study first uses simulation to validate the learning effectiveness prediction and teaching intervention methods of the model content, followed by its actual validation in high school mathematics and university linear algebra teaching practice.

4.1 Validation of learning effectiveness prediction methods and instructional interventions

To validate the effectiveness of the learning effectiveness prediction method, the study conducts experimental verification and analysis using a simulated dataset. Based on the behavioral patterns of students with different academic levels (high, medium, and low achievement), three types of student learning trajectories are manually designed. Specifically, four behavioral characteristics are defined: daily study time, task completion rate, post-class practice accuracy, and resource use frequency. Each feature is sampled according to different distribution patterns (e.g., normal distribution, skewed distribution) corresponding to the learning behaviors of high, average, and low performing students, respectively. During data processing, the behavioral features of each group of students are encoded into matrix-structured time series data, which are further transformed into input feature maps through image processing to serve as input samples for the QTR model.

The hardware configuration includes an Intel Core i7-12700 processor, 16 GB of RAM, and an NVIDIA GeForce RTX 3060 graphics card running Windows 11. The programming environment is based on Python 3.9, with major libraries including NumPy (v1.23.5), Pandas (v1.5.3), and Matplotlib (v3.7.1) for data processing and visualization, Statsmodels (v0.13.5) for regression modeling, and Scikit-learn (v1.2.2) for model evaluation and auxiliary processing. All experiments are conducted in a Jupyter Notebook environment.

The study first compares the three patterns of behavior corresponding to different levels of academic achievement, using GPA as the metric of evaluation, specifically referring to the 0.05 quantile for low achievement, the 0.5 quantile for moderate achievement, and the 0.95 quantile for high achievement. The results are shown in Figure 5.

Figure 5

Comparison of student behavior patterns with different learning outcomes. (a) Behavioral patterns of students with poor academic performance. (b) Behavioral patterns of students with average academic performance. (c) Behavioral patterns of students with excellent academic performance.

In Figure 5, different colors indicate the frequency of positive behaviors such as eating on time, studying in the library, and the number of times students enter the library. In Figure 5(a), the frequency of positive behaviors of students with poor academic performance is maintained between 0.0 and 0.5. In Figure 5(b), the frequency of most of the students with moderate academic performance is maintained between 0.6 and 0.8. In Figure 5(c), the frequency of most of the students with good academic performance is higher than 0.8. Taken together, the frequency of positive behaviors of students with good academic performance is higher, and this result is very important for personalized teaching. Therefore, Figure 4 is used as a basis to analyze it using the QTR model proposed by the study. The results are shown in Figure 6.

Figure 6

Comparison of parameter estimation between TR and QTR. (a) QTR model (0.95) estimation results. (b) QTR model (0.50) estimation results. (c) QTR model (0.05) estimation results. (d) Estimated results of the TR model.

Compared with the results of TR parameter estimation in Figure 6(d), the estimation results of QTR under different parameters in Figure 6(a) to Figure 6(c) are more superior. The QTR model is more capable of capturing the implicit relationship between behavioral patterns and different GPAs. Moreover, it is able to base its determination on the results of different students’ GPAs. This side-by-side comparison confirms the validity of the study’s proposed method of predicting students’ learning effectiveness using the GTR model. In order to further validate the results, the study selected 11 students and predicted the grades of the 11 students using TR and QTR with root mean squared error (RMSE) as the assessment index. Table 1 presents the findings.

Table 1

Comparison of results in predicting academic performance of 11 students using different methods

Real results	Predicted grades
–	QTR (0.95)	QTR (0.5)	QTR (0.05)	TR
85.33	94.7	58.72	67.47	68.45
88.94	97.42	55.08	65.92	68.45
81.46	82.18	59.01	85.09	74.90
81.02	83.57	47.86	68.22	61.99
82.40	97.98	57.22	69.94	71.02
80.60	90.82	60.08	72.07	72.53
79.05	89.51	42.64	47.85	55.00
79.55	89.73	52.13	64.19	65.32
76.58	80.58	53.46	67.47	66.27
69.73	82.11	42.02	60.97	58.84
66.93	84.10	49.86	54.56	56.09
RMSE	25.00	20.56	20.68	35.81

In Table 1, TR can only predict the approximate grades of the 11 students and cannot make predictions based on the students’ own learning conditions. QTR, on the other hand, can not only predict the students’ own LS and characteristics, but also has superior predictive validity. Taken together, the RMSE values of the results predicted by QTR are 25.00, 20.55, and 20.68%, which are lower than TR’s 35.81%, demonstrating the validity of the study’s proposed prediction of learning effectiveness using QTR. On this basis, the study starts to verify the superiority of the proposed TID method using Q-learning, which sets the teaching stage as 2. The two-stage separate (A) and single-stage methods (B) are introduced to be compared with the research method (C) in terms of the optimal decision function solving with the number of students as 200, 400, and 800. The comparison metrics, meanwhile, are determined by calculating the estimated error rate (ER) and the value ratio (VR) between the set optimal decision function and the derived estimated optimal decision function (set VR1 and VR2 on a two-stage basis). As a result, Figure 7 displays the simulation comparison outcomes of several approaches.

Figure 7

Simulation comparison results of different methods. (a) Comparison results when the number of students is 200. (b) Comparison results when the number of students is 400. (c) Comparison results when the number of students is 800.

In Figure 7(a), when the number of students is 200, the ER1 value of Method C is 0.023 ± 0.015, which is lower than method A’s 0.031 ± 0.020, and the simulation of method B fails without results. Moreover, the VR1 value is 1.000 ± 0.023, which is higher than 0.844 ± 0.013 and 0.720 ± 0.018 for Methods A and B, respectively. In Figure 7(b), when the number of students is 400, the ER1 value for Method C is 0.018 ± 0.009, which is lower than 0.026 ± 0.013 for Method A, and the simulation of Method B fails to show results. While at VR1 value of 1.000 ± 0.016, it is higher than the comparison method and shows the same result at the number of students of 800 in Figure 7(c). Whereas the VR2 values of Method C in stage 2 are all higher than the comparison method, ER2 remains the same. When combined, the research methods are preferable in that the ideal decision function has a lower ER and its estimated value is closer to the optimal decision function value than it is to its set value.

4.2 Empirical analysis of the precision instructional interventions model

Based on the validation of the two modules in the theoretical model, the study began to verify the validity of the PTI model. An intelligent teaching and tutoring network platform is chosen as an auxiliary learning tool for the study, which used a class of 60 first-grade students from a high school in the capital city of an eastern Chinese province. The teacher teaches the lessons, and the students complete the exercises on the platform after class. The study mainly focuses on the teaching practice of high school mathematics, and the practice lasts for the entire first year of high school. The platform’s learning record data comprise 13 practice records, for a total of 676 records, while the classroom data comprise the grades and the outcomes of three tests (the first semester’s final exam, the entrance exam, and the second semester’s final exam). Among them, the dynamic changes in students' status and characteristics during the 13 exercises are shown in Figure 8.

In Figure 8, longer boxes indicate more dispersed data and shorter boxes indicate more concentrated data. In Figure 8(a), the overall changes in the total questions and the total correct rate over the 13 exercises are not significant, with the former remaining roughly around 0.4 and the latter remaining roughly around 0.7. Whereas, there are some fluctuations in the average length, which remains roughly between 0.2 and 0.6. In Figure 8(b), there are large variations in the correct rates of the three types of questions: locator questions, easy-to-learn and easy-to-fail questions, and study questions. Taken together, there are differences in the actual mastery of knowledge points by different students. Therefore, there are differences in the actual practice needs of study questions, so the study takes the total study questions as the intervention. Consequently, Figure 9 displays the math test results for the students on the three exams taken before and after the intervention.

Figure 8

Dynamic changes in student status and characteristics during 13 exercises. (a) The results of total number of questions, total accuracy, and average duration. (b) The results of positioning accuracy, easy to learn and error prone question accuracy, and learning question accuracy.

Figure 9

Mathematics scores of students in three exams before and after intervention. (a) Comparison between entrance exam and first semester final exam. (b) Comparison between entrance exam and second semester final exam. (c) Comparison of grade ranking in three math exams.

In Figure 9, the means of numbers 1 to 3 correspond to the entrance exam, the entrance exam, the final exam of the first semester and the final exam of the second semester. The overall exam scores of the students are nearly normal distributed in Figure 9(a), there is a slight difference between the entrance exam and the first semester final exam. Furthermore, there are more scores between 100 and 120, and the overall scores remain between 45 and 99. Figure 9(b) shows that after the second entrance exam, students’ overall performance improved dramatically, maintaining a range of scores between 60 and 129, and the pass rate increases from 77 to 81%. The grade rank of the final test score for the second semester improved greatly, and the grade rank of the three test scores increases sequentially in Figure 9(c). When the research technique is used for a precise exercise intervention for one academic year, the total learning effect of the students in the class is greatly improved, demonstrating the usefulness of the research method.

In order to further analyze the specific effects of the PTI model on students’ LSs and characteristics, the study set up four different exercise strategies, namely, overloading both semesters of study question practice. It is not overloaded in the first semester and not in the second semester. There is an overload in the first semester and no overload in the second semester. There is no overload in either semester. The four strategies are set as g–j and some students are randomly selected from the class to compare the two semester learning images. Among them, the comparison results of learning images of strategies g and h are shown in Figure 10.

Figure 10

Comparison results of learning images using strategies g and h. (a) Under strategy g, students learn images in the first semester. (b) Under strategy h, students learn images in the first semester. (c) Student's second semester learning images under strategy g. (d) Student's second semester learning images under strategy h.

In Figure 10, one student is randomly selected from the strategy g and h administrations. The total questions, total correct rate, average time, correct rate of locus questions, correct rate of questions that are easy to learn, and correct rate of study questions are all indicated by the horizontal D–I. Every exercise is indicated by vertical numbers. In Figure 10(a), the student scored 98 in math in the first semester and ranked only 1,004th in her grade. In Figure 10(b), the student had a relatively poor percentage of locus questions correct and learning questions correct in the first semester and a grade rank of 403. In Figure 10(c), the student’s final math score in the second semester after the implementation of strategy g rose to 102, with a grade rank of 842, indicating that the superabundance of study questions significantly improved his math performance. In Figure 10(d), the student’s correct rate of locating questions and correct rate of learning questions after the superabundance of practice is significantly improved, and the grade ranking is increased to 78. Moreover, the comparison results of strategy i and j learning images are shown in Figure 11.

Figure 11

Comparison results of learning images for strategy i and j. (a) Image of the first student in strategy i for two semesters. (b) Image of the second student in strategy i for two semesters. (c) Two semester images of students under strategy j.

In Figure 11, two students are randomly selected for strategy i administration and one student is selected for strategy j. In Figure 11(a), the first student maintains a high level of correctness in study problems, locus problems, etc., and has had excessive practice in the first semester. However, the math scores are significantly lower in the second semester without the effects of too much practice. The percentage of correct learning problems drops from staying above 0.8 to 0–0.2. In Figure 11(b), the second student has relatively fair correct rates for each topic in the first semester. All correct rates increase after the second semester, at which point the average length of practice increases significantly, suggesting that over-practice needs to be combined with the average length of practice. This result is consistent with the optimal intervention decision function given in the theoretical model of the study. In Figure 11(c), this student has a very high percentage of correct answers for each of the two semesters of homework training and a shorter average study time. Therefore, it is not necessary to do more practice on the study topics used to check for gaps. In math, he has a 129 on the math final for the first two semesters, but improves his grade level ranking by 50 places.

In conclusion, appropriate practice training for these five randomly selected students can effectively increase the accuracy and improve the quality of practice. It can be concluded that accurate practice interventions for different students can effectively improve their academic performance, demonstrating the validity of the PTI model proposed in the study. To further verify the validity and generalization of the theoretical model, the study applies it to a linear algebra course at a teacher training university. The number of practical teaching students in the course totaled 280, and the main data are obtained from students’ campus card records and smart teachers’ linear algebra classes. The campus card data spans 75 days, with a total of 59,445 records collected, and the smart classroom experimental data contain early warning interventions and classroom grades. The study began by analyzing the 12-week breakfast behaviors of the three categories of students in relation to the 12-week library study rate of all students. Figure 12 displays the results.

Figure 12

The 12-week breakfast behavior of three types of students and the 12-week library learning rate of all students. (a) Frequency of breakfast behaviors among three types of students. (b) Entropy of breakfast behavior among three types of students for 1–12 weeks. (c) The library learning rate of all students for 12 weeks.

In Figure 12(a) and (b), the overall top performers eat breakfast more frequently and with more regular behavior, and the differences between the three categories are more pronounced. In Figure 12(c), there is little variation in library study over the 12 weeks, with only a small number of students going to the library regularly and maintaining it for a longer period of time. Therefore, all three can be analyzed as characteristics of the subsequent study. In the experiment, the students are divided equally into two groups. The experimental group is subjected to the early warning intervention, while the control group is not subjected to the early warning intervention. The results of the performance of the students in the two groups are shown in Figure 13.

Figure 13

Comparison of results between two groups of students. (a) Student grades in the experimental group. (b) Student grades in the control group.

In Figure 13(a), the average score of the experimental group of students after the early warning intervention is 80, with an overall more pronounced increase in the scores of the poorer students, and a smaller change in the good students because they did not need the early warning intervention per se. Without the early warning intervention, the control group’s average score in Figure 12(b) is 71. All things considered, the experimental group outperforms the control group by 9 points, demonstrating the efficacy of the early warning intervention. By comparing the research model with the advanced synthetic minority oversampling technique with the models of random forest (K), decision tree (L), and extreme gradient boost with the model of students’ behavioral data (M), the study aims to further validate the superiority of the theoretical model proposed in the study in terms of precise intervention in students’ instructional decision-making. The comparison additionally introduces another class of students in addition to the analyzed class for comprehensive analysis, and the two classes are set as Class 1 and Class 2. The results of the exact intervention are shown in Table 2.

Table 2

Comparison of results of precision interventions in teaching with different methods

–	Accuracy	Precision	Recall	F1-Score
Class 1
K	0.674	0.673	0.673	0.677
L	0.785	0.784	0.776	0.712
M	0.841	0.812	0.796	0.741
Research model	0.915	0.901	0.894	0.854
Class 2
K	0.663	0.622	0.602	0.612
L	0.701	0.700	0.694	0.677
M	0.812	0.810	0.800	0.745
Research model	0.964	0.915	0.905	0.900

In Table 2, in Class 1, the actual accuracy, precision, recall, and F1 values of the research model for teaching precision intervention decisions are 91.5, 90.1, 89.4, and 85.4%, respectively, which are significantly higher than the comparison model. In Class 2, the four values of the research model are 96.4, 91.5, 90.5, and 90.0%, which are also higher than the comparison model. Taken together, the research model has a high degree of accuracy in TID and can effectively optimize students’ LS.

4.3 Complexity analysis of the proposed method

The core of the proposed QTR model lies in solving the quantile regression problem. For each quantile, the time complexity of QTR is mainly determined by the dimensions of the matrix and the number of samples. Therefore, assuming the matrix dimensions are m × n and the number of samples is k , the time complexity of QTR is O ( k × m × n ) . The time complexity of the RL model mainly depends on the state space S and the action space A , with a time complexity of O ( S × A ) . Thus, the ultimate complexity of the method proposed in the study is the superposition of the two. In the worst-case scenario, the model’s time complexity is O ( k × m × n × S × A ) . It can be concluded that in the case of high-dimensional data and large state-action spaces, the computational cost of the model can be very high.

The QTR model requires storage for the matrix data for each sample as well as the regression coefficient matrix. Consequently, its space complexity is primarily determined by the dimensions of the matrix and the number of samples. In the worst-case scenario, the space complexity of the QTR model is O ( k × m × n ) . The space complexity of the RL is O ( S × A ) . The method proposed in the study needs to store both the matrix data of the QTR model and the Q-table of the RL model. Therefore, in the worst-case scenario, the model’s space complexity is O ( k × m × n + S × A ) .

From the above analysis, it can be concluded that the model has a large overhead in terms of both time and space complexity. Subsequently, dimensionality reduction techniques are considered to reduce the matrix dimensions of the QTR model, and approximate dynamic programming is used to handle the large state-action spaces of the RL. At the same time, incremental learning methods will be employed to gradually update the model parameters, thus reducing memory usage and computational time.

4.4 Selection of evaluation indicators and sensitivity analysis

Finally, the reasons for selecting the metrics are discussed in detail and the sensitivity analysis is performed. Accuracy, precision, recall, and F1 score are selected as the main evaluation metrics. Among the aforementioned metrics, accuracy is a measurement of the classification ability of the model. It calculates the precision and recall of the model in its entirety. The F1 score is a harmonic average of accuracy and recall, providing a comprehensive efficiency measurement. At the same time, the weight of each index is adjusted to observe the change in model performance. The weight of each indicator is changed from 0.25 to 1, and the weight of other indicators will be reduced accordingly. The specific results are shown in Table 3.

Table 3

Model sensitivity analysis

Metric weight adjustment	Accuracy	Precision	Recall	F1-score	Model performance change
Original weights	0.915	0.901	0.894	0.854	None
Accuracy = 0.5	0.908	0.895	0.888	0.847	Slight decrease
Precision = 0.5	0.902	0.905	0.890	0.851	Slight variation
Recall = 0.5	0.898	0.889	0.902	0.845	Slight variation
F1-score = 0.5	0.903	0.897	0.891	0.894	Slight variation

In Table 3, when the weight of each index is adjusted from 0.25 to 0.5, the overall performance of the model (as measured by the F1 score) decreases slightly, but there is little change. This shows that the model is robust in increasing the weight of each index. The slight decline in accuracy and precision may be attributable to the model’s prediction tendency within specific categories. Conversely, the increase in recall indicates that the model can maintain a high recall rate while enhancing the recognition of a limited number of categories. This indicates that the proposed model has good stability and reliability, and can effectively capture the nuances of educational intervention outcomes.

On this basis, the study selects three different educational datasets to test the performance of the model. The K-12 Mathematics Education Dataset consists of mathematics course performance data from a school district in California, including 1,200 students, with assessment standards including midterm exams, final exams, and regular homework scores. The K-12 Science Education Dataset is a set of data concerning the achievements of 1,500 students enrolled in a school district in Texas. The data comprise information regarding science courses, with an emphasis on laboratory reports, theoretical exams, and course participation. The Higher Education Computer Science Dataset contains course performance data from the computer science program at a Massachusetts university, including 800 students, with assessment standards that include programming assignments, project design, and final exams. The validation results of the three datasets are shown in Figure 14.

Figure 14

Verification results of the model in different datasets.

In Figure 14, the accuracy of the model proposed in the study is above 0.87 in all three datasets, and the F1 scores are all above 0.85. Among them, the results of the K-12 Science Education dataset are marginally lower than those of the Mathematics dataset, possibly due to the heterogeneity of science courses and the subjective nature of experimental operations, which increases the complexity of model predictions. In contrast, the model performs best on the Higher Education Computer Science dataset, which may be related to the self-learning ability of college students and the systematic design of the curriculum. Overall, the model proposed in the study shows good performance on different educational datasets, demonstrating the generalizability and robustness of the model.

5 Discussion

The study proposed the integration of RL and QTR to achieve precise instructional interventions. A substantial body of research demonstrated the efficacy of RL-based instructional intervention methods in enhancing student learning outcomes [5]. Wang et al. further validated the effectiveness of RL approaches in optimizing adaptive instructional strategies within online learning environments [26]. Building on this foundation, the present study incorporated quantile regression techniques and developed an instructional intervention decision framework based on the integration of RL and QTR [17]. Compared to traditional TR methods, the proposed approach exhibited superior stability and robustness in predicting learning outcomes across groups of students and in guiding personalized instructional interventions.

The results confirmed the effectiveness and superiority of the proposed method. However, it was important to acknowledge and discuss the limitations and challenges associated with this integration. Educational data typically had characteristics that changed over time, as student performance and behavioral patterns evolved. If the model did not adequately account for these changes, it may introduce bias. Therefore, future work will explore adaptive models that can dynamically adapt to changes in data distribution to ensure that intervention decisions remain relevant and effective in the fluctuating educational environment. In the educational context, the criteria for success and optimal outcomes may change as educational goals and policies evolve. Consequently, the model may need to be updated to reflect these changes, requiring an adaptive reward system.

In the previous text, the study conducted a comprehensive review of related work in the areas of RL and educational intervention strategies. A deeper exploration of prior research revealed that while RL was applied in educational settings, there was limited exploration of its combination with QTR to deal with non-stationary data and evolving reward structures. The contribution of the proposed method to educational intervention strategies lies in its ability to provide personalized and dynamic instructional interventions, consistent with the growing body of research advocating personalized learning and adaptive educational practices. By addressing the limitations of current TID technologies and incorporating advanced computational methods, the study contributes to the development of more effective and intelligent educational decision-making tools. On this basis, future research can focus on extending the theoretical model to a wider range of educational contexts and scenarios. This includes testing the model’s effectiveness across different levels of education, types of courses, and diverse student populations, as well as exploring the model’s potential to handle more complex educational data and achieve finer-grained, personalized interventions.

6 Conclusion

The study focused on optimizing personalized instructional interventions in large-scale educational environments by proposing a TID method based on the integration of RL and QTR models, validated by simulation experiments and real classroom data. The main contributions of this research were reflected in the following three aspects: The construction of a PTI theoretical framework based on the integration of RL and QTR, which enhanced the personalization and dynamic adaptability of TID. The design of a learning effectiveness prediction method was predicated on feature imaging and quantile modeling. This method enabled fine-grained modeling of the relationships between different student groups’ learning behaviors and academic outcomes. It was also subject to comprehensive validation across multiple scenarios and datasets. This validation demonstrated the superiority and generalizability of the proposed method in improving the accuracy of learning outcome predictions and the effectiveness of instructional interventions.

Based on the results, the following practical implications are suggested: In large-scale educational applications, dynamic student learning features should be used with the RL-QTR model to achieve precise instructional interventions. For student groups with significant heterogeneity in learning behavior, it is recommended to use quantile modeling approaches to develop differentiated learning paths. In designing intelligent learning platforms, emphasis should be placed on strengthening data collection and feature processing modules to support continuous optimization of intelligent decision models.

Funding information: Author states no funding involved.
Author contributions: Jifeng Gong: conceptualization, data curation, formal analysis, funding acquisition, investigation, methodology, project administration, resources, software, supervision, validation, visualization, and writing – review and editing. Author has accepted responsibility for the entire content of this manuscript and approved its submission.
Conflict of interest: Author states no conflict of interest.
Data availability statement: All data generated or analyzed during this study are included in this published article.

References

[1] Hebbi C, Mamatha H. Comprehensive dataset building and recognition of isolated handwritten Kannada characters using machine learning models. Artif Intell Appl. 2023;1(3):179–90. 10.47852/bonviewAIA3202624.Search in Google Scholar

[2] Fan W, Li Z, Zhang J. Construction and practice of digital transformation of education under the background of big data. Adv Ind Eng Manag. 2023;12(1):28–33.Search in Google Scholar

[3] Yu J, Couldry N. Education as a domain of natural data extraction: Analysing corporate discourse about educational tracking. Inf Commun Soc. 2022;25(1):127–44. 10.1080/1369118X.2020.1764604.Search in Google Scholar

[4] Qin M. Application of efficient recognition algorithm based on deep neural network in English teaching scene. Connect Sci. 2022;34(1):1913–28. 10.1080/09540091.2022.123456.Search in Google Scholar

[5] Shin J, Chen F, Lu C, Bulut O. Analyzing students’ performance in computerized formative assessments to optimize teachers’ test administration decisions using deep learning frameworks. J Comput High Educ. 2022;9(1):71–91. 10.1007/s40692-021-00196-7.Search in Google Scholar

[6] Abualadas HM, Xu L. Achievement of learning outcomes in non-traditional versus traditional anatomy teaching in medical schools: a mixed method systematic review. Clin Anat. 2023;36(1):50–76. 10.1002/ca.23942.Search in Google Scholar PubMed PubMed Central

[7] Kittur J, Bekki J, Brunhaver S. Development of a student engagement score for online undergraduate engineering courses using learning management system interaction data. Comput Appl Eng Educ. 2022;30(3):661–77. 10.1002/cae.22479.Search in Google Scholar

[8] Trakunphutthirak R, Lee VCS. Application of educational data mining approach for student academic performance prediction using progressive temporal data. J Educ Comput Res. 2022;60(3):742–76. 10.1177/07356331211048777.Search in Google Scholar

[9] Teng MF, Qin C, Wang C. Validation of metacognitive academic writing strategies and the predictive effects on academic writing performance in a foreign language context. Metacogn Learn. 2022;17(1):167–90. 10.1007/s11409-021-09278-4.Search in Google Scholar PubMed PubMed Central

[10] AdrianChin YK, JosephNg PS, Eaw HC, Loh YF, Shibghatullah AS. JomDataMining: Academic performance and learning behaviour dubious relationship. Int J Bus Inf Syst. 2022;41(4):548–68. 10.1504/IJBIS.2022.127555.Search in Google Scholar

[11] Lee M, Kim H, Wright E. The influx of International Baccalaureate programmes into local education systems in Hong Kong, Singapore, and South Korea. Educ Rev. 2022;74(1):131–50. 10.1080/00131911.2021.1891023.Search in Google Scholar

[12] Usher M, Hershkovitz A, Forkosh-Baruch A. From data to actions: instructors’ decision making based on learners’ data in online emergency remote teaching. Br J Educ Technol. 2021;52(4):1338–56. 10.1111/bjet.13108.Search in Google Scholar

[13] Carter M, Webster A, Stephenson J, Waddy N, Stevens R, Clements M, et al. Decision-making regarding adjustments for students with special educational needs in mainstream classrooms. Res Pap Educ. 2022;37(5):729–55. 10.1080/02671522.2020.1864768.Search in Google Scholar

[14] Yulianti K, Denessen E, Droop M, Veerman GJ. School efforts to promote parental involvement: The contributions of school leaders and teachers. Educ Stud. 2022;48(1):98–113. 10.1080/03055698.2020.1740978.Search in Google Scholar

[15] Gesel SA, LeJeune LM, Chow JC, Sinclair AC, Lemons CJ. A meta-analysis of the impact of professional development on teachers’ knowledge, skill, and self-efficacy in data-based decision-making. J Learn Disabil. 2021;54(4):269–83. 10.1177/0022219420970196.Search in Google Scholar PubMed

[16] Pesce C, Lakes KD, Stodden DF, Marchetti R. Fostering self‐control development with a designed intervention in physical education: A two‐year class‐randomized trial. Child Dev. 2021;92(3):937–58. 10.1111/cdev.13445.Search in Google Scholar PubMed

[17] Gion C, McIntosh K, Falcon S. Effects of a multifaceted classroom intervention on racial disproportionality. Sch Psychol Rev. 2022;51(1):67–83. 10.1080/2372966X.2020.1788906.Search in Google Scholar

[18] Jungjohann J, Gebhardt M. Dimensions of classroom-based assessments in inclusive education: A teachers’ questionnaire for instructional decision-making, educational assessments, identification of special educational needs, and progress monitoring. J Spec Educ. 2023;38(1):131–44. 10.52291/ijse.2023.38.12.Search in Google Scholar

[19] Kim DS, Kim T. Moment representations of fully degenerate Bernoulli and degenerate Euler polynomials. Russ J Math Phys. 2024;31(4):682–90. 10.1134/S1061920824040071.Search in Google Scholar

[20] Lysytska O, Mykytiuk S, Chastnyk O, Mykytiuk S. Foreign language teaching modes and adaptive methods in emergency education: Evaluation of first-hand experience. Multisci J. 2025;7(2):2025069. 10.31893/multiscience.2025069.Search in Google Scholar

[21] Haas C. Social origin and students’ trajectory patterns at German universities: a sequence-analytical approach. SozW Soz Welt. 2023;74(3):431–65. 10.5771/0038-6073-2023-3.Search in Google Scholar

[22] Womack B, Shi J. Socio-economic status, educational debt, and career choices of social work students in the Southeast United States. Soc Work Educ. 2023;42(1):127–44. 10.1080/02615479.2022.2053098.Search in Google Scholar

[23] Nunes L, Marcuzzi R, Chen X, Behley J, Stachniss C. SegContrast: 3D point cloud feature representation learning through self-supervised segment discrimination. IEEE Robot Autom Lett. 2022;7(2):2116–23. 10.1109/LRA.2022.3142440.Search in Google Scholar

[24] Liu H, Liu T, Zhang Z, Sangaiah AK, Yang B, Li Y. ARHPE: asymmetric relation-aware representation learning for head pose estimation in industrial human–computer interaction. IEEE Trans Ind Inf. 2022;18(10):7107–17. 10.1109/TII.2022.3143605.Search in Google Scholar

[25] Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang MH, et al. Learning enriched features for fast image restoration and enhancement. IEEE Trans Pattern Anal Mach Intell. 2023;45(2):1934–48. 10.1109/TPAMI.2022.3167175.Search in Google Scholar PubMed

[26] Carayannis EG, Campbell DF, Grigoroudis E. Helix trilogy: The triple, quadruple, and quintuple innovation helices from a theory, policy, and practice set of perspectives. J Knowl Econ. 2022;13(3):2272–301. 10.1007/s13132-021-00813-x.Search in Google Scholar

[27] Ren Z, Wan J, Deng P. Machine-learning-driven digital twin for lifecycle management of complex equipment. IEEE Trans Emerg Top Comput. 2022;10(1):9–22. 10.1109/TETC.2022.3143346.Search in Google Scholar

[28] Kovačević J, Mujkić A, Kapo A. Examining school leadership in a transitional context: A mixed-methods study of leadership practices and school cultures as mechanisms of educational change. Educ Manag Adm Leadersh. 2023;51(1):219–44. 10.1177/1741143220971286.Search in Google Scholar

[29] Cunningham JE, Chow JC, Meeker KA, Taylor A, Hemmeter ML, Kaiser AR. A conceptual model for a blended intervention approach to support early language and social-emotional development in toddler classrooms. Infant Young Child. 2023;36(1):53–73. 10.1097/IYC.0000000000000232.Search in Google Scholar

[30] Zhang S, Xia Y, Xia Y, Wang J. Matrix-form neural networks for complex-variable basis pursuit problem with application to sparse signal reconstruction. IEEE Trans Cybern. 2022;52(7):7049–59. 10.1109/TCYB.2020.3042519.Search in Google Scholar PubMed

[31] Yuan J, Weng Y. Support matrix regression for learning power flow in distribution grid with unobservability. IEEE Trans Power Syst. 2022;37(2):1151–61. 10.1109/TPWRS.2021.3107551.Search in Google Scholar

[32] Wang Z, Ma D, Gong G, Xue E. New construction of complementary sequence sets and complete complementary codes. IEEE Trans Inf Theory. 2021;67(7):4902–28. 10.1109/TIT.2021.3079124.Search in Google Scholar

[33] Peng L, Tan XY, Xiao PW, Rizk Z, Liu XH. Oracle inequality for sparse trace regression models with exponential β-mixing errors. Adv Math Sci Eng. 2023;39(10):2031–53. 10.1007/s10114-023-2153-3.Search in Google Scholar

[34] Zhu Z, Lin K, Jain AK, Zhou J. Transfer learning in deep reinforcement learning: A survey. IEEE Trans Pattern Anal Mach Intell. 2023;45(11):13344–62. 10.1109/TPAMI.2023.3292075.Search in Google Scholar PubMed PubMed Central

[35] Zhang Z, Zhang D, Qiu RC. Deep reinforcement learning for power system applications: An overview. CSEE J Power Energy Syst. 2020;6(1):213–25. 10.17775/CSEEJPES.2019.00920.Search in Google Scholar

[36] Haydari A, Yılmaz Y. Deep reinforcement learning for intelligent transportation systems: A survey. IEEE Trans Intell Transp Syst. 2022;23(1):11–32. 10.1109/TITS.2020.3008612.Search in Google Scholar

Received: 2025-02-08

Revised: 2025-05-07

Accepted: 2025-05-12

Published Online: 2025-09-08

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/nleng-2025-0155

Keywords for this article

reinforcement learning; teaching intervention decisions; learning status; quantile trace regression modeling; estimating error rates

Creative Commons

BY 4.0