Endpoint prediction of BOF steelmaking based on state-of-the-art machine learning and deep learning algorithms

Tian-yi Xie; Fei Zhang; Jun-guo Zhang; Yong-guang Xiang; Yi-xin Wang

doi:10.1515/htmp-2024-0016

Artikel Open Access

Endpoint prediction of BOF steelmaking based on state-of-the-art machine learning and deep learning algorithms

Tian-yi Xie , Fei Zhang , Jun-guo Zhang , Yong-guang Xiang und Yi-xin Wang

Veröffentlicht/Copyright: 5. Juni 2024

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen

Aus der Zeitschrift High Temperature Materials and Processes Band 43 Heft 1

Abstract

To enhance the efficiency and sustainability, technical preparations were made for eliminating the Temperature, Sample, Oxygen test of basic oxygen furnace (BOF) steelmaking process in this work. Utilizing data from 13,528 heats and state-of-the-art (SOTA) machine learning (ML) and deep learning algorithms, data-driven models with different types of inputs were developed, marking the first use of time series data (off-gas profiles and blowing practice related curves) for BOF steelmaking’s endpoint prediction, and the tabular features were expanded to 45. The prediction targets are molten steel’s concentrations of phosphorus (Endpoint [P], %) and carbon (Endpoint [C], %), and temperature (Endpoint-Temp, °C). The optimal models for each target were implemented at a Hesteel Group’s BOF steelmaking facility. Initially, SOTA ML models (XGBoost, LightGBM, Catboost, TabNet) were employed to predict Endpoint [P]/[C]/Temp with tabular data. The best mean absolute errors (MAE) achieved were 2.276 × 10⁻³% (Catboost), 6.916 × 10⁻³% (Catboost), and 7.955°C (LightGBM), respectively, which surpassed the conventional models’ performance. The prediction MAEs of the conventional models with the same inputs for Endpoint [P]/[C]/Temp were 3.158 × 10⁻³%, 7.534 × 10⁻³%, and 9.150°C (Back Propagation neural network) and 2.710 × 10⁻³%, 7.316 × 10⁻³%, and 8.310°C (Support Vector Regression). Subsequently, predictions were explored to be made using SOTA time series analysis models (1D ResCNN, TCN, OmniScaleCNN, eXplainable Convolutional neural network (XCM), Time-Series Transformer, LSTM-FCN, D-linear) with the original time series data and SOTA image analysis models (Pre-activation ResNet, DenseNet, DLA, Dual path networks (DPN), GoogleNet, Vision Transformer) with resized time series data. Finally, the concat-model and the paral-model architectures were designed for making predictions with both tabular data and time series data. It was determined that the concat-Model with TCN and ResCNN as the backbone exhibited the highest accuracy. It’s MAE for predicting Endpoint [P]/[C]/Temp reaches 2.153 × 10⁻³%, 6.413 × 10⁻³%, and 5.780°C, respectively, with field test’s MAE at 2.394 × 10⁻³%, 6.231 × 10⁻³%, and 7.679°C. Detailed results of the importance analysis for tabular data and time series are provided.

Keywords: BOF; endpoint; machine learning; deep learning; time series; field application

1 Introduction

Basic oxygen furnace (BOF) steelmaking stands as the predominant method globally, constituting 60% of steel production by 2000. In its high-temperature chemical process, it is paramount to precisely control the temperature and critical contents of chemical elements (such as carbon and phosphorus) of molten steel at the endpoint. This is achieved by blowing oxygen into a blend of scrap, hot metal, and additives. The endpoint measurement in BOF steelmaking serves as a direct indicator of molten steel quality, which is vital for subsequent processes. Currently, the Temperature, Sample, Oxygen (TSO) test is widely utilized as the endpoint measurement method. In this test, the temperature and oxygen content of molten steel are directly measured, and steel samples are taken for further laboratory chemical testing by a sub-lance at the end of blowing. While this method is quite efficient for endpoint measurement, it does exhibit several drawbacks:

Delay of chemistry results: Crucial chemistry values require laboratory testing for 3–5 min, rendering them too tardy to determine the steel quality and guide the steelmaking process.
Additional cost: Sub-lance’s sampler is disposable, and the maintenance of sub-lance and its cooling system need continuous investment, resulting in high operation costs, pollution, and carbon emissions.
Inadequate testing success rate: The success rate of TSO measurements is less than 90%.

The identified drawbacks present substantial challenges in achieving timely endpoint control, full automation of the BOF steelmaking process, as well as in optimizing costs and sustainability. Accurately predicting endpoint values could obviate the corresponding sampling and measuring processes, thereby enhancing steelmaking efficiency, endpoint control, and sustainability., This study implemented the following measures for predicting the targets (Endpoint [P] (%), Endpoint [C] (%), and Endpoint-Temp (°C)): (1) Large-scale data collected from 13,528 heats was utilized, with the tabular features expanded to 45. Additionally, time series data can reflect entire steelmaking process was introduced as input of data-driven endpoint prediction model for the first time. (2) state-of-the-art (SOTA) and deep learning (DL) models were used for prediction. Certainly, here is a more detailed description of the main contents of this study:

Historical data from 5,562 heats were selected for the prediction of Endpoint [P] and Endpoint [C], while data from 7,060 heats were utilized for predicting Endpoint-Temp, all derived from the raw data of 13,528 heats. Comprehensive feature engineering and importance analysis were conducted for both time series and tabular data.
Predictions were conducted using SOTA machine learning (ML) models (XGBoost, LightGBM, Catboost, TabNet) along with 45 tabular features comprising information from hot metal, scrap, additives, preset values, and blowing practice. Additionally, Back propagation neural network (BP-NN) and Support Vector Regression (SVR) were implemented, with their performance serving as a control.
Off-gas and blowing practice-related time series data were introduced as input for the data-driven endpoint prediction model for the first time. SOTA DL models for time series analysis (1D ResCNN, TCN, OmniScaleCNN, eXplainable Convolutional neural network (XCM), Time-Series Transformer [TST], Long short-term memory-full convolutional neural network [LSTM-FCN), D-linear) and image processing (Pre-activation ResNet, DenseNet, DLA, Dual path networks (DPN), GoogleNet, Vision Transformer) were implemented to make predictions using original and resized time series data, respectively.
Two mixed-input models, named concat-model and paral-model, were proposed to enhance prediction accuracy. These models utilize both tabular data and normalized time series as inputs. Their performance was evaluated using various DL backbones (1D ResCNN, TCN, OmniScaleCNN, XCM, TST, LSTM-FCN, D-linear), and comparisons were made to determine the most effective configuration.
An online prediction human-machine interface (HMI) was developed, incorporating models with the best performance for each target. This HMI was deployed in practical field production, and field test results from 300 heats were recorded. The interface design and detailed field test results are elaborated upon.

2 Related literature review

For endpoint prediction for BOF steelmaking, the following approaches have been adopted in previous studies:

Utilize theoretical models based on material balance, heat balance, thermodynamics, and kinetics to make predictions. This theoretical modeling approach idealizes the steelmaking process, resulting in poor predictive performance. For example, Wang et al. [1] proposed a model with oxygen balance, whose hit rate was only 62%, with an average error of within 3%.
Utilize tabular data and either original or modified BP-NN for prediction. The models’ inputs consist of static information such as molten iron, scrap, and additives. For example, with tabular inputs, He and Zhang [2] proposed a BP-NN modified with principal component analysis (PCA-BP), and Wang et al. [3] proposed a BP-NN modified with genetic algorithm (GA-BP). Compared to theoretical models, these models exhibit a significant improvement in hit rates, achieving hit rate of 84–90%. There are more similar studies [4,5,6,7,8,9,10]. The models only considered initial conditions and static process data, leading to lower robustness.
Utilize tabular data and either original or modified support vector machine (SVM) for predictions. The advantages and disadvantages of this approach are like those of using BP-NNs. Gao et al. [11] proposed a k-nearest neighbor-based weighted twin SVR, and there are more similar studies [8,12,13]. The hitting rate was 86–93%.
Utilize ML models with flame related data for prediction. Shao et al. [14] proposed a model based on SVM and flame radiation, and Zhou et al. [15] proposed a model based on SVM and flame spectrum. This method is also used in various studies [15,16,17], with its accuracy being like that of using tabular data but requiring additional equipment. If training directly using flame images, it would lead to a larger dataset.

Through the related literature review and the survey [17,18,19,20,21,22], it has been found that the accuracy of ML models used in previous studies has been surpassed by more advanced models. Furthermore, the data utilized in these studies are often static or represent a specific moment, thus failing to cover the entire steelmaking process. In this study, time-series data that cover the entire steelmaking process was introduced, and SOTA ML and DL algorithms were employed for prediction.

3 Data description and training process

The dataset was divided into three subsets: 70% for the training set, 20% for the validation set, and the remaining 10% for the test set. The data utilized for prediction comprises three categories: time series data, tabular data, and mixed data. Since data from the BOF steelmaking plan used three significant figures, all the data in this work adopted three significant figures. Their individual descriptions are provided as follows:

3.1 Time series data

A time series consists of a sequence of random variables arranged chronologically. In a two-dimensional context, it typically represents the curves of the target process, sampling values at a specified rate, and with uniform time intervals. The time series data for TSO prediction consists of seven curves, which are curves of off-gas total flow (Gas-Flow), lance height (Lc-Height), cumulative oxygen consumption (O₂-Blow), and gas percentage of carbon monoxide (GP-CO), carbon dioxide (GP-CO₂), oxygen (GP-O₂), and hydrogen (GP-H₂). Off-gas refers to the reaction gas produced during BOF steelmaking. The off-gas profile refers to the collective term for percentage amount curves of four gases, which sensitively reflect the steelmaking reactions. Depending on the sampling sequence of the BOF system, the time interval between each time step is 1 s. As the steelmaking time varies for each heat, all the time series were zero-padded to 1,024 time-steps. An example of an original time series is illustrated in Figure 1.

Figure 1

An example of an un-padded time series. (a) Off-gas profile; (b) curves of oxygen lance height, accumulative oxygen consumption, and off-gas flow rate.

3.2 Tabular data

The columns in tabular data define the features (tabular features), while the rows represent the values of each heat for these features. Table 1 displays the features of tabular data and their distribution. It consists of features of endpoint targets, information of hot metal, scraps, additives, blowing practices, and Temperature, Sample, Carbon test (TSC) related features. The pre-set oxygen blown values came from second calculation.

Table 1

Descriptive statistic of tabular data

Features	Unit	Maximum	Minimum	Mean	Std
Endpoint target Temp	°C	1685.001	1610.001	1644.705	14.920
Endpoint target [C]	%	0.056	0.031	0.046	0.006
Hot metal [C]	%	4.999	3.494	4.409	0.133
Hot metal [Si]	%	0.651	0.070	0.309	0.068
Hot metal [Mn]	%	0.301	0.060	0.123	0.022
Hot metal [P]	%	0.156	0.058	0.108	0.014
Hot metal [S]	%	0.099	0.001	0.011	0.014
Hot metal [V]	%	0.331	0.001	0.038	0.013
Hot metal [Ti]	%	0.128	0.006	0.065	0.017
Hot metal weight	kg	267900.001	198900.001	248804.546	9366.625
Hot metal Temp	°C	1450.000	1251.000	1364.275	35.460
Heavy scrap	t	44.797	0.000	3.773	5.892
Small scrap	t	25.704	0.000	3.260	4.555
Hasco (initial processing)	t	49.961	0.000	1.849	2.725
Hasco (secondary processing)	t	15.589	0.000	1.754	2.562
Recycle slag steel	t	6.391	0.000	0.017	0.229
Iron block	t	32.790	0.000	2.899	6.773
Self-product scrap	t	36.057	0.000	1.721	3.989
Recycle scrap	t	20.081	0.000	2.973	2.536
Other scrap	t	26.647	0.000	1.253	3.316
Sinter (before TSC)	t	10.018	0.000	0.102	0.830
Coke (before TSC)	t	1.877	0.000	0.004	0.063
Newman ore (before TSC)	t	17.695	0.000	4.027	3.648
Iron oxide balls (before TSC)	t	17.315	0.000	0.244	1.312
MgO-C (before TSC)	t	0.524	0.000	0.005	0.032
Burn-Dolomite (before TSC)	t	6.874	0.066	3.400	0.774
Lime (before TSC)	t	13.851	1.998	7.037	1.457
Burn-MgO-C (before TSC)	t	1.249	0.000	0.006	0.050
Limestone (before TSC)	t	5.649	0.000	0.339	0.582
Dolomite (before TSC)	t	8.323	0.000	0.748	0.827
FeSi75 (before TSC)	t	1.468	0.000	0.006	0.060
Iron oxide (before TSC)	t	10.209	0.000	0.655	1.374
Sinter (after TSC)	t	3.841	0.000	0.026	0.240
Newman ore (after TSC)	t	10.995	0.000	0.633	0.821
Iron oxide balls (after TSC)	t	4.310	0.000	0.072	0.398
Burn-Dolomite (after TSC)	t	5.091	0.000	0.015	0.155
Lime (after TSC)	t	11.935	0.000	0.018	0.324
Limestone (after TSC)	t	3.544	0.000	0.008	0.071
Dolomite (after TSC)	t	1.735	0.000	0.037	0.136
Iron oxide (after TSC)	t	3.613	0.000	0.070	0.276
Pre-set oxygen blown	m³	12605.501	8853.501	10671.224	872.056
oxygen blown	m³	12452.852	8856.386	10656.900	649.776
oxygen blown before TSC	m³	11172.347	5767.721	9108.811	549.165
TSC-Temp	°C	1649.901	1513.401	1588.004	22.501
TSC-thermal arrest [C]	%	1.073	0.101	0.469	0.138

3.3 Mixed data

Mixed data contained both tabular and time series data.

3.4 Targets and error metrics

The targets are molten steel’s contents of phosphorus (Endpoint [P] (%)), carbon (Endpoint [C] (%)), and temperature (Endpoint-Temp (°C)) in endpoint. The target values in dataset are measured from TSO test. The distributions and relationships of the targets are shown in Figure 2.

Figure 2

The statistical distributions and relationships of the targets.

Combinatorial metrics were used to evaluate the model. The basic metrics adopted were MSE = 1 n ∑ i = 1 n ( Y i − Y ˆ ) 2 and MAE = 1 n ∑ i = 1 n | Y i − Y ˆ | , where MSE is the mean square error; MAE is the mean absolute error; ; Y i is the true value of target with index i in dataset X ; Y ˆ is the predicted value of Y i ; and n is the number of samples contained in X . Traditional methods of assessing hit rates based on uniform standards are not applicable due to significant differences in the end points of various steel grades. In this study, the determination of the hit range is based on the target value. In on-site production, a statistical hit rate of ± 10 , ± 15 , and ± 20 % around the target value is commonly applied for chemical tests. For temperature, this ranges are ± 10 , ± 15 , and ± 20 ℃ . SO, following metrics were also used to assist the evaluation:

The proportion of the predicted value distributes within ± 10 , ± 15 , and ± 20 % of the true value of Endpoint [P] and Endpoint [C].
The proportion of the predicted value of Endpoint-Temp distributes within ± 10 , ± 15 , and ± 20 ℃ of the true value.

3.5 Hyper-parameters tuning

The models were auto tuned with Bayesian optimization for 2,000 trials. Then, manual tuning is used to further adjust the hyper-parameters. The hyper-parameter boundaries and optimum hyper-parameters for ML models and the best models are detailed in the appendix.

4 Methodology

4.1 ML with tabular data

The subsequent ML models are employed for predicting endpoint values using tabular data. It is a data mining process, and the inputs are tabular data.

XGBoost [23]. XGBoost is an ML algorithm rooted in the gradient boosting framework, employing CART decision trees as its base learner. It enhances performance by integrating second-order Taylor series expansion and regularization into the loss function, optimizing it using the Newton-Raphson method rather than gradient descent. Furthermore, it implements techniques such as shrinkage, column subsampling, handling missing values, and feature block arrangement to improve predictive accuracy. Its structure is shown in Figure 3. It has the advantages of regularization that helps simplify models and prevent overfitting. However, it suffers from an abundance of algorithm parameters and is not suitable for handling ultra-high-dimensional feature data.

$Figure 3 Structure of XGBoost, f K {f}_{K} is loss function in k k th tree.$

Figure 3

Structure of XGBoost, f K is loss function in k th tree.

LightGBM [24]. LightGBM is an ML algorithm rooted in the gradient boosting framework. It introduces innovation through the histogram algorithm, which discretizes continuous float eigenvalues into k integers and constructs a histogram with a width of k . This histogram accumulates necessary statistics after traversing the dataset. Subsequently, an optimal segmentation point is determined based on the discrete values of the histogram. LightGBM employs a leaf-wise strategy for splitting, prioritizing leaves with the highest split gain, thereby achieving superior precision compared to the level-wise approach. It has the advantages of low memory occupation and higher accuracy, but it uses deeper trees that are easier to overfit. Its structure is shown in Figure 4.

Figure 4

Structure of LightGBM.

CatBoost [25]. CatBoost is an ML algorithm built upon the gradient boosting framework, utilizing oblivious trees as its base learner. It incorporates techniques such as null value processing, ordered target statistics encoding, and feature combinations to handle categorical features effectively. To address prediction shift issues, CatBoost employs an ordered boosting algorithm. Notably, it conducts categorical feature processing during training rather than as a pre-processing step. During the training process, CatBoost computes both random combinations of target statistics σ cat and random combinations of ordered boosting σ boost . It has high accuracy and low requirements for data pre-processing but need more memory storage. Its structure is shown in Figure 5.

Figure 5

Structure of CatBoost, N is number of samples.

TabNet [26]. TabNet is a neural network-based ML algorithm that employs the sequential attention mechanism to mimic decision tree algorithms. Sequential attention consists of two crucial steps: utilizing the Attentive Transformer to identify important features and employing the Feature Transformer to convert feature values into a feature map. Both the Attentive Transformer and Feature Transformer utilize a feature block composed of a fully connected layer, batch normalization layer, and gated linear unit activation function in sequence as their base learner. Sparse regularization is utilized as the loss function. It is more suitable for fine-tuning and transfer learning, but it is prone to overfitting. Its structure is shown in Figure 6.

Figure 6

Structure of TabNet: (a) Tabnet encoder structure, Agg. is aggregate process, (b) Tabnet decoder structure, (c) feature transformer structure, and (d) attentive transformer structure.

Importance analysis. Aside from data mining, further analysis of feature importance in tabular data is conducted based on the average scaled importance values assigned to each feature by all models above. This analysis combines these importance values with the slope sign derived from linear regression results. The culmination of this analysis is presented in Section 5.

4.2 Time series analysis models with time series data

After padding and channel-wise standardization, the time series is input into the following models to predict TSO values. The input time series has dimensions of 7 channels and 1,024 time-steps. The following are SOTA models for different time series classification and prediction tasks (these two tasks are collectively referred to as time series analysis). They efficiently extract features from time series data and make accurate predictions. Their outputs are flattened and project to single values. The following is a brief overview of each model.

1D ResCNN (ResCNN) [27]. ResCNN is a convolutional neural network (CNN) using residual connections. To enhance the robustness of the model, residual block was fixed to the first three convolutional layers and different activation functions were adopted in different layers. In case of overfitting, global average pooling is applied instead of fully connected layer.

TCN [28]. TCN is a model based on one-dimensional (1D) CNN. Its main characteristics are causal convolution, dilation convolution, and residual connection. Causal convolution ensures that the output of each time step is only related to previous time step. And dilation convolution enhances the limited receptive field of causal convolution.

OmniScaleCNN (OS-CNN) [29]. OS-CNN is a model based on 1D CNN. Omni-Scale Block was proposed to increase 1D CNN receptive field and improve model robustness. This block automatically adjusts the kernel size and receptive field according to different time series to get the highest accuracy.

XCM [30]. XCM is a compact model based on 1D CNN. It accurately identifies and utilizes important time steps by Gradient-weighted Class Activation Mapping, which is a post hoc model-specific explainability method.

TST [31]. TST is based on self-attention mechanism. Its base learner is self-attention layer that consists of multi-head attention network, feedforward network, and residual connection. The TST framework introduces an unsupervised pre-training regimen to provide significant performance benefits.

LSTM-FCN [32]. LSTM-FCN uses both LSTM module and FCN module. The time series sensitivity is enhanced by adding LSTM modules. In particular, there are 1D CNN layers with batch normalization and layers with shuffle and dropout. The output of FCN and LSTM is concatenate as the final output.

D-linear [33]. The main feature extraction mechanism of the D-linear model is a linear layer network. It decomposes historical time series data into trend and remainder components with linear layer networks. Positional encoding is introduced during the decomposition to preserve sorting information. In this study, D-Linear-I was utilized to address underfitting resulting from weight sharing between different features, ensuring each feature has its independent linear layer.

(1) Attention ( Q , K , V ) = softmax Q K T d k V .

The structures of time series analysis models are shown in Figure 7

Figure 7

The structure of all the time series analysis models. The outputs of all models are flattened and processed by a fully connected layer with dropout to produce a single output, and training will be conducted using the rooted mean squared error (RMSE) loss function. And Conv1d is 1D convolutional network, Conv2d is two-dimensional convolutional network. k is the number of split token of inputs and latent. The structures are (a) 1D ResCNN; (b) TCN, B₁ is the number of shown block; (c) OS-CNN; (d) XCM; (e) TST, z , and u are processed token, and B₂ is the number of transformer encoder; (f) LSTM-FCN; and (g) D-linear.

Importance analysis of time series. After TSO prediction, for further studying the relationship between TSO values and time series data, a single-layer gated recurrent unit (GRU) network based on the attention mechanism was developed. The hidden of the last time step was taken as Query, the hidden of each time step was taken as Key and Value for dot product according to equation (1) [30]. By extracting the dot product results after Softmax operation, the weights (importance) of each time step can be obtained. With another single-layer attention based GRU with permuted time series, the weights of each channel can be also received.

4.3 Image analysis models with time series data

Each time series sequence tensor, consisting of 1,024 time-steps and 7 channels, was resized to a 32 × 32 image tensor with 7 channels to facilitate higher-dimensional feature extraction. This transformation converted endpoint prediction into an image regression task. SOTA image classification and regression (collectively referred to as image analysis) models were modified for image regression. The outputs of the models were flattened and projected to single values.

Pre-activation ResNet [34]. Pre-activation ResNet is founded upon CNNs, emphasizing direct transmission of feature information between modules. In its architecture, the conventional residual block is substituted with two sequentially arranged modules comprising batch normalization, ReLU functions, and convolutional layers. This substitution alleviates the training complexity of deep networks while preserving the model’s parameter capacity.

DenseNet [35]. DenseNet is a CNN architecture designed to address the issue of gradient vanishing. In the DenseNet structure, each layer’s input is derived from the outputs of all preceding layers, establishing direct connections from early layers to later ones. Batch normalization and ReLU activation functions are employed between each convolutional layer. This design promotes information flow throughout the network, facilitating efficient gradient propagation and enhancing training performance.

Deep layer aggregation (DLA) [36]. DLA serves as a foundational structure for consolidating deep networks. It encompasses two distinct types of aggregations: Iterative deep aggregation (IDA) and Hierarchical deep aggregation (HDA). In IDA, the aggregation node’s input comprises the output of the current stage and the previous node. Conversely, in HDA, the aggregation node’s input consists of the outputs of blocks and the preceding node, while the aggregation node’s output is also fed into the subsequent set of blocks. In this study, DLA, based on two-dimensional convolutional networks, utilizes IDA to amalgamate stages and HDA to consolidate blocks within each stage.

Dual path networks (DPN) [37]. DPN is a model that integrates key features from ResNet and DenseNet architectures. The practical implementation of the DPN network adopts ResNeXt, leveraging group convolution instead of ResNet, as the primary component. This modification effectively enhances the learning capacity of each block while mitigating the rapid expansion of DenseNet channels. Subsequently, slice and concatenate layers are applied to introduce additional DenseNet pathways. Notably, split and elementwise addition operations are performed on the outputs of both model components.

GoogleNet (Inception-ResNet v2) [38]. GoogleNet, founded on CNNs, introduces the inception module, aimed at reducing model parameters and enhancing network complexity. Inception-ResNet v2, an advancement over GoogleNet, incorporates residual connections to enable a deeper model structure. Key components include inception modules such as the Inception-ResNet-A block, utilizing 1 × 1 and 3 × 3 convolutions with residual connections, the Inception-ResNet-B block employing 1 × 1, 1 × n, and n × 1 convolutions with residual connections, and the Inception-ResNet-C block integrating 1 × 1, 1 × 3, and 3 × 1 convolutions with residual connections. In comparison to Inception V4, it offers greater ease of tuning.

Vision transformer (VIT) [39]. Vision transformer is rooted in the transformer architecture. It introduces a multi-head attention mechanism utilizing dot product, where each transformer layer comprises layer normalization, a feedforward network, and multi-head attention sequentially, with residual connections. VIT begins by embedding and projecting a patched image (C × H × W) to resize it to C × D. Following patch embedding and position embedding, the transformed image is forwarded to the transformer encoder to produce the final output.

The schematic diagram of the image regression process and the structures of SOTA backbones used are depicted in Figure 8.

Figure 8

Diagram of the image regression process and the structures of SOTA backbones. (a) Diagram of the image regression process. The structures are (b) Pre-activation ResNet; (c) DenseNet; (d) DLA; (e) DPN; (f) GoogleNet; and (g) VIT.

4.4 Concat-model and paral-model with mixed data

To further explore methods for optimizing the utilization of available data and enhancing prediction accuracy, two DL architectures were designed for making prediction with both tabular and time series data. Their architectures are shown in Figure 9.

Figure 9

Architectures of (a) Concat-model and (b) paral-model, FC is fully connected layer. For keeping the same complexity, the tabular data are embedded in the same pattern.

The first architecture is named concat-model. It involves embedded and reshaped tabular data, then concatenating it with the time series data in the channel dimension to form a new sequence. This new sequence will be input into time series analysis models for prediction.

The second architecture is dubbed the paral-model. It employs time series analysis models to handle time series data, while simultaneously utilizing a tabular data processing network composed of fully connected layers with dropout to process tabular data. The outputs of these two networks are added, flattened, and projected as the final output.

5 Results and analysis

5.1 Results of tabular feature importance

Figure 10 shows the tabular features and their feature importance in different tasks. And Figure 10 shows the distribution of importance across channels and time steps for the time series data in different prediction tasks. The features related to hot metal and oxygen blowing are more important in each task. The features related to additives are of lower importance.

Figure 10

Importance analysis result for different task.

Time series data from 300 heats were randomly chosen for channel and time step importance analysis. Figure 11 is importance analysis results of time series data. It demonstrates that time series data exhibit comparable distributions of time step and channel importance for predicting Endpoint [P] and Endpoint [C]. Later stage time steps and channels of O₂-blow, flow, and CO are more crucial for predicting Endpoint [P]/[C]. However, for Endpoint-Temp, mid-stage time steps are more important, while channels of GP-CO, GP-CO₂, and GP-O₂ exhibit lower importance for prediction. For each target, the blowing practice related curves are more important.

Figure 11

Importance analysis results of time series data’s time steps and channels in different task.

5.2 Results of ML with tabular data

The prediction results of test set are shown in Table 2. Conventional BP-NN and SVR are also considered, and their performance are also listed and compared. The best values are given in bold.

Table 2

Performance of ML models with tabular data in different tasks

Tasks	Models	MAE	MSE	± 10 % ( ± 10 ℃ )	± 15 % ( ± 15 ℃ )	± 20 % ( ± 20 ℃ )
Endpoint [P]/%	XGBoost	2.313 × 10⁻³	9.408 × 10⁻⁶	0.402	0.573	0.706
	LightGBM	2.326 × 10⁻³	9.400 × 10⁻⁶	0.397	0.567	0.707
	CatBoost	2.276 × 10 ⁻³	8.985 × 10 ⁻⁶	0.391	0.560	0.715
	TabNet	2.365 × 10⁻³	9.561 × 10⁻⁶	0.391	0.555	0.672
	BP-NN	3.158 × 10⁻³	1.554 × 10⁻⁵	0.282	0.418	0.542
	SVR	2.710 × 10⁻³	1.197 × 10⁻⁵	0.323	0.461	0.607
Endpoint [C]/%	XGBoost	7.018 × 10⁻³	8.768 × 10⁻⁵	0.381	0.569	0.701
	LightGBM	6.976 × 10⁻³	8.706 × 10⁻⁵	0.401	0.571	0.705
	CatBoost	6.916 × 10 ⁻³	8.754 × 10 ⁻⁵	0.409	0.556	0.721
	TabNet	7.066 × 10⁻³	8.723 × 10⁻⁵	0.369	0.560	0.707
	BP-NN	7.534 × 10⁻³	1.007 × 10⁻⁴	0.379	0.535	0.678
	SVR	7.316 × 10⁻³	9.443 × 10⁻⁵	0.360	0.540	0.705
Endpoint-Temp/°C	XGBoost	7.964	103.637	0.686	0.879	0.949
	LightGBM	7.955	101.776	0.690	0.855	0.947
	CatBoost	8.009	110.909	0.692	0.871	0.957
	TabNet	8.258	113.576	0.663	0.861	0.943
	BP-NN	9.150	129.957	0.610	0.811	0.920
	SVR	8.310	108.674	0.658	0.851	0.947

Bold values represent best values for each task.

Table 2 indicates that the targets exhibit high predictability with tabular data. Across the same task, the discrepancy in accuracy among models was minimal. Additionally, most predicted values fell within ± 20% or ± 20°C of actual values for Endpoint [P]/[C]/Temp prediction. Based on MAE, CatBoost emerges as the optimal model for Endpoint [P] and Endpoint [C] prediction, while LightGBM excels in Endpoint-Temp prediction. The conventional BP-NN performed worse in each task. However, it is worth noting that CatBoost and TabNet have extremely slower training processes with low learning rates.

5.3 Results of time series analysis models with time series data

The performance of time series analysis models with different SOTA backbones on the test set is presented in Table 3, and the models are named with the backbones’ name. The best values are given in bold. The training strategies are demonstrated in appendix section.

Table 3

Performance of time series analysis models with time series data in different tasks

Tasks	Models	MAE	MSE	± 10 % ( ± 10 ℃ )	± 15 % ( ± 15 ℃ )	± 20 % ( ± 20 ℃ )
Endpoint [P]/%	ResCNN	2.516 × 10⁻³	1.057 × 10⁻⁵	0.343	0.501	0.657
	TCN	2.594 × 10⁻³	1.162 × 10⁻⁵	0.357	0.517	0.643
	OS-CNN	2.499 × 10 ⁻³	1.042 × 10 ⁻⁵	0.368	0.521	0.673
	XCM	2.769 × 10⁻³	1.273 × 10⁻⁵	0.339	0.485	0.632
	TST	2.628 × 10⁻³	1.140 × 10⁻⁵	0.339	0.517	0.636
	LSTM-FCN	2.930 × 10⁻³	1.378 × 10⁻⁵	0.320	0.452	0.587
	D-linear	2.517 × 10⁻³	1.094 × 10⁻⁵	0.386	0.521	0.659
Endpoint [C]/%	ResCNN	6.791 × 10 ⁻³	8.674 × 10 ⁻⁵	0.435	0.585	0.750
	TCN	7.297 × 10⁻³	9.413 × 10⁻⁵	0.404	0.551	0.694
	OS-CNN	7.269 × 10⁻³	1.054 × 10⁻⁴	0.406	0.585	0.732
	XCM	7.601 × 10⁻³	9.905 × 10⁻⁵	0.365	0.562	0.687
	TST	7.485 × 10⁻³	9.543 × 10⁻⁵	0.385	0.539	0.671
	LSTM-FCN	7.321 × 10⁻³	9.556 × 10⁻⁵	0.390	0.544	0.691
	D-linear	7.344 × 10⁻³	9.576 × 10⁻⁵	0.367	0.535	0.694
Endpoint-Temp/°C	ResCNN	10.366	168.310	0.553	0.759	0.884
	TCN	11.437	209.552	0.528	0.699	0.825
	OS-CNN	10.169	164.668	0.567	0.759	0.891
	XCM	11.925	233.993	0.496	0.692	0.835
	TST	10.420	176.275	0.554	0.753	0.862
	LSTM-FCN	11.600	216.666	0.510	0.710	0.851
	D-linear	11.493	209.301	0.517	0.705	0.841

Bold values represent best values for each task.

According to Table 3, approximately 70% of the predicted values for Endpoint [C] and Endpoint [P] fell within ± 20 % of the actual values. For Endpoint-Temp, nearly 90% of the predictions were within ± 20 ℃ of the true values. These results demonstrate a strong correlation between time series data and TSO values. It can be concluded that OS-CNN exhibited the highest accuracy for predicting Endpoint [P] and Endpoint-Temp, while ResCNN performed best for predicting Endpoint [C]. Compared to ML models, using only time series data yields lower accuracy for predicting Endpoint [P] and Endpoint-Temp but higher accuracy for predicting Endpoint [C].

5.4 Results of image analysis models with time series data

The performance of image analysis models utilizing different SOTA backbones is depicted in Table 4. Model names are the same as backbones’ name. The best values are given in bold. Table 4 indicates that image analysis models did not enhance forecasting accuracy. The best prediction effect of image analysis models with different backbone is similar to time series models. However, in practical scenarios, after hyperparameter tuning, image analysis models necessitate more parameters and are more prone to overfitting. Therefore, if only time series data are accessible for forecasting, time series DL models are recommended.

Table 4

Performance of image analysis models with time series data for different tasks

Tasks	Models	MAE	MSE	± 10 % ( ± 10 ℃ )	± 15 % ( ± 15 ℃ )	± 20 % ( ± 20 ℃ )
Endpoint [P]/%	ResNet	3.132 × 10⁻³	1.553 × 10⁻⁵	0.269	0.399	0.544
	DenseNet	2.540 × 10 ⁻³	1.117 × 10⁻⁵	0.386	0.521	0.632
	DLA	2.587 × 10⁻³	1.116 × 10⁻⁵	0.356	0.501	0.636
	DPN	2.548 × 10⁻³	1.072 × 10⁻⁵	0.341	0.515	0.648
	GoogleNet	2.554 × 10⁻³	1.121 × 10⁻⁵	0.361	0.531	0.659
	VIT	2.587 × 10⁻³	1.109 × 10 ⁻⁵	0.356	0.503	0.646
Endpoint [C]/%	ResNet	7.469 × 10⁻³	1.020 × 10⁻⁴	0.388	0.564	0.691
	DenseNet	6.878 × 10 ⁻³	8.728 × 10 ⁻⁵	0.404	0.612	0.739
	DLA	7.520 × 10⁻³	9.540 × 10⁻⁵	0.361	0.526	0.657
	DPN	7.245 × 10⁻³	9.409 × 10⁻⁵	0.390	0.571	0.732
	GoogleNet	7.295 × 10⁻³	9.355 × 10⁻⁵	0.401	0.562	0.689
	VIT	7.558 × 10⁻³	1.006 × 10⁻⁵	0.381	0.526	0.683
Endpoint-Temp/°C	ResNet	11.377	200.414	0.509	0.736	0.847
	DenseNet	12.070	224.581	0.490	0.688	0.817
	DLA	11.195	193.361	0.501	0.727	0.854
	DPN	10.169	170.382	0.577	0.770	0.878
	GoogleNet	10.493	175.325	0.558	0.746	0.871
	VIT	10.827	186.404	0.534	0.736	0.859

Bold values represent best values for each task.

5.5 Results of concat-model and paral-model with mixed data

The prediction of concat-model and paral-model with different backbones is shown in Tables 5 and 6. They demonstrate that using mixed data with DL models can enhance the prediction accuracy of Endpoint [P]/[C]/Temp compared to using only time series data, especially for Endpoint-Temp. The concat-model outperforms the paral-model, achieving higher accuracy and reducing MAE by 4–10%. These findings suggest that a DL model using composite inputs may perform better than a DL model using only one of the inputs. At the same time, the way of feature merging also affects the prediction efficiency. The earlier the feature fusion is performed, the higher the feature fusion efficiency may be. Besides, the results also show that a single network can extract information from merged features efficiently. The best values are given in bold.

Table 5

Performance of concat-model with different backbones in different tasks

Tasks	Models	MAE	MSE	± 10 % ( ± 10 ℃ )	± 15 % ( ± 15 ℃ )	± 20 % ( ± 20 ℃ )
Endpoint [P]/%	ResCNN	2.361 × 10⁻³	9.768 × 10⁻⁶	0.388	0.555	0.698
	TCN	2.163 × 10 ⁻³	8.147 × 10 ⁻⁶	0.420	0.582	0.743
	OS-CNN	2.264 × 10⁻³	8.862 × 10⁻⁶	0.391	0.555	0.720
	XCM	2.471 × 10⁻³	1.057 × 10⁻⁵	0.381	0.558	0.672
	TST	2.181 × 10⁻³	8.234 × 10⁻⁶	0.418	0.582	0.741
	LSTM-FCN	2.186 × 10⁻³	8.274 × 10⁻⁶	0.404	0.591	0.732
	D-linear	2.184 × 10⁻³	8.360 × 10⁻⁶	0.404	0.571	0.745
Endpoint [C]/%	ResCNN	6.475 × 10 ⁻³	7.724 × 10 ⁻⁵	0.435	0.598	0.750
	TCN	7.246 × 10⁻³	9.253 × 10⁻⁵	0.401	0.571	0.698
	OS-CNN	7.004 × 10⁻³	8.331 × 10⁻⁵	0.363	0.560	0.700
	XCM	7.253 × 10⁻³	9.630 × 10⁻⁵	0.415	0.553	0.687
	TST	6.865 × 10⁻³	8.130 × 10⁻⁵	0.406	0.578	0.716
	LSTM-FCN	6.876 × 10⁻³	8.467 × 10⁻⁵	0.410	0.608	0.721
	D-linear	7.321 × 10⁻³	9.920 × 10⁻⁵	0.386	0.560	0.708
Endpoint-Temp/°C	ResCNN	5.891	55.682	0.828	0.962	0.990
	TCN	5.887	56.112	0.821	0.960	0.989
	OS-CNN	5.923	57.528	0.824	0.953	0.987
	XCM	5.919	56.172	0.834	0.957	0.986
	TST	7.354	92.837	0.734	0.901	0.962
	LSTM-FCN	7.214	88.917	0.752	0.852	0.946
	D-linear	6.788	71.852	0.747	0.921	0.980

Bold values represent best values for each task.

Table 6

Performance of paral-model with different backbones in different tasks

Tasks	Models	MAE	MSE	± 10 % ( ± 10 ℃ )	± 15 % ( ± 15 ℃ )	± 20 % ( ± 20 ℃ )
Endpoint [P]/%	ResCNN	2.373 × 10⁻³	9.656 × 10⁻⁶	0.373	0.544	0.679
	TCN	2.464 × 10⁻³	9.815 × 10⁻⁶	0.372	0.533	0.666
	OS-CNN	2.348 × 10⁻³	9.304 × 10⁻⁶	0.373	0.544	0.679
	XCM	2.314 × 10 ⁻³	8.997 × 10 ⁻⁶	0.393	0.548	0.691
	TST	2.785 × 10⁻³	1.261 × 10⁻⁵	0.325	0.463	0.585
	LSTM-FCN	2.738 × 10⁻³	1.221 × 10⁻⁵	0.318	0.479	0.598
	D-linear	2.351 × 10⁻³	9.328 × 10⁻⁵	0.381	0.546	0.669
Endpoint [C]/%	ResCNN	7.199 × 10⁻³	9.178 × 10⁻⁵	0.370	0.567	0.710
	TCN	6.984 × 10⁻³	8.473 × 10⁻⁵	0.379	0.578	0.721
	OS-CNN	7.226 × 10⁻³	9.233 × 10⁻⁵	0.360	0.547	0.717
	XCM	7.211 × 10⁻³	9.295 × 10⁻⁵	0.399	0.558	0.708
	TST	7.192 × 10⁻³	9.203 × 10⁻⁵	0.369	0.565	0.705
	LSTM-FCN	6.979 × 10 ⁻³	8.429 × 10 ⁻⁵	0.399	0.578	0.723
	D-linear	7.233 × 10⁻³	9.209 × 10⁻⁵	0.364	0.561	0.721
Endpoint-Temp/°C	ResCNN	6.763	72.736	0.780	0.929	0.977
	TCN	8.005	101.168	0.672	0.879	0.928
	OS-CNN	6.906	75.777	0.767	0.928	0.976
	XCM	8.243	115.226	0.703	0.844	0.939
	TST	7.422	87.492	0.727	0.911	0.967
	LSTM-FCN	6.896	74.746	0.764	0.922	0.980
	D-linear	7.216	82.679	0.734	0.917	0.967

Bold values represent best values for each task.

The visualized best prediction results in test set for each task is shown in Figure 12. The models are concat-model with backbone of TCN (for Endpoint [P]) and ResCNN (for Endpoint [C]/Endpoint-Temp).

Figure 12

Running chart of prediction results of best models.

According to all the above results, concat-models with TCN and ResCNN as backbone are the best models. A five-fold cross-validation is used to strengthen the reliability of the model evaluations and further optimize the hyperparameters. Specifically, the training set is mixed with the validation set and the order is randomly shuffled. For a set of model hyperparameters, 0–20, 20–40, 40–60, 60–80, and 80–100% of this new dataset are used for model validation, respectively, while the remaining data are used for training. The average of the results of these five training sessions is used for selecting the best hyper-parameters. The performance of the models with the best hyperparameters in the test set is shown in Table 7.

Table 7

Results of cross validation

Tasks	Models	MAE	MSE	± 10 % ( ± 10 ℃ )	± 15 % ( ± 15 ℃ )	± 20 % ( ± 20 ℃ )
Endpoint [P]/%	TCN (concat-model)	2.153 × 10⁻³	7.497 × 10⁻⁶	0.430	0.582	0.758
Endpoint [C]/%	ResCNN (concat-model)	6.413 × 10⁻³	7.526 × 10⁻⁵	0.446	0.598	0.765
Endpoint-Temp/°C	ResCNN (concat-Model)	5.780	55.644	0.838	0.972	0.986

After five-fold cross-validation, the performance of the model was further improved. The adjustment of hyperparameters is also experienced by all the training set and test set data, so that the model has stronger confidence.

5.6 Results of comparison with other public works

The MAE and MSE of BOF endpoint prediction obtained from previous studies are juxtaposed with those derived from this work. Despite differences in data quality, objectives, application scenarios, BOF vessel structure, and steelmaking processes across these studies compared to the scenario addressed in this study, they still hold significance for reference and comparison purposes. The results are shown in Table 8.

Table 8

Results of comparison with other public works

Tasks	Models	MAE	MSE
Endpoint [P]/%	Concat-model (TCN)	2.163 × 10⁻³	8.147 × 10⁻⁶
	BP model [5]	—	3.058 × 10⁻⁵
	PCA–GA–BP model [10]	—	2.2500 × 10⁻⁵
	GBR model [40]	—	1.0180 × 10⁻⁵
	RFR model [41]	—	1.0560 × 10⁻⁵
	M-c BP [41]	—	9.0000 × 10⁻⁶
	LWOA-TSVR [42]	6.690 × 10⁻³	7.8500 × 10⁻⁵
Endpoint [C]/%	Concat-model (ResCNN)	6.475 × 10⁻³	7.724 × 10⁻⁵
	BP model [5]	—	1.664 × 10⁻⁴
	FWA-TSVR [13]	—	8.2369 × 10⁻⁴
	CS-TSVR [13]	—	8.2829 × 10⁻⁴
	MI_IWSVM [43]	—	2.7828 × 10⁻⁴
	DRSupAE-NN [44]	—	3.9840 × 10⁻⁴
	DRSupAE-JITRN [44]	—	1.9107 × 10⁻⁴
Endpoint-Temp/°C	Concat-model (ResCNN)	5.891	55.682
	BP model [5]	—	461.820
	FWA-TSVR [13]	—	342.0280
	CS-TSVR [13]	—	382.7501
	MI_IWSVM [43]	—	76.6307
	DRSupAE-NN [44]	—	67.5437
	DSupAE-JITRN [44]	—	77.3379

6 Discussion on the potential for overfitting

The models developed in this work employ many tabular features. To investigate whether such many tabular features have led to overfitting, specific experiments were conducted. In the experiments, the hyperparameters of the best models were kept, and the number of tabular features were changed to: 25 (by removing all features of additives, Model-25), 36 (by removing the physical and chemical features of hot metal, Model-36), and 37 (by merging the features of additives before and after the TSC test, Model-37). Using the best ML and DL models (concat-model) in each task with the same training and tuning process, the performance disparity between the training and testing datasets for the model has been documented in Table 9. The MAE/MSE (test-train) is the difference of the MAE/MSE in the test set and the training set.

Table 9

The performance of models with different model complexity

Tasks	Models	MAE (training set)	MAE (test set)	MSE (training set)	MSE (test set)
Endpoint [P]/%	CatBoost	2.070 × 10⁻³	2.276 × 10⁻³	7.174 × 10⁻⁶	8.985 × 10⁻⁶
	CatBoost-25	2.306 × 10⁻³	2.614 × 10⁻³	8.856 × 10⁻⁶	1.122 × 10⁻⁵
	CatBoost-36	2.742 × 10⁻³	3.058 × 10⁻³	1.086 × 10⁻⁵	1.691 × 10⁻⁵
	CatBoost-37	2.130 × 10⁻³	2.373 × 10⁻³	7.603 × 10⁻⁶	9.450 × 10⁻⁶
	TCN	1.988 × 10⁻³	2.153 × 10⁻³	6.628 × 10⁻⁶	7.497 × 10⁻⁶
	TCN-25	2.109 × 10⁻³	2.304 × 10⁻³	7.426 × 10⁻⁶	8.994 × 10⁻⁶
	TCN-36	2.129 × 10⁻³	2.361 × 10⁻³	7.568 × 10⁻⁶	9.271 × 10⁻⁶
	TCN-37	2.017 × 10⁻³	2.225 × 10⁻³	6.821 × 10⁻⁶	8.430 × 10⁻⁶
Endpoint [C]/%	CatBoost	6.503 × 10⁻³	6.916 × 10⁻³	7.734 × 10⁻⁵	8.754 × 10⁻⁵
	CatBoost-25	6.715 × 10⁻³	7.290 × 10⁻³	8.255 × 10⁻⁵	9.363 × 10⁻⁵
	CatBoost-36	7.379 × 10⁻³	7.656 × 10⁻³	1.008 × 10⁻⁴	1.037 × 10⁻⁴
	CatBoost-37	6.720 × 10⁻³	7.045 × 10⁻³	8.278 × 10⁻⁵	9.178 × 10⁻⁵
	ResCNN	6.265 × 10⁻³	6.413 × 10⁻³	7.149 × 10⁻⁵	7.526 × 10⁻⁵
	ResCNN-25	6.450 × 10⁻³	6.685 × 10⁻³	7.596 × 10⁻⁵	8.192 × 10⁻⁵
	ResCNN-36	6.511 × 10⁻³	6.730 × 10⁻³	7.761 × 10⁻⁵	8.303 × 10⁻⁵
	ResCNN-37	6.410 × 10⁻³	6.539 × 10⁻³	7.519 × 10⁻⁵	7.826 × 10⁻⁵
Endpoint-Temp/°C	LightGBM	7.490	7.955	89.112	101.776
	LightGBM-25	7.972	8.350	100.417	109.661
	LightGBM-36	8.328	8.706	109.173	122.483
	LightGBM-37	7.826	8.188	96.959	109.757
	ResCNN	5.508	5.780	50.477	55.644
	ResCNN-25	5.889	6.098	57.892	59.962
	ResCNN-36	6.020	6.252	58.591	65.183
	ResCNN-37	5.928	6.108	58.689	62.182

According to Table 9, taking MAE and MSE as criterion, for predicting Endpoint [P], when removing tabular features, the difference between training set and test set of all models are distributed with 2 × 10⁻⁴–2.5 × 10⁻⁴% (MAE) and 1.5 × 10⁻⁶–6 × 10⁻⁶% (MSE). For predicting Endpoint [C], the values are 2 × 10⁻⁴–3 × 10⁻⁴% (MAE) and 3 × 10⁻⁶–1 × 10⁻⁵% (MSE). And for predicting Endpoint-Temp, the values are 0.2–0.5°C. The results show that adding and reducing original features will not lead to overfitting, but reducing key features makes the underfitting. The removing of features has less impact for muti-input DL models.

To further explore whether the complex models used in this work have induced overfitting, additional experiments were conducted under the condition of using the original input features. For predicting Endpoint [P], the best ML model is CatBoost (max_depth = 14), and the best DL model is TCN (concat-model, num_layers = 6). They are changed to CatBoost-2 (reduce complexity, max_depth = 2), CatBoost-20 (increasing complexity, max_depth = 20), TCN-3 (reduce complexity, num_layers = 3), TCN-9 (increasing complexity, num_layers = 9). For predicting Endpoint [C], the best models (CatBoost (max_depth = 8), ResCNN (concat-model, emb_dim = 560)) are changed to CatBoost-2, Catboost-20, ResCNN-256 (emb_dim = 256), ResCNN-1,024 (emb_dim = 1,024). For predicting Endpoint-Temp, the best models (LightGBM (max_depth = 8), ResCNN (concat-model, emb_dim = 658)) are changed to LightGBM-2, LightGBM-20, ResCNN-256 (emb_dim = 256), ResCNN-1,024 (emb_dim = 1,024). The results are shown in Table 10.

Table 10

Results of changing best model’s complexities

Tasks	Models	MAE (training set)	MAE (test set)	MSE (training set)	MSE (test set)
Endpoint [P]/%	CatBoost	2.070 × 10⁻³	2.276 × 10⁻³	7.174 × 10⁻⁶	8.985 × 10⁻⁶
	CatBoost-2	2.343 × 10⁻³	2.560 × 10⁻³	9.133 × 10⁻⁶	1.071 × 10⁻⁵
	CatBoost-20	1.189 × 10⁻³	2.566 × 10⁻³	6.015 × 10⁻⁶	1.075 × 10⁻⁵
	TCN	1.988 × 10⁻³	2.153 × 10⁻³	6.628 × 10⁻⁶	7.497 × 10⁻⁶
	TCN-3	2.176 × 10⁻³	2.401 × 10⁻³	7.906 × 10⁻⁶	9.667 × 10⁻⁶
	TCN-9	1.936 × 10⁻³	2.730 × 10⁻³	6.309 × 10⁻⁶	1.215 × 10⁻⁵
Endpoint [C]/%	CatBoost	6.503 × 10⁻³	6.916 × 10⁻³	7.734 × 10⁻⁵	8.754 × 10⁻⁵
	CatBoost-2	6.995 × 10⁻³	7.354 × 10⁻³	9.046 × 10⁻⁵	9.596 × 10⁻⁵
	CatBoost-20	6.421 × 10⁻³	7.289 × 10⁻³	7.545 × 10⁻⁵	9.363 × 10⁻⁵
	ResCNN	6.265 × 10⁻³	6.413 × 10⁻³	7.149 × 10⁻⁵	7.526 × 10⁻⁵
	ResCNN-256	6.685 × 10⁻³	6.887 × 10⁻³	9.363 × 10⁻⁵	8.713 × 10⁻⁵
	ResCNN-1,024	5.878 × 10⁻³	7.327 × 10⁻³	6.296 × 10⁻⁵	9.487 × 10⁻⁵
Endpoint-Temp/°C	LightGBM	7.490	7.955	89.112	101.776
	LightGBM-2	8.069	8.429	102.772	115.616
	LightGBM-20	7.049	8.498	78.940	117.307
	ResCNN	5.508	5.780	50.477	55.644
	ResCNN-256	6.005	6.159	58.327	60.893
	ResCNN-1,024	5.266	6.204	46.006	61.724

According to Table 10, the performance of the best models in the training set and the test set are like each other. However, based on the original model, a large reducing of the model complexity will lead to underfitting problems, while a significant increase in the model complexity may lead to serious overfitting. The overfitting caused by the increased model complexity on DL models is even more serious.

6.1 Significance of predictions

Optimizing the steelmaking process. (1) The models allow for continuous output of predictions. In the later stages of smelting, since the additives have been completed, for the model, its predictions are solely related to the time series and the oxygen blowing volume is updated every second. With such dynamic inputs, the models can generate a decarburization/temperature rise curve to dynamically control the endpoint accordingly (The “Dynamic” mode of the prediction software). (2) The importance analysis based on the developed model has demonstrated the significance of various inputs, enabling targeted adjustments to the process based on their importance. For instance, in controlling the endpoint temperature, attention should be paid to controlling the amount of Newman ore added according to the steel grade and process requirements.
Reduce resource consumption. By eliminating the TSO test, each heat can directly save $20–30 in disposable sampler consumption. Along with savings in cooling and driving energy consumption, even if only 50% of heats cancel the TSO test, each steel plant can save nearly half million dollars annually.
Improve production efficiency. According to the predicted values, the TSO test can be eliminated. This directly saves 1.5–2 min of smelting time. With a BOF steelmaking time of 10–15 min, the efficiency is increased by 12–20%.
Additionally, this study has demonstrated that relatively accurate endpoint predictions can be achieved solely using time series data. Some factories have poor data infrastructure, with incomplete static figures or data that cannot be utilized due to manual input. In such cases, if there are oxygen blowing-related curves and gas composition data from coal gas recovery, endpoint prediction can still be achieved.

7 Results of field application

The best models have been applied to practical field production. The interface of the python-based online prediction software is shown in Figure 13. They receive data from dynamic database and make prediction with the best models for each task. It has two predicting methods. For dynamic method, it predicts each target automatically for each second. And in static method, it only makes predictions when the “Predict” button is pressed. When selecting “Tabular” button, the ML models will be used for prediction with tabular data. When selecting “Time Series” button, the concat-model will be adopted.

Figure 13

Interface of the prediction software.

In the actual field production, the model prediction results of 300 furnaces were recorded. The MSE, MAE, and visual running chart of the predicted results are shown in Figure 13. The MSE for predicting Endpoint [P]/[C]/Temp are 9.419 × 10⁻⁶%, 6.857 × 10⁻⁵%, and 104.021°C. And the MAE are 2.394 × 10⁻³%, 6.231 × 10⁻³%, and 7.679°C. In the case of field uncertainty, the prediction accuracy of Endpoint [P] and Endpoint-Temp is only slightly lower than that of the test set. The prediction result of Endpoint [C] is better than that of test set. This proves that the model has good robustness and generalization ability. The results are shown in Figure 14.

Figure 14

Field application results of the best models.

8 Conclusion

This study concentrated on predicting the endpoint chemistry and temperature (Endpoint [P]/[C]/Temp) of BOF steelmaking using SOTA ML (XGBoost, LightGBM, Catboost, TabNet) and DL models, coupled with varied inputs gathered from 13,528 heats. It provides the potential to eliminate the TSO test. The best models were implemented in practical field production, and the prediction results of 300 heats were recorded. By integrating the testing and field application results, the following conclusions were drawn.

The best prediction MAE of SOTA ML models with 45 tabular features (features of hot metal, scraps, additives, and so on) for Endpoint [P]/[C]/Temp were 2.276 × 10⁻³% (Catboost), 6.916 × 10⁻³% (Catboost), and 7.955°C (LightGBM), respectively, which exceeded the performance of conventional BP-NN and SVR models.
The time series data (off-gas profile and blowing practice related curves) has strong relationships with the targets. The targets can be predicted with only original time series data and SOTA DL time series analysis models (1D ResCNN, TCN, OS-CNN, XCM, TST, LSTM-FCN, D-linear). For predicting Endpoint [C], the best accuracy (MAE = 6.791 × 10⁻³%, ResCNN) was better than SOTA ML models with tabular features. But for predicting Endpoint [P] (MAE = 2.499 × 10⁻³%, OS-CNN) and Endpoint-Temp (MAE = 10.169°C, OS-CNN), the accuracies are lower. If using resized time series data and DL image analysis models (pre-activation ResNet, DenseNet, DLA, DPN, GoogleNet, VIT), the accuracy will not improve.
Making prediction with both time series data and tabular data with designed concat-model and paral-model would lead to a higher prediction accuracy. The concat-model with backbone of TCN (MAE = 2.153 × 10⁻³% for Endpoint [P]) and ResCNN (MAE = 6.413 × 10⁻³% for Endpoint [C] and MAE = 5.780°C for Endpoint Temp) performed the best, which achieved the highest accuracies for each target in this study.
The best models (concat-model with backbone of TCN and ResCNN) have been applied in field production. The field test results showed that the prediction MAE for Endpoint [P]/[C]/Temp were 2.394 × 10⁻³%, 6.231 × 10⁻³%, and 7.679°C, respectively. The accuracy of predicting Endpoint [C] is better than that of test set. And the accuracies of predicting Endpoint [C]/Temp are near to that of test set. The results served as an evidence of the developed model’s strong robustness and generalization capability.
The off-gas related curves held less significance compared to those representing blowing practices for all targets. The onset and the final stage of curves influenced the prediction of Endpoint [P] and Endpoint [C] greatly, but middle stage of the curves was more important for prediction of Endpoint-Temp.
Limitations of the study and directions for future research: Due to constraints in project planning and computational resources, this study proposed several generalized robust models. In the future, it is advisable to establish specific prediction models for the main products based on different steel grades and their process characteristics, aiming to maximize prediction accuracy.

tel: +86-15932672652

Acknowledgements

The authors are thankful to Tangsteel Co., Ltd of Hesteel Group and Digital Co., Ltd of Hesteel Group for providing detailed data, hardware, and software support for model development and field production test.

Funding information: This research has been supported by the Natural Science Foundation of Hebei Province, China (E2022318002).
Author contributions: Tian-yi Xie: proposed the idea for this study and was fully responsible for data preprocessing, model design and development, and deployment. And Tian-yi Xie also responsible for the writing of this paper. Fei Zhang provided technical support of steelmaking process. Jun-guo Zhang resposed platform management and communication. Yong-guang Xiang and Yi-xin Wang provided data, soft ware and hard wire of this study.
Conflict of interest: The authors state no conflict of interest.

Appendixes

The DL models are trained with the same pattern. The loss function is RMSE; the optimizer is Adam; and the learning rate is set to 0.01. The epoch is set to 1,000 originally. ReduceLROnPlateau (factor = 0.5, patience = 100, threshold = 0.00001) is adopted as learning rate scheduler. The models with highest accuracy of valid set in the training process are automatically saved. And the hyper-parameters boundaries and best hyper-parameters of the best models are detailed in Table A1.

Table A1

Hyper-parameter boundaries and best hyper-parameters of the best models

Models	Hyper-parameters and boundaries	Best hyper-parameters for each task
Models	Hyper-parameters and boundaries	Endpoint [P]	Endpoint [C]	Endpoint-Temp
Concat-model (TCN)	K: [256, 1,024]	K = 462	K = 828	K = 326
	emb_dim: [256, 1,024]	emb_dim = 585	emb_dim = 384	emb_dim = 640
	kernel_size: [3, 5, 7, 9]	kernel_size = 3	kernel_size = 3	kernel_size = 3
	num_layers: [1, 10]	num_layers = 6	num_layers = 8	num_layers = 5
	dropout: [0, 0.5]	dropout = 0.5	dropout = 0	dropout = 0
	dilation_base: [1, 3, 5]	dilation_base = 3	dilation_base = 3	dilation_base = 3
Concat-Model (ResCNN)	K: [256, 1,024]	K = 525	K = 916	K = 505
	emb_dim: [256, 1,024]	emb_dim = 462	emb_dim = 512	emb_dim = 622
	alpha: [0, 1]	alpha = 0.326	alpha = 0.182	alpha = 0.190
	dropout: [0, 0.5]	dropout = 0.5	dropout = 0.5	dropout = 0.5

The hyperparameters of the ML models are detailed in Table A2.

Table A2

Hyper-parameter boundaries and best hyper-parameters of ML models

Models	Hyper-parameters and boundaries	Best hyper-parameters for each task
Models	Hyper-parameters and boundaries	Endpoint [P]	Endpoint [C]	Endpoint-Temp
LightGBM	boosting: [“gbdt,” “dart”]	Boosting = “gbdt”	Boosting = “dart”	Boosting = “gbdt”
	learning_rate: [1 × 10⁻³,1]	learning_rate = 0.0625	learning_rate = 0.671	learning_rate = 0.134
	max_depth: [2, 20]	max_depth = 11	max_depth = 12	max_depth = 8
	num_leaves: [5, 31]	num_leaves = 26	num_leaves = 16	num_leaves = 7
	bagging_freq: [0, 30]	bagging_freq = 28	bagging_freq = 4	bagging_freq = 28
	max_bin: [10, 255]	max_bin = 106	max_bin = 213	max_bin = 81
CatBoost	Iterations: [100, 5 × 10³]	Iterations = 4,347	Iterations = 4,027	Iterations = 4,027
	bootstrap_type: [ “Bayesian,” “Bernoulli,” “MVS”]	bootstrap_type = “Bernoulli”	bootstrap_type = “MVS”	bootstrap_type = “Bernoulli”
	learning_rate: [1 × 10⁻⁴,0.1]	learning_rate = 0.0109	learning_rate = 0.0187	learning_rate = 0.0246
	max_depth: [2, 15]	max_depth = 14	max_depth = 8	max_depth = 8
	max_bin: [50, 400]	max_bin = 175	max_bin = 160	max_bin = 152
XGBoost	learning_rate: [1 × 10⁻³, 1]	learning_rate = 0.0192	learning_rate = 0.0207	learning_rate = 0.0167
	n_estimators: [100, 1 × 10⁵]	n_estimators = 4,458	n_estimators = 4,964	n_estimators = 3,277
	max_depth: [2, 15]	max_depth = 11	max_depth = 9	max_depth = 4
	lambda: [1 × 10⁻³, 10]	lambda = 0.377	lambda = 0.0162	lambda = 0.109
	alpha: [1 × 10⁻³, 10]	alpha = 0.0134	alpha = 0.00405	alpha = 0.00133
	colsample_bytree: [0.1,1]	colsample_bytree = 0.909	colsample_bytree = 0.863	colsample_bytree = 0.947
TabNet	mask_type: [“entmax,” “sparsemax”]	mask_type = “sparsemax”	mask_type = “sparsemax”	mask_type = “sparsemax”
	n_a: [4, 64]	n_a = 8	n_a = 11	n_a = 24
	n_d: [4, 64]	n_d = 12	n_d = 11	n_d = 45
	n_steps: [1, 10]	n_steps = 1	n_steps = 2	n_steps = 7
	gamma: [1, 2]	gamma = 1.052	gamma = 1.597	gamma = 1.795
	n_shared: [1, 3]	n_shared = 1	n_shared = 1	n_shared = 1
	learning_rate: [0001, 1]	learning_rate = 0.125	learning_rate = 0.167	learning_rate = 0.101
BP-NN	layer_num: [2, 10]	layer_num = 6	layer_num = 8	layer_num = 5
	emb_dim: [64, 1, 024]	emb_dim = 512	emb_dim = 384	emb_dim = 512
	learning_rate: [1 × 10⁻⁴, 1]	learning_rate = 0.0165	learning_rate = 0.0224	learning_rate = 0.0151
SVR	c: [1, 2]	c = 1.692	c = 1.825	c = 1.026
	kernel: [“rbf,” “sigmoid”]	kernel = “rbf”	kernel = “rbf”	kernel = “rbf”
	gamma: [0, 10]	gamma = 1.782	gamma = 0.856	gamma = 1.652

References

[1] Wang, Z., Q. Liu, F. M. Xie, B. Wang, B. Wang, X. C. Lu, et al. Model for prediction of oxygen required in BOF steelmaking. Ironmaking & Steelmaking, Vol. 39, No. 3, 2012, pp. 228–233.Suche in Google Scholar

[2] He, F. and L. Zhang. Prediction model of end-point phosphorus content in BOF steelmaking process based on PCA and BP neural network. Journal of Process Control, Vol. 66, 2018, pp. 51–58.Suche in Google Scholar

[3] Wang, Z., J. Chang, Q.-P. Ju, F.-M. Xie, B. Wang, H.-W. Li, et al. Prediction Model of End-point Manganese Content for BOF Steelmaking Process. ISIJ international, Vol. 52, No. 9, 2012, pp. 1585–1590.Suche in Google Scholar

[4] Li, W., Q. M. Wang, X. S. Wang, and H. Wang. Magnetically separable Ag/AgCl-zero valent iron particles modified zeolite X heterogeneous photocatalysts for tetracycline degradation under visible light. Chemical Engineering Transactions, Vol. 51, 2016, pp. 475–480.Suche in Google Scholar

[5] Wang, R., I. Mohanty, A. Srivastava, T. K. Roy, P. Gupta, and K. Chattopadhyay. Hybrid method for endpoint prediction in a basic oxygen furnace. Metals, Vol. 12, No. 5, 2022, id. 801.Suche in Google Scholar

[6] Kang, Y., M. M. Ren, J. X. Zhao, L. B Yang, Z. K. Zhang, Z. Wang, and G. Cao. Prediction of end-point phosphorus content of molten steel in BOF with machine learning models. Journal of Mining and Metallurgy, Section B: Metallurgy, 2024, 8–8.Suche in Google Scholar

[7] Zhou, K. X., W. H. Lin, J. K. Sun, J. S. Zhang, D. Z. Zhang, X. M. Feng, et al. Gasification reactivity and kinetic parameters of coal chars for non-isothermal steam gasification. J. Iron Steel Res. Int., Vol. 28, 2021, pp. 1–10.Suche in Google Scholar

[8] Bae, J., Y. Li, N. Ståhl, G. Mathiason, and N. Kojola. Using Machine Learning for Robust Target Prediction in a Basic Oxygen Furnace System. Metallurgical and Materials Transactions B, Vol. 51, 2020, pp. 1632–1645. (ANN, SVR).Suche in Google Scholar

[9] Cox, I. J., R. W. Lewis, R. S. Ransing, H. Laszczewski, and G. Berni. Application of neural computing in basic oxygen steelmaking. Journal of Materials Processing Technology, Vol. 120, No. 1, 2002, pp. 310–315.Suche in Google Scholar

[10] Liu, Z., S. Cheng, and P. P. Liu. Prediction model of BOF end-point P and O contents based on PCA–GA–BP neural network. High Temperature Materials and Processes, Vol. 41, No. 1, 2022, pp. 505–513.Suche in Google Scholar

[11] Gao, C., M. Shen, X. Liu, L. Wang, and M. Chen. In vitro digestion of bread: How is it influenced by the bolus characteristics? Transactions of the Indian Institute of Metals, Vol. 72, 2019, pp. 257–270.Suche in Google Scholar

[12] Schlueter, J., H. J. Odenthal, N. Uebber, H. Blom, and K. Morik. A novel data-driven prediction model for BOF endpoint. Proceedings of the Iron & Steel Technology Conference, 2013, pp. 923–928.Suche in Google Scholar

[13] Duan, J., Q. Qu, C. Gao, and X. Chen. BOF steelmaking endpoint prediction based on FWA-TSVR. In 2017, 36th Chinese Control Conference (CCC), IEEE, 2017, pp. 4507–4511.Suche in Google Scholar

[14] Shao, Y., M. Zhou, Y. Chen, Q. Zhao, and S. Zhao. BOF endpoint prediction based on the flame radiation by hybrid SVC and SVR modeling. Optik, Vol. 125, No. 11, 2014, pp. 2491–2496.Suche in Google Scholar

[15] Zhou, M., Q. Zhao, and Y. Chen. Endpoint prediction of BOF by flame spectrum and furnace mouth image based on fuzzy support vector machine. Optik, Vol. 178, 2019, pp. 575–581.Suche in Google Scholar

[16] Jiang, F., H. Liu, B. Wang, and X. F. Sun. Emotional Regulation and Executive Function Deficits in Unmedicated Chinese Children with Oppositional Defiant Disorder. Computer Engineering, Vol. 42, No. 10, 2016, pp. 277–282.Suche in Google Scholar

[17] Borisov, V., T. Leemann, K. Seßler, J. Haug, M. Pawelczyk, and G. Kasneci. Deep neural networks and tabular data: A survey. IEEE Transactions on Neural Networks and Learning Systems, 2022.Suche in Google Scholar

[18] Grinsztajn, L., E. Oyallon, and G. Varoquaux. Why do tree-based models still outperform deep learning on typical tabular data?. Advances in Neural Information Processing Systems, Vol. 35, 2022, pp. 507–520.Suche in Google Scholar

[19] Lim, B. and S. Zohren. Time-series forecasting with deep learning: a survey. Philosophical Transactions of the Royal Society A, Vol. 379, No. 2194, 2021, id. 20200209.Suche in Google Scholar

[20] Ismail Fawaz, H., G. Forestier, J. Weber, L. Idoumghar, and P. A. Muller. Accurate and interpretable evaluation of surgical skills from kinematic data using fully convolutional neural networks. Data Mining and Knowledge Discovery, Vol. 33, No. 4, 2019, pp. 917–963.Suche in Google Scholar

[21] Rawat, W. and Z. Wang. Deep convolutional neural networks for image classification: A comprehensive review. Neural Computation, Vol. 29, No. 9, 2017, pp. 2352–2449.Suche in Google Scholar

[22] Meer, P., D. Mintz, A. Rosenfeld, and D. Y. Kim. Robust regression methods for computer vision: A review. International Journal of Computer Vision, Vol. 6, 1991, pp. 59–70.Suche in Google Scholar

[23] Chen, T. and C. Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, August, pp. 785–794.Suche in Google Scholar

[24] Ke, G., Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, et al. Lightgbm: A highly efficient gradient boosting decision tree, Advances in Neural Information Processing Systems, Vol. 30, 2017.Suche in Google Scholar

[25] Prokhorenkova, L., G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin. CatBoost: unbiased boosting with categorical features, Advances in Neural Information Processing Systems, Vol. 31, 2018.Suche in Google Scholar

[26] Arik, S.Ö. and T. Pfister. TabNet: Attentive Interpretable Tabular Learning. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, No. 8, 2021, pp. 6679–6687.Suche in Google Scholar

[27] Zou, Y., H. Zhang, Z. Wang, Q. Liu, and Y. Liu. A novel ECL method for histone acetyltransferases (HATs) activity analysis by integrating HCR signal amplification and ECL silver clusters. Neurocomputing, Vol. 367, 2019, pp. 39–45.Suche in Google Scholar

[28] Lea, C., M. D. Flynn, R. Vidal, A. Reiter, and G. D. Hager. Temporal convolutional networks for action segmentation and detection. In 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp. 1003–1012.Suche in Google Scholar

[29] Tang, W., G. Long, L. Liu, T. Zhou, M. Blumenstein, and J. Jiang. Omni-scale cnns: a simple and effective kernel size configuration for time series classification. arXiv preprint arXiv:2002.10061.Suche in Google Scholar

[30] Fauvel, K., T. Lin, V. Masson, É. Fromont, and A. Termier. XCM: An Explainable Convolutional Neural Network for Multivariate Time Series Classification. Mathematics, Vol. 9, No. 23, 2021, id. 3137.Suche in Google Scholar

[31] Zerveas, G., S. Jayaraman, D. Patel, A. Bhamidipaty, and C. Eickhoff. A transformer-based framework for multivariate time series representation learning. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021, pp. 2114–2124.Suche in Google Scholar

[32] Karim, F., S. Majumdar, H. Darabi, and S. Chen. LSTM fully convolutional networks for time series classification. IEEE Access, Vol. 6, 2017, pp. 1662–1669.Suche in Google Scholar

[33] Zeng, A., M. Chen, L. Zhang, and Q. Xu. Are transformers effective for time series forecasting?. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37, 2023, pp. 11121–11128.Suche in Google Scholar

[34] He, K., X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual networks. Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, Springer International Publishing, 2016, pp. 630–645.Suche in Google Scholar

[35] Huang, G., Z. Liu, L. Van Der Maaten, and K. Q. Weinberger. Densely connected convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4700–4708.Suche in Google Scholar

[36] Yu, F., D. Wang, E. Shelhamer, and T. Darrell. Deep layer aggregation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2403–2412.Suche in Google Scholar

[37] Chen, Y., J. Li, H. Xiao, X. Jin, S. Yan, and J. Feng. Dual path networks. Advances in Neural Information Processing Systems, Vol. 30, 2017.Suche in Google Scholar

[38] Szegedy, C., S. Ioffe, V. Vanhoucke, and A. Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31, No. 1, 2017.Suche in Google Scholar

[39] Dosovitskiy, A., L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.Suche in Google Scholar

[40] Zhang, R., J. Yang, S. Wu, H. Sun, and W. Yang. Comparison of the Prediction of BOF End‐Point Phosphorus Content Among Machine Learning Models and Metallurgical Mechanism Model. Steel Research Int, Vol. 94, No. 5, 2023, id. 2200682.Suche in Google Scholar

[41] Zhou, K. X., W. H. Lin, J. K. Sun, J. S. Zhang, D. Z. Zhang, and X. M. Feng. Prediction model of end-point phosphorus content for BOF based on monotone-constrained BP neural network. Journal of Iron and Steel Research International, Vol. 29, 2022, pp. 751–760.Suche in Google Scholar

[42] Shi, C., S. Guo, B. Wang, Z. Ma, C. L. Wu, and P. Sun. Prediction model of BOF end-point phosphorus content and sulfur content based on LWOA-TSVR. Ironmaking & Steelmaking, 2023, Vol. 50, No. 7, pp. 857–866.Suche in Google Scholar

[43] Wang, X., M. Han, and J. Wang. Applying input variables selection technique on input weighted support vector machine modeling for BOF endpoint prediction. Engineering Applications of Artificial Intelligence, Vol. 23, No. 6, 2010, pp. 1012–1018.Suche in Google Scholar

[44] Yang, L., H. Liu, and F. Chen. Just-in-time updating soft sensor model of endpoint carbon content and temperature in BOF steelmaking based on deep residual supervised autoencoder, Chemometrics and Intelligent Laboratory Systems, Netherlands, Vol. 231, 2022, id. 104679.Suche in Google Scholar

Received: 2024-04-10

Revised: 2024-04-25

Accepted: 2024-05-04

Published Online: 2024-06-05

This work is licensed under the Creative Commons Attribution 4.0 International License.

Artikel in diesem Heft

https://doi.org/10.1515/htmp-2024-0016

Schlagwörter für diesen Artikel

BOF; endpoint; machine learning; deep learning; time series; field application

Creative Commons

BY 4.0