Model construction of corrosion resistance of alloying elements for low alloy steel in marine atmospheric corrosive environment based on machine learning

Fulong Wang; Wei Liu; Yipu Sun; Bo Zhang; Hai Li; Longjun Chen; Bowen Hou; Haoyu Zhang

doi:10.1515/corrrev-2023-0162

Article Open Access

Model construction of corrosion resistance of alloying elements for low alloy steel in marine atmospheric corrosive environment based on machine learning

Fulong Wang , Wei Liu , Yipu Sun , Bo Zhang , Hai Li , Longjun Chen , Bowen Hou and Haoyu Zhang

Published/Copyright: September 12, 2024

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Corrosion Reviews Volume 43 Issue 1

Abstract

The study focused on constructing a machine learning model, considering the interaction of alloying elements on corrosion resistance of low alloy steels in the marine atmospheric environment. Spearman’s analysis was applied, and the relationship between alloying element and corrosion rate was evaluated based on random forest (RF) importance and Shapley additive explanation (SHAP) analysis. The prediction performance of the six models (RF, multilayer perceptron (MLP), ridge regression (RR), K-nearest neighbor regression (KNN), logistic regression (LR), and support vector machine (SVM) was compared by using the preferred dominant elements as input variables. Afterwards, a high-precision corrosion rate prediction model based on RF was constructed. Finally, the generalizability of the model was demonstrated using 10 lines of steel corrosion data from several new marine atmospheric environments.

Keywords: marine atmospheric corrosion; machine learning; low alloy steel; alloying elements; random forest

1 Introduction

Low alloy steel is considered to be those with a carbon content of less than 0.2 wt% and major alloying elements (Cu, Cr, Ni, P, Si, Mn, etc.) to a total of no more than 5 wt% (Morcillo et al. 2013). It is widely used in marine atmospheric environments due to its good mechanical properties and low cost. However, chloride-rich marine atmospheres typically cause faster corrosion rates for low alloy steels (Daniel et al. 2023). Thus, the service life of low alloy steels is also shortened in this environment (Wu et al. 2023). Its corrosion resistance is related not only related to the humidity, temperature and salinity in the marine atmospheric environment, but also to the characteristics of the rust layer (Zhang et al. 2021a). Alloying can improve the corrosion resistance of steels in marine atmospheric environment (Fan et al. 2020).

Within the range of alloying elements, researchers varied the content of Cu, Cr, Ni, Nb, and Mo, etc. in low alloy steels to explore their optimal corrosion resistance (Wei et al. 2021; Zhang et al. 2021b). For example, Liu et al. (Liu et al. 2014) compared the corrosion results of 3Ni steel with those of conventional weathering steels and showed that the element Ni could form NiFe₂O₄ nano-networks, which improves the steel’s weathering resistance. Adding Cu to Ni–Mo steel could refine grain size, increase the proportion of high angle grain boundaries, and accelerate the deposition of corrosion product films on the steel surface (Zhang et al. 2022). The element Cr reacts with the steel matrix to produce a passivation film, which enhances the corrosion performance of steel in chlorine environments and reduces the risk of corrosion (Guo et al. 2012). The research results from the Thailand Dalat exposure experiment at Trat in Thailand, which was in marine atmospheric environment, showed that the addition of 0.15 wt% Mo provided better corrosion resistance than the addition of 0.3 wt% Cu, where the formation of CaMoO₄ and Fe₂(MoO₄)₃ hindered further corrosion of the steel (Dong et al. 2022). In our previous studies and those of other researchers, up to now, many have used conventional experimental methods with the limitations of large experimental workload and long test cycles. In particular, exposure experiments in the highly corrosive marine atmosphere are often time-consuming and costly, making it difficult to meet the demand for developing new low alloy steel types. In recent years, machine learning has shown great potential in data mining and model building, which can be well applied to alloy design (Roy et al. 2022).

Different from traditional modeling methods, machine learning has great development in predicting the corrosion resistance of steel due to its powerful regression ability and rich feature processing techniques. For example, Pei et al. (2020) used a machine learning method based on random forest (RF) to predict the corrosion current value of steel in exposure location (Qingdao) and screened out important environmental factors that affect atmospheric corrosion. An artificial neural network (ANN) algorithm was applied to create a mapping for Ni–Cr–Mo–V steels that was utilized to predict corrosion currents and corrosion potentials (Hu et al. 2019). Various alloying elements were also set as a total amount as input features to construct a machine learnable corrosion rate prediction model for marine atmospheric environments (Yan et al. 2020). Yang et al. (2023a) tested four different Sn-containing low alloy steels (0 wt%, 0.10 wt%, 0.21 wt%, 0.32 wt%) using machine learning method to determine the optimal Sn elemental content in the steel. In summary, machine learning is feasible in predicting the corrosion rate of steel to optimize its corrosion resistance (Lu et al. 2022). However, machine learning model construction for corrosion resistance elements of low alloy steels in marine atmospheric environments has still rarely been studied.

Therefore, in this study, we selected seven common metallic elements (Ni, Cu, Mo, Nb, Mn, Cr, and Ti) and four non-metallic elements (Si, C, P, and S) of low alloy steels. The ratio of test set to training set was divided by comparing the root mean square error (RMSE) of different models. The importance of these 11 elements was ranked and their mechanism of action was analyzed using Spearman’s analysis and Shapley additive explanation (SHAP) method. The most important elements were selected as inputs to compare the prediction accuracy of the six chosen models (RF, multilayer perceptron (MLP), ridge regression (RR), K-nearest neighbor regression (KNN), logistic regression (LR), and support vector machine (SVM), and the best model was constructed. Finally, we also compared the prediction results with newly collected steel corrosion rates to validate the accuracy of the models. This study demonstrates the application of machine learning methods in determining how the alloying element content in low alloy steels affects their corrosion rate.

2 Data construction and methods

2.1 Corrosion data

Corrosion rates for 108 low alloy steels were manually collected for 1 year using the marine atmospheric dataset from Chinese National Materials Corrosion and Protection Data Center (National Materials Corrosion and Protection Data Center 2023). The steel samples were cut into square slices of a few tens of millimeters and placed in the field at an angle of 45° to the horizontal. However, some of the raw corrosion data could not be directly composed into a dataset to build a model. For example, some of the samples in the collected dataset tables were unavailable to provide the complete content of the alloying elements (or are below the detection threshold), while others had a corrosion rate that was higher than a certain value (the specimen was fully corroded). Therefore, the raw dataset was preprocessed by correcting erroneous data (data cleaning) and filling in missing values to improve the accuracy of the machine learning model (Shin et al. 2019).

If a sample had an obviously higher corrosion rate than the others, we used the box plot shown in Figure 1 for detection and rejection. At the same time, elements without labeled content were assigned with the minimum value of the corresponding element. The data set after data processing is shown in Table 1. Seven metallic elements, four non-metallic elements, corrosion time, temperature and corrosion rate are included in each row of data. Feature scaling was used to adjust for alloying elements with a wide range of fluctuations prior to characterization and feature importance ranking.

Figure 1:

Box plot of the distribution of corrosion rate value points.

Table 1:

List of features used in the corrosion rate prediction models.

Features		Unit	Descriptions	Data range
Material	Elements	wt%	C content	0.025–0.060
			S content	0.002–0.023
			P content	0.002–0.080
			Mn content	0.680–1.010
			Si content	0.130–0.940
			Cr content	0.001–0.920
			Mo content	0.001–1.230
			Ni content	0.012–2.870
			Ti content	0.003–0.091
			Cu content	0.001–1.210
			Nb content	0.004–0.024
Environmental	T	°C	Temperature range	18.5–22.2
Time	t	year	Immersion time	1
Corrosion rate	v	mm·y⁻¹	Annual corrosion depth, millimeter per year	0.015–0.024

2.2 Machine learning algorithms

2.2.1 Spearman’s correlation coefficient

Spearman’s correlation coefficient was used to evaluate the relationship between the alloying element and the corrosion rate (Song et al. 2023). As a nonparametric statistical method, it is suitable for assessing the correlation between two nonlinear variables. The following equation is used to obtain the Spearman’s coefficient (Song et al. 2023):

(1) ρ = 1 − 6 ∑ i = 1 n d i 2 n ( n 2 − 1 )

where ρ denotes the Spearman’s correlation coefficient, d _i represents the rank difference of the ith sample in the two variables, and n is the total number of samples. This equation is used to evaluate elemental and corrosion rate correlations, with 0–1 indicating a positive correlation, −1 to 0 indicating a negative correlation, and a value of 0 indicating no linear relationship (Yang et al. 2023a).

2.2.2 Shapley additive explanations

SHAP is used to explain the specific predictions of machine learning models, which can help understand the mapping relationship between input alloying elements characteristics and corrosion rate (Oviedo et al. 2022). The principle of SHAP is the Shapley value theory of cooperative game theory and provides an interpretable analysis that provides deep insight into model predictions. The Shapley value of the feature element i is represented by the following equations (Ekanayake et al. 2022):

(2) S h a p ( i ) = ∑ k ⊆ M { i } k ! ( N − k − 1 ) ! k ! [ f x k ∪ { i } − f x ( k ) ]

(3) f x ( k ) = E [ f ( x ) | x k ]

where, N denotes a subset of the features. M is the set of all possible combinations of features other than i. The expected value of a function on subset k is denoted by E [ f ( x ) | x k ] .

Stojic et al. (2019) used the SHAP method to investigate the characteristic correlations of the assay parameters and to identify factors controlling the wet deposition of xylene, ethylbenzene, and toluene. The color of a single point in the SHAP plot indicates the value of the feature, and as the relative importance of the feature increases, the point changes from blue to red. Positive SHAP value indicates that the feature has a facilitating effect on the target performance, while negative SHAP values represent the ability that will reduce the target performance (Li 2022). SHAP can help provide insights into feature importance and interactions between features.

2.2.3 Algorithms for corrosion rate prediction model

Six machine learning algorithms (RF, MLP, RR, KNN, LR, and SVM) were selected to construct a predictive model for the effect of elements on steel corrosion rate. As an integrated learning algorithm comprising a collection of multiple decision trees, RF can make predictions by majority voting oraveraging. MLP learns weight parameters from training data to be able to approximate nonlinear functions for complex pattern recognition and prediction in a variety of tasks. RR prevents overfitting by introducing L2 regularization terms to handle multicollinearity problems. KNN determines the relationship between features by searching for nearest neighbors, mainly used to solve regression problems. LR uses a logistic function (sigmoid function) to map a linear combination of features to a probability range between 0 and 1. SVM with linear kernel constructs regression models by finding the maximum margin, which is suitable for both regression and classification tasks, with strong generalization properties. Banded linear kernel is one of the variants. The mechanisms of all the above models can be found in literatures (Cai et al. 2020; Ramprasad et al. 2017; Xu et al. 2023). The open source Python module Scikit-learn was used to implement machine learning (Pedregosa et al. 2011).

2.3 Model construction process

Spearman’s correlation analysis was used to evaluate the correlation between each feature and corrosion rate. Regression analysis was performed using the RF algorithm model and the features were ranked in order of importance. The SHAP method was also used to visually demonstrate the effect of features on the corrosion rate of low alloy steels (Liu and Li 2023). The feasibility of machine learning in corrosion data mining was demonstrated through the above steps. The original dataset was divided into a training set (for optimizing the corrosion rate model) and a testing set (for characterizing the model prediction accuracy) with several different division ratios (i.e. 40 %, 50 %, 60 %, 70 %, and 80 % of the training set data, and the rest of the test set data). K-fold cross validation technique was used to improve training effectiveness. The prediction performance of above machine learning models (RF, MLP, RR, KNN, LR and SVM) was compared. Finally, the generalization ability of the model was demonstrated by comparing the predictions of the model with several sets of newly collected corrosion rate data.

2.4 Performance evaluation of models

The fitting errors of the training and testing samples of the different models were simultaneously evaluated using the coefficient of determination (R²) and the root mean square error (RMSE) to assess the accuracy of the model’s prediction results. They are represented by the following formulas (Zhi et al. 2019):

(4) R 2 = 1 − ∑ n = 1 N ( f n − y n ) 2 ∑ n = 1 N ( y ‾ − y n ) 2

(5) R M S E = ∑ n = 1 N ( f n − y n ) 2 N

where N denotes the sample size, f _n denotes the predicted corrosion rate, y _n denotes the measured value, and y ‾ denotes the average value. With R² approaching to the 1 and smaller RMSE, the better the predictive performance of the model.

3 Results and discussion

3.1 Comparison of machine learning algorithms

In general, the larger the dataset is, the more information it is likely to contain. Possible relationships between input features and target attributes are better extracted by the model through large data sets. Figure 2 shows the results of the division of prediction accuracy with the proportion of training set. Clearly, as the proportion of training set increases, the accuracy of all six prediction models increases, and the MLP, LR and RF models has the highest prediction accuracy at 80 % of training set proportion. The accuracy of the algorithms for KNN, RR, and SVM decreases slightly as the division of the training set exceeds 75 %. This may be due to the reduced model generalization ability caused by overfitting of these algorithms (Deng et al. 2015). LR has large fluctuations in RMSE with less than 55 % division of the training set, which could also be overfitting or affected by randomness. The dataset of this study is relatively small, so 80 % training set and 20 % test set ratio was chosen.

Figure 2:

Comparison of the performance of random forest (RF), multilayer perceptron (MLP), ridge regression (RR), k-nearest neighbor model (KNN), logistic regression (LR), and support vector machine (SVM) for corrosion rate prediction with different training and test set partitioning ratios.

3.2 Key alloying elements description analysis and feature selection

Figure 3 reveals the relationship between 11 alloying elements and the Spearman’s correlation of marine atmospheric corrosion rates. Spearman’s coefficients are shown in each box, with the darker box color indicating that the element inhibits the corrosion strongly. As shown in Figure 3, the correlation between 11 elements and corrosion rate is as follows in descending order: Ni, Cu, Mo, C, Nb, Ti, Mn, Cr, Si, P, and S. Ni, Cu, Mo, Nb, Ti, Mn, Cr, and Si show negative correlation with corrosion rate, indicating that these elements inhibited the corrosion of low alloy steel. C and P are positively correlated with corrosion rate, revealing that C and P reduce their corrosion resistance. This result is in accordance with existing studies (Guo et al. 2012; Liu et al. 2014; Sun et al. 2014; Yang et al. 2023b, 2023c; Zhang et al. 2022, 2023a, 2023b; Zhou et al. 2013). Element S produces segregation at the oxide/matrix interface, weakening the oxide layer adhesion and thus reducing the oxidation resistance of the alloy (Zhan et al. 2023). The inhibition of corrosion rate shown in the thermogram may be due to the small amount of element S in the collected low alloy steels, which fluctuate over a small range. The same reason is for the P element. Adding P to steel could act as an anode depolarizer, allowing for more uniform dissolution of the steel surface during the corrosion process and faster transformation of Fe²⁺into Fe³⁺, which improved the rate of generation of α-FeOOH (Raghavan 2004; Stewart et al. 2000). The effect of alloying elements may be influenced by other elements and the environment in which the steel is used. Meanwhile, the small database may also lead to a slight deviation of the results, so further experimental validation is needed.

Figure 3:

Spearman’s correlation plot for alloying elements. Spearman’s correlation coefficients are shown in each box, with darker colors indicating stronger corrosion rate inhibition by the element.

SHAP can be used to analyze the mechanism of action of individual alloying elements. Figure 4 shows the overall distribution of SHAP values for the main features of all the samples. These features are arranged on the vertical axis from highest to lowest importance. Red and blue colors indicate higher and lower values of the features (Roy et al. 2023). More positive SHAP values indicate higher corrosion rate. From the figure, it can be seen that Ni has the greatest effect on the corrosion rate. The lower the content of Ni, the more positive the value of SHAP, which indicates that Ni reduces the corrosion rate. Similarly, Ni, Mo and Cr all improve the corrosion resistance of low alloy steels, which is consistent with the findings obtained in Figure 3. The remaining features such as S, P and Ti present less effect on corrosion.

Figure 4:

Importance plot of SHAP variables for corrosion data. Each point represents a sample and these points are colored from low (blue) to high (red) by the corresponding eigenvalue. Positive SHAP values represent the ability to increase corrosion rate and negative SHAP values represent the ability to decrease corrosion rate. The features are listed from top to bottom on the vertical axis in order of their functional importance.

The RF model was used to further determine feature importance. Figure 5 shows the importance ranking of the 11 alloying elements based on the RF model. The order of importance of the 11 alloying elements is Ni, Cu, Mo, Nb, Cr, C, Mn, Si, S, P, and Ti. This is generally consistent with the ordering of importance given in Figures 3 and 4. The top eight features (Ni, Cu, Mo, Nb, Cr, C, Mn, and Si) with the highest importance were chosen to ensure the generalization ability of the model. Due to the limitations of the sample size, the content of the elements S, P, and Ti varied little and did not have a significant effect on the corrosion rate. Therefore, in the subsequent model selection, the first eight features were chosen to ensure the accuracy of the model. And S, P, and Ti were excluded from the model inputs.

Figure 5:

Order of importance of elemental content on corrosion rate.

3.3 Modeling of corrosion rate prediction

Ni, Cu, Mo, Nb, Cr, C, Mn, and Si were used as input features to select the best model. RMSE and R² were used to evaluate the reliability of the six machine learning models (RF, MLP, RR, KNN, LR and SVM) used. A comparison of the accuracy of the models is shown in Figure 6 and their specific values are shown in Table 2. Figure 6a shows the difference between the R² of the six models in the training and test sets. The models all have slightly higher R² on the training set than on the test set. From the model comparison, RF, SVM and LR have higher R² values, resulting in better model fit results. The R² of KNN has a larger accuracy gap between the training set and the test set, which may be due to overfitting. Figure 6b shows the results of the RMSE comparison of the six models. RMSE values of RF, SVM, MLP and LR are smaller compared to the RMSE of KNN and RR. Figure 6a and b indicates that RF, SVM and LR have lower model error and higher fitting accuracy.

Figure 6:

(a) R² for the training and test sets of six chosen models. (b) RMSE for training and test sets of six chosen models. The dataset is randomly divided into 80 % training set and 20 % test set.

Table 2:

Prediction accuracy of corrosion rate prediction models.

Models	Training set		Test set
Models	RMSE	R²	RMSE	R²
Random forest	0.05	0.93	0.11	0.85
Support vector machine	0.10	0.82	0.10	0.77
Multilayer perceptron	0.10	0.79	0.11	0.71
Ridge regression	0.10	0.76	0.14	0.66
k-Nearest neighbor	0.11	0.75	0.16	0.58
Logistic regression	0.10	0.80	0.12	0.76

The coefficient of determination (R²) and root mean square error (RMSE) were calculated for the samples in the training and test sets, respectively.

Figure 7 compares the predicted values and actual values of the RF, SVM, and MLP models on the training and test sets. The vertical axis represents the predicted corrosion rate, the horizontal axis represents the true corrosion rate as a function of the true corrosion rate, and the diagonal line is the true prediction line. If all data points tend towards the 45° diagonal distribution in the graph, the model tends to be more perfect (Zhi et al. 2019). Figure 7a1, b1, c1 shows the results of the model on the training set, while Figure 7a2, b2, c2 shows the results on the test set. The RF model has a smaller RMSE and a larger R² on the training set than on the test set. This indicates that the model has slightly higher model accuracy on the training set. Similar to the RF results, both SVM and MLP have slightly higher prediction accuracy on the training set than on the test set. On the comparison of the models on the training set, the three models are ranked in order of accuracy as RF, SVM, and MLP. R² is 0.93, 0.82, and 0.79 and RMSE are 0.05, 0.10, and 0.10. The model accuracy ranking for the test set is the same as the results in the training set, with R² being 0.85, 0.77, and 0.71, and RMSE being 0.11, 0.10, and 0.11. It can be seen that the RF model has the lowest fitting error with RMSE and R² of 0.05 and 0.93 on the training set and 0.11 and 0.85 on the test set.

Figure 7:

(a1) RF, (b1) SVM and (c1) MLP model fits to the training samples, and (a2) RF, (b2) SVM and (c2) MLP model fits to the test samples.

At the same time, the fitting abilities of RR, KNN, and LR were compared in Figure 8a1, b1, c1, and Figure 8a2, b2, c2. The RR, KNN and LR models all have smaller RMSE and larger R² on the training set than on the test set. Figure 8b1 and 8b2 show that the RMSE and R² of the KNN model on the training set is 0.11 and 0.75, while on the test set is 0.16 and 0.58. The KNN has a low model accuracy on the test set, and there is a large difference in the accuracy of the model between the test set and the training set. This may be due to the fact that although KNN can handle nonlinear data, it requires a large sample size to train the model (Zhang 2016). Therefore, the model accuracy of KNN is not good enough to obtain relatively reliable results with fewer training samples. Comparing the models on the training set, the three models are ranked in order of accuracy as LR, RR, and KNN. The R² is 0.80, 0.76, and 0.75, and the RMSE is 0.10, 0.10, and 0.11. The ranking of the models in order of accuracy on the test set is the same as the results on the training set, with the R² being 0.76, 0.66, and 0.58, and the RMSE being 0.12, 0.14, and 0.16. It can be seen that the LR model has the lowest fitting error with RMSE and R² of 0.10 and 0.80 for the training set and 0.12 and 0.76 for the test set. R² and RMSE can be used to evaluate the prediction accuracy (Diao et al. 2021). However, the prediction accuracy of the models shown in Figures 7 and 8 is lower on the test set than on the training set, which may be due to the insufficient amount of data to adequately reflect the elemental versus corrosion rate relationship.

Figure 8:

(a1) RR, (b1) KNN and (c1) LR model fit results for the training samples, and (a2) RR, (b2) KNN and (c2) LR model fit results for the test samples.

From the results in Figures 7 and 8, it is clear that the RF predictions are generally accurate, with only a few cases deviating from the 45-degree diagonal. This indicates that the RF model is reliable in dealing with corrosion datasets characterized by small samples, steep flow patterns, and nonlinearities. Meanwhile, K-fold cross-validation was also used to improve the training of the model (Figures S6 and S7). The results indicated that the RF model still has the highest prediction accuracy and the smallest error under K5-fold and K10-fold cross-validation. In summary, RF is able to achieve better performance than five other commonly used machine learning algorithms in predicting the corrosion rate of steel.

3.4 Generalization capabilities of machine learning models

To further validate the model accuracy, 10 rows of low alloy steel corrosion data (from National Materials Corrosion and Protection Data Center (National Materials Corrosion and Protection Data Center 2023)) independent of the previous dataset were collected as a validation set. The newly collected low alloy steel has different elemental content from the steel used in the training model. Therefore, these 10 rows of data can be used to verify the generalization ability of the model. The corrosion data are listed in Table S1. After inputting the elements of the new steel grade into the six models, the corrosion rate of each sample was predicted. Then compare the predicted corrosion rate with the actual corrosion rate. The results of the comparison of RF with the remaining five models (SVM, MLP, RR, KNN, LR) are documented in Figure 9 and Figures S1–S5. The results demonstrate that RF possesses higher predictive power than the other five models. In Figure 9, blue color represents the measured corrosion rate and red color represents the corrosion rate predicted by the model. Sample 8 had a slightly larger difference between the actual and predicted corrosion rates, and the rest of the samples had a small difference. This indicates that the RF model has high prediction accuracy and excellent generalization capability.

Figure 9:

Comparison between test-measured corrosion rates and corrosion rates predicted by the RF machine learning model. Using 10 new rows corrosion data.

In the feature selection process, some metallic elements (Ni, Cu, Mo, and Cr) and non-metallic elements (C, P, Si) which have a large influence on corrosion are used as input features in the feature selection process. In previous studies, corrosion rate prediction models have also used elements as input features, but the models are limited to certain steels because of the low alloy content and the lack of some elements that have an impact on corrosion (Wei et al. 2021). Zhi et al. (2020) applied a deep forest algorithm to construct a prediction model based on the collected corrosion data of 17 steels. However, the model was also trained using a dataset missing an important element (Mo) that affects the corrosion resistance of low alloy steels as an input feature. This greatly limits the ability to generalize the predictive model. The RF corrosion rate prediction model constructed in this study uses 11 elements as feature inputs for the first time and performs well in terms of model generalization ability in marine atmospheric environments. At the same time, we also screened out the alloying elements with the largest corrosion-influenced rates in the marine atmospheric environment. Finally, as shown in Table 3, five low alloy steel compositions with the best corrosion resistance were selected by the RF model. This is a great reference for guiding the development of corrosion-resistant materials and the design of related experiments.

Table 3:

Five corrosion-resistant low alloy steel compositions obtained by RF modeling.

	%C	%Si	%Mn	%P	%S	%Cr	%Cu	%Ni	%Mo	%Ti	%Nb	Corrosion rate (mm·y⁻¹)
1	0.040	0.315	0.956	0.074	0.023	0.199	0.107	0.753	0.810	0.069	0.008	0.0160
2	0.046	0.758	0.890	0.050	0.022	0.490	1.160	0.656	0.789	0.031	0.014	0.0162
3	0.031	0.807	0.846	0.067	0.023	0.690	0.580	0.821	0.777	0.028	0.017	0.0158
4	0.047	0.462	0.960	0.031	0.023	0.485	0.362	1.067	0.809	0.027	0.005	0.0159
5	0.049	0.213	0.884	0.047	0.023	0.407	0.339	1.262	0.758	0.043	0.017	0.0160

4 Conclusions

In this study, an RF-based machine learning model for predicting the corrosion rate of low alloy steel in a marine atmospheric environment was constructed. Based on the data collected, the relationship between corrosion rate and input alloying elements was analyzed. Ni and Cu have a significant effect on increasing the corrosion resistance of low alloy steels, and several other metallic elements (Mo, Nb, Mn and Cr) also inhibit steel corrosion. In addition, the non-metallic elements C and Si reduce the corrosion resistance of low alloy steels. Finally, a random forest corrosion rate prediction model with an R² value of 0.93 between predicted and measured corrosion rates was developed. The model has high prediction accuracy and generalization ability for multiple samples containing various elements. RF model gives five low alloy steel compositions with the best corrosion resistance. Based on these results, the importance of alloying elements of low alloy steel for resistance to marine atmospheric corrosion is fully explored by the machine learning method, which provides some guidance for the subsequent experiments to be carried out and the development of new steel types.

Corresponding author: Wei Liu, Corrosion and Protection Center, Institute for Advanced Materials and Technology, University of Science and Technology Beijing, Beijing 100083, China, E-mail: weiliu@ustb.edu.cn

Funding source: National Key R&D Program of China

Award Identifier / Grant number: 2016YFE0203600

Funding source: National Natural Science Foundation of China

Award Identifier / Grant number: 51571027

Research ethics: Not applicable.
Author contributions: Fulong Wang: Data curation, writing - original draft, formal analysis. Wei Liu: funding acquisition, writing - review & editing. Yipu Sun: validation, methodology. Bo Zhang: formal analysis. Hai Li: investigation. Longjun Chen: project administration. Bowen Hou: software. Haoyu Zhang: software. The authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Competing interests: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Research funding: The authors are grateful to the National Key R&D Program of China (grant no. 2016YFE0203600) and the National Natural Science Foundation of China (no. 51571027).
Data availability: Not applicable.

References

Cai, J., Chu, X., Xu, K., Li, H. and Wei, J. (2020). Machine learning driven new material discovery. Nanoscale Advances 2: 3115–3130, https://doi.org/10.1039/d0na00388c,Search in Google Scholar PubMed PubMed Central

Daniel, E.F., Wang, C., Li, C., Dong, J., Udoh, I.I., Zhang, D., Zhong, W., and Zhong, S. (2023). Evolution of corrosion degradation in galvanised steel bolts exposed to a tropical marine environment. J. Mater. Res. Technol. 27: 5177–5190, https://doi.org/10.1016/j.jmrt.2023.10.295.Search in Google Scholar

Deng, B.-C., Yun, Y.-H., Liang, Y.-Z., Cao, D.-S., Xu, Q.-S., Yi, L.-Z., and Huang, X. (2015). A new strategy to prevent over-fitting in partial least squares models based on model population analysis. Anal. Chim. Acta 880: 32–41, https://doi.org/10.1016/j.aca.2015.04.045.Search in Google Scholar PubMed

Diao, Y., Yan, L., and Gao, K. (2021). Improvement of the machine learning-based corrosion rate prediction model through the optimization of input features. Mater. Des. 198: 109326, https://doi.org/10.1016/j.matdes.2020.109326.Search in Google Scholar

Dong, B., Liu, W., Chen, L., Zhang, T., Fan, Y., Zhao, Y., Li, S., Yang, W., and Banthukul, W. (2022). Optimize Ni, Cu, Mo element of low Cr-steel rebars in tropical marine atmosphere environment through two years of corrosion monitoring. Cement Concrete Composites 125: 104317, https://doi.org/10.1016/j.cemconcomp.2021.104317.Search in Google Scholar

Ekanayake, I.U., Meddage, D.P.P., and Rathhayake, U. (2022). A novel approach to explain the black-box nature of machine learning in compressive strength predictions of concrete using Shapley additive explanations (SHAP). Case Stud. Construct. Mat. 16: e01059, https://doi.org/10.1016/j.cscm.2022.e01059.Search in Google Scholar

Fan, Y., Liu, W., Li, S., Chowwanonthapunya, T., Wongpat, B., Zhao, Y., Dong, B., Zhang, T., and Li, X. (2020). Evolution of rust layers on carbon steel and weathering steel in high humidity and heat marine atmospheric corrosion. J. Mater. Sci. Technol. 39: 190–199, https://doi.org/10.1016/j.jmst.2019.07.054.Search in Google Scholar

Guo, S., Xu, L., Zhang, L., Chang, W., and Lu, M. (2012). Corrosion of alloy steels containing 2% chromium in CO2 environments. Corros. Sci. 63: 246–258, https://doi.org/10.1016/j.corsci.2012.06.006.Search in Google Scholar

Hu, Q., Liu, Y., Zhang, T., Geng, S., and Wang, F. (2019). Modeling the corrosion behavior of Ni-Cr-Mo-V high strength steel in the simulated deep sea environments using design of experiment and artificial neural network. J. Mater. Sci. Technol. 35: 168–175, https://doi.org/10.1016/j.jmst.2018.06.017.Search in Google Scholar

Li, Z. (2022). Extracting spatial effects from machine learning model using local interpretation method: an example of SHAP and XGBoost. Comput. Environ. Urban Syst. 96: 101845, https://doi.org/10.1016/j.compenvurbsys.2022.101845.Search in Google Scholar

Liu, M. and Li, W. (2023). Prediction and analysis of corrosion rate of 3C steel using interpretable machine learning methods. Mater. Today Commun. 35: 106408, https://doi.org/10.1016/j.mtcomm.2023.106408.Search in Google Scholar

Liu, R., Chen, X., and Shi, Q. (2014). Effect of Ni on corrosion resistance of weathering steels in wet/dry environments. Adv. Mat. Res. 989: 420–424, https://doi.org/10.4028/www.scientific.net/amr.989-994.420.Search in Google Scholar

Lu, Z., Si, S., He, K., Ren, Y., Li, S., Zhang, S., Fu, Y., Jia, Q., Jiang, H.B., Song, H., et al.. (2022). Prediction of Mg alloy corrosion based on machine learning models. Adv. Mat. Sci. Eng. 2022: 9597155, https://doi.org/10.1155/2022/9597155.Search in Google Scholar

Morcillo, M., Chico, B., Díaz, I., Cano, H., and Delafuente, D. (2013). Atmospheric corrosion data of weathering steels. A review. Corros. Sci. 77: 6–24, https://doi.org/10.1016/j.corsci.2013.08.021.Search in Google Scholar

National Materials Corrosion and Protection. (2023). Data center, Available at: http://www.corrdata.org.cn/ (Accessed 20 September 2023).Search in Google Scholar

Oviedo, F., Ferres, J.L., Buonassisi, T., and Butler, K.T. (2022). Interpretable and explainable machine learning for materials science and chemistry. Acc. Mater. Res. 3: 597–607, https://doi.org/10.1021/accountsmr.1c00244.Search in Google Scholar

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., and Dubourg, V.J.T.J.O.M.L.R. (2011). Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12: 2825–2830.Search in Google Scholar

Pei, Z., Zhang, D., Zhi, Y., Yang, T., Jin, L., Fu, D., Cheng, X., Terryn, H.A., Mol, J.M.C., and Li, X. (2020). Towards understanding and prediction of atmospheric corrosion of an Fe/Cu corrosion sensor via machine learning. Corros. Sci. 170: 108697, https://doi.org/10.1016/j.corsci.2020.108697.Search in Google Scholar

Raghavan, V. (2004). C-Fe-P (carbon-iron-phosphorus). J. Phase Equilib. Diffus. 25: 541–542, https://doi.org/10.1361/15477020421142.Search in Google Scholar

Ramprasad, R., Batra, R., Pilania, G., Mannodi-Kanakkithodi, A., and Kim, C. (2017). Machine learning in materials informatics: recent applications and prospects. npj Comput. Mater. 3: 54, https://doi.org/10.1038/s41524-017-0056-5.Search in Google Scholar

Roy, I., Feng, B., Roychowdhury, S., Ravi, S., Umretiya, R., Reynolds, C., Ghosh, S., Rebak, R. and Hoffman, A. (2023). Understanding oxidation of Fe-Cr-Al alloys through explainable artificial intelligence. MRS Communications 13, https://doi.org/10.1557/s43579-022-00315-0,Search in Google Scholar

Roy, A., Roy, I., Santddonato, L.J., and Baiasubramanian, G. (2022). Data-guided feature identification for predicting specific heat of multicomponent alloys. JOM 74: 1406–1413, https://doi.org/10.1007/s11837-022-05183-6.Search in Google Scholar

Shin, D., Yamamoto, Y., Brady, M.P., Lee, S., and Haynes, J.A. (2019). Modern data analytics approach to predict creep of high-temperature alloys. Acta Mater. 168: 321–330, https://doi.org/10.1016/j.actamat.2019.02.017.Search in Google Scholar

Song, Y., Wang, Q.Y., Zhang, X., Dong, L., Bai, S., Zeng, D., Zhang, Z., Zhang, H., and Xi, Y. (2023). Interpretable machine learning for maximum corrosion depth and influence factor analysis. npj Mater. Degrad. 7: 9, https://doi.org/10.1038/s41529-023-00324-x.Search in Google Scholar

Stewart, J.W., Charles, J.A., and Wallacha, E. (2000). Iron–phosphorus–carbon system: Part 1 – mechanical properties of low carbon iron–phosphorus alloys. Mater. Sci. Technol. 16: 275–282, https://doi.org/10.1179/026708300101507839.Search in Google Scholar

Stojic, A., Stanic, N., Vukovic, G., Stanisic, S., Perisic, M., Šostaric, A., and Lazic, L. (2019). Explainable extreme gradient boosting tree-based prediction of toluene, ethylbenzene and xylene wet deposition. Sci. Total Environ. 653: 140–147, https://doi.org/10.1016/j.scitotenv.2018.10.368.Search in Google Scholar PubMed

Sun, F., Li, X., and Cheng, X. (2014). Effects of carbon content and microstructure on corrosion property of new developed steels in acidic salt solutions. Acta Metallurgica Sinica 27: 115–123, https://doi.org/10.1007/s40195-013-0007-1.Search in Google Scholar

Wei, X., Fu, D., Chen, M., Wu, W., Wu, D., and Liu, C. (2021). Data mining to effect of key alloying elements on corrosion resistance of low alloy steels in Sanya seawater environment alloying elements. J. Mater. Sci. Technol. 64: 222–232, https://doi.org/10.1016/j.jmst.2020.01.040.Search in Google Scholar

Wu, W., Qin, L., Cheng, X., Xu, F., and Li, X. (2023). Microstructural evolution and its effect on corrosion behavior and mechanism of an austenite-based low-density steel during aging. Corros. Sci. 212: 110936, https://doi.org/10.1016/j.corsci.2022.110936.Search in Google Scholar

Xu, L., Wang, Y., Mo, L., Tang, Y., Wang, F., and Li, C. (2023). The research progress and prospect of data mining methods on corrosion prediction of oil and gas pipelines. Engineering Failure Analysis 144: 106951, https://doi.org/10.1016/j.engfailanal.2022.106951.Search in Google Scholar

Yan, L., Diao, Y., Lang, Z., and Gao, K. (2020). Corrosion rate prediction and influencing factors evaluation of low-alloy steels in marine atmosphere using machine learning approach. Sci. Technol. Adv. Mat. 21: 359–370, https://doi.org/10.1080/14686996.2020.1746196.Search in Google Scholar PubMed PubMed Central

Yang, L., Yang, X., Wang, B., Wang, Z., Cheng, X., and Li, X. (2023a). Corrosion resistance optimization of Sn-additional low-alloy high strength steel by data-driven identification and field exposure verification. J. Mater. Res. Technol. 25: 3624–3641, https://doi.org/10.1016/j.jmrt.2023.06.159.Search in Google Scholar

Yang, X., Jia, J., Li, X., Li, Q., Sun, Z., Du, C., and Li, X. (2023b). Enhanced hydrogen induced stress corrosion cracking resistance of Ni-advanced weathering steel by Ni and Mn modification. Constr. Build. Mater. 408: 133820, https://doi.org/10.1016/j.conbuildmat.2023.133820.Search in Google Scholar

Yang, Z., Yu, M., Han, C., Zhao, Z., Jia, X., Zhao, M., Li, S., and Liu, J. (2023c). Evolution and corrosion resistance of passive film with polarization potential on Ti-5Al-5Mo-5V-1Fe-1Cr alloy in simulated marine environments. Corros. Sci. 221: 111334, https://doi.org/10.1016/j.corsci.2023.111334.Search in Google Scholar

Zhan, X., Wang, D., Zhang, Z., and Zhang, J. (2023). Effect of trace sulfur on the hot corrosion resistance of Ni-base single crystal superalloy. Corros. Sci. 224: 111528, https://doi.org/10.1016/j.corsci.2023.111528.Search in Google Scholar

Zhang, H., Liu, X., Xu, Y., Zhao, L., Peng, T., Qin, C., Yu, R., Wang, Z., and Yan, C. (2023a). Comparison investigation on corrosion of SIMP and T91 steels exposed to liquid LBE at 450 °C: the role of Si on reducing oxidation rate. Corros. Sci. 225: 111553, https://doi.org/10.1016/j.corsci.2023.111553.Search in Google Scholar

Zhang, Z., Li, X., Yi, H., Xie, H., Zhao, Z., and Bai, P. (2023b). Clarify the role of Nb alloying on passive film and corrosion behavior of CoCrFeMnNi high entropy alloy fabricated by laser powder bed fusion. Corros. Sci. 224: 111510, https://doi.org/10.1016/j.corsci.2023.111510.Search in Google Scholar

Zhang, T., Liu, W., Chen, L., Dong, B., Yang, W., Fan, Y., and Zhao, Y. (2021a). On how the corrosion behavior and the functions of Cu, Ni and Mo of the weathering steel in environments with different NaCl concentrations. Corros. Sci. 192: 109851, https://doi.org/10.1016/j.corsci.2021.109851.Search in Google Scholar

Zhang, T., Xu, X., Li, Y., and Lv, X. (2021b). The function of Cr on the rust formed on weathering steel performed in a simulated tropical marine atmosphere environment. Constr. Build. Mater. 277: 122298, https://doi.org/10.1016/j.conbuildmat.2021.122298.Search in Google Scholar

Zhang, T., Liu, W., Dong, B., Mao, R., Sun, Y., and Chen, L. (2022). Corrosion of Cu-doped Ni–Mo low-alloy steel in a severe marine environment. J. Phys. Chem. Solids 163: 110584, https://doi.org/10.1016/j.jpcs.2022.110584.Search in Google Scholar

Zhang, Z. (2016). Introduction to machine learning: k-nearest neighbors. Ann. Transl. Med. 4: 218, https://doi.org/10.21037/atm.2016.03.37.Search in Google Scholar PubMed PubMed Central

Zhi, Y., Fu, D., Zhang, D., Yang, T. and Li, X. (2019). Prediction and knowledge mining of outdoor atmospheric corrosion rates of low alloy steels based on the Random Forests approach. Metals 9: 383, https://doi.org/10.3390/met9030383,Search in Google Scholar

Zhi, Y., Yang, T., and Fu, D. (2020). An improved deep forest model for forecast the outdoor atmospheric corrosion rate of low-alloy steels. J. Mater. Sci. Technol. 49: 202–210, https://doi.org/10.1016/j.jmst.2020.01.044.Search in Google Scholar

Zhou, Y., Chen, J., Xu, Y., and Liu, Z. (2013). Effects of Cr, Ni and Cu on the corrosion behavior of low carbon microalloying steel in a Cl − containing environment. J. Mater. Sci. Technol. 29: 168–174, https://doi.org/10.1016/j.jmst.2012.12.013.Search in Google Scholar