Application of machine learning in material corrosion research

Shuaijie Ma; Yanxia Du; Shasha Wang; Yanjing Su

doi:10.1515/corrrev-2022-0089

Artikel Öffentlich zugänglich

Application of machine learning in material corrosion research

Shuaijie Ma , Yanxia Du , Shasha Wang und Yanjing Su

Veröffentlicht/Copyright: 17. April 2023

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen Erkunden Sie dieses Fachgebiet

Aus der Zeitschrift Corrosion Reviews Band 41 Heft 4

Abstract

The application of machine learning (ML) to corrosion research has become an important trend in corrosion science in recent years. In this paper, the feature extraction method for corrosion data and the ML algorithms commonly used (including artificial neural networks, support vector machines, ensemble learning and other widely used algorithms) in corrosion field is introduced. Then, the characteristics of different algorithms and their application scenarios in the corrosion prediction are summarized. Finally, the development trend of ML in material corrosion field is prospected.

Keywords: corrosion modelling; data mining; machine learning

1 Introduction

Material corrosion seriously affects the service life of metallic structure, so the prediction of material corrosion is always an attention hotspot of corrosion field. However, corrosion is a complicated physicochemical process (Luo et al. 2020), which is affected by surrounding environmental factors such as pH, water content, conductivity, salt content, electromagnetic interference, etc., as well as material composition, process and others (Angst et al. 2009; Hajibagheri et al. 2018; Jiang et al. 2015; Krivy et al. 2017; Liu et al. 2017). These factors are fuzzy and random, and interact with each other, leading to a complex time-varying nonlinear mapping between corrosion rate and the factors.

For many years, corrosion research mainly depends on the methods of experiment and simulation, and some significant results have been achieved. Materials are often affected by complex multi factors synergy during service, while the traditional corrosion research methods have low efficiency and high cost. Therefore, a novel method is urgently needed to promote the development of material corrosion research.

In recent years, with the progress of computer technology, machine learning (ML) has developed rapidly. The ML technique has unique advantages in dealing with time-varying, nonlinear and other complex problems, which brings new ideas for corrosion behavior research, corrosion modeling and corrosion rate prediction. In 2015, Professor Li et al. (2015) published “Share corrosion data” in the journal Nature, and expounded the importance of corrosion data sharing, and put forward the concept of “Material corrosion informatics”. The material corrosion research based on data drive is systematically expounded from the aspects of database, data processing, data analysis modeling and simulation, and data information application in practice. In recent years, the development of corrosion databases and the accumulation of corrosion data have also laid a good foundation for the application of machine learning in the corrosion field.

At present, some ML algorithms have been used by researchers to modeling, prediction and knowledge mining on various types of corrosion data. The ML algorithms have been widely used in corrosion field mainly include artificial neural network (ANN), support vector machine (SVR), ensemble learning represented by random forest (RF), and other algorithms such as Markov chain (MC), grey theory (GT), Bayesian network (BN) etc. which based on time series or probability. In this paper, the feature extraction method for corrosion data and the application of common ML algorithms in corrosion field is introduced firstly. Then, the characteristics and application scenarios of different algorithms in corrosion prediction are summarized. Finally, the development trend of ML in material corrosion field is prospected.

2 Feature extraction

Feature extraction, as a pre-processing step in machine learning, can reduce dimension, remove irrelevant and redundant data, and increase the efficiency and effect of machine learning. It is an essential process in large-scale machine learning. Zhang et al. (2015) collected an imbalanced dataset (203 healthy samples and 21 fault samples with corrosion pitting on the raceways and balls in rolling bearings) obtained from vibration signals in time domain. 13 useful features (mean, standard deviation, variance, skewness, kurtosis, peak value, peak-peak value, square amplitude, average amplitude, waveform index, peak index, pulse index, margin index, and label) of the original vibration signal were extracted in the time domain by some statistical methods, and then principal component analysis (PCA), independent component analysis (ICA) and linear discriminant analysis (LDA) are selected to dimensionality reduction. The results show that the proposed method combining PCA and support vector data description (SVDD) can improve the prediction accuracy of the fault diagnosis of roller bearing raceway and ball pitting. Hou et al. (2017) recorded the electrochemical noise signals generated by three types of corrosion: uniform, pitting and passivation. Then, by use of recurrence quantification analysis (RQA), 12 features were extracted from the electrochemical noise signals. Linear discriminant analysis (LDA) and random forests (RF) were used to identify the different corrosion types from those features. The results show that both models gave satisfactory performance. Furthermore, an estimation of the importance of the variables by use of the RF model suggested the RQA variables laminarity (LAM) and determinism (DET) played the most significant role with regard to identification of corrosion types.

Feature extraction is also an important step in the field of corrosion image recognition. The features are related to color, texture, shape, motion. Aijazi et al. (2016) have proposed a method to form a 3D point cloud based on different positions and viewing angles of several images. The R, G, B values obtained from each image are then converted into HSV zones. This separates out the illumination color component and the image intensity. These parameters help to detect corroded area of different shapes and sizes, within a selected zone. The two methods used for detecting corrosion are based on histogram based distribution and adaptive thresholds, respectively. Alkanhal (2014) has proposed discrete wavelet packet transform and fractals for extracting image feature parameters and analysis of pitting corrosion. Using these image processing techniques, he has analysed various characteristics as energy loss, Shannon entropy, fractal dimension and fractal intercept increase parameters with exposure time. Jahanshahi and Masri (2013) have evaluated image processing based approaches for corrosion detection for civil infrastructures like bridges, pillars, buildings etc. They evaluated the effects of several image parameters like block sizes, color channels and color space on the performance of texture analysis based on wavelet texture algorithms. They have observed that a combination of color analysis and texture approached should be used to get better results.

3 Introduction of ML algorithm

3.1 Artificial neural network

The information-processing units of ANN are artificial neurons similar to the neurons in the human brain (Haykin 2004). Figure 1 illustrates the architecture of an ANN model. ANN learns by experience, and can realize any complex nonlinear mapping. It has the advantages of large-scale information processing, distributed storage, self-organization and self-learning, good fault tolerance and generalization ability. Therefore, it has been widely used in the corrosion field. At present, the common used neural networks mainly include multi-layer feedforward neural network (MLFNN), convolutional neural network (CNN), recurrent neural network (RNN) etc., and MLFNN is the most widely used in the field of corrosion (the ANN mentioned in this article is MLFNN, unless otherwise specified).

Figure 1:

Typical artificial neural network architecture.

Kamrunnahar and Urquidi-Macdonald (2010) collected the composition, environment (including temperature and pH) and corrosion rate data of carbon steel and corrosion resistant alloy steel from publicly available and personal communication, then built an ANN model to learn the underlying laws that map the alloy’s composition and environment to the corrosion rate. Finally, based on the mapping relationship, the parameters (i.e. pH, temperature, time of exposure, electrolyte composition, metal composition, etc.) were categorized and prioritized. Based on 100 groups of experimental data, Zhang and Yang (2013) used ANN to establish an oil pipeline corrosion rate prediction model, in which pressure, temperature, flow rate, sulfur content and acid value were used as the input parameters of the model, and corrosion rate was used as the output parameter of the model. The single factor sensitivity analysis shows that acidity and sulfur content are the main factors affecting the corrosion, while pressure, temperature and flow rate are the secondary factors affecting the corrosion in the pipeline. Shi et al. (2011) used ANN to establish a phenomenological model correlating the influential factors (total chloride concentration, chloride binding, solution pH, and DO concentration) with the pitting risk of reinforcement in concrete based on the 62 groups of experimental data. Then, three-dimensional response surfaces were constructed to illustrate such predicted correlations and to shed light on the complex interactions between various influencing factors.

ANN with MLFNN structure has strong learning ability and more advantages in solving nonlinear problems. However, it has the disadvantages of complex model and low computational efficiency. Urda et al. (2013) presented a novel application of a constructive neural network C-MANTEC and pointed out that this algorithm has a relatively small and compact neural network architecture, which is easier to implement on industrial environments. Then, the pitting state of stainless steel was predicted with C-MANTEC in which environmental parameters was used as the input. Finally, the predicting results were compared with other artificial intelligence models (linear discriminant analysis, k-nearest neighbor, multilayer perceptron, support vector machines and naive Bayes), which clearly showed the superiority of C-MANTEC algorithm in pitting prediction.

With the support of big data and GPU acceleration training, the artificial intelligence technology with deep learning as its core is developing rapidly and has been widely used in many fields. Eliminating the need for dependence on prior knowledge and human effort in designing features is a major advantage of CNNs. The deep learning has also been applied in the corrosion field. Such as Nash et al. (2019) used CNN trained by image data groups of different sizes to automatically identify the corrosion status of bare steel. A large dataset of 250 images with segmentations labelled by undergraduates and a second dataset of just 10 images with segmentations labelled by subject matter experts were used to train the CNN model, respectively. The results show that a large, noisy dataset outperforms a small, expertly segmented dataset. Forkan et al. (2022) proposed an ensemble AI framework underpinned by CNN models for corrosion detection in civil engineering settings, and then demonstrated that the approach achieved an overall success rate in excess of 92%. Yao et al. (2019) proposed a new method for hull structural plate corrosion damage detection and recognition based on artificial intelligence using CNNCNN model is trained through a large number of classified corrosion damage images to obtain a classifier model. Then the classifier model is used with overlap-scanning sliding window algorithm to recognize and position the location of corrosion damage. Finally, the damage detection pattern for hull structural plate corrosion damage as well as other types of superficial structural damage using CNN is proposed. Soares et al. (2021) proposed the construction of a synthetic dataset with underwater image effects, simulating turbidity, through a function of degradation and alteration of the gamma variable of the images. Then, a CNN network was built to identify levels of corrosion in underwater images. The results were analyzed using the confusion matrix which showed 92% accuracy for the synthetic underwater images of the test dataset.

CNNs have been applied to detect corrosion in images. Unfortunately, the corrosion is detected in bounding boxes, without precisely segmenting the corrosion elements in irregular boundary shapes, and thus it is difficult to assess them quantitatively. Rahman et al. (2021) presented a semantic segmentation deep learning approach together with an efficient image labelling tool for rapidly preparing large training data sets, and effectively detecting, segmenting, and evaluating corrosions in the images. The image labeling tool was developed by implementing a texture-based unsupervised image segmentation method, integrated with red-green-blue (RGB) feature-based classifier optimization. A CNN model with semantic segmentation feature then is trained for corrosion detection and segmentation. Finally, a corrosion evaluation method is proposed for classifying each pixel of a corrosion segment into user-prescribed categories such as heavy corrosion, medium corrosion, and light corrosion. The integrated approach was tested on images collected by professional inspection engineers. The results indicated that the proposed approach is practically applicable for corrosion assessment for a wide range of industrial facilities and civil infrastructures.

Most existing research focuses on manually extracted features or employing complex CNNs for powerful deep feature learning on images of metal sheets. But few of them pay attention to learning discriminative deep features for metallic corrosion detection. Zhang et al. (2021) proposed a channel attention based metallic corrosion detection method (CAMCD), by which the corroded regions with multiple distinct levels can be automatically detected patch-wisely. Correspondingly, to learn the patch-wise features and discriminate them among various corrosion levels, a CAMCD network is built by embedding SE blocks into the deep residual network; thus, the important features of various corroded regions are highlighted by weighting with the learned weights of channel attentions. Experimental results on collected metallic corrosion dataset validate the superiority of CAMCD method over other existing approaches on corroded region detection.

ANN has been applied in corrosion field by some researchers. While there are still some disadvantages: it is poor in explainability, sensitive to the initial network weight, and requires high processing time; the number of its hidden layers and nodes is difficult to determine; its generalization ability on small sample data sets is relatively lower; it’s easier to encounter local minimization.

3.2 Support vector machine

The schematic diagram of support vector machine is shown in Figure 2. SVM is developed based on the principle of structural risk minimization, and the algorithm can be transformed into convex optimization problem, which can ensure the global optimality and good generalization ability of prediction model. SVM is based on statistical theory, and it has strict theoretical and mathematical basis. Therefore, it is unlike artificial neural network in which the design of structure depends on the empirical knowledge and prior knowledge of the designer. The SVM is suitable for small sample data, and the optimal solution is based on limited sample information, so it has great advantages in processing small sample experimental data.

Figure 2:

Schematic diagram of a support vector machine.

Zhan et al. (2015) proposed an SVM prediction model for sulfuric acid corrosion of concrete on 30 groups of experimental date. Five main factors influencing the reaction rate, i.e. water-cement ratio, pH value of the soaking solution, cement and silica fume quantity and mixture fluidity, were analyzed with this model. The root mean square error (RMSE) and mean absolute percentage error (MAPE) of predicted result of the model are 0.188 and 0.096. Jia (2012) surveyed 70 km gas pipeline, including measurements of soil corrosivity, soil resistivity, stray electrical currents, IR drop, depth of soil cover, coating integrity, interference potential and the effectiveness of cathodic protection. Then, a non-linear SVM model was built to predict the most likely locations of external corrosion along other pipelines. The model was applied to another (50 km) pipeline and the results shown that the predicted serious and medium degradation levels are consistent with the dimensions of the observed defects. Based on 309 groups of internal environmental parameters of suspension bridge cables. Karanci and Betti (2018) used SVR with nested cross validation approach for feature selection and determined that the temperature, relative humidity, pH, and Cl⁻ concentration were the most relevant variables for predicting corrosion rate. In this process, four kernel functions (linear, RBF, sigmoid, polynomial) were selected as comparison, and the RBF was finally determined as the optimal kernel function. Then, cyclic corrosion tests were performed by subjecting bridge wire samples to various levels of the selected environmental variables to expand the database available in the literature. Finally, an SVR model that predicts the annual corrosion rate of bridge wires as a function of the selected environmental variables was developed.

SVM can realize the nonlinear modeling process of small sample date (with small sample size, and usually less than 30), but the prediction performance is sensitive to the selection of kernel function, and it is sensitive to noise. In addition, the solution of support vector is based on quadratic programming, which involves the calculation of m-order matrix. Therefore, SVM requires high training time with large samples data (with large sample size, greater than 30 or more).

3.3 Ensemble learning

Ensemble learning is to complete tasks by constructing multiple learners, and usually has better predictive performance than single learners. Nevertheless, it has disadvantages such as complicated model training process and lower learning efficiency. According to the generation mode of the learner, ensemble learning can be divided into two categories: Boosting and Bagging. At present, ensemble learning is also widely used in corrosion field.

For example, to study the internal corrosion of long distance hydrocarbon pipelines De Masi et al. (2015) generated an ensemble of ANN, and applied the ensemble averaging technique to produce a final output. The input parameters of the model are geometrical pipeline features, fluid dynamic multiphase variables and deterministic models, and the output parameters are volume of loss, area of loss, number of defects and corrosion rate. This method has been proved to improve the prediction performance of a single model. Chou et al. (2017) used ensemble and hybrid models to predict pitting corrosion risk of steel reinforced concrete and marine corrosion rate of carbon steel. The ensemble models were constructed from four well-known ML algorithms including ANN, SVR/SVMs, classification and regression tree (CART), and linear regression (LR). The hybrid model integrated a smart nature-inspired metaheuristic optimization algorithm (i.e., smart firefly algorithm) and least squares SVR. Efficacy was compared using two real-world data groups from published literature which contains 62 and 46 samples respectively. According to the comparison results, the hybrid metaheuristic regression model was better than the single and ensemble models in predicting the pitting corrosion risk and the marine corrosion rate. The hybrid metaheuristic regression model is a promising and practical methodology for real-time tracking of corrosion in steel rebar. Zounemat-Kermani et al. (2020) evaluated ensemble learning (bagging, boosting, and modified bagging) potential in predicting microbially induced concrete corrosion in sewer systems based on 433 groups of data. Time, gas temperature, gas-phase H₂S concentration, relative humidity, pH and exposure phase were considered as the models’ inputs. The results indicated that the prediction ability of the random forests model is superior to the other ensemble learners, followed by the ensemble Bag-CHAID method. On average, the ensemble tree-based models acted better than the ensemble network-based models; It was also found that taking the advantages of ensemble learning would enhance the general performance of individual DM models by more than 10%.

The RF is an ensemble algorithm generated by Bagging resampling technique and random feature subset based on decision tree learner. Figure 3 illustrates the architecture of an RF model. The model has more layers and is more suitable for processing data with steep manifold structure. RF algorithm also has the advantages of simple implementation, anti-over-fitting, parallel operation, convergence consistency and not sensitive to noise (Breiman 2001). Therefore, RF has been widely applied in the material corrosion fields such as rebar corrosion, atmospheric corrosion of carbon steel, pitting corrosion of stainless steel, etc (Chun et al. 2020; Hou et al. 2017; Zhi et al. 2019). In addition, RF can be used to calculate the feature importance and extract the key variables, providing analytical support for researchers to analyze the mechanism reflected by data (Diaz-Uriarte and De Andres 2006; Genuer et al. 2010).

Figure 3:

Schematic diagram of random forest modeling.

Chun et al. (2020) used RF method to evaluate the extent of internal damage due to rebar corrosion. The air permeability coefficient, electrical resistivity, ultrasonic velocity, and compressive strength were used as the inputs and the theoretical corrosion amount from experiment was used as the output. 391 groups of data were used to construct a corrosion prediction models. The results show that this method can detect internal damage invisible to the external surface and has been applied to bridge detection. Zhi et al. (2019) collected corrosion data of 17 kinds of low alloy steels in six atmospheric corrosion test stations during 17 years, and formed a corrosion data set which containing 409 groups of data. Then an RF model was established based on the data set. The results show that the effect of environmental factors on the corrosion rate of low alloy steel is greater than that of chemical composition, but the effect of environmental factors will gradually weaken with the increase of exposure time. Finally, the RF model was compared with ANNs, SVR and LR. The results show that the RF model has the best prediction performance. Based on the marine atmospheric corrosion data collected from NIMS database. Morizet et al. (2016) combined the RF algorithm with wavelet analysis to analyze AE signals generated in corrosion experiments and proposed a classification algorithm to separate local corrosion signals. The results show that this method is very effective and has excellent performance in reliability, performance and speed.

Some researchers also used the feature importance from the RF model to study the key parameters affecting the corrosion process. Yan et al. (2020) analyzed the feature importance of nine factors (including alloying elements content, maximum air temperature, minimum air temperature, minimum relative humidity, precipitation, solar radiation, chloride deposition rate, SO₂ deposition rate and exposure period) to corrosion rate using RF algorithm. It is pointed out that relative humidity is the most important environmental factor of Marine atmospheric corrosion. Meanwhile, a corrosion rate prediction model based on RF algorithm was established. The results show that the model has high prediction accuracy for multiple steel samples in different environments. Hou et al. (2017) used recursive quantitative analysis to extract 12 features from electrochemical noise signals generated by uniform corrosion, pitting corrosion and passivation. Then, RF and linear discriminant analysis (LDA) were used to identify the different corrosion types from those features. The RF model showed better prediction accuracy of 93% than the LDA model (88%). Furthermore, the feature important of the variables was estimated by use of the RF model.

To further improve the performances of RF, Zhi et al. (2020) proposed a new deep structure model called densely connected cascade forest-weighted K nearest neighbors (DCCF-WKNNs), to implement the corrosion data modelling and corrosion knowledge-mining. 409 outdoor atmospheric corrosion samples of low-alloy steels as experiment datasets were collected to verify the performance of the proposed method. The results show that the DCCF-WKNNs method can obtain the best prediction results compared with commonly machine-learning algorithms such as ANNs, SVR, RF, and cascade forests.

The RF model has more layers and is more suitable for processing data with steep manifold structure. RF algorithm also has the advantages of simple implementation, anti-over-fitting, parallel operation, convergence consistency and not sensitive to noise (Breiman 2001). It is suitable for corrosion prediction with small sample date and many influencing factors. Such as marine atmospheric corrosion, corrosion damage of steel bars in concrete, corrosion prediction combined with electrochemical signals, etc. However, RF also has some disadvantages, such as it has poor extrapolation performance, requires high processing time, the classification performance for low dimensional data is not satisfactory.

3.4 Other algorithms

Bayesian network (BN), Markov chain (MC), grey theory (GT) and other methods have also been used in the corrosion field. The schematic diagram of BN is shown in Figure 4. BN is a mathematical model based on probabilistic inference, so it also has good interpretability. It is a process of obtaining probabilistic information about some variables from other variables based on physical models, expert opinions and data in a framework. It can still build an accurate model when a certain data variable is missing. BN is proposed to solve the problems of uncertainty and incompleteness, so it has great advantages in solving the faults caused by uncertainty and relevance of complex systems. For example, to solve the problem of missing partial data and missing all data in atmospheric corrosion, Zhi et al. (2015) utilized BN to establish the relationship between environmental factors and corrosive factors. Simulation results showed that the reasoning accuracy rate reached more than 80% to deal with missing partial data. The method provided a new direction in processing missing data, especially in corrosive data analysis. Michael Smith et al. (2019) utilized BN to make predictions for pipelines internal corrosion without ILI data, based on a knowledge of their operational conditions alone. While ‘data’ is captured through historical ILIs for piggable pipelines. With a case study utilizing real pipeline data, it is demonstrated that BN can make more intelligent and less conservative predictions of internal corrosion behavior. This in turn can lead to improved pipeline integrity management decisions and more cost-efficient maintenance regimes. The disadvantages of BN model are as follows: modeling requires large sample data; It has a local minimum; It requires higher processing time and more specialized knowledge of the user.

Figure 4:

Schematic diagram of a Bayesian network.

The schematic diagram of MC is shown in Figure 5. MC is a stochastic method to realize the target state transition. The state transition matrix is constructed by counting the transition probability between states. When the initial corrosion state of the material is known, the future state of corrosion can be predicted, so as to achieve the purpose of corrosion prediction and life assessment. At present, MC method has achieved good prediction performance in terms of pitting depth, pitting quantity and local corrosion. For example, Contreras et al. (2007) established an MC based on professional knowledge and laboratory experiments to predict the transition among the three states of passivation, metastable state and local corrosion on corrosion-resistant alloys surface. The results show that the simulation distribution conforms to the experimental distribution. Based on the experimental data of external corrosion of buried pipelines collected by online monitoring, Valor et al. (2013) developed an MC corrosion model to predict the pit depth distribution in the future time, and then compared with that of single-value corrosion rate model suggested by NACE standard and the linear growth model. The result show that the MC model produces the best performance. Possan and Andrade (2014) proposed the application of MC to analyze and predict the life and reliability of concrete under the influence of chloride ions. Monte Carlo method was used to classify the collected data and then the classified data was input into the MC model for modeling. The results show that the average error is about 14%. MC modeling needs more sequence points, while many sequence data in natural environments do not meet this requirement. Moreover, MC needs to divide the corrosion state in the modeling process, so it requires more professional corrosion knowledge.

Figure 5:

Schematic diagram of the Markov model.

GT provides a new way to solve system problems in the case of poor information. It regards all random processes as grey processes that change in a certain range and are related to time, and uses the method of data generation to sort out the disordered original data into a generating sequence with strong regularity. The GM (1,1) model is widely used in the corrosion prediction field. For example, in the scarcity of available data, Ma and Wang (2007) used GT to predict the change of corrosion rate and pitting depth of stainless steel alloy in seawater with time. Based on 16 sets of laboratory data, Meng et al. (2017) established the wet attachment model of coating in Marine environment by using GM (1,1) algorithm, and proposed a lifetime prediction formula of coating based on these model, so as to predict the theoretical life of coating in different environments. Zhao et al. (2016) used the GM (1,1) to predict the corrosion rate of oil storage tanks, and established the time change models of corrosion thickness, corrosion rate, maximum thickness and maximum corrosion rate respectively. The paper points out that the application in practice shows that GM (1,1) algorithm is significantly better than other algorithms for corrosion of oil storage equipment. GT model is good at processing the small samples date with single time variable. While, the disadvantages include: prior knowledge is needed in the process of modeling; The prediction performance of noisy data is poor. Therefore, GT needs to be further optimized and promoted.

3.5 Algorithm optimization

Parameter optimization is an important step for machine learning. Some researchers have used the grid search to determine the optimal hyperparameters of algorithm. For example, to solve the problem that traditional data-driven methods cannot acquire high accuracy with few monitoring data in nuclear power component degradation prediction, Yang et al. (2017) provided a hyperparameter optimization strategy for SVR using grid search and cross validation. The results show that the prediction performance of proposed SVR algorithm is better than BP neutral network.

In addition, researchers also used metaheuristic algorithm such as genetic algorithm (GA), particle swarm optimization (PSO) and simulated annealing algorithm (CSA), to determine the optimal hyperparameter of SVM. For example, Wen et al. (2009) proposed to combine PSO with SVR method to optimize hyperparameter. The corrosion rate prediction models of 3C steel were established based on five different seawater environmental factors including temperature, dissolved oxygen, salinity, pH value and redox potential. The results show that the generalization ability of SVR surpasses that of BPNN by applying identical training and test samples (41 samples acted as the training samples, five were selected as the test samples). Based on 40 groups of internal corrosion data of oil and gas pipelines collected from published literature, Ma et al. (2019) used PSO, GA, CV, LS and FOA algorithms to optimize the hyperparameter of SVM model and then established a prediction model of internal corrosion rate of oil and gas pipelines. The results show that PSO-SVM model has the best prediction performance. In addition, the author also points out that the training process of PSO-SVM model takes a long time.

In recent years, researchers also proposed some new optimization algorithms. Snoek et al. (2012) proposed to apply Bayesian optimization to the parameter optimization of machine learning. Given the objective function, the posterior distribution of the objective function can be updated by constantly adding sample points. Bayesian optimization has the following characteristics: Bayesian optimization adopts Gaussian process, considers the previous parameter information, and constantly updates the prior. The number of iteration of Bayes parameter adjustment is small and the speed is fast. Bayesian optimization provides an elegant framework for finding a global minimum in as few steps as possible. In the Internet domain, Google Brain has proposed a new parameter optimizer VeLO, which is based on the idea of meta-learning, that is, learning experiences from related tasks to help learn target tasks. Built entirely on AI, the VeLO is well suited to a variety of different tasks.

4 Comparison and analysis of ML algorithms

As above, the application of common ML algorithms in corrosion research has been introduced. Table 1 shows the advantages, disadvantages and applicable scenarios of each algorithm model.

Table 1:

Comparison of different models.

Algorithms	Advantage	Disadvantage	Application scenarios
ANN	It has self-organizing ability, self-learning ability, good fault-tolerance and generalization ability. Any complex nonlinear mapping can be realized here.	It is poor in explainability, sensitive to the initial network weight, and requires high processing time. The number of its hidden layers and nodes is difficult to determine. Its generalization ability on small sample data sets is relatively lower. It’s easier to encounter local minimization.	It is suitable for atmospheric corrosion image recognition and pipeline internal corrosion prediction which are easy to obtain a large sample dataset. It is also used to predict the corrosion rate and pitting risk of steel on laboratory data, but it is easy to encounter local minimization when the sample data is small.
SVM	It has strict theoretical and mathematical basis. It can ensure the global optimality. It has good prediction performance for small sample data.	It’s sensitive to the selection of kernel function and noise data. It requires high training time on large samples data. It is a shallow model and has poor performance in predicting steep epidemic structures.	It is suitable for small sample experimental data with lower noise from laboratories, such as sulfuric acid corrosion of steel bars, internal, external corrosion of buried pipelines and seawater corrosion of carbon steel.
RF	It has the characteristics of multiple layers, simple implementation, anti-overfitting, parallel operation, convergence consistency, noise insensitivity and so on. It can also calculate the feature importance.	It is poor in explainability, and requires high processing time. Its classification performance for low dimensional data is not satisfactory.	It is more suitable for processing data with steep manifold structure. The feature importance can be used to determine the priority ranking of corrosion variables. It is suitable for corrosion prediction on small sample date, influencing by many complex factors. Such as marine atmospheric corrosion, corrosion damage of steel bars in concrete and corrosion prediction combined with electrochemical signals.
BN	It is a modeling approach based on causal relationships among data, providing good explanatory power and capable of handling incomplete datasets.	Its requires large sample data, high processing time and more professional knowledge for modeling. It’s easier to encounter local minimization.	It is suitable for industrial corrosion data with large sample data and many missing variables, such as atmospheric corrosion or pipeline internal corrosion which missing partial variables in the data set.
MC	When the initial corrosion state of the material is known, the future state of corrosion can be predicted.	Its modeling process is complex. It requires more time sequence points and more professional knowledge.	It is suitable for dealing with time-dependent corrosion problems such as pitting process prediction on laboratory data, local corrosion prediction and life evaluation of buried pipes and reinforced concrete.
GT	It is a way to solve system problems in the case of poor information. It has good explainability and good prediction performance on small sample data.	It requires prior knowledge, and is sensitive to noise data.	It is suitable for low noise, poor information and small samples date with single time variable, such as the change of corrosion rate and pitting depth with time, and coating lifetime prediction in marine environments.

ANN, SVR, RF, BN are modeling methods with multivariable input. As to the materials corrosion prediction, these models are suitable for corrosion prediction under the interaction of various environmental parameters and material composition etc. Generally, for a large sample data set, ANN is more suitable for corrosion modeling, such as image recognition of atmospheric corrosion and internal corrosion detection data of buried pipelines which are easier to obtain a large sample data set. If the data of variables in the dataset is partially or completely missing, BN can be used to model according to the causal relationship between the data. For a small sample data with fewer variables, SVM is more suitable for corrosion modeling, such as corrosion experimental data with small sample data. RF is more suitable for small sample data with many variables and steep epidemic structure. In addition, RF can obtain the feature importance, which can be used to determine the priority ranking of variables to corrosion.

MC and GT are modeling method with time as a single variable, which is suitable for predicting corrosion changes with time. MC requires more data sequences and long time to model. GT modeling requires less data and contains corrosion prior knowledge, so it is more suitable for small-sample corrosion sequence data sets compared with MC.

5 Conclusions and prospects

In recent years, the application of machine learning in material corrosion field has made some achievements. However, further improvement is needed in data accumulation and sharing, interpretability of combining algorithm with corrosion expertise, and engineering application of machine learning innovation results:

Data is the foundation of machine learning. The accumulation and sharing of data will greatly promote the application of machine learning in the corrosion field. Therefore, the construction and development of corrosion database will become the key field of material development in the future.
Corrosion is a complicated physicochemical process. When we build a machine learning model on a given corrosion data, we often have no idea how it works. Improving the interpretability of machine learning can help us reveal the corrosion laws and mechanisms behind the complex corrosion factors.
At present, some achievements of machine learning in the corrosion field are still lack of practical application. How to use the existing achievements of machine learning to provide guidance for the safety protection of industrial production and infrastructure will also be an important research direction in the future.
We should improve the existing ML algorithm or develop some new algorithms according to the characteristics of corrosion data and corrosion professional knowledge to adapt to the development of corrosion science.

Corresponding author: Yanxia Du, Corrosion and Protection Center, Institute of Advanced Materials and Technology, University of Science and Technology Beijing, no. 30 Xueyuan Rd, Haidian District, Beijing100083, China, E-mail: duyanxia@ustb.edu.cn

Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: None declared.
Conflicts of interest: The authors declare no conflicts of interest regarding this article.

References

Aijazi, A.K., Malaterre, L., Tazir, M.L., Trassoudaine, L., and Checchin, P. (2016). Detecting and analysing corrosion spots on the hull of large marine vessels using colored 3D lidar point clouds. ISPRS Ann. Photogrammetry Remote Sens. Spat. Inf. Sci. 3: 3.10.5194/isprsannals-III-3-153-2016Suche in Google Scholar

Alkanhal, T.A. (2014). Image processing techniques applied for pitting corrosion analysis. Entropy. Int. 5: 2.Suche in Google Scholar

Angst, U., Elsener, B., Larsen, C.K., and Vennesland, Ø. (2009). Critical chloride content in reinforced concrete-a review. Cement. Concrete. Res. 39: 1122–1138. https://doi.org/10.1016/j.cemconres.2009.08.006.Suche in Google Scholar

Breiman, L. (2001). Random forests. Mach. Learn. 45: 5–32. https://doi.org/10.1023/a:1010933404324.10.1023/A:1010933404324Suche in Google Scholar

Chou, J.S., Ngo, N.T., and Chong, W.K. (2017). The use of artificial intelligence combiners for modeling steel pitting risk and corrosion rate. Eng. Appl. Artificial. 65: 471–483. https://doi.org/10.1016/j.engappai.2016.09.008.Suche in Google Scholar

Chun, P., Ujike, I., and Mishima, K. (2020). Random forest-based evaluation technique for internal damage in reinforced concrete featuring multiple nondestructive testing results. Constr. Build. Mater. 253: 119238. https://doi.org/10.1016/j.conbuildmat.2020.119238.Suche in Google Scholar

Contreras, G., Fassina, P., Fumagalli, G., Goidanich, S., Lazzari, L., and Mazzola, E. (2007). A study on metastability phenomena of passive films for corrosion resistant alloys. Electrochim. Acta 52: 7577–7584. https://doi.org/10.1016/j.electacta.2006.12.037.Suche in Google Scholar

Diaz-Uriarte, R. and De Andres, S. (2006). Gene selection and classification of microarray data using random forest. BMC Bioinf. 7: 3. https://doi.org/10.1186/1471-2105-7-3.Suche in Google Scholar PubMed PubMed Central

Forkan, A.R.M., Kang, Y.B., Jayaraman, P.P., Liao, K., Kaul, R., Morgan, G., and Sinha, S. (2022). CorrDetector: a framework for structural corrosion detection from drone images using ensemble deep learning. Expert Syst. Appl. 193: 116461. https://doi.org/10.1016/j.eswa.2021.116461.Suche in Google Scholar

Genuer, R., Poggi, J., and Tuleau-Malot, C. (2010). Variable selection using random forests. Pattern Recogn. Lett. 31: 2225–2236. https://doi.org/10.1016/j.patrec.2010.03.014.Suche in Google Scholar

Hajibagheri, H.R., Heidari, A., and Amini, R. (2018). An experimental investigation of the nature of longitudinal cracks in oil and gas transmission pipelines. J. Alloy. Compd. 741: 1121–1129. https://doi.org/10.1016/j.jallcom.2017.12.311.Suche in Google Scholar

Haykin, S. (2004). Neural networks. A comprehensive foundation. Neural Network. 2: 41.Suche in Google Scholar

Hou, Y., Aldrich, C., Lepkova, K., Machuca, L.L., and Kinsella, B. (2017). Analysis of electrochemical noise data by use of recurrence quantification analysis and machine learning methods. Electrochim. Acta 256: 337–347. https://doi.org/10.1016/j.electacta.2017.09.169.Suche in Google Scholar

Jahanshahi, M.R. and Masri, S.F. (2013). Effect of color space, color channels, and sub-image block size on the performance of wavelet-based texture analysis algorithms: an application to corrosion detection on steel structures. J. Comput. Civil Eng. 2013: 685–692, https://doi.org/10.1061/9780784413029.086.Suche in Google Scholar

Jia, S. (2012). The Application of a non-linear model for evaluating the external corrosion protection of a gas pipeline. In: ICPTT 2012: better pipeline infrastructure for a better life. ASCE, Wuhan, China, pp. 141–145.10.1061/9780784412619.016Suche in Google Scholar

Jiang, G., Sun, J., Sharma, K.R., and Yuan, Z. (2015). Corrosion and odor management in sewer systems. Curr. Opin. Biotechnol. 33: 192–197. https://doi.org/10.1016/j.copbio.2015.03.007.Suche in Google Scholar PubMed

Kamrunnahar, M. and Urquidi-Macdonald, M. (2010). Prediction of corrosion behavior using neural network as a data mining tool. Corros. Sci. 52: 669–677. https://doi.org/10.1016/j.corsci.2009.10.024.Suche in Google Scholar

Karanci, E. and Betti, R. (2018). Modeling corrosion in suspension bridge main cables. I: annual corrosion rate. J. Bridge Eng. 23: 6. https://doi.org/10.1061/(asce)be.1943-5592.0001233.Suche in Google Scholar

Krivy, V., Kubzova, M., Kreislova, K., and Urban, V. (2017). Characterization of corrosion products on weathering steel bridges influenced by chloride eposition. Metals 7: 336. https://doi.org/10.3390/met7090336.Suche in Google Scholar

Li, X., Zhang, D., Liu, Z., Li, Z., Du, C., and Dong, C. (2015). Materials science: share corrosion data. Nature 527: 441–442. https://doi.org/10.1038/527441a.Suche in Google Scholar PubMed

Liu, Y., Song, Y., Keller, J., Bond, P., and Jiang, G. (2017). Prediction of concrete corrosion in sewers with hybrid Gaussian processes regression model. RSC Adv. 7: 30894–30903. https://doi.org/10.1039/c7ra03959j.Suche in Google Scholar

Luo, Q., Guo, Y., Liu, B., Feng, Y., Zhang, J., Li, Q., and Chou, K. (2020). Thermodynamics and kinetics of phase transformation in rare earth–magnesium alloys: a critical review. J. Mater. Sci. Technol. 44: 171–190. https://doi.org/10.1016/j.jmst.2020.01.022.Suche in Google Scholar

Ma, F. and Wang, W. (2007). Prediction of pitting corrosion behavior for stainless SUS 630 based on grey system theory. Mater. Lett. 61: 998–1001. https://doi.org/10.1016/j.matlet.2006.06.053.Suche in Google Scholar

Ma, G., Li, J., and Bai, R. (2019). Prediction of corrosion rate in oil and gas pipelines based on PSO-SVM model. Surf. Coat. Technol. 48: 43–48.10.1016/j.matlet.2006.06.053Suche in Google Scholar

De Masi, G., Gentile, M., Vichi, R., Bruschi, R., and Gabetta, G. (2015). Machine learning approach to corrosion assessment in subsea pipelines. In: OCEANS 2015-Genova. IEEE, Genova, Italy, pp. 1–6.10.1109/OCEANS-Genova.2015.7271592Suche in Google Scholar

Meng, F., Liu, Y., Liu, L., Li, Y., and Wang, F. (2017). Studies on mathematical models of wet adhesion and lifetime prediction of organic coating/steel by grey system theory. Materials 10: 715–729. https://doi.org/10.3390/ma10070715.Suche in Google Scholar PubMed PubMed Central

Morizet, N., Godin, N., Tang, J., Maillet, E., Fregonese, M., and Normand, B. (2016). Classification of acoustic emission signals using wavelets and random forests: application to localized corrosion. Mech. Syst. Signal Process. 70–71: 1026–1037. https://doi.org/10.1016/j.ymssp.2015.09.025.Suche in Google Scholar

Nash, W., Drummond, T., and Birbilis, N. (2019). Deep learning AI for corrosion detection. March 2019: Corrosion 2019. OnePetro, Nashville, USA.10.1016/j.ymssp.2015.09.025Suche in Google Scholar

Possan, E. and Andrade, J. (2014). Markov chains and reliability analysis for reinforced concrete structure service life. Mater. Res. 17: 593–602. https://doi.org/10.1590/s1516-14392014005000074.Suche in Google Scholar

Rahman, A., Wu, Z.Y., and Kalfarisi, R. (2021). Semantic deep learning integrated with RGB feature-based rule optimization for facility surface corrosion detection and evaluation. J. Comput. Civil. Eng. 35: 04021018. https://doi.org/10.1061/(asce)cp.1943-5487.0000982.Suche in Google Scholar

Shi, X., Nguyen, T.A., Kumar, P., and Liu, Y. (2011). A phenomenological model for the chloride threshold of pitting corrosion of steel in simulated concrete pore solutions. Anti-Corros. Methods Mater. 58: 179–189. https://doi.org/10.1108/00035591111148894.Suche in Google Scholar

Smith, M., Barton, L., Pesinis, K., and Laing, I. (2019). Intelligent corrosion prediction using Bayesian networks. March 2019: Corrosion 2019. One Petro, Nashville, USA.10.1108/00035591111148894Suche in Google Scholar

Snoek, J., Larochelle, H., and Adams, R.P. (2012). Practical Bayesian optimization of machine learning algorithms. Adv. Neural Inf. Process. Syst. 25: 1–9.Suche in Google Scholar

Soares, L., Botelho, S., Nagel, R., and Drews, P.L. (2021). A visual inspection proposal to identify corrosion levels in marine vessels using a deep neural network. In: 2021 Brazilian Symposium on Robotics (SBR), pp. 222–227.10.1109/LARS/SBR/WRE54079.2021.9605400Suche in Google Scholar

Urda, D., Luque, R.M., Jiménez, M.J., Turias, I., Franco, L., and Jerez, J.M. (2013). A constructive neural network to predict pitting corrosion status of stainless steel. In: International Work-Conference on Artificial Neural Networks 2013. Berlin, Heidelberg: Springer, pp. 88–95.10.1109/LARS/SBR/WRE54079.2021.9605400Suche in Google Scholar

Valor, A., Caleyo, F., Hallen, J.M., and Velázquez, J.C. (2013). Reliability assessment of buried pipelines based on different corrosion rate models. Corros. Sci. 66: 78–87. https://doi.org/10.1016/j.corsci.2012.09.005.Suche in Google Scholar

Wen, Y.F., Cai, C.Z., Liu, X.H., Pei, J.F., Zhu, X.J., and Xiao, T.T. (2009). Corrosion rate prediction of 3C steel under different seawater environment by using support vector regression. Corros. Sci. 51: 349–355. https://doi.org/10.1016/j.corsci.2008.10.038.Suche in Google Scholar

Yan, L., Diao, Y., Lang, Z., and Gao, K. (2020). Corrosion rate prediction and influencing factors evaluation of low-alloy steels in marine atmosphere using machine learning approach. Sci. Technol. Adv. Mater. 21: 359–370. https://doi.org/10.1080/14686996.2020.1746196.Suche in Google Scholar PubMed PubMed Central

Yao, Y., Yang, Y., Wang, Y., and Zhao, X. (2019). Artificial intelligence-based hull structural plate corrosion damage detection and recognition using convolutional neural network. Appl. Ocean Res. 90: 101823. https://doi.org/10.1016/j.apor.2019.05.008.Suche in Google Scholar

Yang, C., Liu, J., Zeng, Y., and Xie, G. (2017). Prediction of components degradation using support vector regression with optimized parameters. Energy Proc. 127: 284–290. https://doi.org/10.1016/j.egypro.2017.08.109.Suche in Google Scholar

Zhan, Y., Song, Z., and Wang, H. (2015). Prediction of the silica fume concrete corrosion in sulfuric acid by SVM-based method. In: 5th International Conference on Civil Engineering and Transportation. Atlantis Press, Atlantis, pp. 766–770.10.1016/j.egypro.2017.08.109Suche in Google Scholar

Zhang, S., Deng, X., Lu, Y., Hong, S., Kong, Z., Peng, Y., and Luo, Y. (2021). A channel attention based deep neural network for automatic metallic corrosion detection. J. Build. Eng. 42: 103046. https://doi.org/10.1016/j.jobe.2021.103046.Suche in Google Scholar

Zhang, Y. and Yang, J. (2013). Pipeline corrosion rate prediction based on BP neural network. Total Corros. Control 9: 67–71.10.2991/iccet-15.2015.143Suche in Google Scholar

Zhang, Y., Zhou, X., Shi, H., Zheng, Z., and Li, S. (2015). Corrosion pitting damage detection of rolling bearings using data mining techniques. Int. J. Model. Identif. 24: 235–243. https://doi.org/10.1504/ijmic.2015.072614.Suche in Google Scholar

Zhao, X., Zhou, Y., Zhao, J., Zhan, G., and Yang, P. (2016). Safety prediction of soleplate corrosion state in petroleum storage tank based on grey theory model. Chem. Eng. Trans. 51: 271–276.Suche in Google Scholar

Zhi, Y., Fu, D., Li, Z., and Qing, X. (2015). Reasoning of atmospheric corrosion level under missing data based on CMAC and Bayesian network. In: 34th Chinese Control Conference. IEEE, Hangzhou, China, pp. 3447–3451.10.1504/IJMIC.2015.072614Suche in Google Scholar

Zhi, Y., Fu, D., Zhang, D., Yang, T., and Li, X. (2019). Prediction and knowledge mining of outdoor atmospheric corrosion rates of low alloy steels based on the random forests approach. Metals 9: 383.https://doi.org/10.3390/met9030383Suche in Google Scholar

Zhi, Y., Yang, T., and Fu, D. (2020). An improved deep forest model for forecast the outdoor atmospheric corrosion rate of low-alloy steels. J. Mater. Sci. Technol. 49: 202–210. https://doi.org/10.1016/j.jmst.2020.01.044.Suche in Google Scholar

Zounemat-Kermani, M., Stephan, D., and Barjenbruch, M. (2020). Ensemble data mining modeling in corrosion of concrete sewer: a comparative study of network-based (MLPNN and RBFNN) and tree-based (RF, CHAID, and CART) models. Adv. Eng. Inform. 43: 101030. https://doi.org/10.1016/j.aei.2019.101030.Suche in Google Scholar

Received: 2022-09-20

Accepted: 2023-02-23

Published Online: 2023-04-17

Published in Print: 2023-08-28

Artikel in diesem Heft

https://doi.org/10.1515/corrrev-2022-0089

Schlagwörter für diesen Artikel

corrosion modelling; data mining; machine learning