Handling imbalanced data in supervised machine learning for lithological mapping using remote sensing and airborne geophysical data

Hary Nugroho; Ketut Wikantika; Satria Bijaksana; Asep Saepuloh

doi:10.1515/geo-2022-0487

Article Open Access

Handling imbalanced data in supervised machine learning for lithological mapping using remote sensing and airborne geophysical data

Hary Nugroho , Ketut Wikantika , Satria Bijaksana and Asep Saepuloh

Published/Copyright: August 4, 2023

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Open Geosciences Volume 15 Issue 1

Abstract

With balanced training sample (TS) data, learning algorithms offer good results in lithology classification. Meanwhile, unprecedented lithological mapping in remote places is predicted to be difficult, resulting in limited and unbalanced samples. To address this issue, we can use a variety of techniques, including ensemble learning (such as random forest [RF]), over/undersampling, class weight tuning, and hybrid approaches. This work investigates and analyses many strategies for dealing with imbalanced data in lithological classification based on RF algorithms with limited drill log samples using remote sensing and airborne geophysical data. The research was carried out at Komopa, Paniai District, Papua Province, Indonesia. The class weight tuning, oversampling, and balance class weight procedures were used, with TSs ranging from 25 to 500. The oversampling approach outperformed the class weight tuning and balance class weight procedures in general, with the following metric values: 0.70–0.80 (testing accuracy), 0.43–0.56 (F1 score), and 0.32–0.59 (Kappa score). The visual comparison also revealed that the oversampling strategy gave the most reliable classifications: if the imbalance ratio is proportionate to the coverage area in each lithology class, the classifier capability is optimal.

Keywords: lithological map; machine learning; random forest algorithm; imbalanced data; class weight tuning; oversampling; and imbalance ratio

1 Introduction

1.1 Background

Lithology classification is critical for geological inquiry and mineral resource prospecting. On the other hand, the standard lithological mapping method is a knowledge-driven method that requires experts and is time-consuming. As a result, data-driven lithological mapping technologies, such as machine learning algorithms (MLAs) capable of rapidly processing enormous amounts of data, have been created [1,2,3]. The use of machine learning using remote sensing data combined with aerial geophysical data has been extensively researched for lithological mapping on a large scale, efficiently, and on time [4,5,6,7].

Prospecting efforts in machine learning have improved lithological mapping. Random forest (RF) is one of the most recent techniques utilized in this field, and it has shown promising results in properly predicting lithology from well logs [8] and seismic data [9]. Another technique that has gained prominence in machine learning is semantic analysis [10]. This method uses meaning from natural language documents using statistical and machine learning techniques. It has been used in image recognition, natural language processing, and speech recognition among other areas. Semantic analysis can be utilized in lithological mapping to investigate and extract geological aspects from language descriptions of lithology [11] and reflect the contextual usage meaning of the terms [12], thereby assisting in forecasting lithology types. The combination of RF with semantic analysis [13,14] can result in more accurate and efficient lithological mapping methods, which has significant implications for a variety of industries, such as oil and gas exploration, geothermal energy, and groundwater resource management.

The RF learning algorithm has been identified as a highly accurate and robust MLA for lithology classification using remote sensing data, with the ability to classify lithology types in each pixel of the study area and generate robust prediction uncertainties, making it a valuable tool for determining the spatial distribution of lithology [15]. Many studies have used MLAs, such as the RF algorithm, for lithology classification [6,7,16,17,18]. The RF algorithm is a supervised machine learning approach that is used to categorize lithology in each pixel in the study area. This algorithm is an ensemble learner that has been recognized as a good classifier, moderately robust to outliers and noise [8,19], simple-to-use, and a highly accurate approach for inferring the spatial distribution and generating robust prediction uncertainties [20]. For instance, Cracknell and Reading’s research [21] revealed that RF is straightforward to train, computationally efficient, highly stable in the face of changes in classification model parameter values, and as accurate as, or significantly more accurate than, the other MLAs tested. Meanwhile, Harris and Grunsky [22] took similar techniques. They explained how the RF algorithm can map geology and how it can reduce classification uncertainty on prediction maps.

MLA implementation necessitates the use of training samples (TSs) to carry out the “learning” process. TSs that do not have the same number of samples for all lithology classes (imbalanced) are more likely to generate bias in the classification process, favoring the classes that have the most significant or a majority of the samples. Furthermore, it complicates the identification of regularities, particularly the homogeneity patterns in the minority class [23]. However, the classifiers tend to have excellent accuracy for the majority class while obtaining poor results for the minority class [24]. In such instances, the minority class is frequently the most critical; hence measures to boost its recognition rates are required [25]. Generally, during the classification, a technique that maximizes accuracy is carried out in the training phase. In an imbalanced scenario, maximizing the overall accuracy may not be the most beneficial method [26] because, even if the prediction model yields a high overall precision, the minority class remains unknown [27,28]. Collecting TSs in lithological mapping in areas that are difficult to access is challenging. Because the number of samples is excessive and irregular at the absolute rarity level, this situation tainted the data. This suggests that the minority class lacks sufficient samples to learn the decision limits [29].

The imbalance ratio (IR) is a ratio that describes the condition of uneven data [30]. The IR is a comparison of data from the majority (negative samples) and minority (positive samples) such as 100:1 or 100,000:1. Although IR is used to sort diverse data sets, it does not always provide an accurate estimate of the difficulty level of the sample [24]. Instead, it provides the most likely class occurrence for each instance of the least likely class [31]. On the other hand, minority or rare classes are difficult to identify. This is due to the infrequency and casualness of the condition, and misclassifying rare classes results in high costs [27]. In this respect, it is crucial to make accurate predictions or identifications of the rarer rather than standard classes [29]. Furthermore, classical classifiers are prone to bias in evaluation and perform poorly for the minority class and the majority.

There are two strategies for overcoming data imbalance problems: (i) improvements at the data level and (ii) improvements at the algorithm level [24,30,32]. At the data level, the steps are to preprocess the data and rebalance the class distribution through resampling by applying the over- or undersampling techniques [32]. Solutions for dealing with unbalanced classes at the data level primarily focus on changing the class distribution to get a more balanced sample [33]. At the algorithm level, adjustment is carried out on the learning procedure, including ensemble learning approaches (e.g., RF) and cost-sensitive learning (CSL) techniques [24,32]. In addition, these two combination methods are called hybrid [25], for instance, RF-over/undersampling and RF-CSL.

Improvements at the data level include numerous techniques, for example, oversampling for tiny classes, undersampling for board categories, informative oversampling for small groups, informative undersampling for common classes, oversampling in small classes using synthetic data, and a combination of the aforementioned techniques [32]. The oversampling method works on the minority class by replicating the data to ensure it becomes balanced. The key advantage is that it minimizes information loss, but the disadvantage is that it just duplicates data and leads to overfitting [3].

CSL (also known as class weight tuning) is a method for modifying algorithms at the algorithm level [24,25,34]. This method is made of a specific set of algorithms that are sensitive to various costs associated with distinct properties of the classification problem. CSL intends to train classifiers to focus on classes with greater costs, which are prioritized. The cost of classes is one way that can be used. This scenario assumes that if the classifier incorrectly identifies a specific class, it will pay a substantial penalty/cost. The algorithm modification is performed by giving weights to the majority and minority classes. The difference in weight affects the classification during the training process. The adjustment tries to penalize misclassification in the minority group by increasing class weight and decreasing the value for the majority. According to Karlhede [35], when calculating the Gini Index, which determines the best feature in splitting data at the decision node and when the leaf acquires its class label, the provided weight should be changed. However, it should also be noted that specified weights have a threshold. When a minority class is given a very high weight, the algorithm is likely to be biased toward that group, lowering the model’s overall performance [36].

In lithological mapping, data imbalances are common. However, little research has looked into the effect of imbalances and procedures on classification accuracy, especially in small sample sizes [17,33]. The degree of imbalance may have an impact on the performance of predictive models trained on uneven data. For some samples, the imbalance can have a considerable impact on the classification quality [37]. This work investigates and analyzes multiple approaches to handling unbalanced data in lithological classification based on RF algorithms, using remote sensing and airborne geophysical data with limited borehole log samples in Komopa, Papua Province, Indonesia. These RF algorithms used oversampling and class weight tuning techniques to generate two hybrids. The oversampling method was chosen because it has successfully addressed the problem in lithological mapping [3]. The class weight tuning approach was used since it was relatively considerable, and many studies indicate successful outcomes when a large number of TSs are used [38,39]. Unfortunately, no research has used this method for lithological mapping using a limited sample collection at this time.

1.2 RF classifier

At the University of California, Berkeley, Breiman and Cutler created RFs [19,22]. This algorithm has evolved into a trustworthy and useful classifier for scientists in a variety of fields, including geology [1,40]. Numerous lithological mapping investigation has demonstrated that these techniques outperform other algorithms [1,22,41]. RF is an excellent starting point for undertaking multi-dimensional data classification. This technique can also generate a ranking list of the best predictors without requiring normally distributed TSs [42].

RF is a supervised MLA that incorporates several decision trees and mixes the decisions from many trees (ensemble classification method). To generate many decision trees, this approach uses bootstrap aggregation [19]. This approach, like other supervised classification algorithms, requires training data (i.e., locations of different rock types) [22]. Using replacements, this strategy selects two-thirds of the TSs at random to form an “in-bag” data set. The in-bag data classify a decision tree using the Gini index [43], which is used to identify the best parting for a particular class, while the remaining data (“out-of-bag” data) are utilized to validate the model. The main advantage of this algorithm is that it predicts classes based on the average of several decision trees, improving predictions and lowering outliers’ mistakes [41].

1.3 The objectives

Because of the restricted number of TSs, the problem of data imbalance necessitates research into the ideal number of samples required to achieve accurate classifications. Besides, a few study have described the behavior of minority data and its association with the IR [28]. As a result, the specific goal of this research is to determine: 1) which method produces the best classifications; 2) the effect of the number of TS datasets on the accuracy of the classifications; 3) the behavior of minority data in the classification process, and 4) the effect of the IR on the classifier’s ability. The quality of the classifications was determined not only by performance metrics computed from confusion matrices but also by a visual comparison of the lithological map obtained from the RF classification with the current map. In lithological mapping, performance indications alone are deemed insufficient to describe classification success. The visual analysis revealed how similar the distribution of lithological types and boundaries was to the previous map.

This study included the following experiments. It classifies both hybrid approaches with unbalanced data and varied amounts of data, ranging from 25 to 500 samples distributed at simple random. These studies identify the number of samples, the IR, and the distribution of the minority TSs that produce the best outcomes. Next, compare the classification result of these two hybrids to those of the balanced class weight technique and the balance TS. The balanced class weight approach assigns equal weight to each lithology class. The balanced TS, on the other hand, is a model with the same amount of TSs for each class, which is dispersed using stratified random sampling with 25 and 50 samples.

2 Materials and methods

2.1 Study areas and data used

2.1.1 Study area

The research area includes the Komopa area in the Paniai District, Papua Province, Indonesia (the western part of New Guinea Island) (Figure 1). We chose this area because it already had a lithological map made by Mine Serve International (MSI) conventional geological mapping techniques to provide an accurate reference map. In 2000, MSI created a lithological map of this area with a scale of 1:25,000 [44]. Furthermore, MSI has performed a series of geophysical data collections and a large number of borehole logs so that this area is ideally used for lithological mapping trials with several geoscience data and TSs. According to the technical report written by Skead [45], the Komopa area is limited to the south by the Aga and Bogodide Rivers, both the southwest and southeast of the Wodege River, respectively. The Komopa area has moderate relief, with elevations ranging from 1,720 to 2,200 m. The terrain consists of 600–1,200-m-wide valleys filled with alluvial material divided by low hills. Longitudes 136°28′3.88″ E–136°33′54.13″ E and latitudes 3°44′38.86″ S–3°48′59.32″ S define the survey area. The majority of the land is densely forested, with trees reaching heights of 30 m, and thick humus covering the earth (Figure 2b). Quaternary alluvium, pseudo gossan, inferred porphyry, sedimentary rocks, and undifferentiated porphyry are the lithologies present in the area (as shown in Figure 2a). Pseudo gossan is a more detailed lithology class compared to others. Nevertheless, MSI used this class as a guide for their mineral exploration.

Figure 1

The area of interest is in the middle of Papua Province (Indonesia), presented by a red rectangle [48].

Figure 2

(a) A local lithological map of Komopa, scale 1:25,000, released by MSI in 2000, indicated the variation of rock types in different colors [44]. (b) Study Area, Komopa, Paniai District, Papua Province, presented by the color composite of Sentinel 2A RGB:11-08-02.

According to Glover Consulting’s geological survey reports for MSI [46], New Guinea Island is a well-known late Miocene–Pliocene copper–gold porphyry province-associated gold resource. Island arc volcanism with co-magmatic plutons is linked to mineralization. The island is formed from a collision between the Australian and the southward-migrating Pacific plates. A prominent central fold belt has formed and is split into three east–west trending structural zones: 1) the southern coastal strip comprising Palaeozoic and Tertiary shelf sediments; 2) the central mobile belt comprising Palaeozoic, Mesozoic, and Tertiary sediments, which are folded and faulted by high-angle reverse thrusts and strike-slip faults; and 3) the northern belt of Tertiary sediments and volcanic. The central mobile belt has been intruded by diorite to quartz-monzonite stocks, which are linked to skarn and porphyry copper–gold mineralization. Both intrusive and mineralization are considered structurally controlled [47].

2.1.2 Field sampling data

The study area has no outcrops, and to collect TSs, soil drilling was conducted to the depths of 0–3 m, 3–6 m, or until bedrock was observed using winkie or Longyear drilling [45]. There are 1002 borehole log samples available for this research, of which 500 were utilized as TSs and the rest as testing samples.

2.1.3 Remote sensing and airborne geophysical data

The remote sensing data used were composed of Sentinel-2A, Advanced Land Observing Satellite (ALOS)-Phased Array type L-band Synthetic Aperture Radar (PALSAR), and Digital Elevation Model (DEM) data. The Sentinel-2A image, acquired on January 09, 2019, is a wide-swath (290 km), high-resolution, and multispectral dataset. The Multispectral Instrument measures the Earth’s reflected radiance in 13 spectral bands: (i) four visible and near infrared bands (B2, B3, B4, and B8) at 10 m of spatial resolution; (ii) four vegetation red edge bands (B5, B6, B7, and B8a) and two short-wave infrared (SWIR) bands (B11 and B12) at 20 m of spatial resolution; and (iii) three bands (B1, B9, and B10 for aerosol, water vapor, and cirrus SWIR, respectively), at 60 m of spatial resolution [49]. This study only used the Sentinel-2A bands with a spatial resolution of 10 and 20 m.

The granule at Level-2A was obtained from the United States Geological Survey Earth Explorer Open (Access Hub: https://earthexplorer.usgs.gov/, image code in Supplementary Material S1). A granule is the minimum indivisible product partition (containing all possible spectral bands). It is also known as tiles and possesses 100 × 100 km² ortho-images in the UTM/WGS84 projection [50].

The study area was projected to universal transverse mercator in the 53S zone. The Level-2A image data offer Bottom of Atmosphere reflectance images obtained from the associated Level-1C image data. This image is used directly in downstream applications without further processing [50]. Before application, vegetation’s influence on this image data was removed using the Vegetation Suppression tool available in the environment for visualizing images software [51].

The radar data used were ALOS PALSAR Fine Beam Double Polarization Data (FBD), L-band, 3.17 m × 14.9 m (azimuth × range) resolution dual-polarization (HH + HV). The data were collected on July 02, 2007, with ALOS’s Single Look Complex (SLC) data type. The granule data were obtained from https://www.asf.alaska.edu/ (Image code in Supplementary Material S2).

The DEM data utilized were DEMNAS (National DEM) from the Geospatial Information Agency, Republic of Indonesia. The National DEM was developed from several data sources, such as IFSAR (5 m resolution), TERRASAR-X (5 m resolution), and ALOS PALSAR (11.25 m resolution), by including the stereo-plotting mass point data from aerial photos. The spatial resolution of DEMNAS is 0.27-arcsecond (8.1 m), using the EGM2008 vertical datum [52].

The airborne geophysical data comprised magnetic, electromagnetic, and radiometric data. Residual field magnetic data effectively reflect the distribution of magnetic material within the survey area. In nature, the most dominant magnetic mineral is magnetite (Fe₃O₄). Lithologies containing even small amounts of magnetite will produce distinctive magnetic properties [47]. Electromagnetic data give precise information on the structure and lithological variations. Airborne electromagnetics data can deliver beneficial additional information, whereas magnetic data might only offer minimal information [47]. A quantity of 30 cm of rock, 60 cm of soil, or 1 m of water effectively obscures the underlying radiation sources no matter how intense owing to the minimal depth of penetration and the inherently complex nature of the gamma-ray spectra. Thus, the radiometric data reflect lithological variations on the surface [47]. Magnetic and radiometric data surveys were carried out simultaneously with a flight spacing of 400 m. The electromagnetic data survey was performed separately with a flight spacing of 100 m.

All the datasets used are shown in Table 1.

Table 1

A summary of the data used for lithology classification

Dataset	Acquisition date	Data	Resolution/Flight spacing	Source
Sentinel 2A	January 09, 2019	Band 2	Res. 10 m	USGS
		Band 3	Res. 10 m
		Band 4	Res. 10 m
		Band 5	Res. 20 m
		Band 6	Res. 20 m
		Band 7	Res. 20 m
		Band 8	Res. 10 m
		Band 8a	Res. 20 m
		Band 11	Res. 20 m
		Band 12	Res. 20 m
DEM	No Data	DEM	Res. 8.1 m	Geospatial Information Agency, Indonesia
ALOS PALSAR	July 02, 2007	HH	Res. 3.17 × 14.9 m²	www.asf.alaska.edu
		HV
		HH-HV
Magnetic	August 6-25, 1993	RTP	400 m flight spacing	MSI
Electromagnetic	August 6-25, 1993	2 kHz	100 m flight spacing	MSI
		20 kHz
		36 kHz
Radiometric	August 6-25, 1993	Thorium (Th)	400 m flight spacing	MSI
		Potassium (K)
		Uranium (U)
		K/Th ratio

2.2 Methodology

Figure 3 illustrates the research methodology for this study. It primarily comprises five parts: (1) preprocessing of remote sensing and geophysical data; (2) training and testing data preparation; (3) lithological classification utilizing the RF algorithm; (4) accuracy evaluation; and (5) visual comparative assessment.

Figure 3

Flowchart of research methodology to categorize lithology based on imbalanced or balanced data.

2.2.1 Data preprocessing

Preprocessing was performed for Sentinel-2A, DEM, and ALOS PALSAR data containing information on geological and lithological features for mineral exploration [53] beneath the surface layer. The preprocessing stage was initiated with the Sentinel-2A data. Afterward, we took two steps: first, we use Sentinel Application Platform software from the European Space Agency to carry out atmospheric correction and orthorectification [54]. Second, we perform a process to eliminate vegetation spectral signatures using the Vegetation Suppression tool from ENVI [51].

The ALOS PALSAR data types applied were FBD, L-band, 3.17 m × 14.9 m (azimuth × range) resolution dual-polarization (HH + HV) with SLC data type. The Japan Aerospace Exploration Agency-Earth Observation Research Center obtained the data on July 02, 2007, and preprocessed it (orthorectification, slope correction, and mosaicking). SAR calibration aims to provide imagery where the pixel values may correlate directly with the scene’s radar backscatter [55]. Hence, it is crucial to apply the radiometric correction to SAR images for the pixel values to represent the radar backscatter of the reflecting surface. The SLC data were converted to ground range detected data, and a spatial resolution of 20 m × 20 m was acquired. The data were in the form of digital number intensities, which were then processed into HH-HV polarization backscattering data [56].

The geophysical data preprocessing is initiated with the magnetic data. Filtering against a specified range of frequencies from the dataset was followed by compilation and leveling work, non-linear filtering, gridding, spectral analysis, low-pass filter, decorrugation, international geomagnetic reference field removal, contouring, and reduction to pole [47]. The magnetic data used for this classification were reduced to pole (RTP). The radiometric data comprised Thorium (Th), Potassium (K), Uranium (U), and the ratio of Thorium and Potassium (K/Th). The process started with validating the data utilizing preliminary grids and stacked profiles. Then, the minimum curvature gridding process, decorrugation using linear contour intervals, and color contour plots were constructed using an equal area distribution [47]. Electromagnetic data included 2, 20, and 36 kHz and were processed in two stages, namely quality control (frequency and time domain) and 1D inversion (layered model and inversion result) [57].

The next stage involved resampling all the data to homogenize the pixel size to 20 m (24 features) using inverse distance weighted interpolation. Finally, we use normalization to homogenize each dataset’s magnitude (between 0 and 1) to avert adverse effects on classification [3].

2.2.2 Training and testing data preparation

Table 2 demonstrates the number and distribution of training and testing samples used in the lithology classifications. All the imbalanced samples were acquired from simple random sampling. Meanwhile, the balance samples were obtained from stratified random sampling. Figure 4 depicts the spatial distribution of the training and testing samples.

Table 2

Training and testing sample size and distribution

Model	Number of TSs					Sum
Model	Inferred porphyry	Pseudo gossan	Quaternary alluvium	Sedimentary rocks	Undifferentiated porphyry	Sum
Stratified Random
Stratified Random 25	5	5	5	5	5	25
Stratified Random 50	10	10	10	10	10	50
Simple Random
Simple Random 25	2	2	2	5	14	25
Simple Random 50	2	2	3	12	31	50
Simple Random 100	4	2	8	22	64	100
Simple Random 200	5	2	12	41	140	200
Simple Random 300	6	3	16	67	208	300
Simple Random 400	7	4	19	89	281	400
Simple Random 500	10	6	23	107	354	500
	Number of Testing Samples					Sum
	8	9	17	124	344	502

Figure 4

The distribution of training and testing sample datasets with variations in the number and type of lithology. Figure (a)–(g) are TSs with a simple random distribution with a sample size variation of 25 to 500. Figure (h) and (i) depict stratified random distributions with 25 and 50 TSs. Figure (j) represents the distribution of testing samples, with a total of 502, where the class distribution is imbalanced.

The class weight tuning technique was applied using the GridSearchCV application from Scikit-learn [58]. The class weight tuning process was only applied to minority classes, such as inferred porphyry, pseudo gossan, and quaternary alluvium. Weight tuning is a search process that offers the highest training accuracy and F1 score values. Two stages of testing were performed. The first is to perform a series of classifications by giving values to the minority classes with fairly significant differences, for instance, 0.1, 1, 2, 5, 10, 15, 20. Each lithology class is given the same weight at this stage, while the resulting training accuracy and F1 score are examined. This aimed to obtain an interval of weight values with the highest training accuracy and F1 scores. When the best value interval is accomplished, the second stage of the search process is performed. This was aimed to acquire the best combination of minority class weights. In this stage, the classification is performed as in the first phase, applying the GridSearchCV application. In this application, the classification is performed by applying the value to the weight interval obtained in the first stage. The application of the weight value is conducted in stages with a relatively small increase in value. For instance, when the interval is between 0.1 and 2.0, the weight value added at each stage is 0.1.

In the oversampling approach, data preparation is performed by multiplying TSs (augmentation technique). This technique guarantees that the number equals to the highest value of TSs in the majority class. Finally, in the balance class weight method, all data points are allotted the same value, equal to 1 (known as a classic RF).

2.2.3 Lithological classification using RF algorithm

At this point, the lithology classification was carried out using the RF algorithm implemented in the Python module from the Scikit-learn library. RF is an ensemble learning algorithm that comprises many decision trees [19]. RF uses decision trees as base learners [59], and each tree is developed according to numerous hyperparameters. Hyperparameters identify the number of decision trees (n_estimator) and control the decision tree structure [60]. Alternatively, to handle the class weights, the decision tree uses the class_weight hyperparameter, which contains the class weights set before the decision tree is built. In this research, in the decision tree development process, the hyperparameters used were created following the default values of Scikit-learn [61]. The exception was in the class weight tuning method, where the class_weight hyperparameter for the minor class followed the outcomes of the class weight tuning, while the major class got a score of 1. In RF, basic decision trees are built from many randomly generated subsets, and the class with the most votes is considered as the final classification result [19]. Furthermore, each decision tree is built according to training data acquired through the bootstrap or bagging aggregation process [62]. About two-thirds of the TSs in the dataset are utilized for training, and the remaining one-third is used for internal model validation.

2.2.4 Accuracy evaluation

The performance of classifications was quantitatively evaluated by computing the metrics from confusion matrices. The accuracy from different viewpoints was computed, including the overall testing accuracy, precision, recall, F1 score, and Kappa score [63]. The 502 testing samples were used from the borehole log data to compute the metrics. TSs in the RF classification were used as cross-validation to determine out the overall training accuracy.

The overall testing accuracy was acquired from the ratio of correct pixels and the total number of pixels in the confusion matrix. In the imbalanced data condition, the majority class dominated the classifications, permitting even a poor model to accomplish high accuracy, depending on the imbalance rate of data. Hence, to measure the classifications’ performance, use of instruments other than accuracies, such as precision, recall, and F1 score, is required. Precision quantifies the number of positive class predictions from the positive class. Recall quantifies the number of positive class predictions made out of all the positive examples in the dataset. When applied alone, neither precision nor recall explains the whole story [63]. The F1 score offers a harmonic means for precision and recall. It is utilized to compute the accuracy of both precision and recall. A good algorithm should simultaneously maximize precision and recall [64]. We used equations (1)–(4) for the calculation of accuracy, precision, recall, and F1 score, respectively [65]:

(1) Accuracy = ∑ i = 1 5 N ii / ∑ i = 1 5 ∑ j = 1 5 N ij ,

(2) Precision i = N ii / ∑ k = 1 5 N ki ,

(3) Recall i = N ii / ∑ k = 1 5 N ik ,

(4) F 1 score = 2 × Precision i × Recall i / ( Precision i + Recall i ) .

The confusion matrix is obtained according to a combination of actual and predicted values, as demonstrated in Table 3.

Table 3

Five classification confusion matrix

		Predicted grade
		I	II	III	IV	V
Actual grade	I	N ₁₁	N ₁₂	N ₁₃	N ₁₄	N ₁₅
	II	N ₂₁	N ₂₂	N ₂₃	N ₂₄	N ₂₅
	III	N ₃₁	N ₃₂	N ₃₃	N ₃₄	N ₃₅
	IV	N ₄₁	N ₄₂	N ₄₃	N ₄₄	N ₄₅
	V	N ₅₁	N ₅₂	N ₅₃	N ₅₄	N ₅₅

Description: Class I: inferred porphyry, Class II: pseudogossan, Class III: quaternary alluvium, Class IV: sedimentary rocks, Class V: undifferentiated porphyry.

The Kappa score is another performance metric. Kappa score tests the reliability between raters, which precisely depicts the measured variable (equation (5)). Several techniques for measuring inter-rater reliability are generally expressed as a percentage of agreement (i.e., the sum of the agreement scores divided by the total score) [66].

(5) κ = N ∑ i = 1 n m i , j − ∑ i = 1 n ( G i C i ) N 2 − ∑ i = 1 n ( G i C i ) ,

where i represents the class number, N is the total number of classified values compared to truth values, and m i , j signifies the number of successfully categorized values for truth class i. The total truth values and forecasted values corresponding to class i are designated as C i and G i , respectively [67].

2.2.5 Visual comparative assessment

A comparative visual assessment was performed to compare the outcomes of the RF classification with the existing lithology map. It is necessary to perform these two types of evaluation simultaneously because performance metrics alone cannot describe the quality of the classification results in regards to the accuracy of the location of the identified lithology classes and their boundaries. The low performance metrics do not necessarily signify poor classifications, and vice versa, which is detected when the classification results are visualized and compared with the existing map. The distribution of lithological types and boundaries acquired from the RF classifications were compared with the existing map from MSI. The comparison determined the best approach for handling imbalanced data and providing a lithological map closest to the existing map.

Both evaluations were also performed specifically for the minority lithology classes to identify the minority data’s behavior in the classification and its association with the IR, the number of TSs, and their composition in each class. We superimposed the MSI map on the RF classification results to estimate the deviation between the classification results and the MSI map. Here, an existing MSI map is displayed within the lithology class boundaries. Errors in identifying lithology classes and boundary deviations can be observed, leading to an increase or decrease in lithology classes’ precision and recall values.

3 Results

3.1 Class weight tuning results

As described in Section 2.2.2, class weight tuning results were acquired after conducting a series of classifications as described earlier. From the first stage, it was noted that the best weight value range was between 0.1 and 2.0. Afterward, in the second stage, the GridSearchCV application was applied, which computed the weights between 0.1 and 2.0 with an additional value of 0.1 at each stage utilizing the 5-fold cross-validation technique and 40,000 combinations. Table 4 displays the final results of class weight tuning.

Table 4

Class weight tuning results

TSs	Class weight tuning results			Training accuracy	F1 score
TSs	Inferred porphyry	Pseudo gossan	Quaternary alluvium	Training accuracy	F1 score
Simple Random 25	0.1	0.4	1.7	0.76	0.52
Simple Random 50	1.0	0.8	0.9	0.89	0.57
Simple Random 100	0.3	0.5	0.1	0.96	0.68
Simple Random 200	1.3	0.1	0.2	0.95	0.65
Simple Random 300	1.5	0.4	0.2	0.97	0.72
Simple Random 400	1.5	0.5	0.1	0.97	0.77
Simple Random 500	0.2	0.4	0.7	0.97	0.77

Table 4 depicts the results of class weight tuning for the minority classes, while each majority class obtained a weight of 1. Among the weights of each class, it was discovered that not all classes have weight greater than 1. This condition implied that the weights of these classes did not surpass those in the majority class. Brownlee [63] reports that a class with greater importance was allotted greater weight, while a class with lower importance was given smaller. Meanwhile, Fernández et al. [24] mentioned that higher weights are assigned to instances emerging from the class with a higher value of misclassification cost. The minority class, with a weight of less than 1, attains less attention than the majority. Further information concerning the effects of these class weights can be observed in the performance metrics for the minor classes.

3.2 General accuracy evaluation

Table 5 demonstrates a summary of performance metrics for 23 models with varying numbers of TSs. The average score for each of the five lithology classes are displayed in this table.

Table 5

Performance metrics summary of the lithological classification results

Models	Training accuracy	Testing accuracy	Precision	Recall	F1 score	Kappa score
Random_25_bal	1.00	0.71	0.44	0.41	0.39	0.23
Random_25_CS	0.84	0.69	0.27	0.24	0.23	0.16
Random_25_OV	1.00	0.70	0.44	0.56	0.43	0.32
Stratified_25	1.00	0.61	0.35	0.54	0.38	0.27
Random_50_bal	1.00	0.71	0.51	0.33	0.37	0.23
Random_50_CS	0.94	0.71	0.45	0.26	0.27	0.19
Random_50_OV	1.00	0.74	0.48	0.53	0.49	0.38
Stratified_50	1.00	0.62	0.38	0.65	0.41	0.30
Random_100_bal	1.00	0.75	0.67	0.47	0.52	0.43
Random_100_CS	0.85	0.75	0.28	0.30	0.29	0.44
Random_100_OV	1.00	0.75	0.54	0.59	0.56	0.47
Random_200_bal	1.00	0.77	0.61	0.37	0.41	0.45
Random_200_CS	0.91	0.76	0.48	0.31	0.32	0.40
Random_200_OV	1.00	0.77	0.45	0.41	0.42	0.46
Random_300_bal	1.00	0.78	0.72	0.44	0.49	0.49
Random_300_CS	0.91	0.78	0.39	0.35	0.36	0.48
Random_300_OV	1.00	0.80	0.69	0.50	0.53	0.55
Random_400_bal	1.00	0.82	0.81	0.45	0.51	0.57
Random_400_CS	0.92	0.81	0.41	0.36	0.38	0.54
Random_400_OV	1.00	0.82	0.54	0.47	0.48	0.59
Random_500_bal	1.00	0.81	0.66	0.47	0.52	0.55
Random_500_CS	0.94	0.81	0.42	0.38	0.39	0.56
Random_500_OV	1.00	0.81	0.70	0.49	0.51	0.57

Description: bal: balance class weight method, CS: class weight tuning method, OV: oversampling method; random_500_CS: simple random distribution, 500 TSs, class weight tuning method; stratified_50: stratified random distribution, 50 TSs; bold number: the highest value in each group.

Table 5 illustrates that the range of testing accuracy scores was between 0.69 and 0.80. The accuracy score increased with the number of TSs. The F1 score, alternatively, was lower than the testing accuracy. This condition is because of the classification of a limited number of outcrops and an imbalance of the sample data; therefore, the lithology prediction results from the RF algorithm are often incorrect [17]. Accuracy is suitably used when the data distribution is balanced, and when there are imbalanced classes, it is better to apply the F1 score because it signifies the actual classification performance [68].

The Kappa score values follow the same trend as the testing accuracy; hence, a more significant number of TSs will cause a higher Kappa score. The imbalance models that utilized the oversampling method produced the highest Kappa scores (ranging from 0.32 to 0.59). This Kappa score demonstrates that the level of agreement acquired is weak and minimal, with data reliability of 4–35%, which implies that more than 65% of the evaluated data contain errors [66]. The model that produced the highest Kappa score used 400 TSs (random_400_OV). For samples with a balanced distribution using 25 and 50 TSs, the Kappa scores obtained were relatively low (0.27 and 0.30), with the level of agreement reaching the category at the minimum level. These were lower than those of the imbalance models with the same number of TSs by applying the oversampling method. These findings imply that the oversampling method has enhanced the classifier’s performance.

Generally, the best F1 scores were yielded by models that applied the oversampling method, ranging from 0.43 to 0.56. These F1 scores were higher than the balance class weight model, which was the baseline model in this research, with an F1 score range of 0.37–0.52. The oversampling method could handle imbalanced data by increasing the F1 score by 0.04–0.06. Specifically, the best model was random_100_OV, utilizing 100 TSs. This number of TSs was optimal because further addition to 500 did not elevate the F1 score. Nonetheless, it tends to decrease them by approximately 0.03–0.14. This decrease in the F1 score is linked to the IR value.

3.3 Accuracy evaluation of minority class and IR

Table 6 depicts that, in general, the class weight tuning method did not offer good performance in determining minority classes. The precision and recall scores have a value of 0 in most models. Only one out of three minor classes were determined in each model analyzed. None was detected, even in the model with 100 (random_100_CS) TSs. The classification results indicate that this approach is unsuitable for categorizing lithology classes with samples that are not balanced. Nevertheless, it identified major classes, particularly undifferentiated porphyry, with the highest number of TSs (IR = 1:1) (Table 7) causing high precision, recall, and F1 score. Its F1 score ranges from 0.82 to 0.90. Another major class (sedimentary rocks) has fewer TSs (IR = 3:1), and the performance metrics scores acquired were lower than those of undifferentiated porphyry. Its F1 score ranges from 0.26 to 0.65. It was shown in the two major classes that the more the TSs were used, the higher the yielded performance metric scores were.

Table 6

Summary of classification performance metrics for each lithology class

Number of TSs	Lithological class	Method
		Balance class weight			Class weight tuning			Oversampling
		Precision	Recall	F1 score	Precision	Recall	F1 score	Precision	Recall	F1 score
25	1	0.50	0.25	0.33	0	0	0	0.38	0.62	0.48
	2	0.46	0.67	0.55	0	0	0	0.20	0.89	0.33
	3	0	0	0	0.14	0.06	0.08	0.33	0.18	0.23
	4	0.49	0.17	0.25	0.46	0.18	0.26	0.50	0.19	0.28
	5	0.75	0.95	0.83	0.73	0.94	0.82	0.80	0.91	0.85
50	1	0.50	0.12	0.2	0	0	0	0.33	0.38	0.35
	2	0.60	0.33	0.43	1	0.11	0.20	0.50	0.89	0.64
	3	0.20	0.06	0.09	0	0	0	0.18	0.18	0.18
	4	0.53	0.21	0.30	0.54	0.22	0.31	0.61	0.31	0.41
	5	0.74	0.95	0.83	0.73	0.95	0.83	0.80	0.92	0.85
100	1	0.50	0.25	0.33	0	0	0	0.22	0.25	0.24
	2	1	0.33	0.5	0	0	0	0.67	0.89	0.76
	3	0.43	0.35	0.39	0	0	0	0.39	0.41	0.40
	4	0.58	0.51	0.54	0.55	0.62	0.58	0.59	0.51	0.55
	5	0.82	0.88	0.85	0.83	0.88	0.85	0.84	0.87	0.85
200	1	1	0.12	0.22	1	0.12	0.22	0.33	0.12	0.18
	2	0	0	0	0	0	0	0	0	0
	3	0.56	0.29	0.38	0	0	0	0.43	0.53	0.47
	4	0.67	0.48	0.56	0.60	0.47	0.53	0.66	0.49	0.56
	5	0.80	0.94	0.87	0.80	0.94	0.86	0.82	0.92	0.87
300	1	0.67	0.25	0.36	0.50	0.25	0.33	0.50	0.25	0.33
	2	1	0.11	0.2	0	0	0	1	0.22	0.36
	3	0.41	0.41	0.41	0	0	0	0.36	0.53	0.43
	4	0.70	0.50	0.58	0.65	0.56	0.60	0.74	0.58	0.65
	5	0.82	0.94	0.87	0.82	0.93	0.87	0.85	0.92	0.88
400	1	1	0.12	0.22	0.5	0.25	0.33	0.67	0.25	0.36
	2	1	0.22	0.36	0	0	0	0	0	0
	3	0.46	0.35	0.4	0	0	0	0.38	0.53	0.44
	4	0.77	0.56	0.65	0.72	0.59	0.65	0.77	0.60	0.67
	5	0.84	0.97	0.90	0.84	0.97	0.90	0.86	0.95	0.90
500	1	0.67	0.25	0.36	0	0	0	0.5	0.25	0.33
	2	0.67	0.22	0.33	0	0	0	1	0.11	0.2
	3	0.37	0.41	0.39	0.5	0.35	0.41	0.37	0.59	0.45
	4	0.78	0.52	0.63	0.74	0.59	0.65	0.77	0.56	0.65
	5	0.84	0.96	0.90	0.84	0.96	0.90	0.86	0.95	0.90

Description: Class 1: inferred porphyry, Class 2: pseudo gossan, Class 3: quaternary alluvium, Class 4: sedimentary rocks, Class 5: undifferentiated porphyry; italic number: the lowest value in each group, bold number: the highest value in each group.

Table 7

IR and distribution of TSs

Lithological class	Number of TSs
	25		50		100		200		300		400		500
	∑	IR	∑	IR	∑	IR	∑	IR	∑	IR	∑	IR	∑	IR
1	2	7:1	2	16:1	4	16:1	5	28:1	6	35:1	7	40:1	10	35:1
2	2	7:1	2	16:1	2	32:1	2	70:1	3	69:1	4	70:1	6	59:1
3	2	7:1	3	10:1	8	8:1	12	12:1	16	13:1	19	15:1	23	15:1
4	5	3:1	12	3:1	22	3:1	41	3:1	67	3:1	89	3:1	107	3:1
5	14	1:1	31	1:1	64	1:1	140	1:1	208	1:1	281	1:1	354	1:1

Description: ∑ = Number of TSs in each class, IR = imbalance ratio.

Generally, the balance class weight method yielded good results. Most models can determine the three minor classes (inferred porphyry, pseudo gossan, and quaternary alluvium), except for models using 25 (random_25_bal) and 200 (random_200_bal) TSs, which did not identify quaternary alluvium and pseudo gossan, respectively. Almost all classes have higher precision than recall; some even have a precision score of 1, depicting no false positive, with a low recall score (high false negative).

In general, the oversampling method provided the best classification performance in minority classes. Although the model with 200 (random_200_OV) and 400 (random_400_OV) TSs did not determine the pseudo gossan, this technique generally enhanced the F1 score in minority classes, and the overall F1 score increased compared to the balance class weight method as the baseline model. The quaternary alluvium classes that were unidentified by the balance class weight method using 25 (random_25_bal) TSs were determined by the oversampling method (random_25_OV). The model with 200 (random_200_OV) and 400 (random_400_OV) TSs, where the pseudo gossan (IR = 70:1) was not identified, showed that the oversampling method could not always identify minor classes with very high IR values. Other minority classes with an IR ≤ 40:1 could be identified. Quaternary alluvium was the minority class that consistently enhanced its F1 score across all models. This performance was possibly associated with the number of TSs in this class, which was more significant than the other two minor classes, with a lower IR value, implying that the classifier’s performance was linked to each class’s IR value. Zhu et al. [69] reported that datasets with the same IR can portray distinct classification performances when their dimensionalities vary, rendering IR suboptimal for reflecting the extent of imbalance for classification. The classification performance improved with more discriminatory features.

3.4 Visual comparative assessment

Figures 5 and 6 demonstrate visual comparative assessment results. The RF classification result was overlaid with lithological boundaries from the existing map as a visual assessment reference. There were three criteria used: (i) the ability to detect all lithology classes; (ii) the ability to classify in regards to the position and coverage area of each class; and (iii) the ability to detect lithological boundaries. To enhance our comprehension, we examine this visual comparison by detecting the amount of incorrectly identified pixels, as displayed in Tables 8 and 9.

Figure 5

Visualization of classifications from balanced models using (a) 25, (b) 50 TSs, and (c) the existing lithological map.

Figure 6

Visualization of classifications from imbalanced models using a different number of TSs. (a) Random_25_bal, (b) Random_25_CS, (c) Random_25_OV, (d) Random_50_bal, (e) Random_50_CS, (f) Random_50_OV, (g) Random_100_bal, (h) Random_100_CS, (i) Random_100_OV, (j) Random_200_bal, (k) Random_200_CS, (l) Random_200_OV, (m) Random_300_bal, (n) Random_300_CS, (o) Random_300_OV, (p) Random_400_bal, (q) Random_400_CS, (r) Random_400_OV, (s) Random_500_bal, (t) Random_500_CS, and (u) Random_500_OV.

Table 8

Summary of misclassification pixel for each lithology class of balanced model

TS	Lithological class	Pixel	Error (%)	OE (%)
25	1	6,484	59.83	55.77
	2	31	5.07
	3	13,507	28.62
	4	83,582	72.27
	5	15,950	39.79
50	1	6,347	58.56	56.04
	2	8	1.31
	3	13,219	28.01
	4	79,872	69.06
	5	20,697	51.64

Description: OE: overall error.

Table 9

Summary of misclassification pixel for each lithology class

TS	Lithological class	BAL (pixel)	Error (%)	OE (%)	CW (pixel)	Error (%)	OE (%)	OV (pixel)	Error (%)	OE (%)
25	1	8,679	80.08	70.74	10,838	100.00	70.55	5,958	54.97	70.09
	2	454	74.30		611	100.00		139	22.75
	3	41,951	88.89		38,607	81.81		39,697	84.12
	4	97,519	84.32		98,318	85.01		97,005	83.88
	5	3,047	7.60		2,873	7.17		7,464	18.62
50	1	8,999	83.03	62.98	10,587	97.68	67.92	3,775	34.83	59.78
	2	463	75.78		571	93.45		67	10.97
	3	40,181	85.14		45,461	96.33		39,165	82.99
	4	82,238	71.11		86,228	74.56		79,406	68.66
	5	3,131	7.81		2,750	6.86		5,747	14.34
100	1	8,298	76.56	43.69	10,837	99.99	44.79	6,158	56.82	44.64
	2	522	85.43		611	100.00		255	41.73
	3	31,170	66.05		47,191	100.00		33,723	71.46
	4	46,236	39.98		29,448	25.46		46,139	39.90
	5	7,439	18.56		7,925	19.77		9,431	23.53
200	1	10,327	95.29	45.90	10,813	99.77	50.28	9,681	89.32	43.06
	2	604	98.85		611	100.00		603	98.69
	3	33,937	71.91		47,192	100.00		26,193	55.50
	4	47,359	40.95		43,833	37.90		48,326	41.79
	5	6,160	15.37		5,340	13.32		7,516	18.75
300	1	9,344	86.22	38.61	9,898	91.33	43.28	9,465	87.33	34.83
	2	580	94.93		611	100.00		500	81.83
	3	30,578	64.79		47,191	100.00		19,244	40.78
	4	36,398	31.47		29,698	25.68		38,595	33.37
	5	5,862	14.63		5,377	13.41		6,872	17.14
400	1	10,038	92.62	35.74	9,250	85.35	38.17	9,483	87.50	31.76
	2	572	93.62		610	99.84		588	96.24
	3	32,876	69.66		47,140	99.89		22,360	47.38
	4	28,219	24.40		20,378	17.62		29,387	25.41
	5	4,909	12.25		4,454	11.11		6,277	15.66
500	1	9,370	86.46	31.93	10,838	100.00	35.11	9,682	89.33	29.46
	2	554	90.67		593	97.05		593	97.05
	3	28,198	59.75		37,193	78.81		18,116	38.39
	4	24,845	21.48		21,363	18.47		29,104	25.17
	5	5,483	13.68		5,269	13.15		5,653	14.10

Description: BAL: balance class weight, CW: class weight tuning, OV: oversampling, OE: overall error.

Figure 5a and b demonstrates that the balance class weight method can determine all lithology types. In both models, it can be observed that the pseudo gossan and inferred porphyry classes are identified in a reasonably large area. Conversely, the area of the two classes does not match the area of the two related classes on the existing lithological map. This condition occurs in all classes. Table 7 shows the percentage error in the classification results. Moreover, this visualization indicated that the effect of false positives for pseudo gossan and inferred porphyry classes was visible, which results in low precision scores and high recall scores in these models (Table 5). Contrarily, these two performance metric values should have a balanced value. Nevertheless, because the testing points around the southern region are few and unevenly distributed, the two values differ (Figure 4j).

The visualization of the classifications demonstrated the performance metric scores in Tables 5, 6 and 9, which are elucidated in Figures 5 and 6. The class weight tuning method offered poor visualization results following the performance metric scores of the minority classes (Tables 6 and 9). This procedure clearly indicates the effect of the weight of the lithology class on the classifier’s ability to identify minority classes. The lithology class becomes challenging to detect at low weight values, e.g., inferred porphyry in the 25, 100, or 500 TS models. If the lithology class gets a higher weight, it will be easy to observe (e.g., quaternary alluvium in the 25, 50, or 500 models). Nonetheless, not all high weights can make the lithology class easy to detect; it can be observed in the inferred porphyry lithology class on the 200, 300, or 400 TS models. In these three models, no significant difference is seen between the inferred porphyry class, which has a high weight, and the pseudo gossan and quaternary alluvium lithology classes, which possess a low weight. This condition suggests that the weight of the class must be proportional to the weight of other classes so that the classification results do not bias toward the class with a high weight. Therefore, it can be assumed that the weights in the minor or major classes must satisfy specific proportions based on the population [70] or be inversely proportional to the frequency of the class [71]. Hence, this requires additional research to acquire a correlation between weights and weight comparisons between lithology classes.

The balanced class weight method provided better visualization than the class weight tuning method. The classification results of the 25 (random_25_bal) and 50 (random_50_bal) TS models indicate that the classifier can determine all lithology classes. However, the coverage of each lithology class does not correspond to the coverage of the related lithology class on the existing map. All lithology classes were sufficiently identified in the 100, 300, 400, and 500 TS models. Nonetheless, this procedure cannot ultimately determine the minor classes compared with the existing lithological map. For instance, in the model with 100 TSs, although the number of TS was only 2, it has a fairly low IR value (32:1). In the models with 300, 400, and 500 TS, this class was identified, although only a small area of it. Alternatively, the classifier can determine the pseudo gossan class because the number of TS for this class in each model was slightly higher (3, 4, and 6, respectively). In addition, they made the impurity value slightly more significant than the model with 200 TS, even though the IR values were almost similar (69:1, 70:1, 59:1, respectively). Regarding this, Zhu et al. [69] elucidated that their classification performances can vary when the dimensionalities of datasets with the same IR are different. Hence, IR is not the best way to demonstrate the imbalance for classification because it does not consider dimensionality.

The oversampling method offered better visual classifications than the other two methods evaluated following the three criteria described earlier. Hence, according to the visualization of the classifications, it can be noted that the model with TS of 25, 50, 100, 300, and 500 identified all lithology classes. In the model oversampling with 100 TS, it seemed that the position and area of the lithology class were entirely sufficient. Undifferentiated porphyry and sedimentary rock classes appear to be less misclassified in many locations. In this model, the quaternary alluvium was lacking in the central and southeast regions. Nonetheless, the misclassification of this class decreases with the increase of TS. The best classifications regarding position accuracy and lithology class area were generally acquired by the model with 500 TS, although the pseudo gossan class was inappropriately identified. The best identification for the pseudo gossan class was in the model with 100 TS.

The visual analysis demonstrated that lithological boundaries were accurately identified, beginning with a model with 100 TS. The classification error for the oversampling method decreases consistently as the number of TS increases, as demonstrated in Table 8. The increase in the classifier’s ability to determine the lithology class is the effect of using geophysical data. Integrating geophysical data and remote sensing is crucial to form reliable lithological maps when limited TS are utilized [41]. The sedimentary rocks generally depicted good remarks because this class has high resistivity and low magnetite, as opposed to the other four classes. This result was consistent with Zhu et al.’s study [69], which states that providing more discriminatory features can improve classification performance. The model with 500 TS portrays the best result in identifying lithological boundaries, although the pseudo gossan class was not properly identified.

The oversampling method outcomes depicted a relationship between the IR and the classifier’s ability to determine minority classes. When the IR value in the minority class was not low enough (the number of TS for both classes had an insignificant difference), then with the oversampling method, the classifier treated the minority class as though it was in a balanced state [63]. Conversely, when the IR was high (the number of TS for the majority and minority classes had a significant difference), the minority class was incorrectly identified.

4 Discussion

The application of the oversampling method with limited and imbalanced TS in this study has yielded good results. This is understandable because the actual application of the oversampling method manipulates the classifier to perceive the imbalanced data as balanced data [63]. Nevertheless, the classification results demonstrated that the oversampling method could not identify all classes (pseudo gossan class was not identified in the 200 and 400 TS models, Table 6). This condition begs the question of how this can occur. In this case, three things require investigation: (i) the proportion of TS to the coverage area of each lithology class, (ii) the IR value and (iii) class overlapping or class complexity, which is part of the intrinsic characteristics [63,72].

Concerning the proportion of TS, Qian [70] reported that in stratified sampling (balanced distribution), sometimes a disproportionate number of TS exists. This proportion refers to the total population of each lithology class. Meanwhile, Noorhalim et al. [30] stated that the optimal classification performance should have a balanced distribution and an appropriate number of samples to reflect the TS that can offer more information that can be beneficial for learning processes. Hence, if we categorize areas with multiple classes, we must strive for these two things to be fulfilled. Examples include the stratified distribution classification using 25 and 50 TS (Figure 5) and the 25 and 50 TS in oversampling models (Figure 6c and f).

The need for a proportional number of TS reflects the IR value. Figure 6 and Table 7 demonstrate the influence of IR on the quality of the classifications. The number of TS utilized in the minor class should not be too limited, nor should the IR value be too high for appropriate identification. The minor class should have appropriate TS to enable the decision tree to develop for the class to be detected proportionally [70,71]. This condition occurred in the oversampling method with 100 TS (Figure 6i). The classifications of pseudo gossan and inferred porphyry exhibit the most similar coverage in comparison to the existing lithological map. Still, in other models, the coverage of these two classes became smaller, and even disappeared in some locations when the number of TS and the IR value increased (Figures 6r and u). Hence, for the classifier to accurately detect the lithology class, it is recommended to offer a substantial number of TS [70], which is proportional to the coverage area, with an appropriate (not too high) IR value. Additional research is required to verify the relationship between the number of TS, IR, and coverage area.

Intrinsic characteristics are crucial for applying existing and developing new techniques to deal with imbalanced data (23,63). One of the intrinsic characteristics that greatly influences the identification of lithology classes is class overlapping or class complexity. This can be seen in the area with the inferred porphyry class, between the quaternary alluvium and undifferentiated porphyry classes, located in the west-central region. With this overlapping class, the identification of minor classes in this region always experiences difficulties, which results in low classification accuracy (23). Thus, it is necessary to study further to parse this area by carrying out a partial classification so that the TS for the inferred porphyry class is not isolated in a narrow area and is in the middle of the TS of other classes.

5 Conclusion

The problem of imbalanced data is often encountered in using MLAs in lithology classification. The result demonstrated that the oversampling method, which is hybridized with the RF algorithm, generally outperformed the other two methods with the following performance metrics values: 1.00 (training accuracy), 0.70–0.82 (testing accuracy), 0.43–0.56 (F1 score), and 0.32–0.59 (Kappa score). Additionally, the comparative visual assessment also displayed that the oversampling method produced the classification result closest to the existing map regarding the class types, coverage areas, and boundaries.

The IR functions significantly in the classification. The addition of TS improved the accuracy of the classifications. Increased IR in the minor class affected the classifier’s ability. Still, the IR cannot be too large and must be proportional to the population. Hence, there is an association between the class identification ability and the number of TS proportional to each lithology class coverage area.

Considering the limitations of our study, we want to explore various aspects of our future work. As a starting point, it should be remarkable to investigate the influence of learning strategies on several classifiers. As stated previously, we selected the RF algorithm for our study since it has been shown to be a viable solution for unbalanced classification problems in several areas. Nevertheless, alternative options may also be explored to examine how various classifiers and learning strategies can handle the imbalanced data samples, especially with limited TS. Since the CSL method is unable to provide good results in this study, we aim to further investigate the determination of class weights as a penalty for misclassification costs, which functions as the basis for the effectiveness of the CSL method in identifying classes correctly. Likewise, the oversampling method yielded the best results in this research. Examining the classification’s findings indicates a poor performance metric value. Nonetheless, further studies must be carried out on this model if the capacity to recognize is to improve.

Acknowledgments

The authors wish to acknowledge PT. Eksplorasi Nusa Jaya and Mine Serve International for their support of this research. This research was partially funded by Institut Teknologi Nasional Bandung, Indonesia.

Funding information: This research was partially funded by Institut Teknologi Nasional Bandung, Indonesia (Grant No.: 077/G.22.01/Rektorat/Itenas/VII/2022).
Author Contributions: Conceptualization, H.N.; methodology, H.N.; software, S.B. and A.S.; validation, H.N., S.B. and A.S.; formal analysis, H.N. and K.W.; investigation, H.N., S.B. and A.S.; resources, K.W., S.B. and A.S.; data curation, H.N., K.W. and S.B.; writing – original draft preparation, H.N.; writing – review and editing, H.N., S.B., and A.S.; visualization, H.N.; supervision, K.W.; project administration, K.W. and H.N.; funding acquisition, H.N. All authors have read and agreed to the published version of the manuscript.
Conflict of interest: The authors declare no conflict of interest.
Data availability statement: DEMNAS data are available at the Geospatial Information Agency, Republic of Indonesia, and geophysical data are owned by PT. Eksplorasi Nusa Jaya. Requests for both types of data can be addressed to each organization.

References

[1] Merembayev T, Kurmangaliyev D, Bekbauov B, Amanbek Y. A comparison of machine learning algorithms in predicting lithofacies: Case studies from Norway and Kazakhstan. Energies. 2021;14:1–16.Search in Google Scholar

[2] Xi Y, Taha AMM, Hu A, Liu X. Accuracy comparison of various remote sensing data in lithological classification based on random forest algorithm. Geocarto Int. 2022;37(26):14451–79. 10.1080/10106049.2022.2088859.Search in Google Scholar

[3] Zhou K, Zhang J, Ren Y, Huang Z, Zhao L. A gradient boosting decision tree algorithm combining synthetic minority oversampling technique for lithology identification. Geophysics. 2020;85(4):WA147–58.Search in Google Scholar

[4] De Araújo Neto JF, Santos GL, De Albuquerque E, Souza IMB, De Brito Barreto S, De Lira Santos LCM, et al. Integration of remote sensing, airborne geophysics and structural analysis to geological mapping: A case study of the Vieirópolis region, Borborema Province, NE Brazil. Geol USP - Ser Cient. 2018;18(3):89–103.Search in Google Scholar

[5] Harvey AS, Fotopoulos G. Geological mapping using machine learning algorithms. Int Arch Photogramm Remote Sens Spat Inf Sci - ISPRS Arch. 2016;41(July):423–30. https://ui.adsabs.harvard.edu/abs/2016ISPAr41B8.423H.Search in Google Scholar

[6] Kuhn S, Cracknell MJ, Reading AM. Lithological mapping in the Central African Copper Belt using Random Forests and clustering: Strategies for optimised results. Ore Geol Rev. 2019;112:103015. 10.1016/j.oregeorev.2019.103015.Search in Google Scholar

[7] Kuhn S, Cracknell MJ, Reading AM, Sykora S. Case history identification of intrusive lithologies in volcanic terrains in British Columbia by machine learning using random forests: The value of using a soft classifier. Geophysics. 2020;85(6):235–44.Search in Google Scholar

[8] Halotel J, Demyanov V, Gardiner A. Value of geologically derived features in machine learning facies classification. Math Geosci. 2020;52(1):5–29. 10.1007/s11004-019-09838-0.Search in Google Scholar

[9] Li G, Zheng Y, Li Y, Wu W, Hong Y, Zhou X. Recognition of stratum lithology of seismic facies based on deep belief network. 2nd International Conference on Artificial Intelligence and Industrial Engineering (AIIE2016); 2016. p. 354–7.Search in Google Scholar

[10] Fuentes I, Padarian J, Iwanaga T, Vervoort RW. 3D lithological mapping of borehole descriptions using word embeddings. Comput Geosci. 2020;141:32. 10.1016/j.cageo.2020.104516.Search in Google Scholar

[11] Onan A. Hybrid supervised clustering based ensemble scheme for text classification Abstract. Kybernetes. 2017;46(2):330–48.Search in Google Scholar

[12] Onan A, Korukoğlu S, Bulut H. LDA-based topic modelling in text sentiment classification: An empirical analysis. Int J Comput Linguist Appl. 2016;7(1):101–19.Search in Google Scholar

[13] Onan A, Korukoǧlu S, Bulut H. Ensemble of keyword extraction methods and classifiers in text classification. Expert Syst Appl. 2016;57:232–47.Search in Google Scholar

[14] Onan A, Korukoğlu S, Bulut H. A hybrid ensemble pruning approach based on consensus clustering and multi-objective evolutionary algorithm for sentiment classification. Inf Process Manag. 2017;53(4):814–33.Search in Google Scholar

[15] Ao Y, Zhu L, Guo S, Yang Z. Probabilistic logging lithology characterization with random forest probability estimation. Comput Geosci. 2020;144:104556. 10.1016/j.cageo.2020.104556.Search in Google Scholar

[16] Kuhn S, Cracknell MJ, Reading AM. The utility of machine learning in identification of key geophysical and geochemical datasets: A case study in lithological mapping in the Central African Copper Belt. ASEG Ext Abstr. 2018;1:1–4.Search in Google Scholar

[17] Kuhn S, Cracknell MJ, Reading AM. Lithological mapping using Random Forests applied to geophysical and remote sensing data: A demonstration study from the Eastern Goldfields of Australia. Geophysics. 2018;84(4):1–37.Search in Google Scholar

[18] Wenhua W, Zhuwen W, Ruiyi H, Fanghui X, Xinghua Q, Yitong C. Lithology classification of volcanic rocks based on conventional logging data of machine learning: A case study of the eastern depression of Liaohe oil field. Open Geosci. 2021;13:1245–58.Search in Google Scholar

[19] Breiman L. Random forests. Mach Learn J Pap. 2001;45:1–33.Search in Google Scholar

[20] Cracknell MJ, Reading AM. The upside of uncertainty: Identification of lithology contact zones from airborne geophysics and satellite data using random forests and support vector machines. Geophysics. 2013;78(3):113–26.Search in Google Scholar

[21] Cracknell MJ, Reading AM. Geological mapping using remote sensing data: A comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information. Comput Geosci. 2014;63:22–33. 10.1016/j.cageo.2013.10.008.Search in Google Scholar

[22] Harris JR, Grunsky EC. Predictive lithological mapping of Canada’s North using Random Forest classification applied to geophysical and geochemical data. Comput Geosci. 2015;80(July):9–25. 10.1016/j.cageo.2015.03.013.Search in Google Scholar

[23] Ali A, Shamsuddin SM, Ralescu AL. Classification with class imbalance problem: A review. Int J Adv Soft Comput Appl. 2015;7(3):176–204.Search in Google Scholar

[24] Fernández A, García S, Galar M, Prati RC. Learning from imbalanced data sets. Springer Nature Switzerland; 2018. p. 377.Search in Google Scholar

[25] Krawczyk B. Learning from imbalanced data: Open challenges and future directions. Prog Artif Intell. 2016;5(4):221–32.Search in Google Scholar

[26] Thabtah F, Hammoud S, Kamalov F, Gonsalvesv AH. Data imbalance in classification: experimental evaluation. Inf Sci (NY). 2019;513:429–41. 10.1016/j.ins.2019.11.004.Search in Google Scholar

[27] Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G. Learning from class-imbalanced data: Review of methods and applications. Expert Syst Appl. 2017;73:220–39. 10.1016/j.eswa.2016.12.035.Search in Google Scholar

[28] Loyola-González O, Martínez-Trinidad JF, Carrasco-Ochoa JA, García-Borroto M. Study of the impact of resampling methods for contrast pattern based classifiers in imbalanced databases. Neurocomputing. 2016;175:935–47. 10.1016/j.neucom.2015.04.120.Search in Google Scholar

[29] Weiss GM. Foundations of imbalanced learning. In: He H, Ma Y, editors. Imbalanced learning: Foundations, algorithms, and applications. Berlin, Germany: John Wiley & Sons; 2013. p. 216.Search in Google Scholar

[30] Noorhalim N, Ali A, Shamsuddin SM. Handling imbalanced ratio for class imbalance problem using SMOTE. Proceedings of the Third International Conference on Computing, Mathematics and Statistics (iCMS2017). Springer Nature Singapore; 2019. p. 19–30.Search in Google Scholar

[31] Ortigosa-Hernández J, Inza I, Lozano JA. Measuring the class-imbalance extent of multi-class problems. Pattern Recognit Lett. 2017;98:32–8.Search in Google Scholar

[32] Sun Y, Wong AKC, Kamel MS. Classification of imbalanced data: A review. Int J Pattern Recogn Artif Intell. 2009;23(4):687–719. 10.1142/S0218001409007326.Search in Google Scholar

[33] Menardi G, Torelli N. Training and assessing classification rules with imbalanced data. Data Min Knowl Discov. 2014;28:92–122.Search in Google Scholar

[34] López V, Fernández A, Moreno-Torres JG, Herrera F. Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics. Expert Syst Appl. 2012;39(7):6585–608.Search in Google Scholar

[35] Karlhede A. Tackling imbalanced data in random forest to predict free-to-fee transitions of a subscription. Stockholm, Sweden: KTH Royal Institute of Technology; 2020.Search in Google Scholar

[36] Sinha S, Ohashi H. Class-wise difficulty-balanced loss for solving class-imbalance. Computer Vision – ACCV 2020; 2020. p. 1–17.Search in Google Scholar

[37] Makienko D, Seleznev I, Safonov I. The effect of the imbalanced training dataset on the quality of classification of lithotypes via whole core photos. In: Fursov V, Goshin Y, Kudryashov D, editors. The VI International Conference Information Technology and Nanotechnology. Samara, Russia: CEUR-WS; 2020. p. 132–6.Search in Google Scholar

[38] Weiss G, McCarthy K, Zabar B. Cost-sensitive learning vs. sampling: Which is best for handling unbalanced classes with unequal error costs? Proceedings of the 2007 International Conference on Data Mining, DMIN 2007, June 25-28, 2007. Las Vegas, Nevada, USA; 2007. p. 1–7. http://storm.cis.fordham.edu/∼gweiss/papers/dmin07-weiss.pdf.Search in Google Scholar

[39] Kaewwichian P. Multiclass classification with imbalanced datasets for car ownership demand model – Cost-sensitive learning. Promet–Traffic Transp. 2021;33(3):361–71.Search in Google Scholar

[40] He J, Harris JR, Sawada M, Behnia P. A comparison of classification algorithms using Landsat-7 and Landsat-8 data for mapping lithology in Canada’s Arctic. Int J Remote Sens. 2015;36(8):2252–76.Search in Google Scholar

[41] Costa I, Tavares F, Oliveira J. Predictive lithological mapping through machine learning methods: a case study in the Cinzento Lineament, Carajás Province, Brazil. J Geol Surv Braz. 2019;2(1):26–36.Search in Google Scholar

[42] Harris JR, Juan HX, Rainbird RH, Behnia P. Remote predictive mapping 6: A comparison of different remotely sensed data for classifying bedrock types in Canada’s Arctic: Application of the robust classification method and Ra. Geosci Can. 2014;41(December):557–84.Search in Google Scholar

[43] Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and regression trees. 1st edn. New York: Taylor & Francis; 1984. p. 368.Search in Google Scholar

[44] Mine Serve International. Geological Map Scale of 1:25.000. 2nd edn. Komopa, Papua, Indonesia; 2000.Search in Google Scholar

[45] Skead MB. 1994-1996 Fieldwork in Komopa-Dawagu area, general synthesis. Jakarta, Indonesia: Nabire Bakti Mining; 1996. Search in Google Scholar

[46] Glover JK. The Structural and Lithological Setting, Controls of Mineralization and Potential in the Area of The Komopa-Dawagu Prospects, NBM BLOCK II. Jakarta, Indonesia: Mine Serve International; 1999.Search in Google Scholar

[47] Moore CB. Interpretation of The 1993 Irian jaya airborne geophysical surveys. Jakarta, Indonesia: Nabire Bakti Mining; 1994.Search in Google Scholar

[48] Google Map [Internet]; 2022 [cited 2022 Feb 22]. https://www.google.co.id/maps/@-3.7555498,136.5555741,46322m/data=!3m1!1e3?hl=en.Search in Google Scholar

[49] Satimagingcorp. Sentinel-2A (10m) Satellite Sensor [Internet]; 2022 [cited 2022 Aug 31]. p. 3. https://www.satimagingcorp.com/satellite-sensors/other-satellite-sensors/sentinel-2a/.Search in Google Scholar

[50] European Space Agency. Sentinel 2A [Internet]; 2019 [cited 2019 Dec 10]. https://earth.esa.int/web/sentinel/user-guides/sentinel-2-msi/product- types/level-2a.Search in Google Scholar

[51] L3Harris Geospatial Solution. Vegetation Suppression [Internet]. L3Harris Geospatial; 2020 [cited 2020 Aug 1]. p. 2–4. https://www.l3harrisgeospatial.com/docs/vegetationsuppression.htmlSearch in Google Scholar

[52] Geospatial Information Agency-Republic of Indonesia. DEMNAS Seamless Digital Elevation Model (DEM) dan Batimetri Nasional [Internet]; 2018 [cited 2019 Mar 20]. http://tides.big.go.id/DEMNAS/#Info.Search in Google Scholar

[53] Bannari A, El-Battay A, Saquaque A, Miri A. PALSAR-FBS L-HH mode and landsat-TM data fusion for geological mapping. Adv Remote Sens. 2016;5(4):246–68.Search in Google Scholar

[54] European Space Agency. SNAP [Internet]; 2022 [cited 2022 Sep 16]. https://earth.esa.int/eogateway/tools/snapSearch in Google Scholar

[55] European Space Agency. Level-1 radiometric calibration [Internet]; 2020 [cited 2020 Apr 10]. https://sentinel.esa.int/web/sentinel/radiometric-calibration-of-level-1-productsSearch in Google Scholar

[56] Ottinger M, Kuenzer C. Spaceborne L-band synthetic aperture radar data for geoscientific analyses in coastal land applications: A review. Remote Sens. 2020;12(14):1–36. 10.3390/rs12142228.Search in Google Scholar

[57] GeoSci. Electromagnetic Data Processing [Internet]; 2018 [cited 2022 Feb 6]. https://em.geosci.xyz/content/case_histories/bookpurnong/processing.html.Search in Google Scholar

[58] Scikitlearn. GridSearchCV [Internet]; 2020 [cited 2021 Jun 10]. p. 1–7. https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html.Search in Google Scholar

[59] Tyralis H, Papacharalampous G, Langousis A. A brief review of random forests for water scientists and practitioners and their recent history inwater resources. Water (Switzerland). 2019;11(5):910.Search in Google Scholar

[60] Probst P, Wright M, Boulesteix A-L. Hyperparameters and tuning strategies for random forest. WIREs Data Min Knowl Discov. 2019;9:1–19. 10.1002/widm.1301.Search in Google Scholar

[61] Scikitlearn. Sklearn.ensembleRandomForestClassifier [Internet]; 2020 [cited 2020 Jan 20]. https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html.Search in Google Scholar

[62] Breiman L. Bagging predictors. Mach Learn. 1996;140:123–40.Search in Google Scholar

[63] Brownlee J. Imbalanced classification with Python: Choose better metrics, balance skewed classes, and apply cost-sensitive learning [Internet]. v1.2. Machine Learning Mastery; 2020. https://machinelearningmastery.com/imbalanced-clas. https://machinelearningmastery.com/imbalanced-classification-with-python/.Search in Google Scholar

[64] Mohamed IM, Mohamed S, Mazher I, Chester P. Formation lithology classification: insights into machine learning methods. In SPE Annual Technical Conference and Exhibition. Calgary, Alberta, Canada: Society of Petroleum Engineers; 2019. 10.2118/196096-MS.Search in Google Scholar

[65] Zhang C, Wen H, Liao M, Lin Y, Wu Y, Zhang H. Study on machine learning models for building resilience evaluation in mountainous area: A Case Study of Banan District, Chongqing, China. Sensors. 2022;22(3):1163.Search in Google Scholar

[66] McHugh ML. Lessons in biostatistics Interrater reliability: The kappa statistic. Biochem Medica. 2012;22(3):276–82.Search in Google Scholar

[67] Shebl A, Kusky T, Csámer Á. Advanced land imager superiority in lithological classification utilizing machine learning algorithms. Arab J Geosci. 2022;15(923):1–13. 10.1007/s12517-022-09948-w.Search in Google Scholar

[68] Tischio RM, Weiss GM. Identifying classification algorithms most suitable for imbalanced data. Bronx, New York, USA: Dept. of Computer & Info. Science Fordham University; 2019.Search in Google Scholar

[69] Zhu R, Guo Y, Xue JH. Adjusting the imbalance ratio by the dimensionality of imbalanced data. Pattern Recognit Lett. 2020;133:217–23.Search in Google Scholar

[70] Qian J. Sampling. In: Peterson P, Baker E, McGaw B, editors. International Encyclopedia of Education. 3rd edn. Amsterdam: Elsevier; 2010. p. 390–5. https://doi.org/10.1016/B978-0-08-044894-7.01719-X.Search in Google Scholar

[71] Fernando KRM, Tsokos CP. Dynamically weighted balanced loss: Class imbalanced learning & confidence calibration of deep neural networks. IEEE Trans Neural Netw Learn Syst. 2022;33(7):2940–51. 10.1109/TNNLS.2020.3047335.Search in Google Scholar

[72] Ali A, Shamsuddin SM, Ralescu A. Classification with class imbalance problem: A review. Int J Adv Softw Comput Appl. 2013;5(3):31.Search in Google Scholar

Received: 2022-12-05

Revised: 2023-05-04

Accepted: 2023-05-05

Published Online: 2023-08-04

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/geo-2022-0487

Keywords for this article

lithological map; machine learning; random forest algorithm; imbalanced data; class weight tuning; oversampling; and imbalance ratio

Creative Commons

BY 4.0