Mining Breast Cancer Classification Rules from Mammograms

Jinn-Yi Yeh; Si-Wa Chan; Tai-Hsi Wu

doi:10.1515/jisys-2014-0122

Article Publicly Available

Mining Breast Cancer Classification Rules from Mammograms

Jinn-Yi Yeh , Si-Wa Chan and Tai-Hsi Wu

Published/Copyright: January 7, 2015

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Journal of Intelligent Systems Volume 25 Issue 1

Abstract

Breast cancer is a leading cause of cancer death in women. Early diagnosis and treatment are crucial to reduce the mortality rate and increase patients’ lifespan. Mammography is effective in early detection. This study proposes a computer-aided diagnosis system based on the mini-Mammographic Image Analysis Society database for analyzing mammograms. After selecting the regions of interest, we computed three typical features: the shape, spatial, and spectral domain features. We then applied the structural equation model to obtain relations between the features and the breast tissue type, lesion class, and tumor severity after feature extraction by information gain. Finally, we used the decision tree and classification and regression tree to construct computer-aided diagnosis rules; we generated 10 rules for predicting the classification of abnormal lesions and 11 rules for classifying the tumor severity. These rules can help clinicians detect and identify breast cancer efficiency from mammograms and improve medical care quality.

Keywords: Computer-aided diagnosis system; mammograms; breast cancer classification

1 Introduction

Breast cancer is a leading cause of cancer death in women; however, effective methods to prevent its occurrence are lacking. Moreover, it ranked first among all causes of death in 2011, representing 28% of all cases in Taiwan [23]. However, there is evidence that early detection accompanied with early treatment can increase the survival chances of patients. Thus, early detection is critical for improving breast cancer prognosis. Recent research has emerged on the mechanisms that may contribute to breast cancer [29], suggesting new avenues for cancer control strategy. Nonetheless, mammography screening continues to be a key strategy and has been used as such for decades [35]. Basic information for this detection pertains chiefly to microcalcifications or masses that become visible in mammographic X-ray images.

According to the representations of breast cancer in mammograms, lesions can be classified as space-occupying lesions and microcalcifications (CALC). Space-occupying lesions are further divided into three types: masses, architectural distortion (ARCH), and asymmetry (ASYM). Among them, masses and ARCH are the typical signal characteristics of breast cancer. On the basis of the shape and boundary characteristics, masses can further be divided into speculated masses (SPIC), circumscribed masses (CIRC), and other masses (MISC) [9, 26].

Breast calcifications are deposits of calcium inside the breast tissue. They appear widespread in the breast, and most women have a few on their mammograms at some time, more commonly after menopause [15]. Two major types of calcifications exist depending on the size: macrocalcifications and microcalcifications. Whereas macrocalcifications are nearly always non-cancerous and require neither an additional follow-up nor a biopsy, microcalcifications should be diagnosed after further examinations. Microcalcifications are tiny specks of calcium deposits with an average diameter of 0.3 mm in individual calcifications, and can be scattered throughout the mammary gland or appear in clusters. The size, shape, and distribution of microcalcifications vary. Figure 1 shows typical examples of microcalcification and space-occupying lesions.

Figure 1:

Typical Examples of (A) Microcalcification and (B–F) Space-Occupying Lesions.

2 Background

Space-occupying lesions are defined as groups of cells appearing with varying density (bright regions surrounded by a darker homogeneous background); however, their boundaries are often blurred and difficult to identify. Therefore, the detection of space-occupying lesions on mammograms is difficult. Cheng et al. [4] discussed methods for the automated detection and classification of masses and compared their advantages and disadvantages. They reported that the average sensitivity of radiologists in breast cancer screening is only approximately 75%. However, the performance would be improved if computer-aided detection (CAD) were applied to locate abnormalities. Kom et al. [14] presented an algorithm for the detection of suspicious masses from mammographic images. The algorithm was tested on a database of 61 mammograms, on which masses had previously been marked by experienced radiologists. The results showed that a sensitivity of 95.91% was achieved for mass detection.

Suliga et al. [27] proposed a Markov random field (MRF)-based technique for the automatic detection and classification of masses, which are typically among the first symptoms analyzed in the early diagnosis of breast cancer. The presented MRF model was shown to be an efficient tool for mammogram processing. Kozegar et al. [16] implemented two steps for detecting masses. The first step was to extract suspicious regions from the mammograms through an adaptive thresholding technique. Second, an ensemble classifier was applied to reduce false-positive rates. Experimental results showed that the proposed mass detection algorithm outperformed other competing methods. Oliver et al. [20] provided additional details after reviewing existing methods for automatically detecting and segmenting masses in mammograms.

Because microcalcifications appear as small bright spots within the inhomogeneous background of a mammogram, detecting them is a difficult task. Papadopoulos et al. [22] examined the effect of an image enhancement processing stage and the parameter tuning of a CAD system for the detection of microcalcifications in mammograms. They tested five image enhancement algorithms. The optimal performance for two mammographic data sets was achieved for local range modification (A_ZMIAS = 0.932/A_ZNIJ = 0.915) and the wavelet-based linear stretching (A_ZMIAS = 0.926/A_ZNIJ = 0.904) method. Oliver et al. [21] used local features extracted from a bank of filters for the automatic detection of microcalcifications and clusters in mammographic images. The experimental evaluation was performed on receiver-operating curve (ROC) analysis for microcalcification detection, and on free-response OC analysis for cluster detection, resulting in the sensitivity of > 80% for a single false-positive cluster per image.

Zhang and Gao [33] presented a novel procedure for detecting microcalcification clusters in mammograms that involves image enhancing, feature selection, and supervised learning. The primary contribution of their method is the combination of subspace learning algorithms and the twin support vector machine (SVM). The proposed approach has been evaluated through numerous experiments. Chen et al. [3] used a topological structure over a range of scales in graphical form to classify microcalcification clusters in mammograms. The experimental results showed that the classification accuracy was as high as 96%, and the ROC area was as high as 0.96. Zhang et al. [34] applied mathematical morphology and the SVM to detect microcalcification clusters. They obtained a high level of detection precision and substantially reduced the number of false-positive object regions.

Several studies on microcalcification and space-occupying-lesion detection have been performed. For example, Arodz et al. [1] used the AdaBoost and SVM algorithms to detect suspicious anomalies. The AdaBoost algorithm has an accuracy of 76% for all lesion types and 90% for masses under ideal conditions. Hu et al. [10] developed an adaptive global thresholding segmentation algorithm for detecting suspicious lesions in mammograms. Their experimental results indicated that the algorithm had a sensitivity of 91.3% and 0.71 false-positives per image. Kendall et al. [13] detected anomalies in screening mammograms by using two-dimensional (2D) discrete wavelet transforms, statistical features, and naïve Bayesian classifiers. After a series of tests, the proposed approach resulted in 100% sensitivity and up to 79% specificity for abnormalities. For detecting and classifying suspicious lesions, Veena and Jayakrishna [28] developed a systematic scheme that consists of a back-propagation neural network (BPNN), wavelet transform, and window-based adaptive thresholding. Experimental results showed that the approach yields good detection and classification results. Table 1 lists the summary of studies related to the detection of masses, microcalcifications, and suspicious lesions in mammograms.

Table 1:

Summary of Related Studies for Breast Cancer Analysis.

Author	Purpose	Main Techniques	Data Sets	Performance
Arodz et al. [1]	Detection of suspicious anomalies	AdaBoost and SVM	DDSM^a	Accuracy of 76% for all lesion types and only 90% for masses
Kom et al. [14]	Detection of suspicious masses	Linear transformation filter and a local adaptive thresholding technique	YGOPH: Yaounde Gynaeco-Obstetric and Pediatric Hospital	Sensitivity of 95.91% and ROC area of 0.946
Papadopoulos et al. [22]	Detection of microcalcifications	Five image enhancement algorithms	MIAS^b and University Hospital Nijmegen	LRM (A_ZMIAS = 0.932/A_ZNIJ = 0.915) and wavelet-based linear stretching (A_ZMIAS = 0.926/A_ZNIJ = 0.904)
Hu et al. [10]	Detection of suspicious lesions	Adaptive global thresholding segmentation algorithm	MIAS	Sensitivity of 91.3% and 0.71 false positives per image
Oliver et al. [21]	Detection of microcalcifications	Local features extracted from a bank of filters and a boosted classifier	DDSM and MIAS	Sensitivity > 80% for a single false-positive cluster per image
Zhang and Gao [33]	Detection of microcalcification clusters	Combine subspace learning algorithms and the twin SVM	DDSM	Proposed framework is effective and efficient
Kozegar et al. [16]	Detection of masses	An adaptive thresholding technique and ensemble classifier	MIAS and INBreast	Sensitivity of 91% with false-positive rate of 4.8 per image
Kendall et al. [13]	Detection of anomalies	2D discrete wavelet transforms, statistical features, and naïve Bayesian classifiers	DDSM and MIAS	Sensitivity of 100% and up to 79% specificity for abnormalities
Chen et al. [3]	Classification of microcalcification clusters	Topological structure	DDSM and MIAS	Accuracy is as high as 96% and ROC area is as high as 0.96
Zhang et al. [34]	Detection of microcalcification clusters	Mathematical morphology and SVM	MIAS	High level of precision and reduced false-positive rate
Veena and Jayakrishna [28]	Detection and classification of suspicious lesions	BPNN, wavelet transform, and window-based adaptive thresholding	MIAS	Sensitivity of 92.13%

^aDDSM: Digital Database for Screening Mammography, available at http://marathon.csee.usf.edu/Mammography/Database.html.

^bMIAS: Mammography Image Analysis Society, available at http://peipa.essex.ac.uk/info/mias.html.

Studies present limits because of the specificity of the detected microcalcification, the characteristics of the extracted space-occupying lesions (e.g., size, shape, and number), or the procedure used for detection. Considering these drawbacks, this study proposes a CAD system for analyzing mammograms. The database used to develop our system was provided by the Mammographic Image Analysis Society (mini-MIAS). After selecting the region of interest (ROI), we determined three distinct features: the shape, spatial, and spectral domain features. In addition, we used the structural equation model (SEM) to calculate relations between features and the breast tissue type, lesion class, and tumor severity. Finally, the decision tree (DT) and classification and regression tree (CART) was applied to frame computer-aided breast cancer diagnosis rules. Figure 2 shows a diagram of the proposed method.

Figure 2:

Diagram of the Proposed Method.

The main contributions of the current study are as follows:

A novel method for analyzing irregularities in mammograms that consists of SEM, DT, and CART is proposed.
Several computer-aided diagnosis rules are generated to aid radiologists in classifying abnormal lesions and in classifying tumor severity.
Two practical data sets, namely mini-MIAS and Digital Database for Screening Mammography (DDSM), are used to develop and evaluate the proposed approach.

The remainder of this article is organized as follows: Section 2 details the materials and methods used in the article; Section 3 presents an analysis of the experimental results as well as a discussion; and, lastly, Section 4 offers a conclusion.

3 Materials and Methods

3.1 Data Set

The data set used in this study was provided by the MIAS (mini-MIAS). In the data set, X-ray films have been digitized using a Joyce-Lobel scanning microdensitometer with a resolution of 50 μm × 50 μm, 8-bit word, and the original images have a size of 1024 × 1024 pixels. The mammographic images were obtained from 161 subjects. Both right and left breast photographs of the subjects are provided, totaling 322 files. Odd (even) file numbers concern the right (left) breast mammogram. Of these files, 207 do not show tumor lesions and are presented as normal mammograms, and the remaining 115 plus 8 (some have two or three abnormal areas) show abnormal mammograms with space-occupying or calcification lesions [26].

There are three types of breast tissues: fatty (F), fatty-glandular (FG), and dense-glandular (DG). Among the 207 normal mammograms, 66 mammograms show fatty tissue, 65 mammograms show glandular tissues, and the remaining 76 show dense tissues. Among the 115 abnormal samples, 40 samples have fatty tissues, 39 samples have glandular tissues, and the remaining 36 have dense tissues. Figure 3 shows the distributions of the types of breast tissues.

Figure 3:

Distributions of Types of Breast Tissues.

The class of abnormal lesions describes the type of lesions and tumor disease. Apart from the normal (NORM) mammograms, the remaining abnormal tumor lesions are divided into six categories: CIRC, SPIC, ARCH, ASYM, CALC, and MISC. Among the 123 abnormal mammograms, 25 mammograms have well-defined/circumscribed masses, 19 mammograms have speculated masses, 19 mammograms have architectural distortion masses, 15 mammograms have asymmetry masses, 30 mammograms have calcifications, and the remaining 15 have other ill-defined masses. The severity of the tumor record type of tumors was examined using a biopsy diagnosis to determine whether the tumors were benign (BENIGN) or malignant (MALIG). There are 69 benign cases and 54 malignant cases. Figure 4 shows the classification of mammographic images. The number of mammograms is mentioned in parentheses.

Figure 4:

Classification of Mammographic Images.

3.2 Shape Domain Features

Shape domain features use the intensity histogram of an image to provide various statistical and shape properties. They are based on the distribution of individual pixel values, other than the interaction or co-occurrence with neighboring ones. We calculated 15 gray-level features shown in Table 2, where k and p(k) denote the intensity and its probability determined from the mammogram histogram, respectively. The term bac stands for background, which is the average intensity of the margin of the ROI. Yu and Guan [32] used these features for automatically detecting microcalcification clusters. Cheng et al. [4] also applied these features to detect and classify masses in mammograms.

Table 2:

Shape Domain Features [32].

Features	Expression
Mean (μ)	μ = ∑k = 0255k × p(k)
Standard deviation (σ)	σ2 = ∑k = 0255(k − μ)2 × p(k)
Foreground background ratio (FBR)	FBR = μbac
Foreground background difference (FBD)	FBD= μ– bac
Difference ratio (DIFF)	DIFF = μ − bacμ + bac
Area (AREA)	AREA of the ROI
Compactness (COMP)	COMP = perimeter²/AREA
Elongation (ELONG)	ELONG = max axis/min axis
Moment invariant features (PHI1 – PHI7)	As follows

Moment invariant features can provide the properties of invariance to scale, position, and rotation [5]. For a 2D continuous function f (x, y), the moment of order (p+ q) is defined as

(1)mpq = ∫−∞∞∫−∞∞xpyqf(x, y)dxdy, p, q = 0,1,2, … (1)

The central moments are defined as

(2)μpq = ∫−∞∞∫−∞∞(x − x¯)p(y − y¯)qf(x, y)dxdy, p, q = 0,1,2,…, (2)

where x¯ = m10m00 y¯ = m01m00.

If f (x, y) is a digital image, then equation (2) becomes

(3)μpq = ∑x∑y(x − x¯)p(y − y¯)qf(x, y), (3)

and the normalized central moments, denoted η_pq, are defined as

(4)ηpq = μpqμ00γ γ = p+q+22 p+q = 2,3,4,…. (4)

A set of seven invariant moments can be derived from the second and third moments proposed by Hu [9]

(5)PHI1 = ln(η20+η02), (5)

(6)PHI2 = ln((η20 − η02)2 + 4η112), (6)

(7)PHI3 = ln((η30 − 3η12)2 + (3η21 − η03)2), (7)

(8)PHI4 = ln((η30 + η12)2 + (η21 + η03)2), (8)

(9)PHI5 = ln((η30 − 3η12)(η30 + 3η12)[(η30 + η12)2 − 3(η21 + η03)2]+ (3η21 − η03)(η21 + η03)[3(η30 + η12)2 − (η21 + η03)2]), (9)

(10)PHI6 = ln((η20 − η02)[(η30 + η12)2 − (η21 + η03)2] + 4η11(η30 + η12)(η21 + η03)), (10)

(11)PHI7 = ln((3η21 − η03)(η30 + η12)[(η30 + η12)2 − 3(η21 + η03)2]− (η30 − 3η12)(η21 + η03)[3(η30 + η12)2 − (η21 + η03)2]). (11)

3.3 Spatial Domain Features

The gray-level co-occurrence matrix is a well-established robust method for extracting spatial domain features from images [7]. The matrix element P_{Δx, Δy}(i, j) is the relative frequency with which two pixels separated by distance (Δx, Δy) occur in a given neighborhood, one with intensity I and the other with intensity j. In other words, the matrix element P_{d, θ}(i, j) contains the second-order statistical probability values for changes between gray levels I and j at a particular displacement distance d and at a particular angle θ.

With an M× N input image containing L gray levels from 0 to L– 1, let I(m, n) be the intensity at sample m and line n of the image. First, the square matrix W with size L ×L is established. The element in W is calculated with

(12)Wd, θ(i, j) = ∑l = 1M∑k = 1Nδd, θ(l, k), (12)

where δ_{d, θ}(l, k) is defined as follows:

(13)δd, θ(l, k) = {1 if I(l, k) = i, I(l + dcosθ, k + dsinθ) = j0 otherwise . (13)

This matrix satisfies the symmetry property because the relationships between I and j and j and I have the same meaning. Second, each element in the matrix is normalized to a probability term describing how frequently a gray tone appears in a specified spatial relationship to another gray tone in the image:

(14)pd, θ(i, j) = Wd, θ(i, j)∑i = 1L − 1∑j = 1L − 1Wd, θ(i, j), (14)

where P_{d, θ}(i, j) denotes the normalized probability term. In this study, d was set to 1, 2, 4, and 6, and θ was set to 0°, 45°, 90°, and 135°. Table 3 lists 12 spatial domain features, for which μ_x, μ_y, σ_x, and σ_y denote the means and standard deviations of P_x and P_y. Px(i) is the ith entry in the marginal probability matrix, obtained by summing the rows of P_{d, θ}(i, j). Similarly, P_y(j) is the jth entry, determined by summing the columns of P_{d, θ}(i, j). The total spatial domain features will be 192 (12 × 4 × 4). The spatial domain features (also called texture features) are usually used to detect and classify masses. More references can be found from Cheng et al. [4].

Table 3:

Spatial Domain Features [31].

Feature	Expression
Contrast (CONT)	CONT_d_θ = ∑i = 0L − 1∑j = 0L − 1(i − j)2Pd, θ(i, j)
Correlation (CORR)	CORR_d_θ = ∑i = 0L − 1∑j = 0L − 1(i × j × Pd, θ(i, j) − μx × μxσxσx
Energy	Energy_d_θ = ∑i = 0L − 1∑j = 0L − 1Pd, θ(i, j)2
Entropy	Entropy_d_θ = ∑i = 0L − 1∑j = 0L − 1Pd, θ(i, j)logPd, θ(i, j)
Homogeneity (HOMO)	HOMO_d_θ = ∑i = 0L − 1∑j = 0L − 111 + (i − j)2Pd, θ(i, j)
Dissimilarity (DISS)	DISS_d_θ = ∑i = 0L − 1∑j = 0L − 1\|i − j\| × Pd, θ(i, j)
Intensity	Intensity_d_θ = ∑i = 0L − 1∑j = 0L − 1i × j × Pd, θ(i, j)
Sum of squares variance – X-axis (SSVX)	SSVX_d_θ = ∑i = 0L − 1∑j = 0L − 1(i − μx)2 × Pd, θ(i, j)
Sum of squares variance – Y-axis (SSVY)	SSVY_d_θ = ∑i = 0L − 1∑j = 0L − 1(j − μy)2 × Pd, θ(i, j)
Cluster shade (CS)	CS_d_θ = ∑i = 0L − 1∑j = 0L − 1(i + j − μx − μy)3 × Pd, θ(i, j)
Cluster prominence (CP)	CP_d_θ = ∑i = 0L − 1∑j = 0L − 1(i + j − μx − μy)4 × Pd, θ(i, j)
Maximum probability (MP)	MP_d_θ = MAXi, jPd, θ(i, j)

3.4 Spectral Domain Features

The spectral domain features employ the determination of values as the texture unit. The technique is based on a two-level version of the texture spectrum method [19]. First, image pixels are labeled using a step function that records the differences between the central pixel and its neighbors. Pixel values in the neighborhood are multiplied by binomial weights assigned to the corresponding pixels. Finally, the products are summed to obtain the number of the neighborhood.

The information for a pixel can be extracted from a neighborhood of 3 × 3 pixels, which represents the smallest complete unit (with eight directions surrounding the pixel). Texture units thus characterize the local texture for a given pixel and its neighborhood, and the statistics of all of the texture units over the entire image reveal the global texture aspects [8].

With a neighborhood of 3 × 3 pixels, which are denoted by a set of nine elements V= {V₀, V₁, …, V₈}, where V₀ represents the intensity value of the central pixel and V_i denotes the intensity value of the neighboring pixel i, TU is given by TU= {E₀, E₁, …, E₈}, where E_i is determined as follows:

(15)Ei = {0 if Vi < V01 if Vi = V02 if Vi > V0 i = 1,2,…,8. (15)

From equation (15) above, each element can be assigned one of three possible values so the total number of possible texture units for the eight elements can be estimated as 3⁸= 6561. The texture unit number is defined

(16)NTU = ∑i = 18Ei × 3i − 1, (16)

where N_TU varies from 0 to 6560. The set of 6561 texture units corresponds to the relative gray-level relationships between a pixel and its neighbors in all possible directions, i.e., the local texture aspect of a given pixel in accordance with its neighbors.

Moreover, the eight elements can be ordered differently. If they are ordered clockwise as shown in Figure 5, the first element can take eight possible positions from the top left (A) to the middle left (H), and then the 6561 N_TU can be labeled by the above formula under eight different ordering ways (from A to H). Table 4 lists eight spectral domain features, where S(i) is the occurrence of the ith N_TU; P(a, b, c) is the probability of E_a = E_b = E_c; and K(i) is the probability of E_a = E_e, E_b = E_f, E_c = E_g, and E_d = E_h.

Table 4:

Spectral Domain Features.

Features	Expression
Black-white symmetry (BWS)	BWS = 1 − ∑i = 03279[S1(i) − S1(3281 + i)]∑i = 06560S(i)1
Geometric symmetry (GS)	GS = 1 − 14∑j = 14∑i = 06560[Sj(i) − Sj + 4(i)]2 × ∑i = 06560Sj(i)
Degree of direction (DD)	DD = 1 −16∑m = 13∑n = m + 14∑i = 06560[Sm(i) − Sn(i)]2 × ∑i = 06560Sm(i)
Micro horizontal structure (MHS)	MHS = ∑i = 06560[S1(i) × HM(i)]HM(i) = P(a,b,c) × P(e,f,g)
Micro vertical structure (MVS)	MVS = ∑i = 06560[S1(i) × VM(i)]VM(i) = P(a,g,h) × P(c,d,e)
Micro left diagonal structure (MLDS)	MLDS = ∑i = 06560[S1(i) × LDM(i)]LDM(i) = P(a,b,h) × P(d,e,f)
Micro right diagonal structure (MRDS)	MRDS = ∑i = 06560[S1(i) × RDM(i)]RDM(i) = P(b,c,d) × P(f,g,h)
Central symmetry (CS)	CS = ∑i = 06560S1(i) × K(i)2

The spectral domain features are also called local binary patterns. Some studies applied them to analyze breast cancers, such as Lladó et al. [18] and Joseph and Balakrishnan [11].

Figure 5:

Eight Possible Positions Associated with the Central Pixel.

3.5 DT and Information Gain

A DT is a hierarchical model consisting of root nodes, internal nodes, and leaf nodes. A root node has no incoming branches and has zero or more outgoing branches. Each of the internal nodes has exactly one incoming branch and two or more outgoing branches. Moreover, each of the leaf nodes has exactly one incoming branch and no outgoing branches. A leaf node is also called terminal node. The DT algorithm selects a root node with the highest purity by using all training samples. Each attribute is selected separately to partition these samples. A branch is created for each value of an attribute, and the corresponding subset of samples is moved to the newly created child node. Numerical attributes must be transformed into categorical attributes, and the purity of the node is measured according to the expected amount of information gain. The attribute with the highest information gain is selected to indicate the nodes with the highest purity. The training samples are successively split until all subsets consist of samples belonging to a single class [6, 30].

The information gain is defined as follows:

(17)Gain(X) = INFO(T) − INFOx(T), (17)

where T is a set of train samples and X is a possible test with n outcomes that partition the set T into subsets T₁, T₂,…, T_n. The terms INFO(T) and INFO_x(T) denote the information of T before and after the train samples are partitioned, respectively. The parameter INFO(T) is based on the information theory concept called entropy and is defined as follows:

(18)INFO(S) = entropy(P(Ci, S), 1 − P(Ci, S)) = − P(Ci, S)log2(P(Ci, S)) − (1 − P(Ci, S))log2(1 − P(Ci, S)), (18)

where S is any set of samples and P(C_i, S) represents the probability that the samples in S belong to class C_i. The parameter P(C_i, S) can be calculated as the weighted sum of entropies over the subsets:

(19)INFOx(T) = − ∑i = 1n(INFO(Ti) × |T|i|T|). (19)

The gain criterion selects a test X to maximize Gain(X). Therefore, a typical decision learning system recursively selects attributes to test and splits the data set into subsets according to the outcome of the information gain function.

3.6 Structural Equation Model

The SEM is a statistical model for testing and estimating causal relations by using a combination of statistical data and qualitative causal assumptions (Figure 6).

Figure 6:

Structural Equation Model.

The general SEM can be represented using the following three matrix equations:

(20)ηm × 1 = Bm × m × ηm × 1 + Γm × n × ξn × 1 × ζm × 1, (20)

(21)Yp × 1 = ΛY(p × m) × ηm × 1 + εp × 1, (21)

(22)Xq × 1 = ΛX(q × n) × ξn × 1 + δq × 1. (22)

An SEM includes two types of latent variables: exogenous and endogenous. Parameters ξ indicate exogenous constructs, which are independent variables in all equations, whereas η denotes endogenous constructs that are dependent variables in at least one equation. Parameter γ represents regression relations between exogenous constructs and endogenous constructs. Parameter β represents regression relations between two endogenous constructs. Typically, in SEM, exogenous constructs are allowed to co-vary freely. Parameter ϕ represents these co-variances. Manifest variables associated with exogenous constructs are labeled X, whereas those associated with endogenous constructs are labeled Y. An SEM includes two separate λ matrices that connect manifest variables with latent variables, one on the X side and the other on the Y side. Parameters δ and ε denote measurement errors, whereas ζ represents the structural error [25].

3.7 Classification and Regression Tree

CART is a prediction and classification tool for a new mammogram test. The data set can be categorical or continuous, and values can be missing. A categorical data set produces classification trees, and a continuous data set produces regression trees. The proposed approach by Breiman et al. [2] involves three stages, namely growing, pruning, and optimizing [17]. The growing stage involves creating the tree in a recursive manner by partitioning the training samples into successively purer subsets according to a splitting criterion. For regression trees, the criterion can be least squares, the trimmed mean, or least absolute deviations. For classification trees, the criterion can be Gini, towing, ordered towing, or ϕ coefficients.

The pruning stage involves discarding one or more subtrees according to a minimal-cost complexity measure. The pruning procedure begins with identifying the largest tree and replacing subtrees with leafs to simplify the tree. The procedure continues until only one node of the tree remains. In the optimizing stage, the tree with the lowest predicted error rate and the highest classification quality is obtained through the cross-validation techniques. These techniques divide samples into training and testing samples. The process entails removing parts of the tree that do not contribute to the classification accuracy of unseen testing samples, and produces a less complex and more comprehensible tree. Furthermore, the one-standard-error rule is applied to obtain a stable tree that consists of smaller trees with comparable accuracy within one standard error. The ranking of features can be measured by computing the decreasing predicted error rate when another feature is used to replace the primary split. Therefore, if a feature has an increased chance of having a primary split, then the ranking of the feature is high.

4 Experimental Design and Analysis

We implemented the proposed method by using Matlab 7.0, Weka 3.7, SPSS 12.0, Lisrel 8.5, and SPM 7.0 (provided by Salford Systems, San Diego, CA, USA) on a personal computer (Intel Core i7, 2.93-GHz CPU, and 3.46 GB RAM). After selecting features by information gain, we obtained the remaining features for shape (FBD, DIFF, AREA, COMPACT, ELONG, PHI2, PHI3, and PHI6) and for spatial (CP_2_90, Intensity_4_90, CP_4_90, Energy_4_90, MP_4_90, Intensity_6_90, CP_6_90, Energy_6_90, and MP_6_90). These features plus the spectral features are then fed into the SEM to establish a structural model.

Figure 7 shows the path model that was fitted using the maximum likelihood method in LISREL to estimate the path parameters; the asterisk indicates 5% significant differences (type I error). The χ² of the SEM was 10,530.57 (p= 0.0) with 546 degrees of freedom. The root mean square error of approximation was 0.24, which indicates that the model was acceptable. For the structure model of the SEM, nine significant paths reached the significance level of 0.05: shape→type (0.18), shape→class (0.05), shape→severity (0.11), spatial→class (0.06), spatial→severity (–0.12), spectral→class (–0.05), spectral→severity (–0.12), type→class (–0.01), and type→severity (–0.03). The value in parentheses is the correlation between latent variables. Therefore, we applied space, spatial, spectral, and type to predict both class and severity.

Figure 7:

Path Model of the Features.

Figure 8 shows the DT tree for predicting the classification of abnormal lesions. AREA, CS, MP_6_90, FBD, BWS, CP_2_90, and PHI6 are shown to play a critical role in rule induction. For instance, if a mammogram ROI is characterized by AREA≤ 10,050, CS> 0.075, MP_6_90 ≤ 0.5655, FBD≤ 16.40, BWS≤ 0.1828, CP_2_90 ≤ 0.9033, it falls into ARCH class with an accuracy of 83.3%. Table 5 lists in summary form the rules and the classification results for lesion class obtained from the constructed tree. Table 6 lists the detailed accuracy by class of DT results for predicting the classification of abnormal lesions. The average accurate rate is 73.64%, and the average ROC area is 0.865.

Table 5:

Rules for Predicting the Classification of Abnormal Lesions.

Node	Rule
1	If AREA≤ 10,050 and CS ≤ 0.075, then class = CALC (57.69%).
2	If AREA ≤ 10,050, CS > 0.075, and MP_6_90 ≤ 0.5655, then class = MISC (42.85%).
3	If AREA ≤ 10,050, CS > 0.075, MP_6_90 ≤ 0.5655, FBD ≤ 16.40, BWS ≤ 0.1828, and CP_2_90 ≤ 0.9033, then class = ARCH (83.33%).
4	If AREA ≤ 10,050, CS > 0.075, MP_6_90 ≤ 0.5655, FBD ≤ 16.40, BWS ≤ 0.1828, and CP_2_90 > 0.9033, then class = CIRC (75.00%).
5	If AREA ≤ 10,050, CS > 0.075, MP_6_90 ≤ 0.5655, FBD ≤ 16.40, and BWS > 0.1828, then class = CIRC (80.00%).
6	If AREA ≤ 10,050, CS > 0.075, MP_6_90 ≤ 0.5655, and FBD > 16.40, then class = ASYM (66.67%).
7	If AREA > 10,050 and AREA ≤ 11,277, then class = NORM (97.18%).
8	If AREA > 11,277 and PHI6 ≤ - 56.28, then class = ASYM (60.00%).
9	If AREA > 11,277, PHI6 > – 56.28, and FBD ≤ 11.49, then class = ARCH (80.00%).
10	If AREA > 11,277, PHI6 > – 56.28, and FBD > 11.49, then class = SPIC (100.00%).

Table 6:

Detailed Accuracy by Class of DT Results.

Class	TP Rate	FP Rate	Precision	Recall	F-Measure	ROC Area
CIRC	0.480	0.089	0.308	0.48	0.375	0.681
NORM	1	0.057	0.967	1	0.983	0.967
MISC	0.133	0.022	0.222	0.133	0.167	0.698
ASYM	0.067	0.032	0.091	0.067	0.077	0.699
ARCH	0.105	0.016	0.286	0.105	0.154	0.674
SPIC	0	0.032	0	0	0	0.632
CALC	0.633	0.070	0.475	0.633	0.543	0.744
Average	0.736	0.054	0.704	0.736	0.715	0.865

Figure 8:

Tree of DT for Predicting the Classification of Abnormal Lesions.

Figure 9 shows the CART tree used for classifying the tumor severity. AREA, DD, FBD, COMPACT, BWS, DIFF, GS, and PHI3 clearly play a critical role in rule induction. For example, if a mammogram ROI is characterized by AREA≤ 10,663.50, DD≤ 0.42, and FBD≤ 13.98, it falls into BENIGN class with an accuracy of 84.6%. Table 7 lists the classification rules for tumor severity obtained from the built tree. The average accurate rate is 87.14%. Figure 10 shows the ROC used for classifying tumor severity. The overall area under the curve is 0.9336.

Figure 9:

Tree of CART for Classifying the Tumor Severity.

Figure 10:

ROC for Classifying the Tumor Severity.

To test the performance of the proposed approach, a set of samples was collected from the DDSM database, which contains 2620 cases, including normal images and images with benign and malignant lesions. One hundred benign images and 100 malignant images were randomly selected. Each image has an assessment code between 1 and 5 that is assigned according to the American College of Radiology Breast Imaging and Reporting Data System (ACR Bi-RADS) standard. The distributions of the data are shown in Figure 11. For the lesion severity classification, the average accuracy rate of CART was 82.5%, and the overall area under the ROC curve was 0.844. For the ACR Bi-RADS assessment classification, the average accuracy rate of CART was 65% and the overall area under the ROC curve was 0.69. Although the results were not sufficient compared with those of the mini-MIAS, the accuracy rates of the ACR Bi-RADS assessment classification were acceptable compared with radiologist operations.

Figure 11:

Distributions of DDSM Samples.

Table 8 lists the accuracy and ROC area obtained using the proposed method relative to those obtained by applying other methods to the mini-MIAS and DDSM databases. The methods were naïve Bayes [13], SVM [34], multilayer perception [28], and AdaBoost [1]. The highlighted sections of the table indicate that the proposed approach obtained the classification rules and outperformed other methods when applied to the mini-MIAS database. For the DDSM database, the proposed approach was not the superior classification method. CART and DT provided the classification rules; however, it is difficult to interpret the results obtained using other methods.

Table 7:

Rules for Classifying the Tumor Severity.

Node	Rule
1	If AREA ≤ 10,663.50, DD ≤ 0.42, and FBD ≤ 13.98, then severity = BENIGN (84.6%).
2	If AREA ≤ 10,663.50, DD ≤ 0.42, FBD > 13.98, and FBD ≤ 21.10, then severity = MALIG (85.7%).
3	If AREA ≤ 10,663.50, DD ≤ 0.42, and FBD > 21.10, then severity = BENIGN (100.0%).
4	If AREA ≤ 10,663.50, DD > 0.42, and COMPACT ≤ 17.45, then severity = MALIG (84.6%).
5	If AREA ≤ 10,663.50, DD > 0.42, COMPACT > 17.45, BWS ≤ 0.14, and DIFF ≤ 0.05, then severity = MALIG (72.2%).
6	If AREA ≤ 10,663.50, DD > 0.42, COMPACT > 17.45, BWS ≤ 0.14, and DIFF > 0.05, then severity = BENIGN (80.0%).
7	If AREA ≤ 10,663.50, DD > 0.42, COMPACT > 17.45, and BWS > 0.14, then severity = BENIGN (80.0%).
8	If AREA ≤ 1283.00, then severity = NORMAL (98.1%).
9	If AREA > 1283.00, GS ≤ 0.64, and PHI3 ≤ – 43.21, then severity = MALIG (100.0%).
10	If AREA > 1283.00, GS ≤ 0.64, and PHI3 > – 43.21, then severity = BENIGN (88.9%).
11	If AREA > 1283.00 and GS > 0.64, then severity = MALIG (88.9%).

Table 8:

Comparison of Accuracy and ROC Area.

Radiologists visually search mammograms for abnormalities; however, the task is repetitive and time consuming. The rules generated in this approach can assist them in detecting mammographic lesions that may indicate the presence of breast cancer. First, radiologists determine an ROI that is then tested using the generated rules. If a suspicious abnormality is detected, then further examination is performed to determine the course of action that may be required. However, CAD acts only as a second reader, and the final decision is made by the radiologist [12, 24]. Moreover, the computational complexity for a new occurrence is constant because the features of the classification rules are determined.

5 Conclusion

Breast cancer is a leading cause of early mortality in women. Mammography is the most effective method for the early detection of breast cancer. Early diagnosis and treatment are crucial to reducing the mortality rate and increasing patients’ lifespan. This article proposed a computer-aided diagnosis system consisting of five main steps. First, the original images are obtained from the mini-MIAS database. Second, after selecting the ROI, we computed three distinct features: the shape, spatial, and spectral domain features. Third, the number of features is reduced by information gain. Fourth, these data are inputted into an SEM to calculate the relationship between features and breast tissue type, lesion class, and tumor severity. Finally, DT and CART are applied to construct a mammogram-based computer-aided breast cancer diagnosis system. The proposed method generates 10 rules for predicting the classification of abnormal lesions with an average accurate rate of 73.64% and average ROC area of 0.865. There are 11 rules for classifying tumor severity with an average accurate rate of 87.14% and average ROC area of 0.934. These rules can help clinicians detect and interpret breast cancer efficiency from mammograms, and thus, improves the quality of medical care.

Corresponding author: Jinn-Yi Yeh, Department of Management Information Systems, National Chiayi University, 580 Sinmin Road, Chiayi City 600, Taiwan, e-mail: jyeh@mail.ncyu.edu.tw

Acknowledgments

We are grateful to the National Science Council for the research grant (NSC 102-2221-E-415 -023).

Bibliography

[1] T. Arodz, M. Kurdziel, E. O. Sevre and D. A. Yuen, Pattern recognition techniques for automatic detection of suspicious-looking anomalies in mammograms, Comput. Methods Prog. Biomed.79 (2005), 135–149.10.1016/j.cmpb.2005.03.009Search in Google Scholar

[2] L. Breiman, J. H. Friedman, R. A. Olshen and C. J. Stone, Classification and Regression Trees, Wadsworth, Pacific Grove, CA, 1984.Search in Google Scholar

[3] Z. Chen, H. Strange, E. Denton and R. Zwiggelaar, Analysis of mammographic microcalcification clusters using topological features, Lect. Notes Comput. Sci.8539 (2014), 620–627.Search in Google Scholar

[4] H. D. Cheng, X. J. Shi, R. Min, L. M. Hu, X. P. Cai and H. N. Du, Approaches for automated detection and classification of masses in mammograms, Pattern Recognit.39 (2006), 646–668.10.1016/j.patcog.2005.07.006Search in Google Scholar

[5] R. C. Gonzalez and R. E. Woods, Digital image processing, 2nd Ed., Prentice Hall, Upper Saddle River, NJ, 2002.Search in Google Scholar

[6] J. Han and M. Kamber, Data mining: concepts and techniques, 3rd Ed., Morgan Kaufmann, New York, 2011.Search in Google Scholar

[7] R. Haralick, M. K. Shanmugam and I. Dinstein, Texture features for image classification, IEEE Trans. Syst. Man Cybern.SMC-3 (1973), 610–621.10.1109/TSMC.1973.4309314Search in Google Scholar

[8] D.-C. He and L. Wang, Texture features based on texture spectrum, Pattern Recognit.24 (1991), 391–399.10.1016/0031-3203(91)90052-7Search in Google Scholar

[9] M. K. Hu, Visual pattern recognition by moment invariants, IRE Trans. Inf. Theory8 (1962), 179–187.10.1109/TIT.1962.1057692Search in Google Scholar

[10] K. Hu, X. Gao and F. Li, Detection of suspicious lesions by adaptive thresholding based on multiresolution analysis in mammograms, IEEE Trans. Instrum. Meas.60 (2011), 462–472.10.1109/TIM.2010.2051060Search in Google Scholar

[11] S. Joseph and K. Balakrishnan, Local binary patterns, Haar wavelet features and Haralick texture features for mammogram image classification using artificial neural networks, Adv. Comput. Inf. Technol. Commun. Comput. Inf. Sci.198 (2011), 107–114.10.1007/978-3-642-22555-0_12Search in Google Scholar

[12] N. Karssemeijer, J. D. Otten, H. Rijken and R. Holland, Computer aided detection of masses in mammograms as decision support, Br. J. Cardiol.79 (2006), S123–S126.10.1259/bjr/37622515Search in Google Scholar PubMed

[13] E. J. Kendall, M. G. Barnett and K. Chytyk-Praznik, Automatic detection of anomalies in screening mammograms, BMC Med. Imag.13 (2013), 43.10.1186/1471-2342-13-43Search in Google Scholar PubMed PubMed Central

[14] G. Kom, A. Tiedeu and M. Kom, Automated detection of masses in mammograms by local adaptive thresholding, Comput. Biol. Med.37 (2007), 37–48.10.1016/j.compbiomed.2005.12.004Search in Google Scholar PubMed

[15] D. Kopans, Breast imaging, Lippincott-Raven, Philadelphia, 1998.Search in Google Scholar

[16] E. Kozegar, M. Soryani, B. Minaei and I. Domingues, Assessment of a novel mass detection algorithm in mammograms, J. Cancer Res. Ther.9 (2013), 592–600.10.4103/0973-1482.126453Search in Google Scholar

[17] T.-S. Lee, C.-C. Chiu, Y.-C. Chou and C.-J. Lu, Mining the customer credit using classification and regression tree and multivariate adaptive regression splines, Comput. Stat. Data Anal.50 (2006), 1113–1130.10.1016/j.csda.2004.11.006Search in Google Scholar

[18] X. Lladó, A. Oliver, J. Freixenet, R. Martí and J. Martí, A textural approach for mass false positive reduction in mammography, Comput. Med. Imag. Graph.33 (2009), 415–422.10.1016/j.compmedimag.2009.03.007Search in Google Scholar

[19] T. Ojala, M. Pietikainen and D. Harwood, A comparative study of texture measures with classification based on feature distributions, Pattern Recognit.29 (1996), 51–59.10.1016/0031-3203(95)00067-4Search in Google Scholar

[20] A. Oliver, J. Freixenet, J. Martí, E. Pérez, J. Pont, E. R. Denton and R. Zwiggelaar, A review of automatic mass detection and segmentation in mammographic images, Med. Image Anal.14 (2010), 87–110.10.1016/j.media.2009.12.005Search in Google Scholar PubMed

[21] A. Oliver, A. Torrent, X. Lladó, M. Tortajada, L. Tortajada, M. Sentís, J. Freixenet and R. Zwiggelaar, Automatic microcalcification and cluster detection for digital and digitised mammograms, Knowl. Based Syst.28 (2012), 68–75.10.1016/j.knosys.2011.11.021Search in Google Scholar

[22] A. Papadopoulos, D. I. Fotiadis and L. Costaridou, Improvement of microcalcification cluster detection in mammography utilizing image enhancement techniques, Comput. Biol. Med.38 (2008), 1045–1055.10.1016/j.compbiomed.2008.07.006Search in Google Scholar PubMed

[23] Report of Department of Health, Executive Yuan, Taiwan, Statistical Outcome 2011, Available at: http://www.doh.gov.tw/CHT2006/index_populace.aspx. Accessed 15 June, 2014.Search in Google Scholar

[24] M. P. Sampat, M. K. Markey and A. C. Bovik, Computer-aided detection and diagnosis in mammography, Handb. Image Video Process2 (2005), 1195–1217.10.1016/B978-012119792-6/50130-3Search in Google Scholar

[25] A. Skrondal and S. Rabe-Hesketh, Structural equation modeling: categorical variables, Entry for the Encyclopedia of Statistics in Behavioral Science, Wiley, Hoboken, NJ, 2005.10.1002/0470013192.bsa596Search in Google Scholar

[26] J. Suckling, J. Parker, D. Dance, S. Astley, I. Astley, I. Hutt and C. Boggis, The mammographic images analysis society digital mammogram database, Int. Congr. Ser. Exerpt. Med.1069 (1994), 375–378.Search in Google Scholar

[27] M. Suliga, R. Deklerck and E. Nyssen, Markov random field-based clustering applied to the segmentation of masses in digital mammograms, Comput. Med. Imag. Graph.32 (2008), 502–512.10.1016/j.compmedimag.2008.05.004Search in Google Scholar PubMed

[28] U. K. Veena and V. Jayakrishna, CAD based system for automatic detection and classification of suspicious lesions in mammograms, Int. J. Emerg. Trends Technol. Comput. Sci.3 (2014), 338–345.Search in Google Scholar

[29] L. Vona-Davis and D. P. Rose, Adiposity and diabetes in breast and prostate cancer, in: Kolonin, M.G. (eds.), Adipose Tissue and Cancer, pp. 33–51, Springer, New York, 2013.10.1007/978-1-4614-7660-3_3Search in Google Scholar

[30] J.-Y. Yeh, T.-H. Wu and C.-W. Tsao, Using data mining techniques to predict hospitalization of hemodialysis patients, Decis. Support Syst.50 (2011), 439–448.10.1016/j.dss.2010.11.001Search in Google Scholar

[31] J.-Y. Yeh, T.-H. Wu and W.-J. Tsai, Bleeding and ulcer detection using wireless capsule endoscopy images, J. Softw. Eng. Appl.7 (2014), 422–432.10.4236/jsea.2014.75039Search in Google Scholar

[32] S. Yu and L. Guan, A CAD system for the automatic detection of clustered microcalcifications in digitized mammogram films, IEEE Trans. Med. Imaging19 (2000), 115–126.10.1109/42.836371Search in Google Scholar PubMed

[33] X. Zhang and X. Gao, Twin support vector machines and subspace learning methods for microcalcification clusters detection, Eng. Appl. Artif. Intell.25 (2012), 1062–1072.10.1016/j.engappai.2012.04.003Search in Google Scholar

[34] E. Zhang, F. Wang, Y. Li and X. Bai, Automatic detection of microcalcifications using mathematical morphology and a support vector machine, Bio-Med. Mater. Eng.24 (2014), 53–59.10.3233/BME-130783Search in Google Scholar PubMed

[35] H. C. Zuckerman, The role of mammography in the diagnosis of breast cancer, in: I. M. Ariel and J. B. Clearly (Eds.), Breast Cancer: Diagnosis and Treatment, pp. 152–172, McGraw-Hill, New York, 1987.Search in Google Scholar

Received: 2014-8-14

Published Online: 2015-1-7

Published in Print: 2016-1-1

Articles in the same Issue

https://doi.org/10.1515/jisys-2014-0122

Keywords for this article

Computer-aided diagnosis system; mammograms; breast cancer classification