Construction and Practice of the Optimal Smooth Semi-Supervised Support Vector Machine

Xiaodan Zhang; Ang Li; Pan Ran

doi:10.1515/JSSI-2015-0398

Article Publicly Available

Construction and Practice of the Optimal Smooth Semi-Supervised Support Vector Machine

Xiaodan Zhang , Ang Li and Pan Ran

Published/Copyright: October 25, 2015

Published by

Become an author with De Gruyter Brill

Author Information

From the journal Journal of Systems Science and Information Volume 3 Issue 5

Abstract

The standard semi-supervised support vector machine (S³VM) is an unconstrained optimization problem of non-convex and non-smooth, so many smooth methods are applied for smoothing S³VM. In this paper, a new smooth semi-supervised support vector machine (SS³VM) model , which is based on the biquadratic spline function, is proposed. And, a hybrid Genetic Algorithm (GA)/ SS³VM approach is presented to optimize the parameters of the model. The numerical experiments are performed to test the efficiency of the model. Experimental results show that generally our optimal SS³VM model outperforms other optimal SS³VM models mentioned in this paper.

Keywords: semi-supervised support vector machine; classification; genetic algorithm; smooth; spline function

1 Introduction

Support vector machines (SVMs) were introduced by Vapnik^[1] in the early 1990s. As an effective method of data mining, SVMs are widely applied in many fields ranging from text categorization^[2], image retrieval^[3], face recognition^[4], intelligent goal-driven^[5], information security^[6] to credit risk evaluation^[7], bankruptcy prediction^[8], time series forecasting^[9], etc. The standard SVM is based on supervised learning. In this result, a large number of labeled samples are required to ensure the preferable classification accuracy. But manual labelling is often a slow, expensive, and error prone process. By contrast, the semi-supervised support vector machines (S³VMs)^[10–12] can take advantage of labeled samples and unlabeled samples simultaneously, and then reduce the labelling cost significantly. Hence, S³VM has widely attracted attention of some researchers^[13–15]. Since the unconstrained optimization problem of S³VM is non-convex and non-smooth, most of the fast algorithms cannot be used to solve the S³VMs problem. With this regard, many famous researchers were involved in the development of smooth S³VM (SS³VM) in recent years^[15–18]. The SS³VM models, smoothed by Gaussian function^[12], polynomial functions^[19], and the cubic spline function^[20], have been proposed successively. However, the influences of all parameters of SS³VM on the classification results have not been deeply studied. In fact, the effects of parameters on the performance of SS³VM cannot be ignored. Therefore, selecting the appropriate parameters of SS³VM is crucial.

In this paper, we firstly explore the smoothing method and propose a new SS³VM model based on a biquadratic spline function. Secondly, we focus on the optimization of parameters of the SS³VM models. A hybrid approach combined with the Genetic Algorithm (GA)^[21] and SS³VM is introduced. Finally, we evaluate the new model through numerical experiments. Four dataset are used to test the efficiency of the new model. Moreover, the effect of the number of labeled samples on the classification accuracy of the new model is analyzed.

2 The SS³VM Model

We consider the two-class classification problem. The dataset consist of m labeled samples xxi,yii=1m and l unlabeled samples x¯x¯ii=1l, where xxi,x¯x¯i∈RRn,yi∈1,−1.xxi(i=1,2,⋯,m)andx¯x¯i(i=1,2,⋯,l) are represented by an m × n matrix A_m_×_n and B_l_×_n, respectively. The result of classification for labeled samples is determined by an m × m diagonal matrix D with 1 or –1 along its diagonal. Precisely, a standard unconstrained optimization model of S³VM is given as follows^[22]:

minωω,b12ωω22+cee1TΛDDAAωω+ee1b+c∗ee2TΛ(BBωω+ee2b)(1)

Here, ω is the normal vector to the bounding plane, b is a bias value, e_i (i = 1, 2) is a column vector of 1’s, with e₁∈ R^m, e₂ ∈ R^l. c and c* are the penalty parameters. Ʌ(t) = max(0, 1 –t) is called the hinge loss function and Ʌ(|t|) = max(0, 1 – |t|) is called the symmetric hinge loss function^[6]. If u = (u₁, u₂, · · · , u_n) ∈ Rⁿ, then Ʌ(u) = (Ʌ(u₁), Ʌ(u₂), · · · , Ʌ(u_n))^T. Nevertheless, the Ʌ(t) and Ʌ (|t|) are not differentiable. As a result, the objective function in the Model (1) is non-convex and non-smooth to which precludes the applications of many fast optimization methods such as BFGS algorithm[²³, ²⁵]. So we modify the Model (1) as below:

minωω,b12ωω22+c2ΛDDAAωω+ee1b22+c∗2fBBωω+ee2b,r22(2)

Here, f(x, r) is an arbitrary smooth function at origin, which is used to approximate the symmetric hinge loss function and r is the relevant parameter. The 1-norm being substituted by 2-norm avoid the non-smoothness of the hinge loss function Ʌ(x)at +1 and the symmetric hinge loss function at +1 and –1. In next section, a biquadratic spline function is constructed to replace the f(x, r)in Model (2).

3 The SS³VM Model Based on the Biquadratic Spline Function

3.1 Construction of Biquadratic Spline Function

Definition 1

Fork>1,y1>0,m,n∈Z+,letx0=−1k,x1=0,x2=1kbe a set of nodes. The function s(x, k) is defined as follows:

s(x,k)=s0(x,k),−1k≤x<0s1(x,k),0≤x≤1kΛx,x>1k(3)

Here s₀(x,k) and s₁(x,k) are the n-times polynomials. Ifs(x, k) satisfies the following conditions:

s(d)(x0,k)=0,d=2,3,⋯,m;k>0,s′(x0,k)=1,s(x0,k)=1−1k;
s(d)(x2,k)=0,d=2,3,⋯,m;k>0,s′(x2,k)=−1,s(x2,k)=1−1k;
s0(d)x1−0,k=s1d(x1+0,k),d=1,2,⋯,m,s0(x1+0,k)=s1(x1−0,k)=y1.

Then it is called the n-times spline function with the m-orders smooth condition at origin which approximating Λ(|x|).

Theorem 1

Letk>1andx0=−1k,x1=0,x2=1kbe a set of nodes. Then there exists a unique biquadratic spline s(x, k) with the second derivative at origin approximating Λ(|x|). The function must have the following expression:

s(x,k)=18k3x4−34kx2−38k+1,−1k≤x≤1kΛx,x>1k(4)

Proof

Let s(x, k) be a quartic spline function with the second derivative at origin, which satisfies the conditions in Definition 1. We derive the equation for s(x, k) on −1k,1k. Set s⁽³⁾(x_i, k) = M_i(i = 0, 1, 2). ∀x∈−1k,0, it has s(x, k) = s₀(x, k). Since s₀(x, k) is a polynomial of degree four on [−1k,0],s₀⁽³⁾(x, y) is a linear function. The expression of s₀⁽³⁾(x, y) can be given by

s0(3)(x,k)=M0(x1−x)h0+M1(x−x0)h0=−M0kx+M1k(x−x0)(5)

The above equation is integrated successively 3 times, we obtain:

s0(x,k)+a12x2+a2x+a3=−M0kx44!+M1k4!(x−x0)4

where a₁, a₂, and a₃ are constants of the integration. According to condition (a) in Definition 1, we can determine: a1=−M02k,a2=−M03k2−1,a3=−M08k3−1.

Similarly, on 0,1k, we have

s1(x,k)+b12x2+b2x+b3=M2kx44!−M1k4!(x2−x)4(6)

where b₁, b₂, and b₃ are constants of the integration. Based on condition (b) in Definition 1, it follows that b1=M22k,b2=−M23k2+1,b3=M28k3−1.

In this way, we show that s(x, k) is piecewise polynomial of degree 4 with parameters M₀, M₁, and M₂ on −1k,1k. Furthermore, applying condition (c) in Definition 1, we obtain the matrix equation:

32313k20−13k2121⋅M0M1M2=0−20(7)

Because the coefficient matrix of Equation (7) is nonsingular, we can get the unique solution M₀ = –3k², M₁ = 0, and M₂ = 3k². Finally, the biquadratic spline function s(x, k) with the second derivative at origin approximating Λ(|x|) is obtained as Equation (4).

Theorem 2

Let Λ(|x|) be the symmetric hinge loss function and s(x, k) be the general biquadratic spline function given in Model (4). Then, for x ∈ Rand k > 1, we have:

0 ≤ s(x, k) ≤ Λ(|x|);
0≤Λ2x−s2(x,k)≤38k(2−38k).

Proof

If x∈1k,+∞∪−∞,−1k, both the inequalities (a) and (b) are true, since s(x, k) = Λ(|x|). It remains to show that the inequalities are true on x∈−1k,1k.

(a). If x∈−1k,0, then s(x,k)=s0(x,k)=18k3x4−34kx2−38k+1. Thus s0(2)(x,k)=32k3x2−32k≤0⇒minxs′0(x,k)=s′0(0,k)=0. Hence, s(x, k) is an increasing function. It follows that

1−38k=s0(x,k)≥s0(−1k,k)=1−1k≥0(8)

In addition, let λ(x,k)=Λx−s0(x,k)=x−18k3x4+34kx2+38k. Then

λ(2)(x,k)=−32k3x2+32k≥0⇒λ′(k,x)≥0(9)

Therefore, λ(x, k) is also an increasing function on −1k,0.minx⁡λ(x,k)=λ(−1k,k)=0 is obtained. So 0 ≤ s₀(x, k) < Λ(|x|), when x∈−1k,0.

If x∈0,1k,s(x, k) = s₁(x, k). Similar to the first part, the inequality (a) can also be obtained.

(b). If x∈−1k,0, since λ(x, k) is an increasing, the maximum 38k can be obtained at the point x = 0. Hence,

0≤Λx−s0(x,k)≤38k.(10)

In addition, Λ(|x|) + s₀(x, k) = λ(x, k) + 2s₀(x, k). Inequalities (8) and (10) imply that: Λx+s0(x,k)≤2−38k.

Thus, 0≤Λ2(|x|)−s02(x,k)≤38k(2−38k).

Similarly, if xx∈0,1k,wehave0≤Λ2x−s2(x,k)≤38k(2−38k) as well.

In order to show the approximating accuracy of different smooth functions, the comparison diagram of smooth performance is shown in Figure 1.

Figure 1

Different smooth functions approximate Λ(|x|)

As shown in Figure 1, the biquadratic spline function is closer to the symmetric hinge loss function than both the Gaussian function and the polynomial function. Though the approximate accuracy of the biquadratic spline function is slightly lower than that of the cubic spline function, the biquadratic spline function has more simple expression.

3.2 The Smooth S³VM Model Based on the Biquadratic Spline Function

f(x) in Model (2) is replaced by the s(x, k) in the Equation (4). We obtain the following biquadratic spline smooth semi-support vector machine model:

minωω,b12ωω22+c2ΛDDAAωω+ee1b22+c∗2sBBωω+ee2b,k22(11)

where k is the smooth parameter.

4 The Optimal SS³VM Model Based on GA

One of the big problems in SS³VM is the selection of the values of parameters that will allow good performance. But, it is not known beforehand what values are the best for the model. In order to select the suitable parameters of the SS³VM, the GA is applied^{[21, 26–28]}. GA is an artificial intelligence procedure based on the theory of natural selection and evolution. Unlike the conventional optimization methods, it has the advantages consisting of parallel search, solving complex problems, and large search space.

The SS³VM training algorithm and GA are combined to optimize the parameters of SS³VM. For ease of notation, we named this hybrid method as GA/SS³VM method. Figure 2 shows the overall procedure of the GA/SS³VM method.

Figure 2

Overall procedure of GA/SS³VM method

The detailed steps of GA/SS³VM method are given as below:

Define the string (or chromosome).
According to the Model (2), the parameters c, c*, and r need to be optimized. So, the individual is defined as x = (c, c*, r). The chromosome of x is encoded as a l-bit string which consists of l₁ bits standing for c, l₂ bits standing for c*, and l₁ bits standing for r. Here, l = (l₁+l₂ + l₃).
Determine the fitness function.
The fitness of an individual of population is based on the performance of SVM. So, the prediction accuracy of SS³VM is taken as the fitness function F(x).
Initialization.
1. Define the size of population N, probability of crossover P_c, and probability of mutation P_m.
2. Generate an initial population consisted of N l-bit strings described in Step 1 randomly: P(L)={xxj(L)=(cj,cj∗,rj)L,j=1,2,⋯,N}, where L is the number of the current generation population. In initial generation, L = 0.
Decode the jth string to obtain the corresponding individual xxjL=(cj,cj∗,rj)L.
Apply x_j(L) to the SS³VM model to compute the Fitness F (x_j (L)).
Evolution;
1. Find the worst fitness FminL, the best fitness FmaxL, and the corresponding xxminL in the Lth generation population.
2. Replace xxminLwithxxminL.
Calculate the fitness of the Lth generation population: TL=∑j=1NFxxjL.
Reproduction.
1. Compute cumulative probabilities qj=∑i=1jpi(j=1,2,⋯,N),wherepi=FxxiLTL
2. Generate N random numbers r₁_i, i = 1, 2, · · ·, N in [0, 1]. For each r₁_i, If q_j–₁ < r₁_i ≤ q_j, then select the jth string, otherwise, select the first string such that r₁_i ≤ q₁.
Generate offspring population P(L + 1) by performing crossover and mutation.
1. Crossover: For two parent individuals, a random number r₂in [0, 1] is generated. If r₂ < P_c, choose a random crossover point and exchange the genetic code of the two parent individuals on this point to obtain two new child individuals.
2. Mutation: Generate a random number r₃ in [0, 1] and select a bit randomly. If r₃ < P_m, then operate mutation for the bit.
If the terminal condition is satisfied, output the best individual xxmax(L+1). Otherwise, do Steps 4~10. Terminal condition: The maximum number of generations L_max is reached or the optimum individual does not improve during successive generations.

5 Data Preparation

5.1 Datasets

In this section, four datasets are used to test the hybrid GA/ SS³VM method and our new SS³VM model. All the datasets are obtained from UCI Machine Learning Library (http://archive.ics.uci.edu/ml/). The datasets are named as Heart, QSAR, Wine, and Wilt, respectively. In Heart dataset, the data includes features of the heart disease patients. Heart disease conditions are divided into two categories: presence and absence. The QSAR dataset is used to develop QSAR (Quantitative Structure Activity Relationships) models and study the relationships between chemical structure and biodegradation of molecules. The purpose is to discriminate molecules being ready biodegradable molecules or not. The Wine dataset is related to the white wine quality. The dataset includes objective data and sensory data. The objective data consists of 11 physicochemical attributes such as fixed acidity, volatile acidity and etc. The sensory data includes the median scores of wine quality from 0 (very bad) to 10 (very excellent) graded by experts. In our experiment, if the quality score is bigger than 5, the wine quality is considered as excellent, otherwise, the wine quality is considered as poor. The Wilt dataset involves detecting diseased trees in Quickbird imagery. There are samples for the “diseased trees” class and many for “other land cover” class. For the above four datasets, each sample has been marked. The detailed information of the above datasets is shown in Table 1.

Table 1

Information of the four datasets

Data sets	Classes	Samples	Attributes
Heart	2	270	13 (age, sex, cp, etc)
QSAR	2	1056	42 (nHM, thenCp, etc)
Wine	2	4898	12 (fixed acidity, citric acid, etc)
Wilt	2	4339	5 (GLCM_Pan, Mean_G, etc)

5.2 Data Pre-Processing

5.2.1 Data Reduction

The above datasets, especially QSAR, contain a large number of attributes. If all the ratios are used as inputs of SS³VM, it would result in redundancy and low efficiency. So, the principal component analysis (PCA) is applied to avoid these problems. The PCA^[29–32] can be used for reducing complexity of input variables and it is intended to have a better interpretation of variables. Let X_i be the original variable and R be a variance-covariance matrix. Then, the principal components can be expressed as follows:

Yi=αi⋅X=αi1X1+αi2X2+⋯+αipXp

For the above four data sets, the numbers of their original attributes are 13, 42, 12, and 5, respectively. The number of attributes is integrated into 10, 15, 8, and 4 by PCA. In this result, the useless information redundancy and the computational complexity are reduced.

5.2.2 Data Normalization

In order to avoid the numerical difficulties caused by different numeric ranges of variables, the selected input variable is normalized by x_scoled = 2 (x– x_min)/(x_max – x_min)–1. Here, x is the original variable, x_max, x_min is the maximum, the minimum of original variable, respectively.

6 Numerical Experiments

6.1 Experimental Design

In order to test GA/SS³VM method and the new Model (11), the three other smooth SS³VM models are applied in our experiments. For ease of notation, these SS³VM models are named as follows:

The SS³VM smoothed by Gaussian function is named as GSS³VM;
The SS³VM smoothed by a polynomial function is named as PSS³VM;
The SS³VM smoothed by the cubic spline function is named as 3SS³VM;
The SS³VM Model (11) is named as 4SS³VM.

The experiments are divided into three parts as follows:

Compare performance of four SS³VM models optimized by GA/SS³VM method with that of the original ones.
Compare performance of the optimal 4SS³VM model with that of other three optimal SS³VM models.
Evaluate the classification accuracy sensitivity of the optimal 4SS³VM model to the different proportion of labeled samples.

Here, the SS³VM models are solved by BFGS algorithm and the performance of models is evaluated by classification accuracy and CPUtime.

Since the samples of above four datasets are all labeled. For performing semi-supervised experiments, the unlabeled samples are simulated by dropping labels from some labeled samples. Each dataset is randomly separated into two portions: training set and testing set. The training set is used to train the models and select the optimal parameters, and the testing set is applied to evaluate the performance of the models. The ratios of two portions are about 0.7 and 0.3. In the first two part of the experiment, for either training set or testing set of each dataset, the proportion of the labeled samples is set as 1/5, i.e., the 20% samples are chosen as labeled and the rest are processed as unlabeled. And, for each training set, the GA/SS³VM method is used to optimize parameters of SS³VM model. For the four models: GSS³VM, PSS³VM, 3SS³VM, and 4SS³VM, the parameters to be optimized, are (c, c*), (c, c*, n), (c, c*, k), and (c, c*, k) respectively. Here, k is the smooth parameter and n is the degree of the polynomial function. Otherwise, an assumption is applied that, during the optimization process, the labels found for the unlabeled samples are right. So, c = c* is set. The parameters of GA used to optimize these models are shown in Table 2.

Table 2

GA parameters for optimizing models

Parameters	Value
Size of population (N)	40
Maximum Number of generations (max I)	70
Length of chromosome of c (c* = c)	5
Length of chromosome of k	12
Length of chromosome of n	3
Crossover rate (P_c)	0.7
Mutation rate (P_m)	0.01

In part three of the experiment, the sensitivity of the optimal 4SS³VM model to the proportion of labeled samples is tested. For each the dataset, we change the proportion of labeled samples to observe the volatility of classification accuracy of the optimal 4SS³VM model. Here, we arbitrarily select 20%, 30%, 40%, and 50% as the proportion of the labeled samples.

6.2 Experimental Results and Comparisons

In this section, the results that corresponding to the three parts of the experiments are obtained and analysed. Figure 3 shows the evolution process of the optimal parameters selection on Heart dataset. As it is shown in Figure 3, for four SS³VM models, the stop criterion of GA process is satisfied at about 10th or 15th generation (L = 10 or L = 15), i.e., the optimal parameters of four SS³VM models are obtained at about 10th or 15th generation. The procedures of the parameters selection for QSAR, Wine, and Wilt datasets are similar to that of the Heart dataset. It is emphasized that the selection procedures are distinct for different training sets. Moreover, the process is likely not constant for same training sets because of the stochastic of GA. Table 3 summarizes the optimal parameters for the different SS³VM models on the above four datasets. The training accuracy, testing accuracy, and training time are also shown in Table 3. As it can be observed, comparing with other models, the 4SS³VM model achieves the preferable training accuracy and testing accuracy on four datasets with less CPUtime.

Figure 3

Evolution of the parameters selection. (a), (b), (c), and (d) show the parameters selection procedures of GSS³VM, PSS³VM, 3SS³VM and 4SS³VM respectively (NB: Heart Dataset, labeled proportion 20%)

Table 3

GA/SVM performance on four datasets

Data sets	Models	Optimal parameters	Training accuracy/%	Testing Testing/%	CPUtime /s
Heart	GSS³VM	c = c* = 28	77.33	74.44	0.1814
	FSS³VM	c = c* = 7n = 4	77.33	73.33	2.0603
	3SS³VM	c = c* = 1 k = 82	77.77	73.33	0.1193
	4SS³VM	c = c* = 7 k = 2113	78.88	74.44	0.1148

QSAR	GSS³VM	c = c* = 31	71.55	72.15	0.4419
	FSS³VM	c = c* = 31 n = 8	72.54	74.14	2.9183
	3SS³VM	c = c* = 30 k = 1564	71.69	67.61	0.5777
	4SS³VM	c = c =* 28 k = 2093	71.97	74.44	0.3584

Wine	GSS³VM	c = c* = 25	65.97	67.60	2.4921
	FSS³VM	c = c* = 5n = 3	66.83	67.60	5.3202
	3SS³VM	c = c* = 15 k = 1109	65.97	67.60	5.3202
	4SS³VM	c = c* = 14 k = 1294	65.97	67.60	2.1299

Wilt	GSS³VM	c = c* = 18	98.54	97.78	2.6598
	FSS³VM	c = c* = 18 n = 9	98.54	97.78	18.9791
	3SS³VM	c = c* = 1 k = 100	98.54	97.78	3.6831
	4SS³VM	c = c* = 28 k = 640	98.54	97.78	3.5875

In order to compare the classification accuracy between the optimal SS³VM models and the non-optimized SS³VM models. For each model, 20 groups of parameters are selected randomly and Heart dataset is used to train and test these models. Let J be the index of parameters group. Figure 4 shows the comparison result. It can be easily found that the performance of the optimal SS³VM models outperforms that of the models with non-optimized parameters on training set. Though the testing accuracy of the optimal models is not the best, it is good enough. That means the GA/SS³VM method contributes to the improvement of the classification accuracy, which greatly supports our claims. The performance of the optimal SS³VM models on other datasets has the similar results.

Figure 4

Accuracy comparison between the optimal models and others with arbitrary parameters. (a), (b), (c), and (d) show the compare results of GSS³VM, PSS³VM, 3SS³VM and 4SS³VM respectively (NB: Heart Dataset, labeled proportion 20%)

Table 4 displays the classification accuracy and CPU time of the optimal 4SS³VM under the different labeled proportion. It can be seen that the training and testing accuracy have a little volatility about less than 5%. That means the classification accuracy cannot be remarkably improved when the number of labeled samples increases. It implies that the optimal 4SS³VM model has higher computational efficiency, i.e., a small amount labeled samples can be utilized to guarantee the classification accuracy. Thus, the cost of manual labeling can be cut down greatly.

Table 4

Effect of percentage of labeled samples to the 4SS³VM

Data sets	Labeled Proportion	Training accuracy	Testing accuracy	CPUtime /s
Heart	20	78.88	74.44	0.1148
	30	78.88	74.44	0.1862
	40	81.66	72.22	0.2157
	50	82.77	74.44	0.3143

QSAR	20	71.69	72.44	0.3584
	30	72.26	73.29	0.5167
	40	74.39	71.59	0.7695
	50	70.98	73.01	0.7692

Wine	20	65.97	67.60	2.4299
	30	66.27	66.99	3.7138
	40	66.00	67.00	5.3567
	50	66.03	67.66	14.7719

Wilt	20	98.54	97.78	3.5875
	30	98.54	97.78	7.2430
	40	98.54	97.78	7.5340
	50	98.30	98.27	15.4175

7 Conclusions

In this paper, a biquadratic spline function for smoothing the S³VM is proposed. According to the analysis about approximation accuracy, the biquadratic spline function has a preferable performance. Further, a new approach named GA/SS³VM method which integrating SS³VM and GA is presented. Here, GA is used to optimize parameters of SS³VM models. By the hybrid GA/SS³VM method, the optimal SS³VM model is obtained. The optimal SS³VM model is experimentally evaluated on four real datasets. The results show that the SS³VM models optimized by GA/SS³VM approach achieve higher classification accuracy than other non-optimized SS³VM models. In particular, the optimal SS³VM model based on biquadratic spline function has the desirable classification accuracy and the best computational efficiency. Meanwhile, the classification accuracy of new model is insensitive to the labeled proportion which means that good classification accuracy can be achieved with a small amount labeled samples.

For future work, we intend to apply the kernel function for SS³VM and optimize the kernel function, parameters simultaneously.

Supported by the Fundamental Research Funds for the Central Universities of China (FRF-BR-12-021)

References

[1] Vapnik V. The Nature of Statistical Learning Theory. Springer, 1995.10.1007/978-1-4757-2440-0Search in Google Scholar

[2] Joachims T. Text categorization with support vector machines: Learning with many relevant features, Springer, 1998.10.1007/BFb0026683Search in Google Scholar

[3] Melgani F, Bruzzone L. Classification of hyperspectral remote sensing images with support vector machines. IEEE Transactions on Geoscience and Remote Sensting, 2004, 42(8): 1778–1790.10.1109/IGARSS.2002.1025088Search in Google Scholar

[4] Jonsson K, Kittler J, Li Y P, et al. Support vector machines for face authentication. Image and Vision Computing, 2002, 20(5): 369–375.10.1016/S0262-8856(02)00009-4Search in Google Scholar

[5] Satzger B, Kramer O. Goal distance estimation for automated planning using neural networks and support vector machines. Natural Computing, 2013, 12(1): 87–100.10.1007/s11047-012-9332-ySearch in Google Scholar

[6] Mukkamala S, Janoski G, Sung A. Intrusion detection using neural networks and support vector machines. Proceedings of the 2002 International Joint Conference on Neural Networks, 2002: 1702–1707.10.1109/IJCNN.2002.1007774Search in Google Scholar

[7] Danenas P, Garsva G. Credit risk evaluation using SVM-based classifier. Business Information Systems 2014 Interbational Workshops, 2010: 7–12.10.1007/978-3-642-15402-7_3Search in Google Scholar

[8] Shin K S, Lee T S, Kim H J. An application of support vector machines in bankruptcy prediction model. Expert Systems with Applications, 2005, 28(1): 127–135.10.1016/j.eswa.2004.08.009Search in Google Scholar

[9] Guo Z Q, Wang H Q, Liu Q. Financial time series forecasting using LPP and SVM optimized by PSO. Soft Computing, 2013, 17(5): 805–818.10.1007/s00500-012-0953-ySearch in Google Scholar

[10] Fung G, Mangasarian O L. Semi-supervised support vector machines for unlabeled data classifcation. Optimization Methods and Software, 2001, 15(1): 29–44.10.1080/10556780108805809Search in Google Scholar

[11] Chapelle O, Scholkopf B, Zien A. Semi-Supervised Learning. MIT Press, Cambridge, 2006.10.7551/mitpress/9780262033589.001.0001Search in Google Scholar

[12] Chapelle O, Zien A. Semi-supervised classification by low density separation. 2004.Search in Google Scholar

[13] Chapelle O, Sindhwani V, Keerthi S. Branch and bound for semi-supervised support vector machines. Conference on Neural Information Processing Systems, 2007: 217–240.Search in Google Scholar

[14] Astorino A, Fuduli A. Nonsmooth optimization techniques for semisupervised classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(12): 2135–2142.10.1109/TPAMI.2007.1102Search in Google Scholar PubMed

[15] Reddy I S, Shevade S, Murty M N. A fast quasi-Newton method for semi-supervised SVM. Pattern Recognition, 2011, 44(10): 2305–2313.10.1016/j.patcog.2010.09.002Search in Google Scholar

[16] Yang L M, Wang L S. A class of smooth semi-supervised SVM by difference of convex functions programming and algorithm. Knowledge-Based Systems, 2013, 41: 1–7.10.1016/j.knosys.2012.12.004Search in Google Scholar

[17] Lee Y J, Mangasarian O L. SSVM: A smooth support vector machine for classification. Computational Optimization and Applications, 2001, 20(1): 5–22.10.1023/A:1011215321374Search in Google Scholar

[18] Vural V, Fung G, Dy J, et al. Fast semi-supervised SVM classifiers using a priori metric information. Optimization Methods and Software, 2008, 23(4): 521–532.10.1080/10556780802102750Search in Google Scholar

[19] Liu Y Q, Liu S Y, Gu M T. Polynomial smooth semi-supervised support vector machine. Systems Engineering — Theory & Practice, 2009, 29(7): 113–118.Search in Google Scholar

[20] Zhang X D, Ma J G. A general cubic spline smooth semi-support vector machine. Chinese Journal of Engineering, 2015, 37(3): 385–389.Search in Google Scholar

[21] Min S H, Lee J, Han I. Hybrid genetic algorithms and support vector machines for bankruptcy prediction. Expert Systems with Applications, 2006, 31(3): 652–660.10.1016/j.eswa.2005.09.070Search in Google Scholar

[22] Chapelle O, Sindhwani V, Keerthi S S. Optimization techniques for semi-supervised support vector machines. The Journal of Machine Learning Research, 2008, 9: 203–233.Search in Google Scholar

[23] Yuan Y, Huang T. A polynomial smooth support vector machine for classification. Advanced Data Mining and Applications, Springer, 2005.10.1007/11527503_19Search in Google Scholar

[24] Dennis J, John E, Moré J J. Quasi-Newton methods, motivation and theory. SIAM Review, 1977, 19(1): 46–89.10.1137/1019005Search in Google Scholar

[25] Yuan Y X. A modified BFGS algorithm for unconstrained optimization. IMA Journal of Numerical Analysis, 1991, 11(3): 325–332.10.1093/imanum/11.3.325Search in Google Scholar

[26] Huerta E B, Duval B, Hao J K. A hybrid GA/SVM approach for gene selection and classification of microarray data. Applications of Evolutionary Computing, Springer, 2006.10.1007/11732242_4Search in Google Scholar

[27] Zhao X, Huang D, Cheung Y, et al. A novel hybrid GA/SVM system for protein sequences classification. Intelligent Data Engineering and Automated Learning-IDEAL, 2004: 11–16.10.1007/978-3-540-28651-6_2Search in Google Scholar

[28] Adankon M M, Cheriet M. Genetic algorithm-based training for semi-supervised SVM. Neural Computing and Applications, 2010, 19(8): 1197–1206.10.1007/s00521-010-0358-8Search in Google Scholar

[29] Abdi H, Williams L J. Principal Component Analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(4): 433–459.10.1002/wics.101Search in Google Scholar

[30] Manly B F J. Multivariate statistical methods: A primer. 2nd Edition. Chapman & Hall, CRC Press, London, 1986.Search in Google Scholar

[31] Tabachnick B G. Fidell L S. Using multivariate statistics. 3rd Edition. 2001.Search in Google Scholar

[32] Noori R, Kerachian R, Darban A, et al. Assessment of importance of water quality monitoring stations using principal component and factor analyses: A case study of the Karoon River. Journal of Water & Wastewater, 2007, 63(3): 60–69.Search in Google Scholar

Received: 2015-5-26

Accepted: 2015-7-22

Published Online: 2015-10-25

Articles in the same Issue

https://doi.org/10.1515/JSSI-2015-0398

Keywords for this article

semi-supervised support vector machine; classification; genetic algorithm; smooth; spline function

Construction and Practice of the Optimal Smooth Semi-Supervised Support Vector Machine

Article

Abstract

1 Introduction

2 The SS3VM Model

3 The SS3VM Model Based on the Biquadratic Spline Function

3.1 Construction of Biquadratic Spline Function

Definition 1

Theorem 1

Proof

Theorem 2

Proof

3.2 The Smooth S3VM Model Based on the Biquadratic Spline Function

4 The Optimal SS3VM Model Based on GA

5 Data Preparation

5.1 Datasets

5.2 Data Pre-Processing

5.2.1 Data Reduction

5.2.2 Data Normalization

6 Numerical Experiments

6.1 Experimental Design

6.2 Experimental Results and Comparisons

7 Conclusions

References

Articles in the same Issue

Articles in the same Issue

Articles in the same Issue

2 The SS³VM Model

3 The SS³VM Model Based on the Biquadratic Spline Function

3.2 The Smooth S³VM Model Based on the Biquadratic Spline Function

4 The Optimal SS³VM Model Based on GA