Home i-Vector-Based Speaker Verification on Limited Data Using Fusion Techniques
Article Open Access

i-Vector-Based Speaker Verification on Limited Data Using Fusion Techniques

  • T.R. Jayanthi Kumari EMAIL logo and H.S. Jayanna
Published/Copyright: May 3, 2018
Become an author with De Gruyter Brill

Abstract

In many biometric applications, limited data speaker verification plays a significant role in practical-oriented systems to verify the speaker. The performance of the speaker verification system needs to be improved by applying suitable techniques to limited data condition. The limited data represent both train and test data duration in terms of few seconds. This article shows the importance of the speaker verification system under limited data condition using feature- and score-level fusion techniques. The baseline speaker verification system uses vocal tract features like mel-frequency cepstral coefficients, linear predictive cepstral coefficients and excitation source features like linear prediction residual and linear prediction residual phase as features along with i-vector modeling techniques using the NIST 2003 data set. In feature-level fusion, the vocal tract features are fused with excitation source features. As a result, on average, equal error rate (EER) is approximately equal to 4% compared to individual feature performance. Further in this work, two different types of score-level fusion are demonstrated. In the first case, fusing the scores of vocal tract features and excitation source features at score-level-maintaining modeling technique remains the same, which provides an average reduction approximately equal to 2% EER compared to feature-level fusion performance. In the second case, scores of the different modeling techniques are combined, which has resulted in EER reduction approximately equal to 4.5% compared with score-level fusion of different features.

1 Introduction

The last two decades, biometric technologies plays important role in recognizing a person for various applications [11, 18]. Person authentication using biometric application includes fingerprint, face, iris, ear, retina, DNA and speech. These biometric features are used to develop practical system for various applications. Among these biometric features, speech is considered as one of the features to recognize a person, and it is called speaker recognition. The subcategories of speaker recognition are speaker identification and speaker verification [32]. In identification, the registered test speech data match with all the speaker models to decide the current speaker of the test speech data [32]. Further, speaker verification uses the claimed speaker data to verify against the claimed model. Based on speech data, speaker verification system can be divided into text-dependent and text-independent systems. The same set of text data is used in text-dependent system, and for text-independent system, different text data are used to train and test the speakers [13].

In the present situation, speaker verification performs well in case of sufficient data. The sufficient data contains speech data of a few minutes (<1 min). Existing techniques like speech analysis, feature extraction, modeling and testing perform better under sufficient data condition. On the other hand, limited data means speech data in terms of a few seconds (≤15 s). Many of the biometric applications have to be done using less amount of data to verify a speaker. Speech data can be analyzed in different techniques. The different analysis techniques in state-of-the-art speaker verification systems are segmental, subsegmental and suprasegmental analysis [10]. During segmental analysis, the frame size (FS) and frame rate (FR) in the range of 10–30 ms is used for speech analysis in order to extract the vocal tract information which is called single frame size and single frame rate (SFSR) analysis [19]. For subsegmental analysis, due to rapid variations in the excitation source information as compared to that of vocal tract information. In this case, the analysis of speech is done using FS and FR in the range of 3–5 ms [27]. The behavioral aspects of the speaker is captured by analyzing the FS and FR in the range of 100–300 ms during suprasegmental analysis [12].

The purpose of feature extraction is to extract feature vectors of reduced dimension. The extracted feature information are emphasized and other redundant factors are suppressed in these feature vectors [4, 5]. The mel-frequency cepstral coefficients (MFCC) [28] and linear prediction cepstral coefficients (LPCC) [2] are used to extract the information of vocal tract. The speech signal contains both static and dynamic characteristics. The MFCC and LPCC feature set contain only static characteristics. The dynamic characteristic represented by Delta (∆) and Delta-Delta (∆∆) contains some more speaker information, which is useful in speaker verification [2]. The excitation source features are extracted using prediction residual (LPR) and linear prediction residual phase (LPRP) [24].

Reynolds [28] compared the vocal tract features for speaker recognition and reported that MFCC and LPCC give a better performance compared with the other features. The reasons may be less intra-speaker variability and the availability of rich spectral analysis tools. Speaker recognition performance can be increased by combining vocal tract features and excitation source features [24]. In this study, the training and testing data are limited to 3 s. The feature used as MFCC and its derivatives and either LPR or LPRP. They considered frame size and frame rate in the range of 10–30 and 3–5 ms for MFCC and its derivatives and LPR or LPRP, respectively. The study showed that the combination of MFCC and its derivatives along with either LPR or LPRP features gives better performance compared with individual performance. The author reported that the combined features give better performance compared with individual features in speaker recognition.

Das et al. [9] reported on the effort made to develop speech-based person authentication system involving three different modules of speaker verification under low security applications. They used three different modules of the speaker verification system. The modules are voice-password, text-dependent and text-independent speaker verification. The combination of these system is called multi-level system. The author reported that a multi-level system performs better compared to individual modules for speaker verification. Also, the functionality of each module can be moderated according to the type of application for which the system is designed.

Das et al. [8] concentrated on highlighting the requirement of phonetic match in a text-independent speaker verification framework from the perspective of having practical deployable systems. The authors considered sufficient train data and short test data, and i-vector modeling is used. Three minutes of read speech is considered for training the speaker models. The chosen phrase and a text-constrained phrase are used for testing, and the database used is NIST 2003. The EER obtained is 23%.

Pandey et al. [26] conducted experiments for sufficient training data and limited test data. Limited test data are created by truncating the test data of NIST SRE 2003 database. Four different cases of limited test data duration of 10, 5, 3 and 2 s are considered, and the EER obtained is 5.87%, 10.52%, 16.94% and 22.31%, respectively.

The speaker verification system contains different types of pattern-matching techniques like template matching, probabilistic model and artificial neural network. The nearest-neighbor vector quantization (VQ) belongs to template matching and probabilistic model contains Gaussian mixture model (GMM), GMM–universal background model (UBM), joint factor analysis and i-vector. Further, time delay neural work and decision tree belong to artificial neural network. Among these, i-vector is used for modeling in the present work. The state-of-the-art speaker verification system prefers i-vector-based speaker modeling technique over the conventional modeling approaches due to its compact representation and compatible channel/session compensation techniques [10]. The i-vectors are the low-dimensional representation of GMM mean supervector-derived using factor analysis. The recent works in this area address the problem of mismatch in sensors, environment and language, and changes across sessions [15]. To address these problems, different techniques like score/handset/test normalization [3], within class covariance normalization (WCCN) [16], linear discriminant analysis (LDA) and joint factor analysis (JFA) [36] are available.

Li et al. [23] proposed a new way of applying PLDA mixture models for robust speaker verification. The key idea is to use a classifier to guide the training of PLDA mixture models so that each mixture component precisely models one cluster in the i-vector space. In the testing stage, the verification scores are computed by combining the PLDA scores with dynamic weights depending on the posterior probabilities given by the classifier. The proposed method was compared against state-of-the-art models on the NIST SRE 2012 data set. It achieves much better performance than PLDA and conventional mixture of PLDA under SNR-level variability and channel-type variability.

Al-Ali et al. [1] introduced the use of DWT-based MFCC features and their combination with traditional MFCC features for forensic speaker verification. The i-vector and PLDA classifier are used in this work. Experimental results indicate that the fusion feature warping DWT-MFCC and feature-warped MFCC approach achieved better performance under most environmental noise, reverberation, and noisy and reverberation environments. The robustness in the performance of the fusion feature approach could be used in forensic applications.

Kanagasundaram et al. [21] provided experimental analysis of text-independent speaker verification system by varying the amount of train and test data. The analysis given by Das et al. [7] for very less amount of test data (<10 s) shows that the performance drops significantly even though sufficient speech data are used during training. This trend of downfall in performance for limited test data motivates us to consider a source feature that captures complementary speaker information with limited data. Das et al. [7] demonstrated speaker verification for limited data condition using i-vector modeling. They used 150-dimensional LDA and a full-dimensional WCCN matrix. The study reported that different source features along with MFCC gives better performance than individual MFCC. Dey et al. [11] demonstrated the performance of speaker verification using i-vector as modeling techniques. The study reported that i-vector modeling gives better performance than GMM–UBM, and the performance of raw i-vector-based system was further improved by the use of LDA and WCCN followed by T-norm.

The literature in the works of Gudnason and Brookes [14] and Murty and Yegnanarayana [24] shows that better performance can be achieved by fusing the vocal tract with source excitation features. Studies by Prasanna et al. [27] and Chan et al. [6] show that while dealing with voice source features limited, the amount of train and test data can be used as compared with vocal tract features. This is because voice source features dependence on phonetic content is very less, where as in vocal tract features requires more amount of phonetic content to be captured for speaker modeling. For speaker modeling, a sufficient amount of phonetic content has to be captured in vocal tract features, whereas in excitation source features, the phonetic content dependency is very low. In case of limited data, the amount of data available is very less. The extracted features are insufficient to model well. Each modeling technique uses its own representation of the input pattern. Further, the testing method is used for verifying speaker is different for each modeling technique. Combining scores of different modeling techniques using score-level fusion may give better verification performance in limited data. This motivates us to use score-level fusion using different modeling techniques in case of limited data speaker verification.

In the testing phase, an unknown test speech is represented by channel/session-compensated feature vectors and compared against the claimed model to obtain similarity score. The similarity measure is done based on the employed modeling method. For instance, Euclidean distance [35], log likelihood score (LLS) [29, 30] and log likelihood score ratio (LLSR) [31] are used as the similarity scores for VQ and GMM and GMM–UBM modeling technique, respectively. In an i-vector-based speaker verification system, the test speech is represented as the channel/session-compensated i-vector, and the cosine kernel between the claimed and test i-vectors is used as the similarity measure [10].

The remaining structure of the article is arranged as follows: The database and experimental setup for the present work is explained in Section 2. The development of i-vector-based speaker verification system using vocal tract and source features is explained in Section 3. Section 4 provides the experimental results and discussion of the speaker verification system. Section 5 includes the summary and conclusion for the present work.

2 Database and Experimental Setup for the Work

In the present work, speaker verification is done using the NIST 2003 database [25]. This standard data set contains 2915 speakers. In 356 train speakers, 207 were female speakers and 149 were male speakers. The NIST SRE 2003 evaluation plan contains 2559 test speakers for verifying the speakers. Apart from test and train speech samples, the database also contains development data set to train the UBM [31] and T-matrix. We have used Switchboard Corpus II cellular data of 1872 utterances as development data. A small portion of the development data contains of 251 female and 251 male speakers of roughly 10 h of duration are required to train two gender dependent UBM separately of 8, 16, 32, 64, and 128 Gaussian mixtures. These are combined to form a gender-independent UBM having 16, 32, 64, 128 and 256 Gaussian mixtures. Experiments are conducted for limited data, and we considered the train and test data of durations 3–3, 6–6, 9–9, 12–12 and 15–15 s each, and for better performance, we increased test data by keeping the train data at 15 s (15–20 and 15–25 s). The maximum Gaussian mixtures limited for UBM is 256.

3 Development of i-Vector-Based Speaker Verification System: Vocal Tract and Source Features

The speaker-specific information can be extracted from feature extraction techniques at a reduced data rate [33]. These feature vectors contain vocal tract, excitation source and behavioral traits of speaker-specific information [19]. A good feature is one which contains all components of speaker-specific information. To create a good feature set, different feature extraction techniques need to be understood. These features are modeled using i-vector modeling technique.

3.1 Vocal Tract Features

MFCC and LPCC feature extraction techniques are used to extract vocal tract features. The technique used to extract features of MFCC and LPCC are different, and the performance of these features also varies.

In the case of MFCC, the spectral distortion is minimized using hamming window. The magnitude frequency responses are obtained by applying Fourier Transformation to the windowed frame signal. The 22 triangular band pass filters are used to pass the resulting spectrum. Discrete cosine transform (DCT) is applied to the output of the mel filters in order to obtain the cepstral coefficients. The obtained MFCC features are used to train and test speech data.

LPCC reflects the differences of the biological structure of human vocal tract. The computing method using LPCC is a recursion from LPC parameter to LPC cepstrum according to all-pole model. The coefficients of the all-pole filter form the LPC. It is equivalent to the smoothened envelope of the log spectrum of the speech. The part of the speech which has been windowed is used to calculate the LPC by either autocorrelation or covariance methods. The discrete Fourier transform (DFT) and inverse DFT can be avoided while calculating LPCC using the Durbin recursive method because those methods are time consuming and complex [17].

3.2 Excitation Source Features

The spectral features extracted from THE vocal tract are in the range of 10–30 ms. Some of the speaker-specific information such as the linear prediction (LP) residual and LP residual phase are ignored by the spectral features which can be utilized for speaker verification [27]. The following procedure is carried out to calculate the LP residual. First the LP analysis is used to predict the vocal tract information from the speech data, followed by suppressing the same from the speech data using the inverse filter formation [27, 34]. To calculate LPRP, the LP residual is divided by the Hilbert envelope [34]. The LPR contains information which is obtained by the excitation source mainly glottal closure instants (GCIS) and the speaker-specific information is contained by LPRP [20]. The LPR and LPRP features contain speaker-specific excitation source information which is dissimilar in characteristic. The advantage can be improved by combining these two features.

3.3 i-Vector

The total variability i-vector representation of speech utterances forms the basis for all state-of-the-art speaker verification systems [10]. The GMM mean supervectors of each utterance are projected to a low rank matrix to get a reduced dimension representation is called i-vector. The concatenation of mean vectors of adapted GMM produces GMM mean supervector for speaker utterance. Figure 1 shows the block diagram of the i-vector-based speaker verification system.

Figure 1: Block Diagram Representation of the Speaker Verification System Using i-Vector Modeling [21].
Figure 1:

Block Diagram Representation of the Speaker Verification System Using i-Vector Modeling [21].

The total variability matrix is the channel variability and dominant speaker which are simultaneously represented in the low-rank matrix. If T is the total variability matrix, the i-vector w and adapted GMM mean supervector Ms can be related by the following equation,

(1) Ms=m+Tw,

where m represents the speaker and channel-independent supervector (UBM mean supervector). The UBM represented by a weighted sum of C component Gaussian densities as U={μc, λc, ηc} c=1, 2,…, C where μc, λc and ηc represents the mean vector, covariance matrix and weight associated with mixture component, respectively. It is assumed that L speech feature vectors {X1, X2,…, XL} each having a dimension of F extracted from the speech signal. The mean of 0th-order which is the weight of the mixture component and mean of the first-order Fc centralized with respect to UBM is given by

(2) Nc=i=1LP(c|xt,U)
(3) Fc=i=1LP(c|xt,U)(xtμt),

where c=1, 2,….C represent the components of the UBM, P(c|xt, U) is the posterior probability of the mixture component c generating the feature vector and is the mean of UBM component c.

The learning of the total variability matrix T from the development data is generally done using a variant of probabilistic principal component analysis modified to operate on the Baum-welch statistics of the speech data computed using UBM. The estimated i-vector W is computed by the following equation

(4) W=(I+T1N(u)T)1T1F(u)

where N(u) and Σ are the diagonal matrix of dimension CF×CF with diagonal blocks being NcI and Σc, respectively. F(u) is supervector of dimension CF×1 generated by concatenating all first order Baum–Welch statistics (Fc) for a given utterance u. The speech utterances are represented in the form of i-vectors during training and testing phase. Computing the cosine kernel score of the two i-vectors ŷclm and ŷtst as given below, where ŷclm is the i-vectors of the claimed which was obtained in the training phase and ŷtst is the test speaker utterances:

(5) Score=y^clm,y^tsty^clmy^tst

3.3.1 Session/Channel Compensation in i-Vector-Based Speaker Modeling

The i-vector extracted from the speech utterance contains both channel and speaker variabilities. Implementing channel/session compensation methods help improves the performance of the speaker verification system which is based on i-vector. The different compensation method of session/channel variability to improve the performance of the speaker verification system which is based on i-vector is given below:

  1. Linear discriminant analysis in linear discriminant analysis (LDA): the feature vectors are projected down to a set of new orthogonal axes where the intra-class variance caused by the channel is minimized and inter-class variance is increased [7, 10]. The projection matrix is composed of the eigen vectors corresponding to the best eigen values of the eigen analysis equation as

    (6) (Wc1Bc)v=λv,

    where Wc is the within-class covariance matrix, Bc is the between-class covarience matrix, v is an arbitrary vector and λ is the diagonal matrix of eigen values [17].

  2. Within-class covariance normalization: Compensation of the effects of channel/session mismatch in i-vector and super vector representations can be done using within-class covariance normalization (WCCN) which is a type of linear transformation [10, 16]. The transformation minimizes the upper bounds on the classification error metric and hence minimizes the classification error. The transformation matrix B is obtained by Cholesky decomposition of the inverse of the within-class covariance matrix W as,

    (7) W1=BBt

4 Experimental Results and Discussion

In this work, the experiments are conducted using different features for speaker verification system. The verification performance of the system can be calculated by using equal error rate (EER). It is an operating point where false rejection rate (FRR) equals to false acceptance rate (FAR) [27]. The FRR is defined as ratio of number of rejective true speaker to total number of true speaker. The definition of FAR is ratio of number of accepted impostor to total number of impostor. For the present work, we extracted vocal tract and excitation source features. The vocal tract features are MFCC and LPCC, excitation source features are LPR and LPRP. The amount of data available in limited data condition is very small which gives poor verification performance. To improve the verification performance in limited data condition, we need different levels of information to be extracted from speech data and they have to be combined to get good verification performance. The vocal tract and excitation source information are combined for improving the performance of the speaker verification system under limited data condition. All these features are in the dimension of 13 and 39.

First, we conducted experiments for 13 dimension features. The features of MFCC and LPCC are extracted by considering FS of 20 ms and FR of 10 ms. The LPR and LPRP features are extracted by considering FS of 12 ms and FR of 6 ms. The i-vectors are used as classifier. The Gaussian mixtures of 16, 32, 64, 128 and 256 are considered to model the speakers. The 0-th first-order statistics (GMM mean supervectors) of all speech data of train, test and development data are computed. The channel and session compensation can be computed using LDA analysis followed by WCCN.

Table 1 shows the performance of vocal tract and excitation source feature extraction techniques using different dimensions of i-vector and LDA in terms of EER and decision cost function (DCF). The experiments conducted for the 3–3 second training and testing data for different feature extraction techniques using different dimensions of i-vectors and LDA combination. The best performance is obtained for 100–50 and 50–50 compared to other combinations. For the present work, we considered dimension of i-vector and LDA is 100–50 and 50–50, respectively. The reason for considering this dimension is that, extracted features are in 13 dimensions, and data are limited. Therefore, for 6–6, 9–9, 12–12, 15–15, 15–20 and 15–25 s, this combination is used to model the speaker different amount of train, test and development data. The DCF also follows the same trend in improvement like EER, when EER decreases there are decrease in DCT also.

Table 1:

EER of Speaker Verification for Different Dimensions of i-Vector and LDA Using the NIST 2003 Database for the 3–3 second Train/Test Data and Gaussian Mixture of 16, Features are MFCC, LPCC, LPR and LPRP (13 Dimensions).

Dimensions of (i-vector/LDA) MFCC
LPCC
LPR
LPRP
EER% DCT EER% DCT EER% DCT EER% DCT
400–150 49.86 0.8991 48.10 0.8994 49.10 0.8891 49.06 0.8873
200–150 49.41 0.8981 43.45 0.8152 49.63 0.8983 50.00 0.9000
100–100 43.63 0.8252 42.90 0.8060 45.24 0.8523 44.32 0.8342
100–50 43.36 0.8099 41.64 0.7860 40.51 0.7628 41.86 0.7927
50–50 43.58 0.8247 41.37 0.7812 40.42 0.7643 41.50 0.7821

The performance of the i-vector-based speaker verification system developed using vocal tract features (MFCC and LPCC) and excitation source features (LPR and LPRP) are evaluated for different duration of train and test data for dimensions of 100–50 combination is shown in Table 2. Since the study is for limited data, we evaluated system performance until 15 s. Consider 3–3 second data, the minimum EER is 42.68% and 41.55% is obtained for MFCC and LPCC for Gaussian mixtures of 32 and 128, respectively, and the minimum EER of 39.92% and 40.28% is obtained for LPR and LPRP for the Gaussian mixtures of 64 and 32, respectively. The reduction in EER of LPCC is 1.13% less as compared with reduction in EER of MFCC and LPR is 0.36% less in reduction in EER as compared with LPRP. Further, Table 2 clearly shows that, when train and test data is increased performance will also increases in all feature extraction techniques. The similar trend is observed from Table 3. The minimum EER for 3–3 second data is 40.83% and 40.42% is obtained for MFCC and LPCC features for Gaussian mixtures of 128 in both cases. The LPCC is having reduced EER of 0.41% less as compared with reduced EER of MFCC. The LPR and LPRP is having minimum EER of 40.46% and 40.24% for Gaussian mixture of 128. The LPR is having 0.22% less in reduction as compared with LPRP. As we observed from these two tables, the dimensions of i-vector and LDA combinations 50–50 performance better as compared to 100–50 combination under limited data condition and also performance of LPCC features gives minimum EER as compared to minimum EER of MFCC features. Therefore, future experiments in this work 50–50 combination of i-vector and LDA is used.

Table 2:

EER of Speaker Verification for Using NIST 2003 Database for Different Feature Extraction Techniques (13 Dimensions) and Modeling Done by i-Vectors.

Train/test data (s) Feature extraction techniques No. of dimension: i-vectors=100; LDA=50
16
32
64
128
256
EER% DCT EER% DCT EER% DCT EER% DCT EER% DCT
3–3 MFCC 43.36 0.8099 42.68 0.8051 42.77 0.8011 42.68 0.8046 43.81 0.8221
LPCC 41.64 0.7812 42.77 0.8028 42.09 0.7979 41.55 0.7851 41.59 0.7820
LPR 41.86 0.7927 40.24 0.7593 39.92 0.7554 40.92 0.7734 40.46 0.7639
LPRP 40.51 0.7628 40.28 0.7520 40.42 0.7663 41.37 0.7848 42.63 0.8063
6–6 MFCC 38.70 0.7306 38.16 0.7199 37.08 0.7017 39.43 0.7409 39.97 0.7496
LPCC 39.70 0.7487 38.75 0.7333 39.47 0.7429 37.17 0.6998 35.77 0.6757
LPR 38.79 0.7262 38.88 0.7252 38.07 0.7127 38.34 0.7250 38.79 0.7308
LPRP 38.88 0.7306 38.12 0.7131 39.74 0.7503 39.83 0.7384 39.61 0.7469
9–9 MFCC 32.56 0.6180 31.12 0.5872 32.20 0.6103 29.94 0.5657 29.85 0.5641
LPCC 31.12 0.5903 31.57 0.5953 27.68 0.5191 26.64 0.5028 26.01 0.4910
LPR 35.95 0.6700 34.10 0.6315 33.46 0.6189 33.60 0.6289 33.92 0.6367
LPRP 32.92 0.6133 32.06 0.6013 33.15 0.6262 34.95 0.6575 34.23 0.6449
12–12 MFCC 31.75 0.5993 30.71 0.5823 28.68 0.5415 30.98 0.5800 29.31 0.5541
LPCC 28.36 0.5339 26.19 0.4971 25.11 0.4749 25.20 0.4768 22.31 0.4206
LPR 34.10 0.6380 32.97 0.6230 32.24 0.5864 31.75 0.5888 31.02 0.5798
LPRP 32.38 0.6042 31.57 0.5893 32.83 0.6143 33.24 0.6309 34.10 0.6406
15–15 MFCC 29.43 0.5608 29.11 0.5338 28.61 0.5413 27.35 0.5321 26.21 0.4985
LPCC 21.64 0.4023 21.32 0.4164 20.87 0.3994 20.36 0.3852 19.24 0.3627
LPR 31.79 0.5953 31.57 0.5893 30.98 0.5872 30.68 0.5823 29.64 0.5612
LPRP 32.94 0.6814 32.24 0.6309 31.75 0.5972 31.39 0.5888 30.30 0.5926
15–20 MFCC 28.45 0.5438 28.32 0.5418 27.64 0.5333 27.91 0.5365 25.32 0.4775
LPCC 21.35 0.4012 20.84 0.3965 20.62 0.3964 20.41 0.3832 18.43 0.3512
LPR 29.34 0.5543 29.18 0.5523 28.84 0.5447 28.52 0.5432 28.32 0.5418
LPRP 29.64 0.5578 29.34 0.5542 28.74 0.5441 28.85 0.5453 28.84 0.5543
15–25 MFCC 24.54 0.5554 24.32 0.5532 23.33 0.5476 23.21 0.5365 22.82 0.5275
LPCC 18.32 0.3045 18.21 0.3022 17.64 0.2987 17.31 0.2943 16.35 0.2812
LPR 28.11 0.5443 28.54 0.5423 27.72 0.5337 27.31 0.5323 26.61 0.5234
LPRP 28.72 0.5478 28.61 0.5442 27.81 0.5351 27.61 0.5343 26.74 0.5243
  1. The bold values represent minimum EER of particular feature extraction techniques.

Table 3:

EER of Speaker Verification for Using NIST 2003 Database for Different Feature Extraction Techniques (13 Dimensions) and Modeling Done by i-Vectors.

Train/test data (s) Feature extraction techniques No. of dimension: i-vectors=50; LDA=50
16
32
64
128
256
EER% DCT EER% DCT EER% DCT EER% DCT EER% DCT
3–3 MFCC 43.58 0.8247 42.72 0.8089 41.96 0.7943 41.77 0.7905 40.83 0.7629
LPCC 41.37 0.7812 42.68 0.8093 42.05 0.7944 40.65 0.7703 40.12 0.7635
LPR 41.50 0.7821 40.92 0.7743 40.65 0.7652 40.46 0.7640 41.01 0.7748
LPRP 40.42 0.7643 40.37 0.7647 40.46 0.7670 40.24 0.7586 41.50 0.7806
6–6 MFCC 39.47 0.7412 38.93 0.7261 38.70 0.7292 37.85 0.7133 36.62 0.7088
LPCC 39.74 0.7402 38.88 0.7322 37.26 0.7022 36.85 0.6975 34.68 0.6541
LPR 39.38 0.7349 39.97 0.7484 38.25 0.7110 38.43 0.7199 38.52 0.7223
LPRP 38.70 0.7300 38.66 0.7308 39.25 0.7421 38.61 0.7309 38.66 0.7309
9–9 MFCC 33.33 0.6280 31.61 0.6002 32.24 0.6035 32.56 0.6124 28.13 0.5295
LPCC 30.89 0.5868 29.81 0.5668 29.22 0.5338 29.53 0.5608 25.88 0.5338
LPR 35.59 0.6618 35.27 0.6515 33.73 0.6264 32.83 0.6164 32.56 0.6125
LPRP 33.73 0.6300 31.88 0.5979 33.15 0.6279 33.78 0.6362 33.96 0.6372
12–12 MFCC 31.39 0.5929 31.02 0.5850 27.04 0.5300 32.06 0.6074 31.25 0.5914
LPCC 28.86 0.5416 26.24 0.4985 25.38 0.4778 24.11 0.4555 22.53 0.4255
LPR 33.55 0.6326 34.41 0.6399 32.06 0.5990 32.11 0.5851 33.19 0.5798
LPRP 32.20 0.6032 31.43 0.5831 32.06 0.6054 32.33 0.6046 32.29 0.6067
15–15 MFCC 27.64 0.5374 27.23 0.5086 26.01 0.4922 25.68 0.4986 23.21 0.3717
LPCC 20.68 0.3964 20.41 0.3852 19.01 0.3575 18.54 0.3214 18.18 0.3061
LPR 30.39 0.5738 29.04 0.5662 28.31 0.5364 29.99 0.5626 28.16 0.5213
LPRP 31.30 0.5926 31.25 0.5888 30.62 0.5746 29.10 0.5682 28.31 0.5364
15–20 MFCC 22.77 0.3774 22.54 0.3786 21.59 0.3612 20.64 0.3586 20.78 0.3537
LPCC 18.45 0.3474 18.32 0.3432 17.34 0.3325 17.11 0.3314 16.45 0.3241
LPR 27.33 0.5768 27.14 0.5732 26.54 0.5654 26.61 0.5638 25.34 0.5513
LPRP 27.64 0.5716 27.42 0.5748 26.66 0.5616 26.78 0.5682 25.63 0.5364
15–25 MFCC 21.78 0.3632 20.64 0.3576 19.32 0.3434 18.63 0.3445 17.32 0.3476
LPCC 16.35 0.3643 15.64 0.3552 15.32 0.3575 14.78 0.3234 14.65 0.3145
LPR 25.33 0.5358 24.45 0.4472 23.31 0.4364 23.15 0.4326 22.32 0.4213
LPRP 25.45 0.5396 24.78 0.4468 23.51 0.4346 23.32 0.4382 22.45 0.4264
  1. The bold values represent minimum EER of particular feature extraction techniques.

In our earlier work [22], we evaluated the speaker verification system using GMM–UBM modeling for the NIST 2003 data set. The extracted features are MFCC and LPCC with FS of 20 ms and FR of 10 ms. The features are in the dimensions of 13. It was observed that GMM–UBM modeling works well under limited data [22]. The performance using vocal tract features for limited data conditions using GMM–UBM modeling is mentioned in Table 4. The performance of GMM–UBM modeling using 3–3 second data, the minimum EER of MFCC and LPCC is 40.10% and 39.06% respectively. Comparing the results of GMM–UBM and i-vectors modeling as mentioned in Tables 3 and 4. The reduction in EER is 0.73% and 0.36% for MFCC and LPCC in case of 3–3 second data compared to i-vector modeling. The same trend is not continued in 6–6, 9–9, 12–12, 15–15, 15–20 and 15–25-second data size. However, 6–6, 9–9, 12–12, 15–15, 15–20 and 15–25-second data, i-vector performs better than GMM–UBM. This is because, i-vector-based modeling is an advanced technique than GMM–UBM modeling. The i-vector extracted from speech data contains both channel/session variabilities and these variabilities can be compensated by various techniques to improve the performance of i-vector-based speaker verification system and these compensation techniques not present in GMM–UBM.

Table 4:

EER of the Speaker Verification System Using MFCC and LPCC Features (13 Dimensions) and GMM–UBM Modeling for the NIST 2003 Data Set.

Train/test data (s) Feature extraction techniques Gaussian mixtures
16 32 64 128 256
3–3 MFCC 41.32 40.15 40.10 40.19 40.37
LPCC 40.01 39.79 39.11 39.25 39.06
6–6 MFCC 38.16 36.94 36.67 37.12 37.57
LPCC 37.48 36.54 36.49 36.04 35.63
9–9 MFCC 32.47 31.02 30.21 29.62 30.26
LPCC 28.99 28.54 28.68 29.17 28.54
12–12 MFCC 30.57 28.09 27.46 27.19 27.59
LPCC 27.28 26.24 26.42 26.28 25.38
15–15 MFCC 28.64 27.32 26.49 24.33 25.32
LPCC 26.54 25.34 23.23 22.64 21.37
15–20 MFCC 25.32 25.14 24.68 23.84 23.64
LPCC 21.79 21.64 20.71 20.41 19.64
15–25 MFCC 22.44 22.32 21.78 21.46 20.72
LPCC 20.45 20.28 19.64 19.32 18.32
  1. The bold values represent minimum EER of particular feature extraction techniques.

Further, it was observed that EER of LPCC is less than all other features in Table 3. The minimum EER obtained is 40.12%, 34.68%, 25.88%, 22.53%, 18.18%, 16.45% and 14.65% for LPCC features for 3–3, 6–6, 9–9, 2–12, 15–15, 15–20 and 15–25-second data compared to MFCC, LPR and LPRP feature extraction techniques.

The same set of experiments are conducted for 39 dimension features. Tables 5 and 6 represents performance of the speaker verification system using i-vector and GMM–UBM modeling, respectively. In both modeling techniques, LPCC performance is better than MFCC and also i-vector modeling gives better performance than GMM–UBM for all data sizes. Because of first- and second-order derivatives, 39 dimension features gives better performance than 13 dimension eatures in both modeling techniques.

Table 5:

EER of Speaker Verification Using the NIST 2003 database for Different Feature Extraction Techniques (39 Dimensions) and Modeling Done by i-Vectors.

Train/test data (s) Feature extraction techniques No. of dimension: i-vectors=50; LDA=50
16
32
64
128
256
EER% DCT EER% DCT EER% DCT EER% DCT EER% DCT
3–3 MFCC 40.10 0.7591 40.42 0.7630 40.24 0.7594 39.92 0.7550 39.83 0.7548
LPCC 39.20 0.7426 39.11 0.7397 39.47 0.7412 38.70 0.7292 38.86 0.7199
6–6 MFCC 38.88 0.7322 37.26 0.7022 36.78 0.6799 34.77 0.6575 33.42 0.6305
LPCC 37.08 0.7018 37.18 0.7150 36.13 0.6799 34.77 0.6575 33.42 0.6305
9–9 MFCC 31.25 0.5891 31.43 0.5671 29.94 0.5650 29.04 0.5485 26.64 0.5210
LPCC 28.00 0.5224 27.68 0.5124 26.87 0.5045 25.11 0.4749 25.38 0.4749
12–12 MFCC 30.66 0.5811 30.35 0.5677 30.26 0.5683 28.31 0.5364 24.42 0.5274
LPCC 27.42 0.5086 26.78 0.4991 25.20 0.4768 24.57 0.4651 20.41 0.385
15–15 MFCC 26.34 0.5711 25.34 0.5307 25.17 0.5373 25.32 0.4964 20.12 0.3254
LPCC 19.43 0.3273 18.31 0.3291 18.03 0.3205 17.34 0.3134 16.68 0.2450
15–20 MFCC 19.84 0.3254 19.32 0.3243 18.64 0.3223 18.52 0.3214 18.35 0.3204
LPCC 17.64 0.3173 17.32 0.3151 17.18 0.3135 16.45 0.3034 16.32 0.3050
15–25 MFCC 18.32 0.3246 18.45 0.3257 17.41 0.3173 17.31 0.3164 16.64 0.3045
LPCC 16.78 0.3073 16.64 0.3061 15.32 0.2965 15.11 0.2934 14.86 0.2850
  1. The bold values represent minimum EER of particular feature extraction techniques.

Table 6:

EER of the Speaker Verification System Using MFCC and LPCC Features (39 Dimensions) and GMM–UBM Modeling for the NIST 2003 Data Set.

Train/test data (s) Feature extraction techniques Gaussian mixtures
16 32 64 128 256
3–3 MFCC 39.70 39.02 38.84 39.15 39.02
LPCC 39.61 39.15 38.54 38.66 38.70
6–6 MFCC 38.21 37.08 36.22 36.76 36.94
LPCC 36.44 35.45 34.73 34.73 34.28
9–9 MFCC 28.95 27.59 27.23 27.05 27.14
LPCC 28.13 27.32 27.05 26.73 27.95
12–12 MFCC 26.01 24.66 24.79 24.48 24.84
LPCC 25.70 25.15 23.71 23.84 24.62
15–15 MFCC 24.32 23.64 23.68 22.78 21.16
LPCC 24.13 22.77 22.32 21.94 20.64
15–20 MFCC 22.32 22.14 21.64 21.48 19.86
LPCC 19.64 19.32 18.74 18.34 17.64
15–25 MFCC 19.84 19.32 18.24 18.11 17.32
LPCC 17.64 17.32 16.62 16.12 15.32
  1. The bold values represent minimum EER of particular feature extraction techniques.

The literature survey shows that speaker verification under limited data condition is widely used in security, controlled access, authentication of remote transactions, criminal and forensic investigations etc. In almost all these applications, the speech data may be limited (criminal may speak for only a few seconds). However, when the speech data are less, the speaker-specific information obtained is also less. The speaker verification performance can be improved by using feature level and score-level fusion. To study the effect of this on limited data the following experiments are conducted.

4.1 Feature-Level Fusion

The feature-level fusion is accomplished by a simple concatenation of the feature sets obtained by different feature extraction techniques. In our experiments, we fused vocal tract feature (system information) with excitation source features (source features). For instance, let X= {x1, x2, x3, …, xm} denotes vocal tract features (MFCC or LPCC) and Y={y1, y2, y3yn} represents excitation source features (LPR or LPRP). In our experiments, we concatenated the both feature set to form Z={x1, x2, x3, …, xm, y1, y2, y3, …, yn}. This new feature set Z is used for both training and testing. Figure 2 shows the block diagram of feature-level fusion. The features are fused on the frame-level by concatenating vocal tract features with excitation source features.

Figure 2: Block Diagram of Feature-Level Fusion.
Figure 2:

Block Diagram of Feature-Level Fusion.

The individual EER of MFCC, LPCC, LPR and LPRP is 40.83%, 40.42%, 40.46% and 40.24%, respectively, as shown in Table 3. The results obtained for limited data by combining the features of vocal tract and excitation source are shown in Table 7. The minimum EER of feature-level fusion of MFCC+LPR, MFCC+LPRP, LPCC+LPR and LPCC+LPRP is 35.99%, 40.37%, 35.32% and 39.79% respectively. The fusion of MFCC+LPR and MFCC+LPRP is having reduction of 4.84% and 0.46% less in EER as compared to individual performance of MFCC. Similarly, the fusion of LPCC+LPR and LPCC+LPRP is having reduction in EER of 5.1% and 0.63% less as compared to LPCC. There is an improvement in the performance of EER using feature-level fusion, the combination of vocal tract and excitation source are having dissimilar features relatively improves the performance compared to all individual features. The important point noticed in feature-level fusion is that, fusion of MFCC+LPR and LPCC+LPR gives better performance as compared to fusion of MFCC+LPRP and LPCC+LPRP. This may be due to, LPR contains information obtained from excitation source mainly to glottal closure an instant (GCIS) [20]. Further, the fusion of LPCC+LPR performance is better compared to MFCC+LPR for limited data. Almost similar trend is observed for other data sizes except fusion of MFCC+LPRP and LPCC+LPRP for 9–9, 12–12, 15–15, 15–20 and 15–25-second data.

Table 7:

Results of the i-Vectors System Using Feature-Level Fusion of the Vocal Tract and Excitation Source Features (13 Dimension) for Limited Data Condition on NIST 2003 Database.

Train/test data (s) Scores of vocal tract and excitation source features No. of dimension: i-vectors=50; LDA=50
16
32
64
128
256
EER% DCT EER% DCT EER% DCT EER% DCT EER% DCT
3–3 MFCC+LPR 35.99 0.6792 37.48 0.7017 36.90 0.6950 37.03 0.6991 38.34 0.7251
MFCC+LPRP 41.37 0.7659 40.65 0.7642 41.10 0.7757 40.65 0.7642 40.37 0.7659
LPCC+LPR 35.32 0.6669 36.22 0.6822 37.12 0.6990 36.31 0.6891 37.39 0.7035
LPCC+LPRP 41.18 0.7696 41.32 0.7782 39.79 0.7515 40.24 0.7592 40.87 0.7696
6–6 MFCC+LPR 34.46 0.6512 35.09 0.6628 34.82 0.6567 35.68 0.6696 36.35 0.6745
MFCC+LPRP 39.74 0.7420 38.52 0.7312 38.79 0.7328 38.75 0.7218 38.79 0.7358
LPCC+LPR 34.68 0.6543 35.00 0.6588 34.64 0.6457 34.55 0.6525 34.91 0.6488
LPCC+LPRP 39.52 0.7404 39.47 0.7471 37.80 0.7038 37.48 0.7066 38.03 0.7140
9–9 MFCC+LPR 28.45 0.5368 29.53 0.5486 28.04 0.5300 28.54 0.5613 30.17 0.5613
MFCC+LPRP 33.69 0.6318 32.47 0.6126 33.46 0.6325 32.33 0.6069 33.24 0.6255
LPCC+LPR 28.99 0.5487 28.99 0.5460 27.77 0.5255 28.22 0.5255 25.15 0.4675
LPCC+LPRP 35.18 0.6588 34.77 0.6386 32.52 0.6139 31.88 0.5942 31.97 0.5972
12–12 MFCC+LPR 28.22 0.5306 28.13 0.5219 28.00 0.5223 27.55 0.5176 28.22 0.5311
MFCC+LPRP 32.61 0.6030 32.02 0.6087 32.52 0.6087 31.39 0.5915 32.83 0.6175
LPCC+LPR 28.41 0.5313 27.32 0.5043 27.32 0.5080 25.73 0.4958 22.33 0.4288
LPCC+LPRP 33.46 0.6198 32.65 0.6079 30.21 0.5618 30.57 0.5724 30.39 0.5637
15–15 MFCC+LPR 24.64 0.5366 24.18 0.5309 24.34 0.5123 23.22 0.4176 23.11 0.4121
MFCC+LPRP 24.84 0.4930 24.54 0.4987 24.94 0.4687 23.74 0.4615 23.64 0.4675
LPCC+LPR 20.42 0.3113 19.32 0.3843 19.81 0.3280 18.24 0.3055 17.32 0.2134
LPCC+LPRP 20.64 0.3698 20.24 0.3679 19.54 0.3021 17.63 0.2564 18.32 0.2837
15–20 MFCC+LPR 20.84 0.3696 20.34 0.3609 21.54 0.3723 21.64 0.3776 21.32 0.3721
MFCC+LPRP 21.64 0.3730 20.54 0.3687 21.72 0.3687 21.32 0.3615 21.54 0.3675
LPCC+LPR 16.84 0.2413 15.24 0.2343 15.34 0.2380 16.32 0.2455 16.14 0.2134
LPCC+LPRP 17.35 0.2598 15.44 0.2379 15.64 0.2321 16.55 0.2564 16.28 0.2637
15–25 MFCC+LPR 19.44 0.3696 19.32 0.3609 19.45 0.3723 19.64 0.3776 20.32 0.3721
MFCC+LPRP 19.72 0.3730 19.64 0.3687 19.82 0.3687 20.32 0.3615 20.54 0.3675
LPCC+LPR 14.54 0.2413 15.32 0.2343 15.54 0.2380 15.62 0.2455 15.44 0.2134
LPCC+LPRP 15.64 0.2368 14.78 0.2169 15.68 0.2381 15.92 0.2394 16.11 0.2547
  1. The bold values represent minimum EER of particular feature extraction techniques.

4.2 Score-Level Fusion

In case of score-level fusion, fuse the sores of individual system at verification level using the following equation:

(8) Stotal=αS1+(1α)S2,

where S1 and S2 represent the scores obtained using individual systems and Stotal represents the fused scores of S1 and S2; α represents the optimal value of which is chosen for fusion of the two scores to give and it is a scalar between the value 0 and 1.

In this work, we conducted two types of score-level fusion to improve the performance of the speaker verification system. The first one is to fuse the scores of vocal tract and source excitation features, keeping the modeling techniques same. The second one is fusing the scores of different modeling techniques by maintaining the same feature extraction technique.

The score-level fusion of different feature extraction techniques is shown in Figure 3. The weight factor α becomes optimal for the value of 0.05 in case of fusion of scores of two different feature extraction techniques.

Figure 3: Block Diagram of Score-Level Fusion for Different Feature Extraction Techniques.
Figure 3:

Block Diagram of Score-Level Fusion for Different Feature Extraction Techniques.

From the previous section, we already proved that feature-level fusion gives better performance compared to individual feature. The performance of score-level fusion of vocal tract and excitation source features for limited data using NIST 2003 data set is shown in Table 8. Further, the minimum EER of score-level fusion of MFCC+LPR, MFCC+LPRP, LPCC+LPR and LPCC+LPRP is 34.91%, 38.12%, 34.82% and 39.15% respectively. The score-level fusion of MFCC+LPR and MFCC+LPRP is having reduction of 1.08% and 2.25% less in EER as compared to feature-level fusion of MFCC+LPR and MFCC+LPRP, respectively. Similarly score-level fusion of LPCC+LPR and LPCC+LPRP is 0.5% and 0.64% less in reduction in EER as compared to feature-level fusion of LPCC+LPR and LPCC+LPRP, respectively. The trend in EER reduction remains same for other data sizes.

Table 8:

Results of the i-Vectors System using Score-Level Fusion of the Vocal Tract and Excitation Source Features (13 Dimension) for Limited Data Condition on NIST 2003 Database.

Train/test data (s) Scores of feature extracted signal Score-level fusion (no. of dimension: i-vectors=50; LDA=50)
Gaussian mixtures
16 32 64 128 256
3–3 MFCC+LPR 34.91 35.09 35.63 36.81 36.92
MFCC+LPRP 39.43 39.47 38.75 38.12 39.47
LPCC+LPR 34.82 35.04 35.68 35.99 35.81
LPCC+LPRP 39.34 40.06 39.47 39.15 39.83
6–6 MFCC+LPR 32.79 33.15 32.74 33.55 33.64
MFCC+LPRP 36.58 36.81 36.94 35.72 36.49
LPCC+LPR 33.46 33.92 33.64 35.09 34.37
LPCC+LPRP 38.07 38.21 36.49 36.58 37.17
9–9 MFCC+LPR 26.64 25.92 25.20 25.02 26.28
MFCC+LPRP 28.31 28.09 28.54 28.27 29.17
LPCC+LPR 26.87 26.60 27.10 26.64 25.15
LPCC+LPRP 31.57 31.97 30.84 29.04 30.44
12–12 MFCC+LPR 25.88 24.52 24.48 26.15 25.70
MFCC+LPRP 28.68 26.91 27.00 29.31 28.50
LPCC+LPR 26.15 25.92 25.79 26.01 25.42
LPCC+LPRP 29.58 29.99 28.41 27.82 30.53
15–15 MFCC+LPR 23.98 23.64 23.32 22.16 22.71
MFCC+LPRP 24.54 24.32 23.94 22.54 22.67
LPCC+LPR 19.34 19.31 18.13 17.72 16.34
LPCC+LPRP 19.49 19.64 18.43 17.49 16.54
15–20 MFCC+LPR 20.64 19.32 20.32 20.16 20.32
MFCC+LPRP 20.82 19.82 20.54 20.31 20.24
LPCC+LPR 14.54 14.64 15.32 15.84 15.15
LPCC+LPRP 15.34 14.78 15.64 15.92 15.45
15–25 MFCC+LPR 16.48 16.54 16.13 16.98 16.74
MFCC+LPRP 16.34 16.64 17.32 17.13 17.11
LPCC+LPR 12.34 12.64 13.32 13.64 13.51
LPCC+LPRP 13.51 13.64 12.62 12.82 12.87
  1. The bold values represent minimum EER of particular feature extraction techniques.

Over the last two decades, speaker verification system facing two main problems such as session variability and channel mismatch. The main reason for these problems is emotional state of the speaker, environmental conditions, recording devices, different transmission channels,...etc. Due to this variability the system performance decreases drastically. The GMM–UBM modeling system facing these problems [31].

In case of GMM–UBM, the feature is extracted by frame and number of features are unfixed. Gaussian Mixtures are used to fit all the features. In GMM mapping, the feature is calculated by frame with MAP. The difference of the likelihood ratio from the GMM to the UBM is used to describe the result. These entire problems can be overcome by using i-vector. The i-vector-based modeling is an advanced technique than GMM–UBM modeling. The i-vector extracted from the speech utterance contains both channel and speaker variabilities. Implementing channel/session compensation methods help improves the performance of the speaker verification system [10]. Due to these reasons, the experiments we conducted using i-vectors show an improvements in performance over GMM–UBM. The results are shown in Tables 36.

Figure 4 shows the proposed combined modeling technique in score-level fusion. The improvement in performance by combining the scores of i-vector and GMM–UBM modeling techniques at score-level are significant as shown in Table 9. The reason for improvement in performance may be due to i-vectors use cosine kernel whereas GMM–UBM uses log likelihood ratio test and the working principle of i-vector and GMM–UBM are different.

Figure 4: Block Diagram of Proposed Combined Modeling.
Figure 4:

Block Diagram of Proposed Combined Modeling.

Table 9:

Performance of the Speaker Verification System for Score-Level Fusion Using i-Vector and GMM–UBM Modeling Techniques for Limited Data Condition in the NIST 2003 Database.

Train/test data (s) Score-level fusion for MFCC features (13 dimensions)
Score-level fusion for LPCC features (13 dimensions)
Gaussian mixtures
Gaussian mixtures
16 32 64 128 256 16 32 64 128 256
3–3 38.73 37.28 37.74 36.65 36.06 37.35 37.75 36.90 36.63 35.99
6–6 36.72 36.08 36.63 35.90 32.55 34.77 33.69 33.60 32.70 32.15
9–9 29.72 28.45 28.13 30.08 25.00 24.11 23.44 23.89 23.75 23.30
12–12 27.37 26.24 23.80 28.27 28.00 20.09 21.45 21.95 21.45 20.09
15–15 21.64 22.32 19.18 18.92 17.64 16.11 15.24 15.34 14.52 14.63
15–20 14.43 14.34 14.64 14.36 14.52 11.34 11.41 11.14 11.54 11.64
15–25 11.17 11.34 11.32 11.64 11.54 8.38 8.34 8.54 8.51 8.49
  1. The bold values represent minimum EER of particular feature extraction techniques.

In the proposed system, the optimal value of α is 0.5. The performance of score-level fusion for different modeling techniques for 13 dimensions. The LPCC gives very good performance in all data sizes compared to MFCC. The LPCC is having minimum reduction in EER of≈4% as compared to MFCC in score-level fusion of different modeling techniques.

Table 10 shows comparision for different fusion techniques for 13 dimension features. From this table we observed few points. First point is, for 3–3 second data, feature-level fusion and score-level fusion of different features gives minimum EER as compared to score-level fusion of different modeling techniques. In this case combination of MFCC or LPCC with LPR gives better performance than MFCC or LPCC with LPRP. The second point is, when data size increases (6–6, 9–9, 12–12, 15–15, 15–20 and 15–25 s) score-level fusion for different modeling techniques gives better performance compared to other fusion techniques. From these observation it is clear that, for limited data score-level fusion for different modeling technique can be used to get better verification results. This results motivates us to conduct score-level fusion for 39 dimensions. Table 11 shows the score-level fusion of 39 dimensions for different modeling techniques. In this case, LPCC also performs better than MFCC for all data sizes. Compare Tables 9 and 11’s score-level fusion for both dimensions. The results shows that there is drastic improvement in 39 dimensions compared with 13 dimensions.

Table 10:

Minimum EER of Different Fusion Techniques.

Train/test data (s) Feature-level fusion
Score-level fusion for different features
Score-level fusion for different modeling techniques
MFCC+LPR MFCC+LPRP LPCC+LPR LPCC+LPRP MFCC+LPR MFCC+LPRP LPCC+LPR LPCC+LPRP MFCC LPCC
3–3 35.99 40.37 35.32 39.79 34.91 38.12 34.82 39.15 36.06 35.99
6–6 34.46 38.52 34.55 37.48 32.74 35.72 33.46 36.49 32.55 32.15
9–9 28.04 32.33 27.15 31.88 25.02 28.27 25.15 29.04 25.00 23.30
12–12 27.55 32.02 26.33 30.21 24.48 26.91 25.42 27.82 23.80 20.09
15–15 23.11 23.64 17.32 17.52 22.16 22.64 16.34 16.54 17.64 14.52
15–20 20.34 20.54 15.24 15.64 19.32 19.82 14.54 14.78 14.34 11.14
15–25 19.32 19.64 14.54 14.78 16.13 16.14 12.34 12.62 11.17 8.34
Table 11:

Performance of the Speaker Verification System for Score-Level Fusion Using i-Vector and GMM–UBM Modeling Techniques for Limited Data Condition on the NIST 2003 Database.

Train/test data (s) Score-level fusion for MFCC features (39 dimensions)
Score-level fusion for LPCC features (39 dimensions)
Gaussian mixtures
Gaussian mixtures
16 32 64 128 256 16 32 64 128 256
3–3 36.20 36.11 35.70 34.16 38.93 34.73 33.73 33.69 31.97 31.48
6–6 33.19 33.42 33.46 30.62 30.84 29.26 27.95 28.90 27.10 27.59
9–9 27.82 27.23 25.42 24.61 24.57 21.09 20.05 22.94 22.71 21.34
12–12 25.38 24.11 23.75 22.94 22.17 19.28 19.20 18.15 17.34 19.41
15–15 17.49 16.34 16.51 15.49 15.63 15.32 15.64 14.34 13.21 12.36
15–20 11.69 10.84 9.48 8.64 8.12 9.14 8.34 6.34 6.23 6.04
15–25 5.65 5.84 5.82 5.12 5.05 4.32 4.64 4.12 3.58 3.98
  1. The bold values represent minimum EER of particular feature extraction techniques.

5 Conclusion

In this article, we demonstrated the significance of performance of individual modeling technique and different fusion techniques for limited data condition. First, we studied the working principles of individual features using i-vector modeling technique. It was observed that i-vector modeling gives better EER compared with the GMM–UBM modeling technique. To increase the performance of the speaker verification system, we conducted experiments using feature- and score-level fusion for 13 dimensions. Here the vocal tract features are fused with excitation source features in feature-level fusion, the performance of feature-level fusion is better as compared to performance of individual feature extraction techniques. The two cases of score-level fusions are demonstrated in this work for 13 dimensions. In the first case, fusing the scores of vocal tract and excitation source features at score-level by maintaining same modeling technique. It was observed that, an average reduction of EER is approximately equal to 2% compared with feature-level fusion performance. In the second case, the different modeling scores are fused by keeping feature remain same. In the experimental results, it was observed that an average reduction in EER is approximately equal to 4.5% compared with score-level fusion of different features. Further it was observed that, score-level fusion for different modeling technique gives better performance compared to the other fusion techniques. It was also observed that, LPCC with source features combinations gives better performance as compared to MFCC+LPR and MFCC+LPRP under limited data condition. Therefore, we suggest that the LPCC with source features can be used as features along with score-level fusion for different modeling to improve the performance of speaker verification under limited data condition. Also we observed that score-level fusion for different modeling technique provides better performance than other two fusion techniques. With this movitation we conducted score-level fusion for 39 dimensions. The result shows that there is drastic improvement in EER using 39 dimensions compared with 13 dimensions. Therefore, we also suggest that 39 dimensions can also be used to improve the performance of the speaker verification system under limited data.

Bibliography

[1] A. K. H. Al-Ali, D. Dean, B. Senadji, V. Chandran and G. R. Naik, Enhanced forensic speaker verification using a combination of DWT and MFCC feature warping in the presence of noise and reverberation conditions, IEEE Access 5 (2017), 15400–15413.10.1109/ACCESS.2017.2728801Search in Google Scholar

[2] B. S. Atal, Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification, J. Acoust. Soc. Am. 55 (1974), 1304–1312.10.1121/1.1914702Search in Google Scholar PubMed

[3] R. Auckenthaler, M. Carey and H. Lloyd-Thomas, Score normalization for text-independent speaker verification systems, Digit. Signal Process. 10 (2000), 42–54.10.1006/dspr.1999.0360Search in Google Scholar

[4] F. Bimbot, J. Bonastreand, C. Fredouille, G. Gravier, I. Magrin-Chagnolleau, S. Meignier, T. Merlin, J. Ortega-García, D. Petrovska-Delacrétaz and D. A. Reynolds, A tutorial on text-independent speaker verification, EURASIP J. Appl. Signal Process. 4 (2004), 430–451.10.1155/S1110865704310024Search in Google Scholar

[5] J. P. Campbell, Speaker recognition: a tutorial, Proc. IEEE 85 (1997), 1437–1462.10.1109/5.628714Search in Google Scholar

[6] W. Chan, N. Zheng and T. Le, Discrimination power of vocal source and vocal tract related features for speaker segmentation, IEEE Trans. Audio Speech Lang. Process. 15 (2007), 1884–1892.10.1109/TASL.2007.900103Search in Google Scholar

[7] R. K. Das, D. Pati and S. R. M. Prasanna, Different aspects of source information for limited data speaker verification, Proc. 21st National Conference on Communications (NCC), 2015 Twenty First National Conference on, Mumbai, pp. 1–6, 2015.10.1109/NCC.2015.7084846Search in Google Scholar

[8] R. K. Das, S. Jelil and S. R. M. Prasanna, Significance of constraining text in limited data text-independent speaker verification, in: Signal Processing and Communications (SPCOM), 2016 International Conference on, Bangalore, pp. 1–5, 2016.10.1109/SPCOM.2016.7746659Search in Google Scholar

[9] R. K. Das, S. Jelil and S. R. M. Prasanna, Development of multi-level speech based person authentication system, J. Signal Process. Syst. 88 (2016), 259–271.10.1007/s11265-016-1148-zSearch in Google Scholar

[10] N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel and P. Ouellet, Front-end factor analysis for speaker verification, IEEE Trans. Audio Speech Lang. Process. 19 (2011), 788–798.10.1109/TASL.2010.2064307Search in Google Scholar

[11] S. Dey, S. Barman, R. K. Bhukya, R. K. Das, B. C. Haris, S. R. M. Prasanna and R. Sinha, Speech biometric based attendance system, National Conference on Communications (NCC), pp. 1–6, 2014.10.1109/NCC.2014.6811345Search in Google Scholar

[12] F. Farahani, P. G. Georgiou and S. S. Narayanan, Speaker identification using supra-segmental pitch pattern dynamics, Proc. Int. Conf. Accous. Speech Signal Process (May. 2004), Montreal, pp. 89–92.10.1109/ICASSP.2004.1325929Search in Google Scholar

[13] D. Garcia-Romero and A. McCree, Supervised domain adaptation for i-vector based speaker recognition, Acoustics, speech and signal processing (ICASSP), 2014 IEEE International conference, pp. 4047–4051, 2014.10.1109/ICASSP.2014.6854362Search in Google Scholar

[14] J. Gudnason and M. Brookes, Voice source cepstrum coefficients for speaker identification, in Proc. ICASSP (2008), Las Vegas, pp. 4821–4824.10.1109/ICASSP.2008.4518736Search in Google Scholar

[15] B. C. Haris, G. Pradhan, A. Misra, S. Shukla, R. Sinha and S. R. M. Prasanna, Multi-variability speech database for robust speaker recognition, in: Communications (NCC), 2011 National Conference on, Bangalore, pp. 1–5, 2011.10.1109/NCC.2011.5734775Search in Google Scholar

[16] A. O. Hatch, S. S. Kajarekar and A. Stolcke, Within-class covariance normalization for SVM-based speaker recognition, in: Interspeech, Pittsburgh, 2006.10.21437/Interspeech.2006-183Search in Google Scholar

[17] W.-C. Hsu, W.-H. Lai and W.-P. Hong, Usefulness of residual-based features in speaker verification and their combination way with linear prediction coefficients, Multimedia workshops, 2007. ISMW′07. Ninth IEEE International Symposium, pp. 246–251, 2007.10.1109/ISM.Workshops.2007.49Search in Google Scholar

[18] A. K. Jain, A. Ross, and S. Prabhakar, An introduction to biometric recognition, IEEE Trans. Circuit Syst. Video Technol.(Special Issue on Image and Video-Based Biometrices) 14 (2004), 4–20.10.1109/TCSVT.2003.818349Search in Google Scholar

[19] H. S. Jayanna and S. R. Prasanna, Analysis, feature extraction, modeling and testing techniques for speaker recognition, IETE Tech. Rev. 26 (2009), 181–190.10.4103/0256-4602.50702Search in Google Scholar

[20] H. S. Jayanna and S. R. M. Prasanna, Limited data speaker identification, Sadhana 35 (2010), 525–546.10.1007/s12046-010-0043-8Search in Google Scholar

[21] A. Kanagasundaram, R. Vogt, D. B. Dean, S. Sridharan and M. W. Mason, I-vector based speaker recognition on short utterances, Proceedings of the 12th Annual Conference of the International Speech Communication Association, pp. 2341–2344, 2011.10.21437/Interspeech.2011-58Search in Google Scholar

[22] T. R. J. Kumari and H. S. Jayanna, Comparison of LPCC and MFCC features and GMM and GMM–UBM modeling for limited data speaker verification, IEEE Proc. ICCIC 2014 (2014), 1–6.10.1109/ICCIC.2014.7238329Search in Google Scholar

[23] N. Li, M.-W. Mak and J.-T. Chien, DNN-driven mixture of PLDA for robust speaker verification, IEEE/ACM Trans. Audio Speech Lang. Process. 25 (2017), 1371–1383.10.1109/TASLP.2017.2692304Search in Google Scholar

[24] K. S. R. Murty and B. Yegnanarayana, Combining evidence from residual phase and MFCC features for speaker recognition, Signal Process. Lett. IEEE 13 (2006), 52–55.10.1109/LSP.2005.860538Search in Google Scholar

[25] NIST2003, http://www.itl.nist.gov/iad/mig//tests/sre/2003/2003-spkrec-evalplan-v2.2.pdf[online].Search in Google Scholar

[26] A. Pandey, R. K. Das, N. Adiga, N. Gupta and S. R. M. Prasanna, Significance of glottal activity detection for speaker verification in degraded and limited data condition, in: TENCON 20152015 IEEE Region 10 Conference, Macao, pp. 1–6, 2015.10.1109/TENCON.2015.7372916Search in Google Scholar

[27] S. R. M. Prasanna, C. S. Gupta and B. Yegnanarayana, Extraction of speaker-specific excitation information from linear prediction residual of speech, Speech Commun. 48 (2006), 1243–1261.10.1016/j.specom.2006.06.002Search in Google Scholar

[28] D. A. Reynolds, Experimental evaluation of features for robust speaker identification, IEEE Trans. Acoust. Speech Signal Process. 2 (1994), 639–643.10.1109/89.326623Search in Google Scholar

[29] D. A. Reynolds, Speaker identification and verification using Gaussian mixture speaker models, Speech Commun. 17 (1995), 91–108.10.1016/0167-6393(95)00009-DSearch in Google Scholar

[30] D. A. Reynolds and R. C. Rose, Robust text-independent speaker identification using Gaussian mixture speaker models, IEEE Trans. Speech Audio Process. 3 (1995), 72–83.10.1109/89.365379Search in Google Scholar

[31] D. Reynolds, T. Quatieri and R. Dunn, Speaker verification using adapted Gaussian mixture models, Digital Signal Process. 10 (2000), 19–41.10.1006/dspr.1999.0361Search in Google Scholar

[32] A. E. Rosenberg, Automatic speaker verification: a review, Proc. IEEE 64 (1976), 475–487.10.1109/PROC.1976.10156Search in Google Scholar

[33] A. Salman, E. Muhammad and K. Khurshid, Speaker verification using boosted cepstral features with Gaussian distributions, in: Multitopic Conference, 2007. INMIC 2007. IEEE International, Lahore, pp. 1–5, 2007.10.1109/INMIC.2007.4557681Search in Google Scholar

[34] G. L. Sarada, N. Hemalatha, T. Nagarajan and H. A. Murthy, Automatic transcription of continuous speech using unsupervised and incremental training, in: Proceedings of Interspeech, Jeju Island, pp. 405–408, 2004.10.21437/Interspeech.2004-169Search in Google Scholar

[35] F. K. Soong and A. E. Rosenberg, On the use of instantaneous and transitional spectral information in speaker recognition, IEEE Trans. Acoust. Speech Signal Process. 36 (1988), 871–879.10.1109/ICASSP.1986.1168882Search in Google Scholar

[36] S. C. Yin, R. Rose and P. Kenny, A joint factor analysis approach to progressive model adaptation in text-independent speaker verification, IEEE Trans. Audio Speech Lang. Process. 15 (2007), 1999–2010.10.1109/TASL.2007.902410Search in Google Scholar

Received: 2016-11-30
Published Online: 2018-05-03

©2020 Walter de Gruyter GmbH, Berlin/Boston

This work is licensed under the Creative Commons Attribution 4.0 Public License.

Articles in the same Issue

  1. An Optimized K-Harmonic Means Algorithm Combined with Modified Particle Swarm Optimization and Cuckoo Search Algorithm
  2. Texture Feature Extraction Using Intuitionistic Fuzzy Local Binary Pattern
  3. Leaf Disease Segmentation From Agricultural Images via Hybridization of Active Contour Model and OFA
  4. Deadline Constrained Task Scheduling Method Using a Combination of Center-Based Genetic Algorithm and Group Search Optimization
  5. Efficient Classification of DDoS Attacks Using an Ensemble Feature Selection Algorithm
  6. Distributed Multi-agent Bidding-Based Approach for the Collaborative Mapping of Unknown Indoor Environments by a Homogeneous Mobile Robot Team
  7. An Efficient Technique for Three-Dimensional Image Visualization Through Two-Dimensional Images for Medical Data
  8. Combined Multi-Agent Method to Control Inter-Department Common Events Collision for University Courses Timetabling
  9. An Improved Particle Swarm Optimization Algorithm for Global Multidimensional Optimization
  10. A Kernel Probabilistic Model for Semi-supervised Co-clustering Ensemble
  11. Pythagorean Hesitant Fuzzy Information Aggregation and Their Application to Multi-Attribute Group Decision-Making Problems
  12. Using an Efficient Optimal Classifier for Soil Classification in Spatial Data Mining Over Big Data
  13. A Bayesian Multiresolution Approach for Noise Removal in Medical Magnetic Resonance Images
  14. Gbest-Guided Artificial Bee Colony Optimization Algorithm-Based Optimal Incorporation of Shunt Capacitors in Distribution Networks under Load Growth
  15. Graded Soft Expert Set as a Generalization of Hesitant Fuzzy Set
  16. Universal Liver Extraction Algorithm: An Improved Chan–Vese Model
  17. Software Effort Estimation Using Modified Fuzzy C Means Clustering and Hybrid ABC-MCS Optimization in Neural Network
  18. Handwritten Indic Script Recognition Based on the Dempster–Shafer Theory of Evidence
  19. An Integrated Intuitionistic Fuzzy AHP and TOPSIS Approach to Evaluation of Outsource Manufacturers
  20. Automatically Assess Day Similarity Using Visual Lifelogs
  21. A Novel Bio-Inspired Algorithm Based on Social Spiders for Improving Performance and Efficiency of Data Clustering
  22. Discriminative Training Using Noise Robust Integrated Features and Refined HMM Modeling
  23. Self-Adaptive Mussels Wandering Optimization Algorithm with Application for Artificial Neural Network Training
  24. A Framework for Image Alignment of TerraSAR-X Images Using Fractional Derivatives and View Synthesis Approach
  25. Intelligent Systems for Structural Damage Assessment
  26. Some Interval-Valued Pythagorean Fuzzy Einstein Weighted Averaging Aggregation Operators and Their Application to Group Decision Making
  27. Fuzzy Adaptive Genetic Algorithm for Improving the Solution of Industrial Optimization Problems
  28. Approach to Multiple Attribute Group Decision Making Based on Hesitant Fuzzy Linguistic Aggregation Operators
  29. Cubic Ordered Weighted Distance Operator and Application in Group Decision-Making
  30. Fault Signal Recognition in Power Distribution System using Deep Belief Network
  31. Selector: PSO as Model Selector for Dual-Stage Diabetes Network
  32. Oppositional Gravitational Search Algorithm and Artificial Neural Network-based Classification of Kidney Images
  33. Improving Image Search through MKFCM Clustering Strategy-Based Re-ranking Measure
  34. Sparse Decomposition Technique for Segmentation and Compression of Compound Images
  35. Automatic Genetic Fuzzy c-Means
  36. Harmony Search Algorithm for Patient Admission Scheduling Problem
  37. Speech Signal Compression Algorithm Based on the JPEG Technique
  38. i-Vector-Based Speaker Verification on Limited Data Using Fusion Techniques
  39. Prediction of User Future Request Utilizing the Combination of Both ANN and FCM in Web Page Recommendation
  40. Presentation of ACT/R-RBF Hybrid Architecture to Develop Decision Making in Continuous and Non-continuous Data
  41. An Overview of Segmentation Algorithms for the Analysis of Anomalies on Medical Images
  42. Blind Restoration Algorithm Using Residual Measures for Motion-Blurred Noisy Images
  43. Extreme Learning Machine for Credit Risk Analysis
  44. A Genetic Algorithm Approach for Group Recommender System Based on Partial Rankings
  45. Improvements in Spoken Query System to Access the Agricultural Commodity Prices and Weather Information in Kannada Language/Dialects
  46. A One-Pass Approach for Slope and Slant Estimation of Tri-Script Handwritten Words
  47. Secure Communication through MultiAgent System-Based Diabetes Diagnosing and Classification
  48. Development of a Two-Stage Segmentation-Based Word Searching Method for Handwritten Document Images
  49. Pythagorean Fuzzy Einstein Hybrid Averaging Aggregation Operator and its Application to Multiple-Attribute Group Decision Making
  50. Ensembles of Text and Time-Series Models for Automatic Generation of Financial Trading Signals from Social Media Content
  51. A Flame Detection Method Based on Novel Gradient Features
  52. Modeling and Optimization of a Liquid Flow Process using an Artificial Neural Network-Based Flower Pollination Algorithm
  53. Spectral Graph-based Features for Recognition of Handwritten Characters: A Case Study on Handwritten Devanagari Numerals
  54. A Grey Wolf Optimizer for Text Document Clustering
  55. Classification of Masses in Digital Mammograms Using the Genetic Ensemble Method
  56. A Hybrid Grey Wolf Optimiser Algorithm for Solving Time Series Classification Problems
  57. Gray Method for Multiple Attribute Decision Making with Incomplete Weight Information under the Pythagorean Fuzzy Setting
  58. Multi-Agent System Based on the Extreme Learning Machine and Fuzzy Control for Intelligent Energy Management in Microgrid
  59. Deep CNN Combined With Relevance Feedback for Trademark Image Retrieval
  60. Cognitively Motivated Query Abstraction Model Based on Associative Root-Pattern Networks
  61. Improved Adaptive Neuro-Fuzzy Inference System Using Gray Wolf Optimization: A Case Study in Predicting Biochar Yield
  62. Predict Forex Trend via Convolutional Neural Networks
  63. Optimizing Integrated Features for Hindi Automatic Speech Recognition System
  64. A Novel Weakest t-norm based Fuzzy Fault Tree Analysis Through Qualitative Data Processing and Its Application in System Reliability Evaluation
  65. FCNB: Fuzzy Correlative Naive Bayes Classifier with MapReduce Framework for Big Data Classification
  66. A Modified Jaya Algorithm for Mixed-Variable Optimization Problems
  67. An Improved Robust Fuzzy Algorithm for Unsupervised Learning
  68. Hybridizing the Cuckoo Search Algorithm with Different Mutation Operators for Numerical Optimization Problems
  69. An Efficient Lossless ROI Image Compression Using Wavelet-Based Modified Region Growing Algorithm
  70. Predicting Automatic Trigger Speed for Vehicle-Activated Signs
  71. Group Recommender Systems – An Evolutionary Approach Based on Multi-expert System for Consensus
  72. Enriching Documents by Linking Salient Entities and Lexical-Semantic Expansion
  73. A New Feature Selection Method for Sentiment Analysis in Short Text
  74. Optimizing Software Modularity with Minimum Possible Variations
  75. Optimizing the Self-Organizing Team Size Using a Genetic Algorithm in Agile Practices
  76. Aspect-Oriented Sentiment Analysis: A Topic Modeling-Powered Approach
  77. Feature Pair Index Graph for Clustering
  78. Tangramob: An Agent-Based Simulation Framework for Validating Urban Smart Mobility Solutions
  79. A New Algorithm Based on Magic Square and a Novel Chaotic System for Image Encryption
  80. Video Steganography Using Knight Tour Algorithm and LSB Method for Encrypted Data
  81. Clay-Based Brick Porosity Estimation Using Image Processing Techniques
  82. AGCS Technique to Improve the Performance of Neural Networks
  83. A Color Image Encryption Technique Based on Bit-Level Permutation and Alternate Logistic Maps
  84. A Hybrid of Deep CNN and Bidirectional LSTM for Automatic Speech Recognition
  85. Database Creation and Dialect-Wise Comparative Analysis of Prosodic Features for Punjabi Language
  86. Trapezoidal Linguistic Cubic Fuzzy TOPSIS Method and Application in a Group Decision Making Program
  87. Histopathological Image Segmentation Using Modified Kernel-Based Fuzzy C-Means and Edge Bridge and Fill Technique
  88. Proximal Support Vector Machine-Based Hybrid Approach for Edge Detection in Noisy Images
  89. Early Detection of Parkinson’s Disease by Using SPECT Imaging and Biomarkers
  90. Image Compression Based on Block SVD Power Method
  91. Noise Reduction Using Modified Wiener Filter in Digital Hearing Aid for Speech Signal Enhancement
  92. Secure Fingerprint Authentication Using Deep Learning and Minutiae Verification
  93. The Use of Natural Language Processing Approach for Converting Pseudo Code to C# Code
  94. Non-word Attributes’ Efficiency in Text Mining Authorship Prediction
  95. Design and Evaluation of Outlier Detection Based on Semantic Condensed Nearest Neighbor
  96. An Efficient Quality Inspection of Food Products Using Neural Network Classification
  97. Opposition Intensity-Based Cuckoo Search Algorithm for Data Privacy Preservation
  98. M-HMOGA: A New Multi-Objective Feature Selection Algorithm for Handwritten Numeral Classification
  99. Analogy-Based Approaches to Improve Software Project Effort Estimation Accuracy
  100. Linear Regression Supporting Vector Machine and Hybrid LOG Filter-Based Image Restoration
  101. Fractional Fuzzy Clustering and Particle Whale Optimization-Based MapReduce Framework for Big Data Clustering
  102. Implementation of Improved Ship-Iceberg Classifier Using Deep Learning
  103. Hybrid Approach for Face Recognition from a Single Sample per Person by Combining VLC and GOM
  104. Polarity Analysis of Customer Reviews Based on Part-of-Speech Subcategory
  105. A 4D Trajectory Prediction Model Based on the BP Neural Network
  106. A Blind Medical Image Watermarking for Secure E-Healthcare Application Using Crypto-Watermarking System
  107. Discriminating Healthy Wheat Grains from Grains Infected with Fusarium graminearum Using Texture Characteristics of Image-Processing Technique, Discriminant Analysis, and Support Vector Machine Methods
  108. License Plate Recognition in Urban Road Based on Vehicle Tracking and Result Integration
  109. Binary Genetic Swarm Optimization: A Combination of GA and PSO for Feature Selection
  110. Enhanced Twitter Sentiment Analysis Using Hybrid Approach and by Accounting Local Contextual Semantic
  111. Cloud Security: LKM and Optimal Fuzzy System for Intrusion Detection in Cloud Environment
  112. Power Average Operators of Trapezoidal Cubic Fuzzy Numbers and Application to Multi-attribute Group Decision Making
Downloaded on 8.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/jisys-2017-0047/html
Scroll to top button