Startseite Applying Discriminant and Cluster Analyses to Separate Allergenic from Non-allergenic Proteins
Artikel Open Access

Applying Discriminant and Cluster Analyses to Separate Allergenic from Non-allergenic Proteins

  • L. Naneva EMAIL logo , M. Nedyalkova , S. Madurga , F. Mas und V. Simeonov
Veröffentlicht/Copyright: 3. Juni 2019

Abstract

As a result of increased healthcare requirements and the introduction of genetically modified foods, the problem of allergies is becoming a growing health problem. The concept of allergies has prompted the use of new methods such as genomics and proteomics to uncover the nature of allergies. In the present study, a selection of 1400 food proteins was analysed by PLS-DA (Partial Least Square-based Discriminant Analysis) after suitable transformation of structural parameters into uniform vectors. Then, the resulting strings of different length were converted into vectors with equal length by Auto and Cross-Covariance (ACC) analysis. Hierarchical and non-hierarchical (K-means) Cluster Analysis (CA) was also performed in order to reach a certain level of separation within a small training set of plant proteins (16 allergenic and 16 non-allergenic) using a new three-dimensional descriptor based on surface protein properties in combination with amino acid hydrophobicity scales. The novelty of the approach in protein differentiation into allergenic and non-allergenic classes is described in the article.

The general goal of the present study was to show the effectiveness of a traditional chemometric method for classification (PLS–DA) and the options of Cluster Analysis (CA) to separate by multivariate statistical methods allergenic from non-allergenic proteins.

1 Introduction

Allergies represent one of the most important health problems faced by humanity. Allergic reactions are caused by various food sources such as eggs, soybeans, fruits, vegetables, marine and dairy products [1, 2, 3, 4, 5]. The introduction of genetically modified foods have made allergies an even more concerning problem. The term “allergy” was introduced in 1906 by the Austrian pediatrician Clemens Pirquet to indicate the altered reaction in some children injected prophylactically with an anti-infiltrating vaccine. In an allergic response, the body’s reactivity to the effects of certain factors called allergens has been altered or impaired. Allergens provoke the body to produce neutralizing antibodies. Initially, the reaction of the interaction between allergens and antibodies can go unnoticed.

Food allergy is a condition in which the body reacts negatively to food due to a response of the immune system to nutritional protein. Food allergy differs from other bodily reactions to food such as food intolerance, drug intolerance, and toxin-mediated reactions. Food intolerance is the inability of the body to process a nutrient properly, usually due to the lack of an enzyme, but in food allergy, the immune system generates antibody responses to the absorbed food [6]. An allergic reaction occurs when the susceptible organism is exposed to a specific protein. Because the body perceives this protein (allergen) as a threat, it begins to produce T-helper lymphocytes (Th2) that release interleukins. Interleukins increase the production of antibodies called immunoglobulins E (IgE) from B-cells. The body reacts by producing a large amount of these antibodies. The latter binds to mast cells in the blood. Upon reintroduction into the body of the same allergen, it binds to the antibodies located on the mast cells. As a result of the antigen-antibody response, mast cells release histamine that causes the allergic symptoms: including redness, swelling, and itching [7].

Recognition of allergenic proteins is important because of increasing usage of modified proteins in foods, medicines, household chemicals, and other products. [8]. According to the Food and Agriculture Organization of the United Nations (FAO) and the World Health Organization (WHO), a protein is a potential allergen if it has in its structure 6 to 8 consecutive amino acids or 35% similarity within 80 amino acid residues of already known allergens.

In this study, we describe two methods for predicting allergenicity based on either linear sequence of amino acids or spatial distribution of amino acids on the surface. In the first method, descriptors [9] using Auto-Cross-Covariance (ACC) transformation of protein sequences in universal vectors of the same length [10] are used in a big data set of allergenic proteins. ACC was used for a Structure-Activity-Quantification (QSAR) peptide studies, protein classification and prediction of immunogenicity [11, 12, 13]. In the present study, the transformed protein data are used in PLS-DA for reliable classification of allergenic proteins [15,16]

In the second method, the crystallographic structure of the allergenic protein is required. Cluster analysis (hierarchical and non-hierarchical) is here used as classification methodology for separation of allergic from non-allergic proteins.

The aim of this study is to demonstrate the ability of different chemometric methods using amino acid sequence information or spatial distribution of surface amino acids to separate allergic from non-allergic proteins.

2 Datasets and Methods

2.1 Protein datasets

A dataset of 700 food allergens and 700 non-allergens was collected from databases CSL (Central Science Laboratory) (http://allergen.csl.gov.uk) FARRP (Food Allergen Research and Resource Program) (http://www.allergenonline.org) and SDAP (Structural Database of Allergenic Proteins) (http://fermi.utmb.edu/SDAP/sdap_man html). The non-allergens were selected from the same species using a BLAST (Basic Local Alignment Search Tool) search with 0% identity to allergens at E-value 0.001 [15]. The final set of allergens contained 1400 proteins.

Additionally, a training set of 32 plant proteins was selected for checking the ability of cluster analysis (hierarchical and nonhierarchical mode) to correctly separate proteins into allergenic and non-allergenic classes based on surface protein descriptors. A data set of 16 allergenic proteins related with foods has been selected. Those proteins are classified as allergens by Protein Data Bank (PDB) (https://www.rcsb.org/) or/and by the Structural Database of Allergenic Proteins (SDAP) (http://fermi.utmb.edu/) These allergenic proteins are in one of the following foods: apple, barley, castor bean, cattle, coconut, fungi, legumin, maize, papaya, peach, peanut, olive or soybean. A complementary data set of 16 structures of non allergenic proteins has been also selected from proteins that are constituents of the following foods: apple, barley, castor bean, cattle, maize, papaya, peanut, or soybean. In the description of the proteins, these are not indicated to be either allergenic or related to cancer.

2.2 Protein sequences by E-descriptors and ACC transformation

The E-descriptors for the 20 naturally occurring amino acids, defined by Venkatarajan and Braun [14], were derived by Principal Components Analysis (PCA) of a data matrix consisting of 237 physicochemical properties. The first principal component (E1) reflects the hydrophobicity of the amino acid, the second (E2) reflects its size, the third (E3) reflects its helix-forming propensity, the fourth (E4) correlates with the relative abundance of the amino acid, and the fifth (E5) describes its strand forming propensity. In the present study, the five E-descriptors were used to describe the protein sequences.

The values for the five E-descriptors used in the present study to describe the protein sequences are given in Table 1. To make the length of the proteins uniform, an Auto-and Cross Covariance (ACC) transformation was used [10]. Auto-covariance Ajj(l) and cross-covariance Cjk(l) were calculated according to the following equations:

Ajj(l)=in1Ej,i×Ej,i+1n1Cjk(l)=in1Ej,i×Ek,i+1n1

where indices j and k refer to the E-descriptors (j = 1-5, k = 1-5, jk), n is the number of amino acids in a sequence, index i points the amino acid position (i = 1, 2,.. , n) and l is the lag (l = 1, 2, …, L). Short lags (L = 8) were chosen, as only the influence of close amino acid proximity was investigated. The subsets of antigens and non-antigens were transformed into matrices with 200 variables (5x5x8) each.

Table 1

E-descriptors of amino acids.

amino acidE1E2E3E4E5
Alanine (A)0.0080.134-0.475-0.0390.181
Arginine (R)0.171-0.3610.107-0.258-0.364
Asparagine (N)0.2550.0380.1170.118-0.055
Aspartic acid (D)0.303-0.057-0.0140.2250.156
Cysteine (C)-0.1320.1740.070.565-0.374
Glutamate (Q)0.149-0.1840.030.035-0.112
Glutamic acid (E)0.221-0.28-0.3150.1570.303
Glycine (G)0.2180.562-0.0240.0180.106
Histidine (H)0.023-0.1770.0410.28-0.021
Isoleucine (I)-0.3530.071-0.088-0.195-0.107
Leucine (L)-0.2670.018-0.265-0.2740.206
Lysine (K)0.243-0.339-0.044-0.325-0.027
Methionine (M)-0.239-0.141-0.1550.3210.077
Phenylalanine (F)-0.329-0.0230.072-0.0020.208
Proline (P)0.1730.2860.407-0.2150.384
Serine (S)0.1990.238-0.015-0.068-0.196
Threonine (T)0.0680.147-0.015-0.132-0.274
Tryptophan (W)-0.296-0.1860.3890.0830.297
Tyrosine (Y)-0.141-0.0570.425-0.096-0.091
Valine (V)-0.2740.136-0.187-0.196-0.299

2.3 Surface Descriptor for proteins

In the present study, a new type of molecular descriptor based on surface properties is defined. This descriptor is based on the characterization of the environment of any type of amino acid present on the surface of the proteins. The idea of this descriptor is to be able to classify proteins in terms of surface properties instead of global composition of the protein. In order to be able to characterize the amino acids chemically (polar, non-polar or charged) a set of hydrophobic scales is used [11].

The descriptor, ds,r, of scale, s, for the residue r (any of the 20 amino acids), is obtained from

ds,r=1nsc(r)r=1nsc(r)sr

where nsc (r) contains all the surface contacts established between r and r’ residues. sr is the value of the hydrophobicity scale for the residue r’.

A surface contact is defined when two residues are on the surface with a distance of separation between alpha carbons less than 8 angstroms. To determine that a residue is on the surface, the Residue Depth module of the BioPython package is used to determine its distance with respect to the surface. An average surface distance below of 2.5 angstroms is used to determine that a residue belongs to the protein surface. If a residue r is not present in the surface a value of 0 is assigned to ds,r descriptor.

2.4 Selection of Hydrophobicity Scales

In order to calculate surface descriptors for any amino acid, r, it must have different scales, s. These scales have to be able to separate the amino acids according to the nonpolar, polar or ionic character of their side chains. For that reason, an initial selection of 20 experimentally determined hydrophobicity scales were used [11]. These scales are mainly derived from partition coefficients found experimentally from measurements of amino acid solubility in water and in organic solvents. Depending on the measuring technique, organic solvent, chromatographic column and experimental procedure, different hydrophobicity scales are obtained.

2.5 Partial Least Squares–based Discriminant Analysis (PLS-DA)

Discriminant Analysis (DA) is a method for data classification based on a linear combination of explanatory variables (Ligand-based design manual, Sybyl [12]). Partial Least Squares (PLS)–based DA was used in the present study. PLS forms new X variables, called Principal Components (PC), as linear combinations of old variables, and then uses them to predict class membership. The optimum number of PCs was selected by adding components until the next added component explained less than 10% of the variance. In the present study, PLS-DA was performed by Soft Independent Modeling of Class Analogy (SIMCA) P-8.0 [13].

2.6 Receiver Operating Characteristics (ROC) statistics

The predictive ability of the derived final model was assessed by Receiver Operating Characteristic (ROC) statistics [14]. Four outcomes are possible in ROC-statistics: true positives (TP, true binders predicted as binders); true negatives (TN, true non-binders predicted as non-binders); false positives (FP, true non-binders predicted as binders); and false negatives (FN, true binders predicted as non-binders). Three classification functions were used in the present study: sensitivity (true positives/total positives), specificity (true negatives/total negatives) and accuracy (true positives and negatives/total). Sensitivity, specificity and accuracy were calculated at different thresholds and the area under the ROC curve (sensitivity/1-spesificity) (ROC) was calculated. AROC is a quantitative measure of predictive ability and varies from 0.5 for random prediction to 1.0 for perfect prediction.

2.7 Model validation

The models derived in the present study were validated by Cross-Validation (CV) and by an external test set. CV is a procedure for testing the predictive ability of models. The training set is divided into several groups with approximately equal numbers of members in each group. One group is defined as a test set and the rest form a new training set. The training set is used to derive a model, the test set, in order to test its predictivity. To reduce variability, multiple rounds of cross-validation are performed using different partitions, and the validation results are averaged over the rounds.

The derived models are validated also by an external test set containing allergens and non-allergens not included in the training set. The predictive ability of the models was estimated by the parameter’s sensitivity, specificity, accuracy and AROC.

2.8 Cluster analysis for protein separation

Cluster Analysis (CA) is a general term to indicate series of calculation procedures used for classification and grouping of objects or variables describing the objects [17,18]. The major goal of CA is to find optimal groupings of observations or their descriptive variables in such a way that the members of a cluster are similar to each other and the clusters formed are different from each other. In hierarchical clustering, the number of groups is preliminarily unknown since the non-hierarchical clustering as a supervised pattern recognition method requires a priori determination of the number of groups for data interpretation.

Each object in the data set could be presented by an object vector Хi. In order to interpret the data structure a similarity measure should be introduced like Euclidean distances [17]. Unwanted data rotations in the data structure are avoided by different data transformations the most applied one being the autoscaling or z – transformation [17]. The graphical output of the analysis is known as dendrogram plot.

Next important step after autoscaling and distance determination is the linkage algorithm. There are many options but hierarchical clustering relays often on Ward’s method of linkage and the non-hierarchical – on K– means mode.

It has to be mentioned that in non-hierarchical clustering all a priori required clusters are simultaneously obtained and this grouping does not possess hierarchy.

Ethical approval: The conducted research is not related to either human or animal use.

3 Results and Discussion

In order to derive a preliminary model for allergenicity prediction, a small set of 120 allergens and 120 non-allergens was compiled randomly from the set of 1400 proteins used in the study. The structure of proteins was described by the five E-descriptors and each protein was transformed into a string of 200 variables, applying ACC-transformation, as described in “Datasets and Methods”. The two-class matrix consisting of 240 proteins and 200 variables was subjected to PLS-DA with numbers of principal components varying from 1 to 4. The models were evaluated according to sensitivity, specificity and accuracy at threshold 0.5. The area under the ROC curve (AROC) also was recorded. The results are shown in Figure 1.

Figure 1 Sensitivity, specificity and accuracy at threshold 0.5, and AROC for the preliminary model for allergenicity prediction with different number of PCs.
Figure 1

Sensitivity, specificity and accuracy at threshold 0.5, and AROC for the preliminary model for allergenicity prediction with different number of PCs.

The preliminary model for allergenicity prediction is shown in Table 2. The assignment of ACC variables is as follows: the first digit corresponds to the E-descriptor for the i-th amino acid in the protein; the second digit corresponds to the E-descriptor for the j-th amino acid; and the third digit shows the lag. For example, ACC121 assigns the sum of ACC values calculated using E1 and E2 scales with a lag of 1 (first and second, second and third, third and fourth, etc. The variables in the model are ordered by their Variables Importance in Projection (VIP) values. Variables with VIP > 2.0 are essential to the model. Nineteen variables (32.5%) in the model have a VIP > 2.0. To differentiate between the most important, the threshold for VIP was increased to 1.500. Variables that meet this threshold include ACC121, ACC447, ACC444, ACC 228, ACC222, ACC141, ACC 243 and ACC 246. ACC444, ACC228, ACC222, ACC141, ACC243 have positive coefficients, while ACC121, ACC447 and ACC246 have negative ones. This means that proteins having negative ACC121, ACC447,

Table 2

VIP values and coefficients of the preliminary model for allergenicity prediction. The constant of the model is 0.998. Variables with VIP > 2.0 and coefficients > |0.100| are given in bold.

VariableVIPcoef.VariableVIPcoef.VariableVIPcoef.
ACC1212.759-0.178ACC5181.635-0.078ACC1171.4390.003
ACC4472.554-0.162ACC3441.639-0.078ACC4141.436-0.036
ACC4442.4480.32ACC3351.605-0.097ACC4511.426-0.047
ACC2282.2890.52ACC4271.5890.001ACC2551.404-0.055
ACC2222.1980.84ACC3531.587-0.096ACC2451.379-0.024
ACC1412.1360.65ACC1181.583-0.071ACC1461.3750.032
ACC2432.1150.099ACC1471.566-0.031ACC2521.3690.029
ACC2462.093-0.142ACC4421.5520.030ACC2441.369-0.051
ACC3231.9610.050ACC5231.5430.092ACC2421.367-0.045
ACC1221.929-0.083ACC3421.519-0.073ACC3111.361-0.020
ACC1441.8970.083ACC4181.492-0.085ACC5261.3570.043
ACC1281.799-0.115ACC4381.484-0.081ACC3451.3380.002
ACC2111.748-0.125ACC1141.482-0.009ACC4261.325-0.017
ACC5441.7160.116ACC2511.468-0.014ACC1241.322-0.021
ACC3241.7100.103ACC4431.4660.064ACC3581.2640.007

ACC246, and positive ACC444, ACC228, ACC222, ACC141 and ACC243 are likely to act as allergens.

Further, the preliminary model was used to predict the allergenicity of an external test set of 580 allergens and 580 non-allergens. It recognized 68% of the allergens and 77% of the non-allergens with 73% total accuracy at threshold 0.5. The AROC value was 0.785.

Encouraged by the good predictability of the preliminary model, we derived an extended model for allergenicity prediction based on 700 food allergens and 700 non-allergens. The structure of proteins was described by the three z-descriptors and ACC-transformed into strings of 200 variables. The two-class matrix consisting of 1,400 proteins and 200 variables was subjected to PLS-DA with number of PC varying from 1 to 4. The models were evaluated according to sensitivity, specificity and accuracy at threshold 0.5. AROC also was recorded. The results are shown in Figure 2.

Figure 2 Sensitivity, specificity and accuracy at threshold 0.5, and AROC for the extended model for allergenicity prediction with different number of PCs.
Figure 2

Sensitivity, specificity and accuracy at threshold 0.5, and AROC for the extended model for allergenicity prediction with different number of PCs.

The results showed that the highest values of the parameters are obtained by three PCs. The model with 4 PCs and the VIP-values of the variables are shown in Figure 2. Variables that have a VIP> 2.0 have the greatest significance for the model and coincide with the variables from the original model. The concept of the variables found in the preliminary model is confirmed here.

3.1 Cluster Analysis (CA) as separation tool for allergenicity of using a surface protein descriptor

In order to check the option for separation of proteins into allergenicity and non-allergenicity classes using the surface properties of proteins, a data set of 32 food proteins was prepared (16 allergenic and 16 non-allergenic). A new set of descriptors, ds,r was created based on the reference data for hydrophobicity of the amino acid components.

Values of 20 experimentally determined hydrophobicity scales [11] were used after the transformation. Thus, a training set of 38 proteins (19 allergic and 19 non-allergic) described by 98 most significant descriptors out of totally 400 was treated by Cluster Analysis (CA). The variable reduction was performed by the use of Principal Components Analysis (PCA).

In Figure 3 the hierarchical dendrogram for separation of the proteins into classes of allergenicity and non-allergenicity is shown.

Figure 3 Hierarchical dendrogram for separation between allergenic (a) from non-allergenic (na) proteins.
Figure 3

Hierarchical dendrogram for separation between allergenic (a) from non-allergenic (na) proteins.

As seen in Figure 3 the separation between allergenic (a) and non-allergenic (na) classes of food proteins is well expressed. Two major clusters are formed:

  1. K1 (lower left) with a total of 17 members, including 12 (a) and 5 (na), which could be conditionally named allergenic protein cluster. Correctly classified are 12 allergic proteins out of a total of 17 members (70%). Five out of a total of 17 members (30%) were wrongly classified as allergenic proteins;

  2. K2 (upper left) with a total of 21 members, including 14 (na) and 7 (a), which could be conditionally named non-allergenic protein cluster. Correctly classified are 14 non allergenic proteins out of 21 members (67%), while 7 out of 21 members (33%) were wrongly classified non-allergenic proteins.

The non-hierarchical clustering (K-means mode) gave the same results after checking an a priori stated hypothesis of separation of all 38 objects (proteins) into two clusters.

The results obtained by Cluster Analysis (CA) are of the same level of efficiency reached by the other classification approach, PLS–DA.

4 Conclusions

Allergenicity of food proteins is a crucial problem associated with the widespread usage of new foods, supplements and herbs, many of which may be of genetically modified origin. Allergenicity is a subtle, nonlinearly coded property. Most of the existing methods for allergenicity prediction are based on structural similarities of novel proteins to known allergens. Thus, the identification of a novel, structurally diverse allergens could not be predicted by these methods.

In the present study, we propose an alignment-free method for allergenicity prediction, based on the amino acid principal properties of hydrophobicity, size and electronic structure. Proteins are transformed into uniform vectors and analyzed by PLS-DA. Initially, a preliminary model was derived based on a small set of 120 allergenics and 120 non-allergenics. The model was tested by Cross-Validation and external test set and recognized correctly 73% of the proteins from the external test set. Then, the dataset was extended to 1,400 proteins (700 allergenics and 700 non-allergenics) and a new model was derived. The Cross-Validation study showed that the extended model is able to identify correctly 70% of the tested proteins.

The food allergens involved in the present study have diverse structure, composition and origin, which imply great variance in the set. By increasing the number of proteins in the training set, the number of PCs needed to explain this variance was increased. In the small initial set used to derive the preliminary model, two PCs were sufficient to obtain a model with good predictive ability. In the extended set of proteins used in the extended model, it was necessary to include a third PC. The model with 4 PCs had the highest predictive ability.

Both models point to the importance of variables ACC121, ACC447 and ACC246. These variables account for the electronic structure of amino acids located in close proximity but not next to each other. In addition, hierarchical and non-hierarchical (K-means) clustering using a surface protein descriptor reached an important level of separation within a small training set of allergenic and non-allergenic food proteins.

These results once again shows that the allergenicity is a hidden, complex property, depending on many factors, some of which are encoded in the primary structure of proteins and others in the spatial distribution of amino acids on the protein surface.

Acknowledgement

Author M. Nedyalkova is grateful to the National Scientific Program ICT in SES, financed by the Ministry of Education and Science. The author M.Nedyalkova is gratefully acknowledged to the L’Oréal Program for Woman in Science. Financial support from Generalitat de Catalunya (Grant 2017SGR1033) and Spanish Structures of Excellence María de Maeztu program through grant MDM- 2017–0767 is fully acknowledged.

  1. Conflict of interest: Authors declare no conflict of interest.

References

[1] Sampson H.A., Food allergy. Part 2: diagnosis and management, J. Allergy Clin. Immunol., 1999, 103, 981-989.10.1016/S0091-6749(99)70167-3Suche in Google Scholar

[2] Sampson H.A., Food allergy. Part 1: immunopathogenesis and clinical disorders, J. Allergy Clin. Immunol., 1999, 103, 717-728.10.1016/S0091-6749(99)70411-2Suche in Google Scholar

[3] Sampson H.A., Food allergy: when mucosal immunity goes wrong, J. Allergy Clin. Immunol., 2005, 115, 139-141.10.1016/j.jaci.2004.11.003Suche in Google Scholar

[4] Vanekkrebitz M., Hoffmannsommergruber K., Machado M.L.D., Susani M., Ebner C., Kraft D., et al., Cloning and Sequencing of Mal d 1, the Major Allergen from Apple (Malus domestica), and Its Immunological Relationship to Bet v 1, the Major Birch Pollen Allergen, Biochem. Biophys. Res. Commun., 1995, 214, 538-551.10.1006/bbrc.1995.2320Suche in Google Scholar

[5] Scheurer S., Son D.Y., Boehm M., Karamloo F., Franke S., Hoffmann A., et al., Cross-reactivity and epitope analysis of Pru a 1, the major cherry allergen, Mol. Immunol., 1999, 36, 155-167.10.1016/S0161-5890(99)00033-4Suche in Google Scholar

[6] Glaspole I.N., de Leon M.P., Rolland J.M., O’Hehir R.E., Characterization of the T-cell epitopes of a major peanut allergen, Ara h 2, Allergy, 2005, 60, 35-40.10.1111/j.1398-9995.2004.00608.xSuche in Google Scholar

[7-8] Fitch W.L., McGregor M., Katritzky A.R., Lomaka A., Petrukhin R., Karelson M., Prediction of Ultraviolet Spectral Absorbance Using Quantitative Structure−Property Relationships, J. Chem. Inf. Comput. Sci., 2002, 42, 830-840.10.1021/ci010116uSuche in Google Scholar

[9] Venkatarajan M.S., Braun W., New quantitative descriptors of amino acids based on multidimensional scaling of a large number of physical–chemical properties, J. Mol. Model., 2001, 7, 445-453.10.1007/s00894-001-0058-5Suche in Google Scholar

[10] Nyström Å., Andersson P.M., Lundstedt T., Multivariate Data Analysis of Topographically Modified α-Melanotropin Analogues using Auto and Cross Auto Covariances (ACC), Quant. Struct.-Act. Relat., 2000, 19, 264-269.10.1002/1521-3838(200006)19:3<264::AID-QSAR264>3.0.CO;2-ASuche in Google Scholar

[11] Palecz B., Enthalpic Homogeneous Pair Interaction Coefficients of l-α-Amino Acids as a Hydrophobicity Parameter of Amino Acid Side Chains, J. Am. Chem. Soc., 2002, 124, 6003–600810.1021/ja011937iSuche in Google Scholar

[12] Eriksson L., Umetrics, Multi- and megavariate data analysis : basic principles and applications, MKS Umetrics, 2013.Suche in Google Scholar

[13] SIMCA-P 8.0. Umetrics UK Ltd., Wokingham Road, RG42 1PL, Bracknell, UK.Suche in Google Scholar

[14] Bradley A.P., The use of the area under the ROC curve in the evaluation of machine learning algorithms, Pattern Recognition, 1997, 30, 1145-1159.10.1016/S0031-3203(96)00142-2Suche in Google Scholar

[15] Ivanciuc O., Schein C.H. Braun W., SDAP: database and computational tools for allergenic proteins, Nucleic Acids Res., 2003, 31, 359-362.10.1093/nar/gkg010Suche in Google Scholar

[16] Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J., Basic local alignment search tool, J. Mol. Biol., 1990, 215, 403-410.10.1016/S0022-2836(05)80360-2Suche in Google Scholar

[17] Massart D.L., Kaufman L., The interpretation of analytical chemical data by the use of cluster analysis, John Wiley and Sons, 1989.Suche in Google Scholar

[18] Vandeginste B., Massart D., De Jong S., Massaart D., Buydens L., Handbook of chemometrics and qualimetrics: Part B, Elsevier, 1998.Suche in Google Scholar

[19] Simeonov V., Classification: Encyclopedia of environmetrics, J. Wiley & Sons, 2001.10.1002/9780470057339.vac022Suche in Google Scholar

[20] Feldman R., Sanger J., The Text Mining Handbook Advanced Approaches in Analyzing Unstructured Data, Cambridge University Press, 2006.10.1017/CBO9780511546914Suche in Google Scholar

[21] Leskovec J., Rajaraman A., Ullman J.D., Mining of massive datasets, Cambridge university press, 2014.10.1017/CBO9781139924801Suche in Google Scholar

Received: 2018-12-26
Accepted: 2019-01-30
Published Online: 2019-06-03

© 2019 L. Naneva et al., published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 Public License.

Artikel in diesem Heft

  1. Regular Articles
  2. Research on correlation of compositions with oestrogenic activity of Cistanche based on LC/Q-TOF-MS/MS technology
  3. Efficacy of Pyrus elaeagnifolia subsp. elaeagnifolia in acetic acid–induced colitis model
  4. Anti-inflammatory and antinociceptive features of Bryonia alba L.: As a possible alternative in treating rheumatism
  5. High efficiency liposome fusion induced by reducing undesired membrane peptides interaction
  6. Prediction of the Blood-Brain Barrier Permeability Using RP-18 Thin Layer Chromatography
  7. Phytic Acid Extracted from Rice Bran as a Growth Promoter for Euglena gracilis
  8. Development of a validated spectrofluorimetric method for assay of sotalol hydrochloride in tablets and human plasma: application for stability-indicating studies
  9. Topological Indices of Hyaluronic Acid-Paclitaxel Conjugates’ Molecular Structure in Cancer Treatment
  10. Thermodynamic properties of the bubble growth process in a pool boiling of water-ethanol mixture two-component system
  11. Critical Roles of the PI3K-Akt-mTOR Signaling Pathway in Apoptosis and Autophagy of Astrocytes Induced by Methamphetamine
  12. Characteristics of Stable Hydrogen and Oxygen Isotopes of Soil Moisture under Different Land Use in Dry Hot Valley of Yuanmou
  13. Specific, highly sensitive and simple spectrofluorimetric method for quantification of daclatasvir in HCV human plasma patients and in tablets dosage form
  14. Chromium-modified cobalt molybdenum nitrides as catalysts for ammonia synthesis
  15. Langerhans cell-like dendritic cells treated with ginsenoside Rh2 regulate the differentiation of Th1 and Th2 cells in vivo
  16. Identification of Powdery Mildew Blumeria graminis f. sp. tritici Resistance Genes in Selected Wheat Varieties and Development of Multiplex PCR
  17. Computational Analysis of new Degree-based descriptors of oxide networks
  18. The Use Of Chemical Composition And Additives To Classify Petrol And Diesel Using Gas Chromatography–Mass Spectrometry And Chemometric Analysis: A Uk Study
  19. Minimal Energy Tree with 4 Branched Vertices
  20. Jatropha seed oil derived poly(esteramide-urethane)/ fumed silica nanocomposite coatings for corrosion protection
  21. Calculating topological indices of certain OTIS interconnection networks
  22. Energy storage analysis of R125 in UIO-66 and MOF-5 nanoparticles: A molecular simulation study
  23. Velvet Antler compounds targeting major cell signaling pathways in osteosarcoma - a new insight into mediating the process of invasion and metastasis in OS
  24. Effects of Azadirachta Indica Leaf Extract, Capping Agents, on the Synthesis of Pure And Cu Doped ZnO-Nanoparticles: A Green Approach and Microbial Activity
  25. Aqueous Micro-hydration of Na+(H2O)n=1-7 Clusters: DFT Study
  26. A proposed image-based detection of methamidophos pesticide using peroxyoxalate chemiluminescence system
  27. Phytochemical screening and estrogenic activity of total glycosides of Cistanche deserticola
  28. Biological evaluation of a series of benzothiazole derivatives as mosquitocidal agents
  29. Chemical pretreatments of Trapa bispinosa's peel (TBP) biosorbent to enhance adsorption capacity for Pb(ll)
  30. Dynamic Changes in MMP1 and TIMP1 in the Antifibrotic Process of Dahuang Zhechong Pill in Rats with Liver Fibrosis
  31. The Optimization and Production of Ginkgolide B Lipid Microemulsion
  32. Photodynamic Therapy Enhanced the Antitumor Effects of Berberine on HeLa Cells
  33. Chiral and Achiral Enantiomeric Separation of (±)-Alprenolol
  34. Correlation of Water Fluoride with Body Fluids, Dental Fluorosis and FT4, FT3 –TSH Disruption among Children in an Endemic Fluorosis area in Pakistan
  35. A one-step incubation ELISA kit for rapid determination of dibutyl phthalate in water, beverage and liquor
  36. Free Radical Scavenging Activity of Essential Oil of Eugenia caryophylata from Amboina Island and Derivatives of Eugenol
  37. Effects of Blue and Red Light On Growth And Nitrate Metabolism In Pakchoi
  38. miRNA-199a-5p functions as a tumor suppressor in prolactinomas
  39. Solar photodegradation of carbamazepine from aqueous solutions using a compound parabolic concentrator equipped with a sun tracking system
  40. Influence of sub-inhibitory concentration of selected plant essential oils on the physical and biochemical properties of Pseudomonas orientalis
  41. Preparation and spectroscopic studies of Fe(II), Ru(II), Pd(II) and Zn(II) complexes of Schiff base containing terephthalaldehyde and their transfer hydrogenation and Suzuki-Miyaura coupling reaction
  42. Complex formation in a liquid-liquid extraction-chromogenic system for vanadium(IV)
  43. Synthesis, characterization (IR, 1H, 13C & 31P NMR), fungicidal, herbicidal and molecular docking evaluation of steroid phosphorus compounds
  44. Analysis and Biological Evaluation of Arisaema Amuremse Maxim Essential Oil
  45. A preliminary assessment of potential ecological risk and soil contamination by heavy metals around a cement factory, western Saudi Arabia
  46. Anti- inflammatory effect of Prunus tomentosa Thunb total flavones in LPS-induced RAW264.7 cells
  47. Collaborative Influence of Elevated CO2 Concentration and High Temperature on Potato Biomass Accumulation and Characteristics
  48. Methods of extraction, physicochemical properties of alginates and their applications in biomedical field – a review
  49. Characteristics of liposomes derived from egg yolk
  50. Preparation of ternary ZnO/Ag/cellulose and its enhanced photocatalytic degradation property on phenol and benzene in VOCs
  51. Influence of Human Serum Albumin Glycation on the Binding Affinities for Natural Flavonoids
  52. Synthesis and antioxidant activity of 2-methylthio-pyrido[3,2-e][1,2,4] triazolo[1,5-a]pyrimidines
  53. Comparative study on the antioxidant activities of ten common flower teas from China
  54. Molecular Properties of Symmetrical Networks Using Topological Polynomials
  55. Synthesis of Co3O4 Nano Aggregates by Co-precipitation Method and its Catalytic and Fuel Additive Applications
  56. Phytochemical analysis, Antioxidant and Antiprotoscolices potential of ethanol extracts of selected plants species against Echinococcus granulosus: In-vitro study
  57. Silver nanoparticles enhanced fluorescence for sensitive determination of fluoroquinolones in water solutions
  58. Simultaneous Quantification of the New Psychoactive Substances 3-FMC, 3-FPM, 4-CEC, and 4-BMC in Human Blood using GC-MS
  59. Biodiesel Production by Lipids From Indonesian strain of Microalgae Chlorella vulgaris
  60. Miscibility studies of polystyrene/polyvinyl chloride blend in presence of organoclay
  61. Antibacterial Activities of Transition Metal complexes of Mesocyclic Amidine 1,4-diazacycloheptane (DACH)
  62. Novel 1,8-Naphthyridine Derivatives: Design, Synthesis and in vitro screening of their cytotoxic activity against MCF7 cell line
  63. Investigation of Stress Corrosion Cracking Behaviour of Mg-Al-Zn Alloys in Different pH Environments by SSRT Method
  64. Various Combinations of Flame Retardants for Poly (vinyl chloride)
  65. Phenolic compounds and biological activities of rye (Secale cereale L.) grains
  66. Oxidative degradation of gentamicin present in water by an electro-Fenton process and biodegradability improvement
  67. Optimizing Suitable Conditions for the Removal of Ammonium Nitrogen by a Microbe Isolated from Chicken Manure
  68. Anti-inflammatory, antipyretic, analgesic, and antioxidant activities of Haloxylon salicornicum aqueous fraction
  69. The anti-corrosion behaviour of Satureja montana L. extract on iron in NaCl solution
  70. Interleukin-4, hemopexin, and lipoprotein-associated phospholipase A2 are significantly increased in patients with unstable carotid plaque
  71. A comparative study of the crystal structures of 2-(4-(2-(4-(3-chlorophenyl)pipera -zinyl)ethyl) benzyl)isoindoline-1,3-dione by synchrotron radiation X-ray powder diffraction and single-crystal X-ray diffraction
  72. Conceptual DFT as a Novel Chemoinformatics Tool for Studying the Chemical Reactivity Properties of the Amatoxin Family of Fungal Peptides
  73. Occurrence of Aflatoxin M1 in Milk-based Mithae samples from Pakistan
  74. Kinetics of Iron Removal From Ti-Extraction Blast Furnace Slag by Chlorination Calcination
  75. Increasing the activity of DNAzyme based on the telomeric sequence: 2’-OMe-RNA and LNA modifications
  76. Exploring the optoelectronic properties of a chromene-appended pyrimidone derivative for photovoltaic applications
  77. Effect of He Qi San on DNA Methylation in Type 2 Diabetes Mellitus Patients with Phlegm-blood Stasis Syndrome
  78. Cyclodextrin potentiometric sensors based on selective recognition sites for procainamide: Comparative and theoretical study
  79. Greener synthesis of dimethyl carbonate from carbon dioxide and methanol using a tunable ionic liquid catalyst
  80. Nonisothermal Cold Crystallization Kinetics of Poly(lactic acid)/Bacterial Poly(hydroxyoctanoate) (PHO)/Talc
  81. Enhanced adsorption of sulfonamide antibiotics in water by modified biochar derived from bagasse
  82. Study on the Mechanism of Shugan Xiaozhi Fang on Cells with Non-alcoholic Fatty Liver Disease
  83. Comparative Effects of Salt and Alkali Stress on Antioxidant System in Cotton (Gossypium Hirsutum L.) Leaves
  84. Optimization of chromatographic systems for analysis of selected psychotropic drugs and their metabolites in serum and saliva by HPLC in order to monitor therapeutic drugs
  85. Electrocatalytic Properties of Ni-Doped BaFe12O19 for Oxygen Evolution in Alkaline Solution
  86. Study on the removal of high contents of ammonium from piggery wastewater by clinoptilolite and the corresponding mechanisms
  87. Phytochemistry and toxicological assessment of Bryonia dioica roots used in north-African alternative medicine
  88. The essential oil composition of selected Hemerocallis cultivars and their biological activity
  89. Mechanical Properties of Carbon Fiber Reinforced Nanocrystalline Nickel Composite Electroforming Deposit
  90. Anti-c-myc efficacy block EGFL7 induced prolactinoma tumorigenesis
  91. Topical Issue on Applications of Mathematics in Chemistry
  92. Zagreb Connection Number Index of Nanotubes and Regular Hexagonal Lattice
  93. The Sanskruti index of trees and unicyclic graphs
  94. Valency-based molecular descriptors of Bakelite network BNmn
  95. Computing Topological Indices for Para-Line Graphs of Anthracene
  96. Zagreb Polynomials and redefined Zagreb indices of Dendrimers and Polyomino Chains
  97. Topological Descriptor of 2-Dimensional Silicon Carbons and Their Applications
  98. Topological invariants for the line graphs of some classes of graphs
  99. Words for maximal Subgroups of Fi24
  100. Generators of Maximal Subgroups of Harada-Norton and some Linear Groups
  101. Special Issue on POKOCHA 2018
  102. Influence of Production Parameters on the Content of Polyphenolic Compounds in Extruded Porridge Enriched with Chokeberry Fruit (Aronia melanocarpa (Michx.) Elliott)
  103. Effects of Supercritical Carbon Dioxide Extraction (SC-CO2) on the content of tiliroside in the extracts from Tilia L. flowers
  104. Impact of xanthan gum addition on phenolic acids composition and selected properties of new gluten-free maize-field bean pasta
  105. Impact of storage temperature and time on Moldavian dragonhead oil – spectroscopic and chemometric analysis
  106. The effect of selected substances on the stability of standard solutions in voltammetric analysis of ascorbic acid in fruit juices
  107. Determination of the content of Pb, Cd, Cu, Zn in dairy products from various regions of Poland
  108. Special Issue on IC3PE 2018 Conference
  109. The Photocatalytic Activity of Zns-TiO2 on a Carbon Fiber Prepared by Chemical Bath Deposition
  110. N-octyl chitosan derivatives as amphiphilic carrier agents for herbicide formulations
  111. Kinetics and Mechanistic Study of Hydrolysis of Adenosine Monophosphate Disodium Salt (AMPNa2) in Acidic and Alkaline Media
  112. Antimalarial Activity of Andrographis Paniculata Ness‘s N-hexane Extract and Its Major Compounds
  113. Special Issue on ABB2018 Conference
  114. Special Issue on ICCESEN 2017
  115. Theoretical Diagnostics of Second and Third-order Hyperpolarizabilities of Several Acid Derivatives
  116. Determination of Gamma Rays Efficiency Against Rhizoctonia solani in Potatoes
  117. Studies On Compatibilization Of Recycled Polyethylene/Thermoplastic Starch Blends By Using Different Compatibilizer
  118. Liquid−Liquid Extraction of Linalool from Methyl Eugenol with 1-Ethyl-3-methylimidazolium Hydrogen Sulfate [EMIM][HSO4] Ionic Liquid
  119. Synthesis of Graphene Oxide Through Ultrasonic Assisted Electrochemical Exfoliation
  120. Special Issue on ISCMP 2018
  121. Synthesis and antiproliferative evaluation of some 1,4-naphthoquinone derivatives against human cervical cancer cells
  122. The influence of the grafted aryl groups on the solvation properties of the graphyne and graphdiyne - a MD study
  123. Electrochemical modification of platinum and glassy carbon surfaces with pyridine layers and their use as complexing agents for copper (II) ions
  124. Effect of Electrospinning Process on Total Antioxidant Activity of Electrospun Nanofibers Containing Grape Seed Extract
  125. Effect Of Thermal Treatment Of Trepel At Temperature Range 800-1200˚C
  126. Topical Issue on Agriculture
  127. The effect of Cladophora glomerata exudates on the amino acid composition of Cladophora fracta and Rhizoclonium sp.
  128. Influence of the Static Magnetic Field and Algal Extract on the Germination of Soybean Seeds
  129. The use of UV-induced fluorescence for the assessment of homogeneity of granular mixtures
  130. The use of microorganisms as bio-fertilizers in the cultivation of white lupine
  131. Lyophilized apples on flax oil and ethyl esters of flax oil - stability and antioxidant evaluation
  132. Production of phosphorus biofertilizer based on the renewable materials in large laboratory scale
  133. Human health risk assessment of potential toxic elements in paddy soil and rice (Oryza sativa) from Ugbawka fields, Enugu, Nigeria
  134. Recovery of phosphates(V) from wastewaters of different chemical composition
  135. Special Issue on the 4th Green Chemistry 2018
  136. Dead zone for hydrogenation of propylene reaction carried out on commercial catalyst pellets
  137. Improved thermally stable oligoetherols from 6-aminouracil, ethylene carbonate and boric acid
  138. The role of a chemical loop in removal of hazardous contaminants from coke oven wastewater during its treatment
  139. Combating paraben pollution in surface waters with a variety of photocatalyzed systems: Looking for the most efficient technology
  140. Special Issue on Chemistry Today for Tomorrow 2019
  141. Applying Discriminant and Cluster Analyses to Separate Allergenic from Non-allergenic Proteins
  142. Chemometric Expertise Of Clinical Monitoring Data Of Prolactinoma Patients
  143. Chemomertic Risk Assessment of Soil Pollution
  144. New composite sorbent for speciation analysis of soluble chromium in textiles
  145. Photocatalytic activity of NiFe2O4 and Zn0.5Ni0.5Fe2O4 modified by Eu(III) and Tb(III) for decomposition of Malachite Green
  146. Photophysical and antibacterial activity of light-activated quaternary eosin Y
  147. Spectral properties and biological activity of La(III) and Nd(III) Monensinates
  148. Special Issue on Monitoring, Risk Assessment and Sustainable Management for the Exposure to Environmental Toxins
  149. Soil organic carbon mineralization in relation to microbial dynamics in subtropical red soils dominated by differently sized aggregates
  150. A potential reusable fluorescent aptasensor based on magnetic nanoparticles for ochratoxin A analysis
  151. Special Issue on 13th JCC 2018
  152. Fluorescence study of 5-nitroisatin Schiff base immobilized on SBA-15 for sensing Fe3+
  153. Thermal and Morphology Properties of Cellulose Nanofiber from TEMPO-oxidized Lower part of Empty Fruit Bunches (LEFB)
  154. Encapsulation of Vitamin C in Sesame Liposomes: Computational and Experimental Studies
  155. A comparative study of the utilization of synthetic foaming agent and aluminum powder as pore-forming agents in lightweight geopolymer synthesis
  156. Synthesis of high surface area mesoporous silica SBA-15 by adjusting hydrothermal treatment time and the amount of polyvinyl alcohol
  157. Review of large-pore mesostructured cellular foam (MCF) silica and its applications
  158. Ion Exchange of Benzoate in Ni-Al-Benzoate Layered Double Hydroxide by Amoxicillin
  159. Synthesis And Characterization Of CoMo/Mordenite Catalyst For Hydrotreatment Of Lignin Compound Models
  160. Production of Biodiesel from Nyamplung (Calophyllum inophyllum L.) using Microwave with CaO Catalyst from Eggshell Waste: Optimization of Transesterification Process Parameters
  161. The Study of the Optical Properties of C60 Fullerene in Different Organic Solvents
  162. Composite Material Consisting of HKUST-1 and Indonesian Activated Natural Zeolite and its Application in CO2 Capture
  163. Topical Issue on Environmental Chemistry
  164. Ionic liquids modified cobalt/ZSM-5 as a highly efficient catalyst for enhancing the selectivity towards KA oil in the aerobic oxidation of cyclohexane
  165. Application of Thermal Resistant Gemini Surfactants in Highly Thixotropic Water-in-oil Drilling Fluid System
  166. Screening Study on Rheological Behavior and Phase Transition Point of Polymer-containing Fluids produced under the Oil Freezing Point Temperature
  167. The Chemical Softening Effect and Mechanism of Low Rank Coal Soaked in Alkaline Solution
  168. The Influence Of NO/O2 On The NOx Storage Properties Over A Pt-Ba-Ce/γ-Al2O3 Catalyst
  169. Special Issue on the International conference CosCI 2018
  170. Design of SiO2/TiO2 that Synergistically Increases The Hydrophobicity of Methyltrimethoxysilane Coated Glass
  171. Antidiabetes and Antioxidant agents from Clausena excavata root as medicinal plant of Myanmar
  172. Development of a Gold Immunochromatographic Assay Method Using Candida Biofilm Antigen as a Bioreceptor for Candidiasis in Rats
  173. Special Issue on Applied Biochemistry and Biotechnology 2019
  174. Adsorption of copper ions on Magnolia officinalis residues after solid-phase fermentation with Phanerochaete chrysosporium
  175. Erratum
  176. Erratum to: Sand Dune Characterization For Preparing Metallurgical Grade Silicon
Heruntergeladen am 8.9.2025 von https://www.degruyterbrill.com/document/doi/10.1515/chem-2019-0045/html
Button zum nach oben scrollen