Startseite Application of mutual information estimation for predicting the structural stability of pentapeptides
Artikel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

Application of mutual information estimation for predicting the structural stability of pentapeptides

  • A. I. Mikhalskii EMAIL logo , I. V. Petrov , V. V. Tsurko , A. A. Anashkina und A. N. Nekrasov
Veröffentlicht/Copyright: 30. Oktober 2020

Abstract

A novel non-parametric method for mutual information estimation is presented. The method is suited for informative feature selection in classification and regression problems. Performance of the method is demonstrated on problem of stable short peptide classification.

MSC 2010: 92-04; 62-07; 62G07

Funding statement: The work was supported by the RFBR (project No. 20–04–01085).

References

[1] H. Almuallim and T. G. Dietterich, Learning with many irrelevant features. Proc. 9th National Conf. on Artificial Intelligence, AAAI Press, 1991, pp. 547–552.Suche in Google Scholar

[2] P. Comon, Independent component analysis. A new concept. Signal Processing36 (1994), 287–314.10.1016/0165-1684(94)90029-9Suche in Google Scholar

[3] D. Darmon, Information-theoretic model selection for optimal prediction of stochastic dynamical systems from data. Phys. Review E97 (2018), No. 3, 032206.10.1103/PhysRevE.97.032206Suche in Google Scholar PubMed

[4] L. Ein-Dor, O. Zuk, and E. Domany, Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc. Natl. Acad. Sci. USA103 (2006), No. 15, 5923–5928.10.1073/pnas.0601231103Suche in Google Scholar PubMed PubMed Central

[5] I. T. Jolliffe, Principal Component Analysis. Springer–Verlag, New York, 1986.10.1007/978-1-4757-1904-8Suche in Google Scholar

[6] I. Kononenko, Estimating attributes: analysis and extensions of RELIEF. Proc. 7th Europ. Conf. on Machine Learning, 1994.10.1007/3-540-57868-4_57Suche in Google Scholar

[7] A. Kraskov, H. Stoogbauer, and P. Grassberger, Estimating mutual information. Phys. Review E69 (2004), No. 6, 066138.10.1103/PhysRevE.69.066138Suche in Google Scholar PubMed

[8] O. F. Lange and H. Grubmuller, Generalized correlation for biomolecular dynamics. Proteins62 (2006), 1053–1061.10.1002/prot.20784Suche in Google Scholar PubMed

[9] A. N. Nekrasov, Entropy of protein sequences: an integral approach. J. Biomolecular Struct. Dynam. 20 (2002), 87–92.10.1080/07391102.2002.10506825Suche in Google Scholar PubMed

[10] A. N. Nekrasov, Analysis of the information structure of protein sequences: a new method for analyzing the domain organization of proteins. J. Biomolecular Struct. Dynam. 21 (2004), No. 5, 615–623.10.1080/07391102.2004.10506952Suche in Google Scholar PubMed

[11] A. N. Nekrasov, L. G. Alekseeva, R. A. Pogosyan, D. A. Dolgikh, M. P. Kirpichnikov, A. G. de Brevern, and A. A. Anashkina, A minimum set of stable blocks for rational design of polypeptide chains. Biochimie160 (2019), 88–92.10.1016/j.biochi.2019.02.006Suche in Google Scholar PubMed

[12] A. N. Nekrasov, A. A. Anashkina, and A. A. Zinchenko, A new paradigm of protein structural organization. Theoretical Approaches to BioInformation Systems (2014), 1–22.Suche in Google Scholar

[13] B. Scholkopf, R. Herbrich, and A. J. Smola, A generalized representer theorem. LNAI (2001), 416–426.10.1007/3-540-44581-1_27Suche in Google Scholar

[14] T. Suzuki, M. Sugiyama, T. Kanamori, and J. Sese, Mutual information estimation reveals global associations between stimuli and biological processes. BMC Bioinformatics10 (2009), 552.10.1186/1471-2105-10-S1-S52Suche in Google Scholar PubMed PubMed Central

[15] G. D. Tourassi, E. D. Frederick, M. K. Markey, and C. E. Jr. Floyd, Application of the mutual information criterion for feature selection in computer-aided diagnosis. Medical Physics28 (2001), No. 12, 2394–2402.10.1118/1.1418724Suche in Google Scholar PubMed

[16] V. Tsurko and A. Michalskii, Contrasting method for selection of informative features using empirical data. Avtomatika i Telemekhanika12 (2016), 136–154 (in Russian).Suche in Google Scholar

[17] V. Vapnik and R. Izmailov, Statistical inference problems and their rigorous solutions. Statistical Learning and Data Sciences LNAI (2015), No. 9047, 33–75.10.1007/978-3-319-17091-6_2Suche in Google Scholar

Appendix A. Nonparametric estimation of mutual information

Substituting representation (1.5) into functional Je(ŵ, λ), we get

Je(w^,λ)=12n2i=1nj=1nl=1nαlK(xi,yj,xl,yl)21ni=1nl=1nαlK(xi,yi,xl,yl)+λ2l=1nαlK(xi,yi,xl,yl)L2+C.

The first summand is transformed to the form

12n2i=1nj=1nl=1nαlK(xi,yj,xl,yl)2=12n2i=1nj=1nl=1nm=1nαlK(xi,yj,xl,yl)αmK(xi,yj,xm,ym)=12n2l=1nm=1nαlαmi=1nj=1nK(xi,yj,xl,yl)K(xi,yj,xm,ym)=12l=1nm=1nαlαmHlm

where Hlm=1n2i=1nj=1nK(xi,yj,xl,yl)K(xi,yj,xm,ym).

The second summand is transformed to the form

1ni=1nl=1nαlK(xi,yi,xl,yl)=1nl=1nαli=1nK(xi,yi,xl,yl)=l=1nαlhl

where hl=1ni=1nK(xi,yi,xl,yl).

Calculate the last summand

λ2l=1nαlK(xi,yi,xl,yl)L2=λ2l=1nαlK(xi,yi,xl,yl),m=1nαmK(xi,yi,xm,ym)λ2l=1nm=1nαlαmK(xi,yi,xl,yl),K(xi,yi,xm,ym)=λ2l=1nm=1nαlαmK(xl,yl,xm,ym).

The calculation uses the property of the scalar product in the Hilbert space with the reproducing kernel K(z, t), namely, < K(z, u), K(t, u) > = K(z, t). Denoting the matrix with the elements Kij = K(xi, yi, xj, yj), by K, we finally obtain the expression

Je(α,λ)=12αTHααTh+λ2αTKα+C.

The minimum of the later functional is attained at the vector

α=(H+λK)1h.
Received: 2019-10-18
Revised: 2020-07-09
Accepted: 2020-09-18
Published Online: 2020-10-30
Published in Print: 2020-10-27

© 2020 Walter de Gruyter GmbH, Berlin/Boston

Heruntergeladen am 8.9.2025 von https://www.degruyterbrill.com/document/doi/10.1515/rnam-2020-0022/pdf
Button zum nach oben scrollen