Abstract
We propose a new algorithm for spectral learning of Hidden Markov Models (HMM). In contrast to the standard approach, we do not estimate the parameters of the HMM directly, but construct an estimate for the joint probability distribution. The idea is based on the representation of a joint probability distribution as an N-th-order tensor with low ranks represented in the tensor train (TT) format. Using TT-format, we get an approximation by minimizing the Frobenius distance between the empirical joint probability distribution and tensors with low TT-ranks with core tensors normalization constraints. We propose an algorithm for the solution of the optimization problem that is based on the alternating least squares (ALS) approach and develop its fast version for sparse tensors. The order of the tensor d is a parameter of our algorithm. We have compared the performance of our algorithm with the existing algorithm by Hsu, Kakade and Zhang proposed in 2009 and found that it is much more robust if the number of hidden states is overestimated.
Funding statement: The authors gratefully acknowledge the financial support from Ministry of Education and Science of the Russian Federation under grant 14.756.31.0001.
Acknowledgements
The authors express their deep gratitude to Professor Andrzej Cichocki for his helpful comments and assistance.
References
[1] A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade and M. Telgarsky, Tensor decompositions for learning latent variable models, J. Mach. Learn. Res. 15 (2014), 2773–2832. 10.21236/ADA604494Suche in Google Scholar
[2] B. W. Bader and T. G. Kolda, Efficient MATLAB computations with sparse and factored tensors, SIAM J. Sci. Comput. 30 (2007/08), no. 1, 205–231. 10.2172/897641Suche in Google Scholar
[3] L. E. Baum, T. Petrie, G. Soules and N. Weiss, A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains, Ann. Math. Statist. 41 (1970), 164–171. 10.1214/aoms/1177697196Suche in Google Scholar
[4] J. D. Caroll and J. J. Chang, Analysis of individual differences in multidimensional scaling via n-way generalization of Eckart–Young decomposition, Psychometrika 35 (1970), 283–319. 10.1007/BF02310791Suche in Google Scholar
[5] V. de Silva and L.-H. Lim, Tensor rank and the ill-posedness of the best low-rank approximation problem, SIAM J. Matrix Anal. Appl. 30 (2008), no. 3, 1084–1127. 10.1137/06066518XSuche in Google Scholar
[6] L. Grasedyck, Hierarchical singular value decomposition of tensors, SIAM J. Matrix Anal. Appl. 31 (2009/10), no. 4, 2029–2054. 10.1137/090764189Suche in Google Scholar
[7] W. Hackbusch and S. Kühn, A new scheme for the tensor representation, J. Fourier Anal. Appl. 15 (2009), no. 5, 706–722. 10.1007/s00041-009-9094-9Suche in Google Scholar
[8] D. Hsu, S. M. Kakade and T. Zhang, A spectral algorithm for learning hidden Markov models, J. Comput. System Sci. 78 (2012), no. 5, 1460–1480. 10.1016/j.jcss.2011.12.025Suche in Google Scholar
[9] X. D. Huang, Y. Ariki and M. A. Jack, Hidden Markov Models for Speech Recognition, Edinburgh University, Edinburgh, 1990. Suche in Google Scholar
[10] H. Jaeger, Observable operator models for discrete stochastic time series, Neural Comput. 12 (2000), no. 6, 1371–1398. 10.1162/089976600300015411Suche in Google Scholar PubMed
[11] T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Rev. 51 (2009), no. 3, 455–500. 10.1137/07070111XSuche in Google Scholar
[12] A. Krogh, B. Larsson, G. Von Heijne and E. L. L. Sonnhammer, Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes, J. Mol. Biol. 305 (2001), no. 3, 567–580. 10.1006/jmbi.2000.4315Suche in Google Scholar PubMed
[13] I. V. Oseledets, Tensor-train decomposition, SIAM J. Sci. Comput. 33 (2011), no. 5, 2295–2317. 10.1137/090752286Suche in Google Scholar
[14] I. Oseledets, M. Rakhuba and A. Uschmajew, Alternating least squares as moving subspace correction, preprint (2017), https://arxiv.org/abs/1709.07286. 10.1137/17M1148712Suche in Google Scholar
[15] I. V. Oseledets and E. E. Tyrtyshnikov, Breaking the curse of dimensionality, or how to use SVD in many dimensions, SIAM J. Sci. Comput. 31 (2009), no. 5, 3744–3759. 10.1137/090748330Suche in Google Scholar
[16] L. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE 77 (1989), no. 2, 257–286. 10.1016/B978-0-08-051584-7.50027-9Suche in Google Scholar
[17] T. Rohwedder and A. Uschmajew, On local convergence of alternating schemes for optimization of convex problems in the tensor train format, SIAM J. Numer. Anal. 51 (2013), no. 2, 1134–1162. 10.1137/110857520Suche in Google Scholar
[18] S. M. Siddiqi, B. Boots and G. J. Gordon, Reduced-rank hidden Markov models, International Conference on Artificial Intelligence and Statistics, PMLR, (2010), 741–748. Suche in Google Scholar
[19] L. Song, M. Ishteva, A. Parikh, E. Xing and H. Park, Hierarchical tensor decomposition of latent tree graphical models, Proceedings of the 30th International Conference on Machine Learning (ICML-13), PMLR (2013), 334–342. Suche in Google Scholar
[20] L. Song, E. P. Xing and A. P. Parikh, A spectral algorithm for latent tree graphical models, Proceedings of the 28th International Conference on Machine Learning (ICML-11), PMLR (2011), 1065–1072. Suche in Google Scholar
[21] K. Stratos, M. Collins and D. Hsu, Unsupervised part-of-speech tagging with anchor hidden Markov models, Trans. Assoc. Comput. Linguist. 4 (2016), 245–257. 10.1162/tacl_a_00096Suche in Google Scholar
© 2018 Walter de Gruyter GmbH, Berlin/Boston
Artikel in diesem Heft
- Frontmatter
- Tensor Numerical Methods: Actual Theory and Recent Applications
- A Low-Rank Inexact Newton–Krylov Method for Stochastic Eigenvalue Problems
- A Tensor Decomposition Algorithm for Large ODEs with Conservation Laws
- Non-intrusive Tensor Reconstruction for High-Dimensional Random PDEs
- Quasi-Optimal Rank-Structured Approximation to Multidimensional Parabolic Problems by Cayley Transform and Chebyshev Interpolation
- Projection Methods for Dynamical Low-Rank Approximation of High-Dimensional Problems
- Tensor Train Spectral Method for Learning of Hidden Markov Models (HMM)
- Tucker Tensor Analysis of Matérn Functions in Spatial Statistics
- Low-Rank Space-Time Decoupled Isogeometric Analysis for Parabolic Problems with Varying Coefficients
- Approximate Solution of Linear Systems with Laplace-like Operators via Cross Approximation in the Frequency Domain
- Rayleigh Quotient Methods for Estimating Common Roots of Noisy Univariate Polynomials
Artikel in diesem Heft
- Frontmatter
- Tensor Numerical Methods: Actual Theory and Recent Applications
- A Low-Rank Inexact Newton–Krylov Method for Stochastic Eigenvalue Problems
- A Tensor Decomposition Algorithm for Large ODEs with Conservation Laws
- Non-intrusive Tensor Reconstruction for High-Dimensional Random PDEs
- Quasi-Optimal Rank-Structured Approximation to Multidimensional Parabolic Problems by Cayley Transform and Chebyshev Interpolation
- Projection Methods for Dynamical Low-Rank Approximation of High-Dimensional Problems
- Tensor Train Spectral Method for Learning of Hidden Markov Models (HMM)
- Tucker Tensor Analysis of Matérn Functions in Spatial Statistics
- Low-Rank Space-Time Decoupled Isogeometric Analysis for Parabolic Problems with Varying Coefficients
- Approximate Solution of Linear Systems with Laplace-like Operators via Cross Approximation in the Frequency Domain
- Rayleigh Quotient Methods for Estimating Common Roots of Noisy Univariate Polynomials