Abstract
Variational Bayes has allowed the analysis of Bayes’ rule in terms of gradient flows, partial differential equations (PDE), and diffusion processes. Mean-field variational inference (MFVI) is a version of approximate Bayesian inference that optimizes over product distributions. In the spirit of variational Bayes, we represent the MFVI problem in three different manners: a gradient flow in a Wasserstein space, a system of quasilinear PDE, and a McKean–Vlasov diffusion process. Furthermore, we show that a time-discretized coordinate ascent variational inference algorithm in the product Wasserstein space of measures yields a gradient flow in the small-time-step limit. A similar result is obtained for their associated densities, with the limit given by a system of quasilinear PDEs. We illustrate how the tools provided here can be used to guarantee convergence of algorithms, and can be extended to a variety of approaches, old and new, to solve MFVI problems.
References
[1] L. Ambrosio, N. Gigli and G. Savaré, Gradient Flows in Metric Spaces and in the Space of Probability Measures, Lectures in Math. ETH Zürich, Birkhäuser, Basel, 2008. Search in Google Scholar
[2] M. Arnese and D. Lacker, Convergence of coordinate ascent variational inference for log-concave measures via optimal transport, preprint (2024), https://arxiv.org/abs/2404.08792. Search in Google Scholar
[3] L. Arnold, Stochastic Differential Equations: Theory and Applications, Wiley-Interscience, New York, 1974. Search in Google Scholar
[4] V. Barbu and M. Röckner, Probabilistic representation for solutions to nonlinear Fokker–Planck equations, SIAM J. Math. Anal. 50 (2018), no. 4, 4246–4260. 10.1137/17M1162780Search in Google Scholar
[5] C. Bianca and C. Dogbe, On the existence and uniqueness of invariant measure for multidimensional diffusion processes, Nonlinear Stud. 24 (2017), no. 3, 437–468. Search in Google Scholar
[6] P. Billingsley, Probability and Measure, Wiley Ser. Probab. Stat., John Wiley & Sons, Hoboken, 2013. Search in Google Scholar
[7] C. M. Bishop, Pattern Recognition and Machine Learning, Inform. Sci. Statist., Springer, New York, 2006. Search in Google Scholar
[8] D. M. Blei, A. Kucukelbir and J. D. McAuliffe, Variational inference: A review for statisticians, J. Amer. Statist. Assoc. 112 (2017), no. 518, 859–877. 10.1080/01621459.2017.1285773Search in Google Scholar
[9] V. Bögelein, A variational approach to porous medium type equations, IMN Int. Math. Nachr. 235 (2017), 17–32. Search in Google Scholar
[10] V. Bögelein, F. Duzaar and P. Marcellini, Existence of evolutionary variational solutions via the calculus of variations, J. Differential Equations 256 (2014), no. 12, 3912–3942. 10.1016/j.jde.2014.03.005Search in Google Scholar
[11] R. Carmona, F. Delarue and A. Lachapelle, Control of McKean–Vlasov dynamics versus mean field games, Math. Financ. Econ. 7 (2013), no. 2, 131–166. 10.1007/s11579-012-0089-ySearch in Google Scholar
[12] G. dos Reis, G. Smith and P. Tankov, Importance sampling for McKean–Vlasov SDEs, Appl. Math. Comput. 453 (2023), Paper No. 128078. 10.1016/j.amc.2023.128078Search in Google Scholar
[13] T. Funaki, A certain class of diffusion processes associated with nonlinear parabolic equations, Z. Wahrsch. Verw. Gebiete 67 (1984), no. 3, 331–348. 10.1007/BF00535008Search in Google Scholar
[14] N. Garcia Trillos and D. Sanz-Alonso, The Bayesian update: Variational formulations and gradient flows, Bayesian Anal. 15 (2020), no. 1, 29–56. 10.1214/18-BA1137Search in Google Scholar
[15] Y. Jiang, S. Chewi and A.-A. Pooladian, Algorithms for mean-field variational inference via polyhedral optimization in the Wasserstein space, preprint (2023), https://arxiv.org/abs/2312.02849. Search in Google Scholar
[16] R. Jordan, D. Kinderlehrer and F. Otto, The variational formulation of the Fokker–Planck equation, SIAM J. Math. Anal. 29 (1998), no. 1, 1–17. 10.1137/S0036141096303359Search in Google Scholar
[17] D. Lacker, Independent projections of diffusions: gradient flows for variational inference and optimal mean field approximations, preprint (2023), https://arxiv.org/abs/2309.13332. Search in Google Scholar
[18] M. Lambert, S. Chewi, F. Bach, S. Bonnabel and P. Rigollet, Variational inference via wasserstein gradient flows, preprint (2022), https://arxiv.org/abs/2205.15902. Search in Google Scholar
[19] J. Lott and C. Villani, Ricci curvature for metric-measure spaces via optimal transport, Ann. of Math. (2) 169 (2009), no. 3, 903–991. 10.4007/annals.2009.169.903Search in Google Scholar
[20] Y. Wang and D. M. Blei, Frequentist consistency of variational Bayes, J. Amer. Statist. Assoc. 114 (2019), no. 527, 1147–1161. 10.1080/01621459.2018.1473776Search in Google Scholar
[21] P. M. Williams, Bayesian conditionalisation and the principle of minimum information, British J. Philos. Sci. 31 (1980), no. 2, 131–144. 10.1093/bjps/31.2.131Search in Google Scholar
[22] R. Yao and Y. Yang, Mean field variational inference via Wasserstein gradient flow, preprint (2022), https://arxiv.org/abs/2207.08074. Search in Google Scholar
[23] A. Zellner, Optimal information processing and Bayes’s theorem, Amer. Statist. 42 (1988), no. 4, 278–284. 10.1080/00031305.1988.10475585Search in Google Scholar
© 2025 Walter de Gruyter GmbH, Berlin/Boston