Hierarchical Bayesian mixture modelling for antigen-specific T-cell subtyping in combinatorially encoded flow cytometry studies

Lin Lin; Cliburn Chan; Sine R. Hadrup; Thomas M. Froesig; Quanli Wang; Mike West

doi:10.1515/sagmb-2012-0001

Article

Hierarchical Bayesian mixture modelling for antigen-specific T-cell subtyping in combinatorially encoded flow cytometry studies

Lin Lin , Cliburn Chan , Sine R. Hadrup , Thomas M. Froesig , Quanli Wang and Mike West

Published/Copyright: April 24, 2013

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Statistical Applications in Genetics and Molecular Biology Volume 12 Issue 3

Abstract

Novel uses of automated flow cytometry technology for measuring levels of protein markers on thousands to millions of cells are promoting increasing need for relevant, customized Bayesian mixture modelling approaches in many areas of biomedical research and application. In studies of immune profiling in many biological areas, traditional flow cytometry measures relative levels of abundance of marker proteins using fluorescently labeled tags that identify specific markers by a single-color. One specific and important recent development in this area is the use of combinatorial marker assays in which each marker is targeted with a probe that is labeled with two or more fluorescent tags. The use of several colors enables the identification of, in principle, combinatorially increasingly numbers of subtypes of cells, each identified by a subset of colors. This represents a major advance in the ability to characterize variation in immune responses involving larger numbers of functionally differentiated cell subtypes. We describe novel classes of Markov chain Monte Carlo methods for model fitting that exploit distributed GPU (graphics processing unit) implementation. We discuss issues of cellular subtype identification in this novel, general model framework, and provide a detailed example using simulated data. We then describe application to a data set from an experimental study of antigen-specific T-cell subtyping using combinatorially encoded assays in human blood samples. Summary comments discuss broader questions in applications in immunology, and aspects of statistical computation.

Keywords: Dirichlet process mixtures; GPU computing; Hierarchical model; Immune profiling; Immune response biomarkers; Large data sets; Markov chain Monte Carlo; Massive mixture models; Multimers; Posterior simulation; Relabeling; T-cell subtyping

Corresponding author: Lin Lin, Department of Statistical Science, Duke University, Durham, NC, 27708-0251, USA

7 Appendix

7.1 Priors for parameters of phenotypic marker mixture

In the DP mixture model of Section 3.3, the traditional prior specification is as follows. Further background can be found in, for example Ishwaran and James (2001) or the summary appendix in Ji et al. (2009).

First, π₁=ν₁, π_j=(1–ν₁). . .(1–ν_j–₁)ν_j, where ν_{j, j}≠_J~Be(1, α_b),ν_J=1, and the hyper-parameter α_b˜Ga(e_b, f_b) for some specified e_b and f_b. Second, independently of the π_j the normal mean and variance matrices are independent across components with priors

(μ_b,j, Σ_b,j)~N(μ_b,j|m, λΣ_b,j)IW(Σ_b,j|δ_b_,Φ_b)

for some specified hyper-parameters m, λ, δ_b, Φ_b.

7.2 Priors for mixing weights in multimer mixtures

In the hierarchical DP mixture model of Section 3.4, the J sets of probabilities ω_j,1:K have priors defined from an underlying (truncated) hierarchical DP model as discussed in Teh et al. (2006). This extends the stick-breaking prior for mixture component weights to the set of J mixtures and links across them, with details as follows.

First, generate a K-–vector of probabilities η_1:_K via the stick-breaking construction

where ϕ_k~Beta(1, γ_t), k=1, . . ., K–1 and ϕ_K=1 and where γ_t~G(e_t, f_t) for some given hyper-parameters e_t, f_t.

Then, for each phenotypic marker component j=1:J, generate the multimer mixture weights ω_j_,1:_K via

where and ϕ_{j, K}=1. We use hyper-priors α_t˜G(a, c) for given hyper-parameters a, c.

7.3 MCMC analysis

Under the hierarchical mixture model specification, the MCMC analysis introduced in Section 3.6 has technical components as detailed here.

The use of an augmented model based on underlying, sample-specific, component indicator variables that induce the mixtures is, of course, key. We have already introduced the phenotypic marker mixture indicators z_b,i, latent indicators that underlie the assignment of each data point b_i to one of the j=1:J components in the mixture of equation (2). At the second stage, we can apply the same strategy to the mixture for the t_i conditional on b_i of equation (4) by utilizing indicators z_{t, i}.

At each MCMC iterate, we resample subsets of Θ and the two sets of indicators Z={z_b,i, z_t,i, i=1:n} via the following sequence of sampling steps. In each conditional distribution the conditioning . . . represent the data and all other parameters and/or indicators; in some cases the need to be explicit about some of conditioning quantities is reflected in the use of the ^– superscript for their values at the preview MCMC iterate.

7.3.1 Update component indicator variables

For each i=1:n in parallel due to conditional independence, update z_b,i and z_{t, i} by sampling from their conditional multinomial (number of trials=1 in each case) posteriors defined by probabilities as follows:

P(z_{b, i}=j|…)απ_jN(b_i|μ_b,j, Σ_b,j), j=1:J;

P(z_t,i=k|…)αω_i,k(b_i) N(t_i|μ_t,k, Σ_t,k), k=1:K.

As z_b_,_i is conditionally independent of z_t_,_i, sampling the multimer model indicators is done in parallel with the phenotypic marker indicators.

7.3.2 Update phenotypic marker model parameters

7.3.2.1 Update phenotypic marker mixture weights and hyperparameter

Sampling mixture probabilities π_1:_J and the hyperparameter of the DP model use a Metropolis-Hastings extension of the standard component distributions (Ishwaran and James, 2001; Ji et al., 2009), as follows. The mixture probabilities π_j are obtained from underlying beta variates ν_1:_J–₁ as detailed in Appendix 7.1. Hence new π_1:_J samples are computed directly from resampled ν_1:_J–₁ samples. For the latter, we have

where for each j=1:J–1. The complications here are that, since the π_j are functions of the ν_j, then ν_j is implicitly involved in both numerator and denominator terms of the product expression multiplying the base beta distributions. Hence we use a Metropolis-Hastings sampler for this step, based on a customized proposal distribution

with

This proposal distribution is an approximation of p(ν_j|…) by taking off the denominator f(b_i|Θ)^–¹ in the product expression and assuming that dominates the rest of the component values. Our experience with examples and the data analysis reported is that this generates acceptable convergence with acceptance rates for these components of the MCMC around 10–50%.

The weights π_1:_J are then evaluated by the formula in Appendix 7.1. Next, the hyperparameter α_b is resampled from

7.3.2.2 Update phenotypic marker component means and variance matrices

For each j=1:J the mean μ_b,j has conditional posterior

with

where c_j=λ/(1+λa₁_,j). Again we need a Metropolis-Hastings sampler for this step as the base normal distribution here is multiplied by a term that depends in complicated ways on μ_b,j. We use the customized proposal distribution where

with

A similar structure and MCMC strategy arises for each of the j=1:J variance matrices Σ_b,j; the conditional posteriors are

where

We use the customized proposal distribution

with

We update the pair (μ_b,j, Σ_b,j) together each iterate. We achieve acceptable convergence with acceptance rates for these components of the MCMC around 20–45%.

7.3.3. Update multimer model parameters

7.3.3.1 Update multimer mixture weights and hyperparameter

With the definitions and notation of the multimer mixture model parameters of Appendix 7.2, the logic and details of the MCMC steps are as follows.

For each k=1:K ϕ_k has conditional posterior

To choose a proposal distribution, first, for each i=1:n and independently over i, generate a set of auxiliary indicator variables q_i from conditional multinomials on k=1:K cells with number of trials=1 and

(q_i=k)∝η_kN(t_i; μ_t,k, Σ_t,k), k=1:K.

Given these sampled values, generate

where for each r=1:K. We achieve acceptable convergence with acceptance rates for these components of the MCMC around 10–40%.

The sets of weights ω_j,k and the η_1:_K probabilities are then evaluated by the formulæ given in Appendix 7.2. Further, the hyper-parameter γ_t is resampled from

Next, for each j=1:J and k=1:K, the latent probabilities ϕ_j,k of Appendix 7.2 have conditional posterior

We use the customized proposal distribution

where We achieve acceptable convergence with acceptance rates for these components of the MCMC around 5–50%.

7.3.3.2 Update multimer component means and variance matrices

For each k=1:K the mean μ_t,k is sampled using an additional auxiliary random quantity that allocates the multimer to one of the K anchor regions based on current parameters and indicators. That is, for each k independently, draw an auxiliary indicator τ_k from the multinomial with one trial and probabilities on k=1:K given by

Then draw (μ_t,k|C_k=r, …)~N (μ_t,_k|m_t,k, M_t,k) where

with

Finally, resample the variance matrices from

Research reported here was partially supported by grants from the US National Science Foundation (DMS 1106516 of M.W.) and National Institutes of Health [P50-GM081883 of M.W., and RC1 AI086032 of C.C. & M.W., and the Danish Cancer Society (DP06031)]. Any opinions, findings and conclusions or recommendations expressed in this work are those of the authors and do not necessarily reflect the views of the NIH and/or NSF.

References

Andersen, R. S., P. Kvistborg, T. M. Frøsig, N. W. Pedersen, R. Lyngaa, A. H. Bakker, C. J. Shu, P. thor Straten, T. N. Schumacher and S. R. Hadrup (2012): “Parallel detection of antigen-specific t cell responses by combinatorial encoding of mhc multimers,” Nature Protocols, 7, 891–902.10.1038/nprot.2012.037Search in Google Scholar PubMed

Chan, C., F. Feng, M. West and T. B. Kepler (2008): “Statistical mixture modelling for cell subtype identification inflow cytometry,” Cytometry A, 73, 693–701.10.1002/cyto.a.20583Search in Google Scholar PubMed PubMed Central

Cron, A. J. and M. West (2011): “Efficient classification-based relabeling in mixture models,” Am. Stat., 65, 16–20.Search in Google Scholar

Escobar, M. D. and M. West (1995): “Bayesian density estimation and inference using mixtures,” J. Am. Stat. Assoc., 90, 577–588.Search in Google Scholar

Feyerabend S., S. Stevanovic, C. Gouttefangeas, D. Wernet, J. Hennenlotter, J. Bedke, K. Dietz, S. Pascolo, M. Kuczyk, H. G. Rammensee and A. Stenzl (2009): “Novel multi-peptide vaccination in hla-a2+ hormone sensitive patients with biochemical relapse of prostate cancer,” Prostate, 69, 917–927.10.1002/pros.20941Search in Google Scholar PubMed

Finak, G., A. Bashashati, R. Brinkman and R. Gottardo (2009): “Merging mixture components for cell population identification in flow cytometry,” Adv. Bioinform, Article ID 247646.Search in Google Scholar

Frelinger, J., J. Ottinger, C. Gouttefangeas and C. Chan (2010): “Modeling flow cytometry data for cancer vaccine immune monitoring,” Cancer Immunol. Immun., 59, 1435–1441.Search in Google Scholar

Hadrup, S. R., A. H. Bakker, C. J. Shu, R. S. Andersen, J. van Veluw, P. Hombrink, E. Castermans, P. thor Straten, C. Blank, J. B. Haanen, M. H. Heemskerk and T. N. Schumacher (2009): “Parallel detection of antigen-specificT-cell responses by multidimensional encoding of MHC multimers,” Nat. Methods, 6, 520–528.Search in Google Scholar

Hadrup, S. R. and T. N. Schumacher (2010): “MHC-based detection of antigen-specific CD8+ T cell responses,” Cancer Immunol. Immun., 59, 1425–1433.Search in Google Scholar

Ishwaran, H. and L. F. James (2001): “Gibbs sampling methods for stick-breaking priors,” J. Am. Stat. Assoc., 96, 161–173.Search in Google Scholar

Ji, C., D. Merl, T. B. Kepler and M. West (2009): “Spatial mixture modelling for unobserved point processes: application to immunofluorescence histology,” Bayesian Analysis, 4, 297–316.10.1214/09-BA411Search in Google Scholar PubMed PubMed Central

Lo, K., R. R. Brinkman and R. Gottardo (2008): “Automated gating of flow cytometry data via robust model-based clustering,” Cytometry A, 73, 321–332.10.1002/cyto.a.20531Search in Google Scholar PubMed

Manolopoulou, I., C. Chan and M. West (2010): “Selection sampling from large datasets for targeted inference in mixture modeling (with discusion),” Bayesian Analysis, 5, 429–450,PMC2943396.10.1214/10-BA517Search in Google Scholar

Newell, E. W., L. O. Klein, W. Yu and M. M. Davis (2009): “Simultaneous detection of many T-cell specificities using combinatorial tetramer staining,” Nat. Methods, 6, 497–499.Search in Google Scholar

Pyne, S., X. Hu, K. Wang, E. Rossin, T. Lin, L. M. Maier, C. Baecher-Allan, G. J. McLachlan, P. Tamayo, D. A. Hafler, P. L. DeJager and J. P. Mesirov (2009): “Automated high-dimensional flow cytometric data analysis,” Proc. Natl. Acad. Sci., 106, 8519.Search in Google Scholar

Suchard, M. A., Q. Wang, C. Chan, J. Frelinger, A. J. Cron and M. West (2010): “Understanding GPU programming for statistical computation: studies in massively parallel massive mixtures,” J. Comput. Graph. Stat., 19, 419–438.Search in Google Scholar

Teh, Y. W., M. I. Jordan, M. J. Beal and D. M. Blei (2006): “Hierarchical Dirichlet processes,” J. Am. Stat. Assoc., 101, 1566–1581.Search in Google Scholar

West, M., P. Müller and M. D. Escobar (1994): Hierarchical priors and mixture models, with application in regression and density estimation. In: Smith, A. F. M., Freeman P. R. (Eds.), Aspects of Uncertainty: A Tribute to D. V. Lindley. London: Wiley, pp. 363–386.Search in Google Scholar

Published Online: 2013-04-24

Published in Print: 2013-06-01

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/sagmb-2012-0001

Keywords for this article

Dirichlet process mixtures; GPU computing; Hierarchical model; Immune profiling; Immune response biomarkers; Large data sets; Markov chain Monte Carlo; Massive mixture models; Multimers; Posterior simulation; Relabeling; T-cell subtyping