Home Life Sciences Hierarchical Bayesian mixture modelling for antigen-specific T-cell subtyping in combinatorially encoded flow cytometry studies
Article
Licensed
Unlicensed Requires Authentication

Hierarchical Bayesian mixture modelling for antigen-specific T-cell subtyping in combinatorially encoded flow cytometry studies

  • Lin Lin EMAIL logo , Cliburn Chan , Sine R. Hadrup , Thomas M. Froesig , Quanli Wang and Mike West
Published/Copyright: April 24, 2013

Abstract

Novel uses of automated flow cytometry technology for measuring levels of protein markers on thousands to millions of cells are promoting increasing need for relevant, customized Bayesian mixture modelling approaches in many areas of biomedical research and application. In studies of immune profiling in many biological areas, traditional flow cytometry measures relative levels of abundance of marker proteins using fluorescently labeled tags that identify specific markers by a single-color. One specific and important recent development in this area is the use of combinatorial marker assays in which each marker is targeted with a probe that is labeled with two or more fluorescent tags. The use of several colors enables the identification of, in principle, combinatorially increasingly numbers of subtypes of cells, each identified by a subset of colors. This represents a major advance in the ability to characterize variation in immune responses involving larger numbers of functionally differentiated cell subtypes. We describe novel classes of Markov chain Monte Carlo methods for model fitting that exploit distributed GPU (graphics processing unit) implementation. We discuss issues of cellular subtype identification in this novel, general model framework, and provide a detailed example using simulated data. We then describe application to a data set from an experimental study of antigen-specific T-cell subtyping using combinatorially encoded assays in human blood samples. Summary comments discuss broader questions in applications in immunology, and aspects of statistical computation.


Corresponding author: Lin Lin, Department of Statistical Science, Duke University, Durham, NC, 27708-0251, USA

7 Appendix

7.1 Priors for parameters of phenotypic marker mixture

In the DP mixture model of Section 3.3, the traditional prior specification is as follows. Further background can be found in, for example Ishwaran and James (2001) or the summary appendix in Ji et al. (2009).

First, π1=ν1, πj=(1–ν1). . .(1–νj1)νj, where νj, jJ~Be(1, αb),νJ=1, and the hyper-parameter αb˜Ga(eb, fb) for some specified eb and fb. Second, independently of the πj the normal mean and variance matrices are independent across components with priors

(μb,j, Σb,j)~N(μb,j|m, λΣb,j)IWb,j|δb,Φb)

for some specified hyper-parameters m, λ, δb, Φb.

7.2 Priors for mixing weights in multimer mixtures

In the hierarchical DP mixture model of Section 3.4, the J sets of probabilities ωj,1:K have priors defined from an underlying (truncated) hierarchical DP model as discussed in Teh et al. (2006). This extends the stick-breaking prior for mixture component weights to the set of J mixtures and links across them, with details as follows.

First, generate a K-–vector of probabilities η1:K via the stick-breaking construction

where ϕk~Beta(1, γt), k=1, . . ., K–1 and ϕK=1 and where γt~G(et, ft) for some given hyper-parameters et, ft.

Then, for each phenotypic marker component j=1:J, generate the multimer mixture weights ωj,1:K via

where and ϕj, K=1. We use hyper-priors αt˜G(a, c) for given hyper-parameters a, c.

7.3 MCMC analysis

Under the hierarchical mixture model specification, the MCMC analysis introduced in Section 3.6 has technical components as detailed here.

The use of an augmented model based on underlying, sample-specific, component indicator variables that induce the mixtures is, of course, key. We have already introduced the phenotypic marker mixture indicators zb,i, latent indicators that underlie the assignment of each data point bi to one of the j=1:J components in the mixture of equation (2). At the second stage, we can apply the same strategy to the mixture for the ti conditional on bi of equation (4) by utilizing indicators zt, i.

At each MCMC iterate, we resample subsets of Θ and the two sets of indicators Z={zb,i, zt,i, i=1:n} via the following sequence of sampling steps. In each conditional distribution the conditioning . . . represent the data and all other parameters and/or indicators; in some cases the need to be explicit about some of conditioning quantities is reflected in the use of the superscript for their values at the preview MCMC iterate.

7.3.1 Update component indicator variables

For each i=1:n in parallel due to conditional independence, update zb,i and zt, i by sampling from their conditional multinomial (number of trials=1 in each case) posteriors defined by probabilities as follows:

P(zb, i=j|…)απjN(bi|μb,j, Σb,j),  j=1:J;

P(zt,i=k|…)αωi,k(bi) N(ti|μt,k, Σt,k),  k=1:K.

As zb,i is conditionally independent of zt,i, sampling the multimer model indicators is done in parallel with the phenotypic marker indicators.

7.3.2 Update phenotypic marker model parameters

7.3.2.1 Update phenotypic marker mixture weights and hyperparameter

Sampling mixture probabilities π1:J and the hyperparameter of the DP model use a Metropolis-Hastings extension of the standard component distributions (Ishwaran and James, 2001; Ji et al., 2009), as follows. The mixture probabilities πj are obtained from underlying beta variates ν1:J1 as detailed in Appendix 7.1. Hence new π1:J samples are computed directly from resampled ν1:J1 samples. For the latter, we have

where for each j=1:J–1. The complications here are that, since the πj are functions of the νj, then νj is implicitly involved in both numerator and denominator terms of the product expression multiplying the base beta distributions. Hence we use a Metropolis-Hastings sampler for this step, based on a customized proposal distribution

with

This proposal distribution is an approximation of p(νj|…) by taking off the denominator f(bi|Θ)1 in the product expression and assuming that dominates the rest of the component values. Our experience with examples and the data analysis reported is that this generates acceptable convergence with acceptance rates for these components of the MCMC around 10–50%.

The weights π1:J are then evaluated by the formula in Appendix 7.1. Next, the hyperparameter αb is resampled from

7.3.2.2 Update phenotypic marker component means and variance matrices

For each j=1:J the mean μb,j has conditional posterior

with

where cj=λ/(1+λa1,j). Again we need a Metropolis-Hastings sampler for this step as the base normal distribution here is multiplied by a term that depends in complicated ways on μb,j. We use the customized proposal distribution where

with

A similar structure and MCMC strategy arises for each of the j=1:J variance matrices Σb,j; the conditional posteriors are

where

We use the customized proposal distribution

with

We update the pair (μb,j, Σb,j) together each iterate. We achieve acceptable convergence with acceptance rates for these components of the MCMC around 20–45%.

7.3.3. Update multimer model parameters

7.3.3.1 Update multimer mixture weights and hyperparameter

With the definitions and notation of the multimer mixture model parameters of Appendix 7.2, the logic and details of the MCMC steps are as follows.

For each k=1:K ϕk has conditional posterior

To choose a proposal distribution, first, for each i=1:n and independently over i, generate a set of auxiliary indicator variables qi from conditional multinomials on k=1:K cells with number of trials=1 and

(qi=k)∝ηkN(ti; μt,k, Σt,k),  k=1:K.

Given these sampled values, generate

where for each r=1:K. We achieve acceptable convergence with acceptance rates for these components of the MCMC around 10–40%.

The sets of weights ωj,k and the η1:K probabilities are then evaluated by the formulæ given in Appendix 7.2. Further, the hyper-parameter γt is resampled from

Next, for each j=1:J and k=1:K, the latent probabilities ϕj,k of Appendix 7.2 have conditional posterior

We use the customized proposal distribution

where We achieve acceptable convergence with acceptance rates for these components of the MCMC around 5–50%.

7.3.3.2 Update multimer component means and variance matrices

For each k=1:K the mean μt,k is sampled using an additional auxiliary random quantity that allocates the multimer to one of the K anchor regions based on current parameters and indicators. That is, for each k independently, draw an auxiliary indicator τk from the multinomial with one trial and probabilities on k=1:K given by

Then draw (μt,k|Ck=r, …)~N (μt,k|mt,k, Mt,k) where

with

Finally, resample the variance matrices from

Research reported here was partially supported by grants from the US National Science Foundation (DMS 1106516 of M.W.) and National Institutes of Health [P50-GM081883 of M.W., and RC1 AI086032 of C.C. & M.W., and the Danish Cancer Society (DP06031)]. Any opinions, findings and conclusions or recommendations expressed in this work are those of the authors and do not necessarily reflect the views of the NIH and/or NSF.

References

Andersen, R. S., P. Kvistborg, T. M. Frøsig, N. W. Pedersen, R. Lyngaa, A. H. Bakker, C. J. Shu, P. thor Straten, T. N. Schumacher and S. R. Hadrup (2012): “Parallel detection of antigen-specific t cell responses by combinatorial encoding of mhc multimers,” Nature Protocols, 7, 891–902.10.1038/nprot.2012.037Search in Google Scholar PubMed

Chan, C., F. Feng, M. West and T. B. Kepler (2008): “Statistical mixture modelling for cell subtype identification inflow cytometry,” Cytometry A, 73, 693–701.10.1002/cyto.a.20583Search in Google Scholar PubMed PubMed Central

Cron, A. J. and M. West (2011): “Efficient classification-based relabeling in mixture models,” Am. Stat., 65, 16–20.Search in Google Scholar

Escobar, M. D. and M. West (1995): “Bayesian density estimation and inference using mixtures,” J. Am. Stat. Assoc., 90, 577–588.Search in Google Scholar

Feyerabend S., S. Stevanovic, C. Gouttefangeas, D. Wernet, J. Hennenlotter, J. Bedke, K. Dietz, S. Pascolo, M. Kuczyk, H. G. Rammensee and A. Stenzl (2009): “Novel multi-peptide vaccination in hla-a2+ hormone sensitive patients with biochemical relapse of prostate cancer,” Prostate, 69, 917–927.10.1002/pros.20941Search in Google Scholar PubMed

Finak, G., A. Bashashati, R. Brinkman and R. Gottardo (2009): “Merging mixture components for cell population identification in flow cytometry,” Adv. Bioinform, Article ID 247646.Search in Google Scholar

Frelinger, J., J. Ottinger, C. Gouttefangeas and C. Chan (2010): “Modeling flow cytometry data for cancer vaccine immune monitoring,” Cancer Immunol. Immun., 59, 1435–1441.Search in Google Scholar

Hadrup, S. R., A. H. Bakker, C. J. Shu, R. S. Andersen, J. van Veluw, P. Hombrink, E. Castermans, P. thor Straten, C. Blank, J. B. Haanen, M. H. Heemskerk and T. N. Schumacher (2009): “Parallel detection of antigen-specificT-cell responses by multidimensional encoding of MHC multimers,” Nat. Methods, 6, 520–528.Search in Google Scholar

Hadrup, S. R. and T. N. Schumacher (2010): “MHC-based detection of antigen-specific CD8+ T cell responses,” Cancer Immunol. Immun., 59, 1425–1433.Search in Google Scholar

Ishwaran, H. and L. F. James (2001): “Gibbs sampling methods for stick-breaking priors,” J. Am. Stat. Assoc., 96, 161–173.Search in Google Scholar

Ji, C., D. Merl, T. B. Kepler and M. West (2009): “Spatial mixture modelling for unobserved point processes: application to immunofluorescence histology,” Bayesian Analysis, 4, 297–316.10.1214/09-BA411Search in Google Scholar PubMed PubMed Central

Lo, K., R. R. Brinkman and R. Gottardo (2008): “Automated gating of flow cytometry data via robust model-based clustering,” Cytometry A, 73, 321–332.10.1002/cyto.a.20531Search in Google Scholar PubMed

Manolopoulou, I., C. Chan and M. West (2010): “Selection sampling from large datasets for targeted inference in mixture modeling (with discusion),” Bayesian Analysis, 5, 429–450,PMC2943396.10.1214/10-BA517Search in Google Scholar

Newell, E. W., L. O. Klein, W. Yu and M. M. Davis (2009): “Simultaneous detection of many T-cell specificities using combinatorial tetramer staining,” Nat. Methods, 6, 497–499.Search in Google Scholar

Pyne, S., X. Hu, K. Wang, E. Rossin, T. Lin, L. M. Maier, C. Baecher-Allan, G. J. McLachlan, P. Tamayo, D. A. Hafler, P. L. DeJager and J. P. Mesirov (2009): “Automated high-dimensional flow cytometric data analysis,” Proc. Natl. Acad. Sci., 106, 8519.Search in Google Scholar

Suchard, M. A., Q. Wang, C. Chan, J. Frelinger, A. J. Cron and M. West (2010): “Understanding GPU programming for statistical computation: studies in massively parallel massive mixtures,” J. Comput. Graph. Stat., 19, 419–438.Search in Google Scholar

Teh, Y. W., M. I. Jordan, M. J. Beal and D. M. Blei (2006): “Hierarchical Dirichlet processes,” J. Am. Stat. Assoc., 101, 1566–1581.Search in Google Scholar

West, M., P. Müller and M. D. Escobar (1994): Hierarchical priors and mixture models, with application in regression and density estimation. In: Smith, A. F. M., Freeman P. R. (Eds.), Aspects of Uncertainty: A Tribute to D. V. Lindley. London: Wiley, pp. 363–386.Search in Google Scholar

Published Online: 2013-04-24
Published in Print: 2013-06-01

©2013 by Walter de Gruyter Berlin Boston

Downloaded on 12.3.2026 from https://www.degruyterbrill.com/document/doi/10.1515/sagmb-2012-0001/html
Scroll to top button