A Bayesian decision procedure for testing multiple hypotheses in DNA microarray experiments

Miguel A. Gómez-Villegas; Isabel Salazar; Luis Sanz

doi:10.1515/sagmb-2012-0076

Article

A Bayesian decision procedure for testing multiple hypotheses in DNA microarray experiments

Miguel A. Gómez-Villegas , Isabel Salazar and Luis Sanz

Published/Copyright: December 6, 2013

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Statistical Applications in Genetics and Molecular Biology Volume 13 Issue 1

Abstract

DNA microarray experiments require the use of multiple hypothesis testing procedures because thousands of hypotheses are simultaneously tested. We deal with this problem from a Bayesian decision theory perspective. We propose a decision criterion based on an estimation of the number of false null hypotheses (FNH), taking as an error measure the proportion of the posterior expected number of false positives with respect to the estimated number of true null hypotheses. The methodology is applied to a Gaussian model when testing bilateral hypotheses. The procedure is illustrated with both simulated and real data examples and the results are compared to those obtained by the Bayes rule when an additive loss function is considered for each joint action and the generalized loss 0–1 function for each individual action. Our procedure significantly reduced the percentage of false negatives whereas the percentage of false positives remains at an acceptable level.

Keywords: Bayes rule; Bayesian decision; multiple hypothesis; posterior expected loss; false positives; false negatives

Corresponding author: Miguel A. Gómez-Villegas, Dpto. de Estadística e I.O., Facultad de Ciencias Matemáticas, Plaza de las Ciencias, 3, Universidad Complutense de Madrid, 28040–Madrid, Spain, e-mail: villegas@ucm.es

This work was supported by grants MTM 2008-03282/MTM and GR 58/08. The authors gratefully acknowledge the very constructive comments and suggestions of two anonymous referees who have contributed to improve the quality of the paper.

Appendix A

Proof of Proposition 1

First we show that for fixed values of the costs for false negatives, C₀_i, i=1, …, N, the posterior expected loss for the Bayes rule that rejects k hypotheses, is a function of the cutoff p_k or equivalently a function of k, the number of rejected null hypotheses.

In fact, setting p_k=C₀_i/(C₀_i+C₁_i), we can write as follows,

where is the posterior Bayes action by which the kth null hypothesis will be rejected. Then, for fixed values of C₀_i, i=1, …, N, the posterior expected loss for the Bayes rule, is a function of the cutoff p_k or equivalently a function of k, the number of rejected null hypotheses.

Finally we show that is a decreasing function on k.

Let k₁ and k₂ be such that k₁<k₂; then and and

Appendix B

In fact, if C₀_i=C for i=1, …, N, then the second term of (19) can be written as

where p=Pr(H₀_i=0|p) is the prior probability that each null hypothesis is true. Then, substituting (20) in (19), we immediately obtain (13).

Appendix C

Matlab code for simulated data

1. Codes of Matlab for function M–file for function to maximize:

1 function EB = EB (c,t,al,bet,a,b,m,N)

2 p= ;phi= ;mu=zeros(1,N);EB=0;burnin= ; iters= ;

3 for iter=1:burnin+iters

4 prob0=p*exp(-phi*(t.^2)/2);

5 prob1=prob0+(1-p)*exp(-phi*((t-mu).^2)/2);

6 uni=rand(1,N).*prob1;z=zeros(1,N);z=uni>prob0;

7 N1=sum(z);N0=N-N1;I1=find(z);

8 p=betarnd(al+N0,bet+N1);

9 bast=b+sum(t(z==0).^2)+sum((t(I1)-mu(I1)).^2)+sum(c*(mu-m).^2);

10 phi=gamrnd((a+2*N)/2,2/bast);

11 mu(I1)=normrnd((c*m(I1)+t(I1))./(c+1),1./sqrt((c+1)*phi));

23 mu(z==0)=normrnd(m(z==0),1./sqrt(c*phi));

13 if iter>burnin;EB=EB+prod((p*(phi^0.5)*exp(-phi*(t.^2)/2))+((1-14 p)*(phi^0.5)* exp(-phi*((t-mu).^2)/2)));end

15 end

16 EB=-EB/iters

2. Create an M–file to obtain the maximum of the function EB using the code “fminbnd”

3. Given the previously estimated value of c, next sentences estimate the rest of the parameters and measures used in the approach.

17 clear;seed=;rand(’state’,seed);randn(’state’,seed);

18 N=;n=;sig=;ptrue=;phitrue=n*(sig)^-2;mutrue=linspace( , ,N);

19 al=;bet=;a=;b=;m=zeros(1,N);c= *ones(1,N);

20 meds=mutrue;uni=rand(1,N);ztrue=uni>ptrue;meds(uni<ptrue)=0;

21 for i=1:n;x(i,:)=normrnd(meds,sig);end

22 t=mean(x);s=std(x);

23 p=; phi=; mu=zeros(1,N);

24 Ez=zeros(1,N);Emu1=zeros(1,N);Ep=0;Ephi=0;

25 fid = fopen(’pphi.txt’,’wt’);fid2 = fopen(’tzmu.txt’,’wt’);

26 fid3 = fopen(’mpphi.txt’,’wt’);burnin=;iters=;

27 for iter=1:burnin+iters

28 prob0=p*exp(-phi*(t.^2)/2);

29 prob1=prob0+(1-p)*exp(-phi*((t-mu).^2)/2);

30 uni=rand(1,N).*prob1;z=zeros(1,N);z=uni>prob0;

31 if iter>burnin;Ez=Ez+z;end

32 N1=sum(z);N0=N-N1;I1=find(z);p=betarnd(al+N0,bet+N1);

33 if iter>burnin;Ep=Ep+p;mp=Ep/(iter-burnin);end

34 bast=b+sum(t(z==0).^2)+sum((t(I1)-mu(I1)).^2)+sum(c.*(mu-m).^2);

35 phi=gamrnd((a+2*N)/2,2/bast);

36 if iter>burnin;Ephi=Ephi+phi;mphi=Ephi/(iter-burnin);end

37 if iter>burnin;fprintf(fid3,’%6.4f %6.4f/n’,[mp; mphi]);end

38 mu(I1)=normrnd((c(I1).*m(I1)+t(I1))./(c(I1)+1),1./sqrt((c(I1)+1)*phi));

39 mu(z==0)=normrnd(m(z==0),1./sqrt(c(z==0)*phi));

40 if iter>burnin;Emu1(I1)=Emu1(I1)+mu(I1);end

41 if iter>burnin;fprintf(fid,’%6.4f %6.4f/n’,[p; phi]);end

42 end

43 Emu1=Emu1./Ez;Ez=Ez/iters;load pphi.txt;load mpphi.txt

44 pest=[ 1-sum(ztrue)/N ptrue mean(pphi(:,1))]

45 phiest=[phitrue mean(pphi(:,2))]

46 zest=[ztrue;Ez]’;muest=[meds;Emu1]’;tzmu=[t; ztrue;Ez;meds;Emu1]’;

47 fprintf(fid2,’%6.4f %6.4f %6.4f %6.4f %6.4f/n’,[t;ztrue;Ez;meds;Emu1]);

48 fclose(’all’);

49 subplot(2,2,1),plot(pphi(:,1)),title(’p’)

50 subplot(2,2,2),plot(pphi(:,2)),title(’phi’)

51 subplot(2,2,3),plot(mpphi(:,1)),title(’mp’)

52 subplot(2,2,4),plot(mpphi(:,2)),title(’mphi’)

53 RB=(sum(Ez>.5)*100)/N

54 R1=round(N*(1-mean(pphi(:,1))));

55 N1=(R1*100)/N

56 EzO=sort(Ez,’descend’);pN1=1-EzO(R1)

57 V=cumsum(1-EzO);F=cumsum(EzO);T=sum(Ez)-F;FPr=V./V(N);

58 FNr=T./sum(Ez);R=(1:1:N);I=V./R;II=T./(N-R);

59 for i=1:N-1;EI(i)=I(i);EII(i)=II(i);end

60 rFDR=[EI,V(N)/N];rFNR=[EII,0];

61 FPrB=FPr((RB*N)/100)

62 FNrB=FNr((RB*N)/100)

63 rFDRB=rFDR((RB*N)/100)

64 rFNRB=rFNR((RB*N)/100)

65 FPrFNH=FPr(R1)

66 FNrFNH= FNr(R1)

67 rFDRFNH=rFDR(R1)

68 rFNRFNH=rFNR(R1)

For real data change the following lines
line 17 by: clear;x = xlsread(’File name.xls’); %This file must contain a column with mean differences
line 18 by: N= ;
lines 20, 21 and 22 by: t=x’;
line 44 by: pest=mean(pphi(:,1))
line 45 by: phiest=mean(pphi(:,2))
line 46 by: tzmu=[t;Ez;Emu1]’;
line 47 by: fprintf(fid2,’%6.4f %6.4f %6.4f/n’,[t;Ez;Emu1]);

References

Alon, U., N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack and A. J. Levine (1999): “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays,” Proc. Natl. Acad. Sci. USA, 96, 6745–6750.10.1073/pnas.96.12.6745Search in Google Scholar PubMed PubMed Central

Ausín, M. C., M. A. Gómez-Villegas, B. González-Pérez, M. T. Rodríquez-Bernal, I. Salazar and L. Sanz (2011): “Bayesian analysis of multiple hypothesis testing with applications to microarray experiments,” Commun Stat-Theor M, 40, 2276–2291.10.1080/03610921003778183Search in Google Scholar

Baldi, P. and A. D. Long (2001): “A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes,” Bioinformatics, 17, 509–519.10.1093/bioinformatics/17.6.509Search in Google Scholar PubMed

Benjamini, Y. and Y. Hochberg (1995): “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” J. Roy. Stat. Soc. Ser. B, 57, 289–300.Search in Google Scholar

Berger, J. (1985): Statistical decision theory and Bayesian analysis, 2nd ed. New York: Springer-Verlag.10.1007/978-1-4757-4286-2Search in Google Scholar

Cabras, S. (2010): “A note on multiple testing for composite null hypotheses,” J. Stat. Plan. Infer., 140, 659–666.Search in Google Scholar

Chen, J. and S. K. Sarkar (2004): “Multiple testing of response rates with a control: a Bayesian stepwise approach,” J. Stat. Plan. Infer., 125, 3–16.Search in Google Scholar

De la Horra, J. (2007): “Bayesian robustness of the positive false discovery rate,” Commun Stat-Theor M, 36, 1905–1914.10.1080/03610920601126563Search in Google Scholar

Do, K.-A., P. Müller and F. Tang (2005): “A Bayesian mixture model for differential gene expression,” J. R. Stat. Soc. Ser. C, 54, 627–644.Search in Google Scholar

Dudoit, S., J. P. Shaffer, and J. C. Boldrick (2003): “Multiple hypothesis testing in microarray experiments,” Stat. Sci., 18, 71–103.Search in Google Scholar

Duncan, D. B. (1961): “Bayes rules for a common multiple comparisons problem and related Student-t problems,” Ann. Math. Stat., 32, 1013–1033.Search in Google Scholar

Duncan, D. B. (1965): “A Bayesian approach to multiple comparisons,” Technometrics, 7, 171–222.10.1080/00401706.1965.10490249Search in Google Scholar

Genovese, C. and L. Wasserman (2002): “Operating characteristics and extensions of the false discovery rate procedure,” J. Roy. Stat. Soc. Ser. B, 64, 499–517.Search in Google Scholar

Genovese, C. and L. Wasserman (2003): Bayesian and frequentist multiple testing. In Bernardo, J. M., Bayarri, M., Berger, J. O., Dawid, A. P., Heckerman, D., Smith, A. F. M., West, M. (Eds.), Bayesian Statistics 7. Oxford, UK: Oxford University Press, pp. 145–162.Search in Google Scholar

Hochberg, Y. and A. C. Tamhane (1987): Multiple Comparison Procedures. New York: John Wiley.10.1002/9780470316672Search in Google Scholar

Kendziorski, C., M. Newton, H. Lan, M. N. y Gould (2003): “On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles,” Stat. Med., 22, 3899–3914.Search in Google Scholar

Lehmann, E. L. (1957a): “A theory of some multiple decision problems, I,” Ann. Math. Stat., 28, 1–25.10.1214/aoms/1177707034Search in Google Scholar

Lehmann, E. L. (1957b): “A theory of some multiple decision problems, II,” Ann. Math. Stat., 28, 547–572.10.1214/aoms/1177706873Search in Google Scholar

Lewis, C. and D. T. Thayer (2004): “A loss function related to the FDR for random effects multiple comparisons,” J. Stat. Plan. Infer., 125, 49–58.Search in Google Scholar

Lönnstedt, I. and T. Britton (2005): “Hierarchical Bayes models for cDNA microarray gene expression,” Biostatistics, 6, 279–291.10.1093/biostatistics/kxi009Search in Google Scholar PubMed

Lönnstedt, I. and T. Speed (2002): “Replicated microarray data,” Stat. Sinica, 12, 31–46.Search in Google Scholar

Müller, P., G. Parmigiani, C. Robert and J. Rousseau (2004): “Optimal sample size for multiple testing: the case of gene expression microarrays,” J. Am. Stat. Assoc., 468:990–1001.Search in Google Scholar

Scott, J. G. and J. O. Berger (2006): “An exploration of aspects of Bayesian multiple testing,” J. Stat. Plan. Infer., 136, 2144–2162.Search in Google Scholar

Shaffer, J. P. (1999): “A semi-Bayesian study of Duncan’s Bayesian multiple comparison procedure,” J. Stat. Plan. Infer., 82, 197–213.Search in Google Scholar

Spjøtvoll E. (1972): “On the optimality of some multiple comparison procedures,” Ann. Math. Stat., 43, 398–411.Search in Google Scholar

Storey, J. D. (2003): “The positive false discovery rate: a Bayesian interpretation and the q-value,” Ann. Stat., 31, 2013–2035.Search in Google Scholar

Storey, J. D., J. Y. Day and J. T. Leek (2007): “The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments,” Biostatistics, 8, 414–432.10.1093/biostatistics/kxl019Search in Google Scholar PubMed

Sun, W. and T. T. Cai (2007): “Oracle and adaptive compound decision rules for false discovery rate control,” J. Am. Stat. Assoc., 102, 901–912.Search in Google Scholar

Sun, W., and A. C. McLain (2012): “Multiple testing of composite null hypotheses in heteroscedastic models,” J. Am. Stat. Assoc., 107, 673–687.Search in Google Scholar

Published Online: 2013-12-06

Published in Print: 2014-02-01

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/sagmb-2012-0076

Keywords for this article

Bayes rule; Bayesian decision; multiple hypothesis; posterior expected loss; false positives; false negatives