Abstract
DNA microarray experiments require the use of multiple hypothesis testing procedures because thousands of hypotheses are simultaneously tested. We deal with this problem from a Bayesian decision theory perspective. We propose a decision criterion based on an estimation of the number of false null hypotheses (FNH), taking as an error measure the proportion of the posterior expected number of false positives with respect to the estimated number of true null hypotheses. The methodology is applied to a Gaussian model when testing bilateral hypotheses. The procedure is illustrated with both simulated and real data examples and the results are compared to those obtained by the Bayes rule when an additive loss function is considered for each joint action and the generalized loss 0–1 function for each individual action. Our procedure significantly reduced the percentage of false negatives whereas the percentage of false positives remains at an acceptable level.
This work was supported by grants MTM 2008-03282/MTM and GR 58/08. The authors gratefully acknowledge the very constructive comments and suggestions of two anonymous referees who have contributed to improve the quality of the paper.
Appendix A
Proof of Proposition 1
First we show that for fixed values of the costs for false negatives, C0i, i=1, …, N, the posterior expected loss for the Bayes rule that rejects k hypotheses,
is a function of the cutoff pk or equivalently a function of k, the number of rejected null hypotheses.
In fact, setting pk=C0i/(C0i+C1i), we can write
as follows,
where
is the posterior Bayes action by which the kth null hypothesis will be rejected. Then, for fixed values of C0i, i=1, …, N, the posterior expected loss for the Bayes rule,
is a function of the cutoff pk or equivalently a function of k, the number of rejected null hypotheses.
Finally we show that
is a decreasing function on k.
Let k1 and k2 be such that k1<k2; then
and
and

Appendix B
In fact, if C0i=C for i=1, …, N, then the second term of (19) can be written as
where p=Pr(H0i=0|p) is the prior probability that each null hypothesis is true. Then, substituting (20) in (19), we immediately obtain (13).
Appendix C
Matlab code for simulated data
1. Codes of Matlab for function M–file for function to maximize:
1 function EB = EB (c,t,al,bet,a,b,m,N)
2 p= ;phi= ;mu=zeros(1,N);EB=0;burnin= ; iters= ;
3 for iter=1:burnin+iters
4 prob0=p*exp(-phi*(t.^2)/2);
5 prob1=prob0+(1-p)*exp(-phi*((t-mu).^2)/2);
6 uni=rand(1,N).*prob1;z=zeros(1,N);z=uni>prob0;
7 N1=sum(z);N0=N-N1;I1=find(z);
8 p=betarnd(al+N0,bet+N1);
9 bast=b+sum(t(z==0).^2)+sum((t(I1)-mu(I1)).^2)+sum(c*(mu-m).^2);
10 phi=gamrnd((a+2*N)/2,2/bast);
11 mu(I1)=normrnd((c*m(I1)+t(I1))./(c+1),1./sqrt((c+1)*phi));
23 mu(z==0)=normrnd(m(z==0),1./sqrt(c*phi));
13 if iter>burnin;EB=EB+prod((p*(phi^0.5)*exp(-phi*(t.^2)/2))+((1-14 p)*(phi^0.5)* exp(-phi*((t-mu).^2)/2)));end
15 end
16 EB=-EB/iters
2. Create an M–file to obtain the maximum of the function EB using the code “fminbnd”
3. Given the previously estimated value of c, next sentences estimate the rest of the parameters and measures used in the approach.
17 clear;seed=;rand(’state’,seed);randn(’state’,seed);
18 N=;n=;sig=;ptrue=;phitrue=n*(sig)^-2;mutrue=linspace( , ,N);
19 al=;bet=;a=;b=;m=zeros(1,N);c= *ones(1,N);
20 meds=mutrue;uni=rand(1,N);ztrue=uni>ptrue;meds(uni<ptrue)=0;
21 for i=1:n;x(i,:)=normrnd(meds,sig);end
22 t=mean(x);s=std(x);
23 p=; phi=; mu=zeros(1,N);
24 Ez=zeros(1,N);Emu1=zeros(1,N);Ep=0;Ephi=0;
25 fid = fopen(’pphi.txt’,’wt’);fid2 = fopen(’tzmu.txt’,’wt’);
26 fid3 = fopen(’mpphi.txt’,’wt’);burnin=;iters=;
27 for iter=1:burnin+iters
28 prob0=p*exp(-phi*(t.^2)/2);
29 prob1=prob0+(1-p)*exp(-phi*((t-mu).^2)/2);
30 uni=rand(1,N).*prob1;z=zeros(1,N);z=uni>prob0;
31 if iter>burnin;Ez=Ez+z;end
32 N1=sum(z);N0=N-N1;I1=find(z);p=betarnd(al+N0,bet+N1);
33 if iter>burnin;Ep=Ep+p;mp=Ep/(iter-burnin);end
34 bast=b+sum(t(z==0).^2)+sum((t(I1)-mu(I1)).^2)+sum(c.*(mu-m).^2);
35 phi=gamrnd((a+2*N)/2,2/bast);
36 if iter>burnin;Ephi=Ephi+phi;mphi=Ephi/(iter-burnin);end
37 if iter>burnin;fprintf(fid3,’%6.4f %6.4f/n’,[mp; mphi]);end
38 mu(I1)=normrnd((c(I1).*m(I1)+t(I1))./(c(I1)+1),1./sqrt((c(I1)+1)*phi));
39 mu(z==0)=normrnd(m(z==0),1./sqrt(c(z==0)*phi));
40 if iter>burnin;Emu1(I1)=Emu1(I1)+mu(I1);end
41 if iter>burnin;fprintf(fid,’%6.4f %6.4f/n’,[p; phi]);end
42 end
43 Emu1=Emu1./Ez;Ez=Ez/iters;load pphi.txt;load mpphi.txt
44 pest=[ 1-sum(ztrue)/N ptrue mean(pphi(:,1))]
45 phiest=[phitrue mean(pphi(:,2))]
46 zest=[ztrue;Ez]’;muest=[meds;Emu1]’;tzmu=[t; ztrue;Ez;meds;Emu1]’;
47 fprintf(fid2,’%6.4f %6.4f %6.4f %6.4f %6.4f/n’,[t;ztrue;Ez;meds;Emu1]);
48 fclose(’all’);
49 subplot(2,2,1),plot(pphi(:,1)),title(’p’)
50 subplot(2,2,2),plot(pphi(:,2)),title(’phi’)
51 subplot(2,2,3),plot(mpphi(:,1)),title(’mp’)
52 subplot(2,2,4),plot(mpphi(:,2)),title(’mphi’)
53 RB=(sum(Ez>.5)*100)/N
54 R1=round(N*(1-mean(pphi(:,1))));
55 N1=(R1*100)/N
56 EzO=sort(Ez,’descend’);pN1=1-EzO(R1)
57 V=cumsum(1-EzO);F=cumsum(EzO);T=sum(Ez)-F;FPr=V./V(N);
58 FNr=T./sum(Ez);R=(1:1:N);I=V./R;II=T./(N-R);
59 for i=1:N-1;EI(i)=I(i);EII(i)=II(i);end
60 rFDR=[EI,V(N)/N];rFNR=[EII,0];
61 FPrB=FPr((RB*N)/100)
62 FNrB=FNr((RB*N)/100)
63 rFDRB=rFDR((RB*N)/100)
64 rFNRB=rFNR((RB*N)/100)
65 FPrFNH=FPr(R1)
66 FNrFNH= FNr(R1)
67 rFDRFNH=rFDR(R1)
68 rFNRFNH=rFNR(R1)
For real data change the following lines
line 17 by: clear;x = xlsread(’File name.xls’); %This file must contain a column with mean differences
line 18 by: N= ;
lines 20, 21 and 22 by: t=x’;
line 44 by: pest=mean(pphi(:,1))
line 45 by: phiest=mean(pphi(:,2))
line 46 by: tzmu=[t;Ez;Emu1]’;
line 47 by: fprintf(fid2,’%6.4f %6.4f %6.4f/n’,[t;Ez;Emu1]);
References
Alon, U., N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack and A. J. Levine (1999): “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays,” Proc. Natl. Acad. Sci. USA, 96, 6745–6750.10.1073/pnas.96.12.6745Search in Google Scholar PubMed PubMed Central
Ausín, M. C., M. A. Gómez-Villegas, B. González-Pérez, M. T. Rodríquez-Bernal, I. Salazar and L. Sanz (2011): “Bayesian analysis of multiple hypothesis testing with applications to microarray experiments,” Commun Stat-Theor M, 40, 2276–2291.10.1080/03610921003778183Search in Google Scholar
Baldi, P. and A. D. Long (2001): “A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes,” Bioinformatics, 17, 509–519.10.1093/bioinformatics/17.6.509Search in Google Scholar PubMed
Benjamini, Y. and Y. Hochberg (1995): “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” J. Roy. Stat. Soc. Ser. B, 57, 289–300.Search in Google Scholar
Berger, J. (1985): Statistical decision theory and Bayesian analysis, 2nd ed. New York: Springer-Verlag.10.1007/978-1-4757-4286-2Search in Google Scholar
Cabras, S. (2010): “A note on multiple testing for composite null hypotheses,” J. Stat. Plan. Infer., 140, 659–666.Search in Google Scholar
Chen, J. and S. K. Sarkar (2004): “Multiple testing of response rates with a control: a Bayesian stepwise approach,” J. Stat. Plan. Infer., 125, 3–16.Search in Google Scholar
De la Horra, J. (2007): “Bayesian robustness of the positive false discovery rate,” Commun Stat-Theor M, 36, 1905–1914.10.1080/03610920601126563Search in Google Scholar
Do, K.-A., P. Müller and F. Tang (2005): “A Bayesian mixture model for differential gene expression,” J. R. Stat. Soc. Ser. C, 54, 627–644.Search in Google Scholar
Dudoit, S., J. P. Shaffer, and J. C. Boldrick (2003): “Multiple hypothesis testing in microarray experiments,” Stat. Sci., 18, 71–103.Search in Google Scholar
Duncan, D. B. (1961): “Bayes rules for a common multiple comparisons problem and related Student-t problems,” Ann. Math. Stat., 32, 1013–1033.Search in Google Scholar
Duncan, D. B. (1965): “A Bayesian approach to multiple comparisons,” Technometrics, 7, 171–222.10.1080/00401706.1965.10490249Search in Google Scholar
Genovese, C. and L. Wasserman (2002): “Operating characteristics and extensions of the false discovery rate procedure,” J. Roy. Stat. Soc. Ser. B, 64, 499–517.Search in Google Scholar
Genovese, C. and L. Wasserman (2003): Bayesian and frequentist multiple testing. In Bernardo, J. M., Bayarri, M., Berger, J. O., Dawid, A. P., Heckerman, D., Smith, A. F. M., West, M. (Eds.), Bayesian Statistics 7. Oxford, UK: Oxford University Press, pp. 145–162.Search in Google Scholar
Hochberg, Y. and A. C. Tamhane (1987): Multiple Comparison Procedures. New York: John Wiley.10.1002/9780470316672Search in Google Scholar
Kendziorski, C., M. Newton, H. Lan, M. N. y Gould (2003): “On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles,” Stat. Med., 22, 3899–3914.Search in Google Scholar
Lehmann, E. L. (1957a): “A theory of some multiple decision problems, I,” Ann. Math. Stat., 28, 1–25.10.1214/aoms/1177707034Search in Google Scholar
Lehmann, E. L. (1957b): “A theory of some multiple decision problems, II,” Ann. Math. Stat., 28, 547–572.10.1214/aoms/1177706873Search in Google Scholar
Lewis, C. and D. T. Thayer (2004): “A loss function related to the FDR for random effects multiple comparisons,” J. Stat. Plan. Infer., 125, 49–58.Search in Google Scholar
Lönnstedt, I. and T. Britton (2005): “Hierarchical Bayes models for cDNA microarray gene expression,” Biostatistics, 6, 279–291.10.1093/biostatistics/kxi009Search in Google Scholar PubMed
Lönnstedt, I. and T. Speed (2002): “Replicated microarray data,” Stat. Sinica, 12, 31–46.Search in Google Scholar
Müller, P., G. Parmigiani, C. Robert and J. Rousseau (2004): “Optimal sample size for multiple testing: the case of gene expression microarrays,” J. Am. Stat. Assoc., 468:990–1001.Search in Google Scholar
Scott, J. G. and J. O. Berger (2006): “An exploration of aspects of Bayesian multiple testing,” J. Stat. Plan. Infer., 136, 2144–2162.Search in Google Scholar
Shaffer, J. P. (1999): “A semi-Bayesian study of Duncan’s Bayesian multiple comparison procedure,” J. Stat. Plan. Infer., 82, 197–213.Search in Google Scholar
Spjøtvoll E. (1972): “On the optimality of some multiple comparison procedures,” Ann. Math. Stat., 43, 398–411.Search in Google Scholar
Storey, J. D. (2003): “The positive false discovery rate: a Bayesian interpretation and the q-value,” Ann. Stat., 31, 2013–2035.Search in Google Scholar
Storey, J. D., J. Y. Day and J. T. Leek (2007): “The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments,” Biostatistics, 8, 414–432.10.1093/biostatistics/kxl019Search in Google Scholar PubMed
Sun, W. and T. T. Cai (2007): “Oracle and adaptive compound decision rules for false discovery rate control,” J. Am. Stat. Assoc., 102, 901–912.Search in Google Scholar
Sun, W., and A. C. McLain (2012): “Multiple testing of composite null hypotheses in heteroscedastic models,” J. Am. Stat. Assoc., 107, 673–687.Search in Google Scholar
©2014 by Walter de Gruyter Berlin Boston
Articles in the same Issue
- Masthead
- Masthead
- Research Articles
- Modeling angles in proteins and circular genomes using multivariate angular distributions based on multiple nonnegative trigonometric sums
- Second order optimization for the inference of gene regulatory pathways
- Multiple comparisons in genetic association studies: a hierarchical modeling approach
- A Bayesian decision procedure for testing multiple hypotheses in DNA microarray experiments
- Semi-automatic selection of summary statistics for ABC model choice
- Detection of epistatic effects with logic regression and a classical linear regression model
- Bayesian clustering of DNA sequences using Markov chains and a stochastic partition model
Articles in the same Issue
- Masthead
- Masthead
- Research Articles
- Modeling angles in proteins and circular genomes using multivariate angular distributions based on multiple nonnegative trigonometric sums
- Second order optimization for the inference of gene regulatory pathways
- Multiple comparisons in genetic association studies: a hierarchical modeling approach
- A Bayesian decision procedure for testing multiple hypotheses in DNA microarray experiments
- Semi-automatic selection of summary statistics for ABC model choice
- Detection of epistatic effects with logic regression and a classical linear regression model
- Bayesian clustering of DNA sequences using Markov chains and a stochastic partition model