A comparison of some confidence intervals for a binomial proportion based on a shrinkage estimator

Félix Almendra-Arao; Hortensia Reyes-Cervantes; Marcos Morales-Cortés

doi:10.1515/math-2022-0588

Article Open Access

A comparison of some confidence intervals for a binomial proportion based on a shrinkage estimator

Félix Almendra-Arao , Hortensia Reyes-Cervantes and Marcos Morales-Cortés

Published/Copyright: June 6, 2023

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

$Open Mathematics$

From the journal Open Mathematics Volume 21 Issue 1

Abstract

Confidence intervals are valuable tools in statistical practice for estimating binomial proportions, with the most well-known being the Wald and Clopper-Pearson intervals. However, it is known that these intervals perform poorly in terms of coverage probability and expected mean length, leading to the proposal of alternative intervals in the literature, although these may also have deficiencies. In this work, we investigate the performance of several of these confidence intervals using the parametric family p ^ c = X + c n + 2 c with c ≥ 0 to estimate the parameter p . Rather than using the confidence intervals approach, this analysis is done from the hypothesis tests approach. Our primary goal with this work is to identify values of c that result in better-performing tests and to establish an optimal procedure.

Keywords: confidence interval; binomial proportion; coverage probability; shrinkage estimator

MSC 2010: 62F25

1 Introduction

When it comes to estimating an unknown population parameter, there are two options: point estimation and interval estimation. Point estimation involves using sample data to determine a specific value for estimating the population parameter. In contrast, interval estimation provides a range of values that is likely to contain the population parameter, along with information on how likely it is that the parameter is within this range. This likelihood is expressed as the confidence level or confidence coefficient, typically denoted by 1 − α . It is worth noting that a confidence interval is a random interval.

A binomial experiment is an experiment fulfilling the following properties:

The experiment consists of n repeated trials;
Each trial results in one of the two possible outcomes, which are called success and failure;
The probability of a success, denoted by p , remains constant from trial to trial;
The n trials are independent. That is, the outcome of any trial does not affect the outcome of the others.

Let X be the number of successes in n trials of a binomial experiment, and then X is a binomial random variable.

The probability mass function of X is given by b ( X ; n , p ) = P ( X = x ) = n x p x q n − x and is called binomial distribution.

The binomial distribution is widely applicable across almost all scientific fields. It is therefore essential to understand appropriate procedures for estimating the parameter p .

In this work, our focus is on computing confidence intervals for the binomial proportion p . Several methods exist for computing these confidence intervals, which have resulted in various types of confidence intervals.

One prominent method is the Clopper-Pearson method, which is an exact method based on the cumulative probabilities of the binomial distribution, see [1], for example. However, it is widely recognized that this method is overly conservative, as reported in [2–4]. For this reason, we will not be investigating this method in our work.

There are several methods for estimating the binomial proportion using approximate distributions, known as approximate methods. The most commonly used approximate methods are Wald, Wilson score, Agresti-Coull, and arcsine methods. The Agresti-Coull method is a special case of the Wald method when c = 2 , so it is also included in our analysis.

One important interval is the Jeffrey’s confidence interval, which is constructed from the Bayesian perspective. However, due to its Bayesian approach, we will not cover it in this work.

The more popular of these intervals is the Wald interval; however, several investigations, including [2,3,5–9], have shown that this interval has serious problems when p is near the extremes of the interval [ 0 , 1 ] or when n is small. Other interesting investigations concerning confidence intervals for a proportion can be found in [10–17].

Interested readers can refer to [18] to have an overview of the great variety of works aimed at the study of confidence intervals for a binomial proportion.

While some recommended confidence intervals may approximate the nominal coverage probability for certain ranges of the p parameter and sample size, the search for an optimal confidence interval remains an ongoing challenge in the field.

An interval estimate of a real-valued parameter θ is any pair of functions L ( x 1 , … , x n ) and U ( x 1 , … , x n ) of a sample that satisfy L ( x ) ≤ U ( x ) for all x ∈ Ω . The random interval [ L ( X ) , U ( X ) ] is called an interval estimator. For an interval estimator [ L ( X ) , U ( X ) ] of a parameter θ , the coverage probability of [ L ( X ) , U ( X ) ] is the probability that the random interval [ L ( X ) , U ( X ) ] contains the parameter θ . This can be expresses as P θ ( θ ∈ [ L ( X ) , U ( X ) ] ) or P ( θ ∈ [ L ( X ) , U ( X ) ] ∣ θ ) . The infimum inf θ P θ ( θ ∈ [ L ( X ) , U ( X ) ] ) is called the confidence coefficient of [ L ( X ) , U ( X ) ] . Interval estimators together with a measure of confidence are known as confidence intervals, and in general, it is possible to talk of confidence sets; a confidence set with coefficient 1 − α is called a confidence set 1 − α , see [19].

Moreover, it is known that there exists a strong relationship between hypothesis testing and interval estimation. Specifically, every confidence set corresponds to a test and vice versa, and every test corresponds to a confidence set.

The strategy for constructing the confidence set that corresponds to a test statistic involves inverting the test statistic, as detailed in [19].

Relying on this relationship, we will not compare confidence intervals directly but instead will compare the corresponding test statistics.

To estimate the binomial proportion p , [20] proposed using the shrinkage estimator C + c 1 n + c 2 , where c 1 and c 2 are non-negative real numbers. The authors concluded that c 2 must necessarily be equal to 2 c 1 .

The objective of this study is to investigate the effectiveness of various confidence intervals for a binomial proportion p . Rather than utilizing the direct confidence approach, we will approach this study from the equivalent perspective of hypotheses tests.

In additional, in the corresponding statistical tests, we will use the parametric family p ^ c = X + c n + 2 c with c ≥ 0 to estimate p .

2 Statistics under consideration

Let X ∼ Bin ( n , p ) , that is, a random binomial variable with parameters n and p , and p ^ = X n , q ^ = 1 − p ^ .

Several statistics can be used to contrast H 0 : p = p 0 vs H 1 : p ≠ p 0 ; with p 0 ∈ ( 0 , 1 ) .

In the following, we present the statistics that we are interested in comparing.

2.1 Wald statistic

T W = p ^ − p 0 p ^ q ^ / n .

The corresponding confidence interval is expressed as follows:

I W = [ p ^ − z α / 2 p ^ ( 1 − p ^ ) / n , p ^ + z α / 2 p ^ ( 1 − p ^ ) / n ] .

2.2 Wilson score statistic

T S = p ^ − p 0 p 0 ( 1 − p 0 ) / n .

The corresponding confidence interval is expressed as follows:

I S = 1 n + z α / 2 2 X + z α / 2 2 2 − z α / 2 n 1 / 2 p ^ q ^ + z α / 2 2 4 n 1 / 2 , 1 n + z α / 2 2 X + z α / 2 2 2 + z α / 2 n 1 / 2 p ^ q ^ + z α / 2 2 4 n 1 / 2 .

2.3 Arcsine statistic

T A = 2 n ( sin − 1 p ^ − sin − 1 p 0 ) .

The confidence interval is expressed as follows:

I A = sin 2 sin − 1 p ^ − z α / 2 2 n , sin 2 sin − 1 p ^ + z α / 2 2 n .

2.4 Confidence intervals to compare

Due to the fact that the classic and simple Wald procedure, as with many other alternative procedures, fails to operate effectively, several authors have endeavored to enhance it while retaining its simplicity. For instance, references such as [2,17,20] provide examples of these efforts.

Several improvements to the Wald procedure have been proposed, including correction for continuity and the use of the procedure only when certain criteria are satisfied. However, research has shown that some of the most commonly suggested improvements do not offer adequate performance and continue to exhibit unfavorable behavior (see [3,8]).

In the test statistics for the Wald, Wilson score, and arcsine procedures, we will substitute X , n , p ˆ , and 1 − p ˆ by X + c , n + 2 c , p ^ c , and 1 − p ^ c , respectively, where p ^ c = X + c n + 2 c with c ≥ 0 .

On the basis of these modifications, we construct the adjusted procedures; the confidence intervals will be denoted by I W c , I S c , and I A c ; and the corresponding statistical tests will be denoted as T W c , T S c , and T A c .

In addition, in [20], the authors suggest using the statistic

T B c = ( p ^ c − p 0 ) / n p ^ c ( 1 − p ^ c ) n + 2 c .

The corresponding confidence interval is expressed as follows:

I B c = p ^ c − z α / 2 n p ^ c ( 1 − p ^ c ) n + 2 c , p ^ c + z α / 2 n p ^ c ( 1 − p ^ c ) n + 2 c .

In this work, we will compare the adjusted confidence intervals I W c , I S c , I B c , and I A c for a wide range of c values. Rather than using the confidence intervals approach, we will employ the hypothesis tests approach. Specifically, we will compare the statistical tests T W c , T S c , T B c , and T A c by evaluating the test sizes and power of their corresponding test statistics.

The rejection region of a test statistic T is given by

R T = { x : ∣ z T ∣ > z α / 2 } .

The no rejection region is given by

A T = { x : ∣ z T ∣ ≤ z α / 2 } ,

which can be described as follows:

A T = { ⌈ x 1 ⌉ , … , ⌊ x 2 ⌋ } ,

where x 1 and x 2 depend on the test statistic under consideration, of n , p 0 , α , and c ; moreover, the functions ⌈ ⌉ and ⌊ ⌋ are the ceiling and floor functions, respectively. Note that the number of elements of R T ( Card ( R T ) ) , is given by Card ( R T ) = n + ⌈ x 1 ⌉ − ⌊ x 2 ⌋ .

The power function is given by

β T ( n , p ) = ∑ x ∈ R T n x p x ( 1 − p ) n − x .

To analyze the global performance of the power for a statistic test T , we will compute the mean power on [ 0 , 1 ]

M P T ( n , p 0 ) = ∫ 0 1 β T ( n , p ) d p .

Proposition 1

M P T ( n , p 0 ) = n + ⌈ x 1 ⌉ − ⌊ x 2 ⌋ n + 1 .

The proof of this proposition is provided in the Appendix.

The test size is given by

T S T ( n , p 0 ) = ∑ x ∈ R T n x p 0 x ( 1 − p 0 ) n − x .

Regarding test sizes, there are two possible criteria for determining desirable test behavior:

The test sizes approach the nominal level ( α ) but always through values less than or equal to α (conservative test).
The test sizes approach the nominal level regardless of whether it is through values greater or less than α .

In this study, we will take the stance that for a test, it is desirable to have a mean power close to 1 and for the test size to approach the nominal level, whether it is achieved through values greater or less than α .

3 Obtaining values of c to optimize power and test size

For each of the tests being considered, we will determine the value of c that yields the best performance in terms of test sizes and mean power.

For the analysis, we consider values of c in the interval [ 0 , 4 ] of the form i 100 with i ∈ { 0 , 1 , 2 , … , 398 , 399 , 400 } .

Let D 3 = { 0.001 , 0.002 , … , 0.998 , 0.999 } , we are considering the following three cases for p 0 :

Case 1. Central values of p 0 , p 0 ∈ D 3 ∩ [ 0.2 , 0.8 ] = D 1 .

Case 2. Extreme values of p 0 , p 0 ∈ D 3 ∩ ( [ 0 , 0.2 ) ∪ ( 0.8 , 1 ] ) = D 2 .

Case 3. All values of p 0 , p 0 ∈ D 3 .

In the following, for fixed n , we determine the value of c for which the average test size is closest to α = 0.05 , for each of the three considered cases.

The average test size ( ATS T ( n ) ) is obtained as follows:

Case i. ATS T ( n ) = 1 Card ( D i ) ∑ p 0 ∈ D i T S T ( n , p 0 ) .

In addition, for fixed n , we determine the value of c for which the average mean power is closer to 1, for each of the three cases being considered.

The average mean power ( AMP T ( n ) ) is given by:

Case i. AMP T ( n ) = 1 Card ( D i ) ∑ p 0 ∈ D i M P T ( n , p 0 ) .

The values obtained in this way will be considered optimal.

The entire analysis is for α = 0.05 and sample sizes n ∈ N = { 30 , 31 , … , 199 , 200 } .

3.1 Wald-adjusted procedure

3.1.1 Candidate values based on test sizes

In Table 1, we observe the following:

Case 1 ( p ∈ D 1 ). Candidates to optimal values of c take values around 1.46; thus, this will be a candidate value of c for n ∈ N .
Case 2 ( p ∈ D 2 ). There are two groups of candidate values, and by averaging the values in each group we obtained two candidate values: 0.58 and 2.76.
Case 3 ( p ∈ D 3 ). Similar to the previous case, in this case, we also obtained two candidate values of c : 0.71, and 2.9.

Table 1

Optimal values of c based on the average test size for the Wald-adjusted procedure, for n ∈ N and α = 0.05

	Case				Case				Case				Case
n	1	2	3	n	1	2	3	n	1	2	3	n	1	2	3
30	1.40	0.51	0.69	73	1.44	2.76	2.90	116	1.45	2.78	2.93	159	1.40	0.62	2.95
31	1.39	0.51	2.83	74	1.43	2.74	2.91	117	1.49	2.79	2.92	160	1.46	2.80	2.94
32	1.37	2.62	0.69	75	1.47	2.73	2.89	118	1.43	2.79	2.89	161	1.55	2.81	0.73
33	1.38	0.50	2.83	76	1.44	2.75	0.71	119	1.50	0.58	0.73	162	1.46	2.80	2.91
34	1.38	2.63	2.84	77	1.50	2.74	0.71	120	1.44	2.80	0.70	163	1.51	2.83	2.90
35	1.36	2.63	2.84	78	1.51	2.75	2.88	121	1.49	2.79	0.71	164	1.49	2.80	2.92
36	1.36	2.64	0.70	79	1.46	2.73	0.71	122	1.52	2.76	2.93	165	1.50	2.81	0.73
37	1.41	2.65	0.70	80	1.43	2.75	0.71	123	1.46	2.79	2.91	166	1.50	2.80	2.91
38	1.41	2.64	2.85	81	1.44	2.76	0.70	124	1.46	2.78	0.72	167	1.55	2.76	0.71
39	1.41	2.67	0.70	82	1.44	0.58	2.89	125	1.40	2.79	0.73	168	1.48	0.61	2.92
40	1.40	0.54	2.86	83	1.44	0.57	2.90	126	1.49	0.60	2.92	169	1.41	0.62	2.91
41	1.38	0.53	2.85	84	1.41	0.57	2.91	127	1.44	0.62	0.71	170	1.59	2.79	0.72
42	1.38	2.67	2.85	85	1.47	2.76	2.90	128	1.56	2.80	0.71	171	1.46	2.81	0.73
43	1.39	0.53	0.70	86	1.43	2.74	2.91	129	1.56	2.79	0.71	172	1.49	2.80	0.73
44	1.42	0.54	2.85	87	1.38	0.58	0.71	130	1.53	2.78	2.92	173	1.52	2.80	0.72
45	1.43	2.68	0.70	88	1.44	2.77	2.89	131	1.51	2.78	2.94	174	1.56	0.60	0.70
46	1.46	2.68	2.86	89	1.50	2.77	2.89	132	1.45	0.60	0.72	175	1.55	0.64	2.93
47	1.43	2.69	2.86	90	1.44	2.74	2.89	133	1.55	0.61	2.91	176	1.49	2.78	2.95
48	1.41	2.69	2.88	91	1.44	0.58	2.90	134	1.41	0.60	0.72	177	1.48	2.80	0.72
49	1.41	0.54	0.71	92	1.48	2.76	2.90	135	1.49	2.77	2.92	178	1.49	2.80	2.86
50	1.40	2.70	0.71	93	1.47	2.77	0.71	136	1.46	0.61	2.92	179	1.51	0.62	0.71
51	1.41	0.55	0.71	94	1.46	2.77	2.89	137	1.51	2.76	2.94	180	1.67	2.78	0.71
52	1.45	0.55	0.70	95	1.44	2.78	2.91	138	1.45	2.79	2.94	181	1.50	0.61	0.73
53	1.43	2.70	0.70	96	1.48	2.79	2.91	139	1.51	0.60	0.71	182	1.41	2.77	0.71
54	1.43	0.55	0.70	97	1.50	0.58	2.91	140	1.52	2.80	2.93	183	1.67	0.60	2.93
55	1.44	2.70	0.71	98	1.48	2.76	0.72	141	1.42	2.78	2.93	184	1.52	0.63	2.91
56	1.44	2.72	0.71	99	1.45	0.58	0.72	142	1.48	0.61	2.94	185	1.33	2.79	0.73
57	1.43	0.55	0.71	100	1.46	2.75	2.90	143	1.39	2.79	2.95	186	1.48	2.82	2.91
58	1.41	2.73	0.70	101	1.46	0.59	0.71	144	1.46	2.80	2.91	187	1.40	2.82	0.72
59	1.47	2.74	0.71	102	1.49	2.77	2.89	145	1.52	0.61	2.89	188	1.43	2.79	2.92
60	1.44	2.72	2.88	103	1.44	0.59	0.72	146	1.50	2.81	2.92	189	1.48	0.60	0.68
61	1.44	2.72	0.71	104	1.48	0.59	0.70	147	1.41	2.81	2.90	190	1.47	2.83	2.87
62	1.42	0.55	0.71	105	1.53	0.59	0.71	148	1.42	0.59	0.72	191	1.43	2.81	0.71
63	1.41	2.73	0.71	106	1.48	0.59	2.89	149	1.48	2.81	0.72	192	1.40	2.81	2.89
64	1.45	2.74	2.87	107	1.44	0.58	0.71	150	1.44	2.80	2.92	193	1.40	2.82	2.88
65	1.46	2.74	0.71	108	1.44	2.75	2.91	151	1.44	2.81	2.91	194	1.40	2.83	2.89
66	1.42	0.56	2.91	109	1.44	2.79	0.74	152	1.42	2.79	0.70	195	1.41	2.84	0.73
67	1.42	2.73	0.70	110	1.47	0.59	2.92	153	1.49	2.79	0.74	196	1.49	0.62	0.72
68	1.44	0.57	0.71	111	1.47	0.59	2.91	154	1.46	2.81	0.71	197	1.50	2.82	2.91
69	1.42	2.73	0.70	112	1.49	0.59	0.73	155	1.45	0.60	2.93	198	1.49	2.84	0.72
70	1.47	0.57	2.87	113	1.50	0.58	2.94	156	1.53	2.80	0.72	199	1.44	0.63	2.91
71	1.45	0.58	2.89	114	1.42	0.59	0.71	157	1.49	0.61	2.93	200	1.42	0.62	2.91
72	1.50	2.76	2.89	115	1.47	2.78	0.70	158	1.57	0.60	2.94

3.1.2 Candidate values based on mean power

Observing Table 2, it is apparent that the optimal values of c , determined by mean power, are consistently close to 0. The average values for cases 1, 2, and 3 are 0, 0.01, and 0.01, respectively. Although these values may initially seem optimal, they lead to test sizes that are too large and are thus discarded. Therefore, based on mean power, there are no optimal candidates for the Wald-adjusted procedure.

Table 2

Optimal values of c based on the average mean power for the Wald-adjusted procedure, for n ∈ N and α = 0.05

	Case				Case				Case				Case
n	1	2	3	n	1	2	3	n	1	2	3	n	1	2	3
30	0.00	0.00	0.00	73	0.00	0.00	0.00	116	0.00	0.00	0.00	159	0.03	0.00	0.04
31	0.00	0.00	0.00	74	0.06	0.00	0.00	117	0.01	0.01	0.01	160	0.00	0.00	0.00
32	0.00	0.00	0.00	75	0.00	0.00	0.00	118	0.00	0.00	0.00	161	0.05	0.01	0.05
33	0.00	0.00	0.00	76	0.01	0.00	0.00	119	0.00	0.00	0.00	162	0.01	0.03	0.03
34	0.00	0.00	0.00	77	0.00	0.00	0.00	120	0.00	0.00	0.00	163	0.03	0.07	0.04
35	0.00	0.00	0.00	78	0.00	0.00	0.00	121	0.00	0.03	0.03	164	0.00	0.05	0.00
36	0.00	0.00	0.00	79	0.02	0.00	0.00	122	0.00	0.00	0.00	165	0.06	0.15	0.00
37	0.03	0.00	0.00	80	0.03	0.00	0.00	123	0.00	0.00	0.00	166	0.00	0.03	0.03
38	0.00	0.00	0.00	81	0.00	0.00	0.00	124	0.10	0.00	0.00	167	0.00	0.14	0.00
39	0.02	0.00	0.00	82	0.00	0.00	0.00	125	0.00	0.00	0.00	168	0.00	0.00	0.00
40	0.00	0.00	0.00	83	0.00	0.02	0.01	126	0.00	0.02	0.00	169	0.00	0.18	0.03
41	0.00	0.00	0.00	84	0.03	0.03	0.03	127	0.00	0.00	0.00	170	0.02	0.05	0.05
42	0.00	0.00	0.00	85	0.00	0.00	0.00	128	0.08	0.01	0.01	171	0.04	0.00	0.04
43	0.00	0.00	0.00	86	0.02	0.00	0.02	129	0.03	0.00	0.03	172	0.00	0.10	0.00
44	0.00	0.00	0.00	87	0.00	0.00	0.00	130	0.02	0.01	0.01	173	0.01	0.02	0.02
45	0.00	0.00	0.00	88	0.00	0.06	0.00	131	0.00	0.00	0.00	174	0.00	0.00	0.00
46	0.00	0.00	0.00	89	0.00	0.00	0.00	132	0.00	0.00	0.00	175	0.10	0.12	0.11
47	0.01	0.00	0.00	90	0.02	0.00	0.00	133	0.08	0.00	0.00	176	0.00	0.07	0.00
48	0.00	0.00	0.00	91	0.04	0.06	0.05	134	0.00	0.10	0.00	177	0.01	0.00	0.00
49	0.00	0.00	0.00	92	0.03	0.00	0.04	135	0.08	0.01	0.08	178	0.02	0.01	0.01
50	0.01	0.00	0.00	93	0.00	0.00	0.00	136	0.01	0.00	0.00	179	0.00	0.00	0.00
51	0.01	0.00	0.00	94	0.05	0.01	0.01	137	0.01	0.00	0.00	180	0.00	0.00	0.00
52	0.00	0.00	0.00	95	0.00	0.00	0.00	138	0.07	0.00	0.00	181	0.06	0.05	0.06
53	0.00	0.00	0.00	96	0.00	0.00	0.00	139	0.00	0.00	0.00	182	0.00	0.15	0.00
54	0.00	0.00	0.00	97	0.01	0.00	0.00	140	0.00	0.05	0.03	183	0.15	0.05	0.17
55	0.02	0.00	0.00	98	0.06	0.00	0.00	141	0.07	0.00	0.00	184	0.00	0.13	0.00
56	0.01	0.00	0.00	99	0.00	0.02	0.00	142	0.02	0.00	0.02	185	0.07	0.00	0.02
57	0.03	0.00	0.00	100	0.00	0.01	0.01	143	0.00	0.00	0.00	186	0.00	0.00	0.00
58	0.00	0.01	0.01	101	0.01	0.00	0.00	144	0.07	0.07	0.07	187	0.10	0.00	0.03
59	0.04	0.00	0.00	102	0.07	0.00	0.00	145	0.04	0.08	0.04	188	0.00	0.05	0.00
60	0.00	0.00	0.00	103	0.02	0.00	0.02	146	0.00	0.03	0.03	189	0.04	0.01	0.01
61	0.00	0.00	0.00	104	0.08	0.01	0.01	147	0.04	0.02	0.04	190	0.19	0.02	0.01
62	0.00	0.00	0.00	105	0.01	0.00	0.02	148	0.00	0.03	0.04	191	0.00	0.00	0.00
63	0.00	0.00	0.00	106	0.00	0.00	0.00	149	0.04	0.00	0.06	192	0.01	0.04	0.02
64	0.01	0.00	0.01	107	0.00	0.00	0.00	150	0.00	0.10	0.10	193	0.15	0.02	0.02
65	0.00	0.00	0.00	108	0.07	0.00	0.07	151	0.00	0.00	0.00	194	0.00	0.00	0.00
66	0.00	0.00	0.00	109	0.02	0.00	0.00	152	0.00	0.00	0.00	195	0.00	0.15	0.01
67	0.00	0.00	0.00	110	0.00	0.01	0.00	153	0.08	0.10	0.08	196	0.02	0.03	0.03
68	0.06	0.00	0.00	111	0.00	0.01	0.00	154	0.00	0.00	0.00	197	0.00	0.00	0.00
69	0.00	0.00	0.00	112	0.01	0.06	0.06	155	0.01	0.01	0.01	198	0.00	0.00	0.00
70	0.00	0.00	0.00	113	0.00	0.00	0.00	156	0.00	0.04	0.00	199	0.03	0.11	0.01
71	0.02	0.00	0.00	114	0.00	0.00	0.00	157	0.17	0.00	0.00	200	0.00	0.02	0.02
72	0.00	0.00	0.00	115	0.00	0.01	0.01	158	0.08	0.00	0.08

3.1.3 Selecting an optimal value of c

So far we have that for the Wald-adjusted procedure, the candidates to optimal values of c are as follows: 0.58. 0.71, 1.46, 2.76, and 2.9; these values will be jointly compared with c = 2 , because according to [21] this value is an optimal value of c for the Wald test.

Thus, from these values, we will select the one with the best behavior in terms of test size and power.

From Figure 1, we can establish the following remarks.

$Figure 1 Average test sizes and average mean power for Wald-adjusted procedures and optimal values of c c . (a) Average test size: Case 1, (b) average mean power: Case 1, (c) average test size: Case 2, (d) average mean power: Case 2, (e) average test size: Case 3 and (f) average mean power: Case 3.$

Figure 1

Average test sizes and average mean power for Wald-adjusted procedures and optimal values of c . (a) Average test size: Case 1, (b) average mean power: Case 1, (c) average test size: Case 2, (d) average mean power: Case 2, (e) average test size: Case 3 and (f) average mean power: Case 3.

Case 1. The values of c with the average test size closest to α are 1.46, 2, and 0.71, in that order. Related to average mean power, the values of c with the highest powers are 0.58 and 0.71 in this order.

Case 2. The values 2.9, 1.46, and 2 demonstrate inadequate performance, as their average test sizes deviate significantly from α = 0.05 . In terms of average mean power, the highest power is achieved at c = 0.71 .

Case 3. The values of 0.71 and 2.76 yield the best performance, respectively. The other values demonstrate poor performance, as their average test sizes deviate significantly from α = 0.05 . In terms of average mean power, the values of c with the highest powers are 0.58 and 0.71, respectively.

The only value of c that appears in all three cases is 0.71. Of the six values of c being analyzed, 0.71 is the only value that exhibits a strong combined performance in terms of average test sizes and average mean power.

Thus, we consider c = 0.71 as the optimal value for the Wald-adjusted procedure.

In addition, we determined that c = 2 can be a viable choice for Wald-adjusted procedure if a conservative procedure is required, and this conservative behavior was also established in [2].

3.2 Wilson score-adjusted procedure

3.2.1 Candidate values based on test sizes

In Table 3, we observe that the candidate values for each case are expressed as follows:

Case 1 ( p ∈ D 1 ). Candidates to optimal values of c take values around 2.96.
Case 2 ( p ∈ D 2 ). Candidates to optimal values of c take values around 0.18.
Case 3 ( p ∈ D 3 ). Candidates to optimal values of c take values around 0.26.

Table 3

Optimal values of c based on the average test size for the Wilson score adjusted procedure, for n ∈ N and α = 0.05

	Case				Case				Case				Case
n	1	2	3	n	1	2	3	n	1	2	3	n	1	2	3
30	3.08	0.15	0.23	73	3.07	0.15	0.24	116	3.10	0.21	0.28	159	3.09	0.18	0.29
31	3.05	0.15	0.27	74	3.03	0.18	0.30	117	3.13	0.15	0.28	160	3.03	0.19	0.26
32	3.01	0.17	0.27	75	3.05	0.19	0.28	118	3.04	0.17	0.22	161	2.89	0.21	0.24
33	3.01	0.17	0.26	76	3.02	0.17	0.25	119	3.04	0.14	0.25	162	3.04	0.21	0.22
34	3.02	0.17	0.30	77	3.04	0.17	0.30	120	3.13	0.18	0.21	163	3.00	0.13	0.25
35	3.05	0.18	0.26	78	3.01	0.18	0.27	121	3.03	0.20	0.23	164	3.03	0.20	0.25
36	2.97	0.14	0.25	79	3.02	0.19	0.20	122	3.09	0.20	0.24	165	3.02	0.23	0.28
37	2.99	0.15	0.26	80	3.04	0.20	0.21	123	2.80	0.18	0.23	166	3.01	0.21	0.25
38	3.10	0.15	0.26	81	3.02	0.21	0.22	124	3.04	0.19	0.28	167	3.06	0.27	0.29
39	3.02	0.18	0.27	82	3.14	0.22	0.23	125	3.05	0.20	0.21	168	3.06	0.24	0.29
40	2.99	0.13	0.27	83	2.90	0.19	0.26	126	3.11	0.20	0.22	169	2.98	0.21	0.28
41	3.01	0.17	0.25	84	3.04	0.19	0.26	127	3.03	0.23	0.24	170	3.07	0.22	0.28
42	3.07	0.18	0.25	85	3.03	0.21	0.25	128	3.06	0.23	0.25	171	0.01	0.26	0.30
43	3.03	0.14	0.27	86	3.02	0.17	0.27	129	2.97	0.23	0.24	172	3.06	0.17	0.31
44	3.01	0.15	0.27	87	3.14	0.21	0.25	130	3.09	0.19	0.27	173	3.12	0.21	0.31
45	3.05	0.16	0.28	88	2.96	0.21	0.26	131	3.14	0.24	0.26	174	2.85	0.16	0.23
46	3.08	0.16	0.27	89	3.04	0.19	0.27	132	3.16	0.22	0.26	175	3.03	0.16	0.23
47	3.07	0.18	0.26	90	3.02	0.19	0.28	133	3.03	0.26	0.27	176	3.21	0.19	0.35
48	3.00	0.18	0.26	91	3.08	0.18	0.25	134	3.03	0.25	0.28	177	3.07	0.20	0.27
49	2.99	0.19	0.22	92	3.08	0.18	0.28	135	3.07	0.22	0.29	178	2.91	0.22	0.17
50	3.04	0.20	0.24	93	3.00	0.19	0.24	136	3.01	0.18	0.29	179	2.98	0.11	0.32
51	3.07	0.18	0.26	94	3.06	0.21	0.27	137	3.08	0.20	0.25	180	3.15	0.12	0.29
52	2.94	0.13	0.22	95	3.01	0.20	0.27	138	3.13	0.22	0.30	181	2.97	0.14	0.22
53	3.04	0.18	0.28	96	3.02	0.18	0.27	139	3.08	0.22	0.29	182	3.20	0.20	0.31
54	3.07	0.19	0.24	97	3.04	0.16	0.29	140	3.02	0.21	0.25	183	0.11	0.19	0.20
55	3.03	0.15	0.26	98	3.08	0.16	0.31	141	3.00	0.21	0.30	184	2.99	0.17	0.28
56	2.99	0.16	0.24	99	3.00	0.16	0.25	142	3.10	0.22	0.23	185	2.95	0.17	0.31
57	3.04	0.17	0.23	100	3.04	0.17	0.31	143	3.05	0.24	0.28	186	2.89	0.18	0.19
58	3.10	0.18	0.26	101	3.12	0.16	0.31	144	2.98	0.22	0.28	187	2.85	0.18	0.34
59	3.04	0.19	0.26	102	3.00	0.15	0.31	145	3.05	0.22	0.29	188	2.86	0.19	0.24
60	3.07	0.16	0.24	103	2.98	0.16	0.22	146	2.95	0.25	0.32	189	3.02	0.20	0.17
61	3.01	0.21	0.26	104	3.06	0.18	0.19	147	3.09	0.26	0.35	190	0.03	0.23	0.20
62	3.01	0.19	0.30	105	3.12	0.20	0.25	148	3.06	0.20	0.24	191	3.03	0.21	0.24
63	3.01	0.17	0.24	106	3.04	0.16	0.29	149	3.01	0.20	0.32	192	3.08	0.13	0.27
64	3.03	0.17	0.26	107	3.02	0.17	0.28	150	3.03	0.15	0.18	193	3.09	0.15	0.10
65	3.05	0.16	0.26	108	3.14	0.18	0.23	151	3.05	0.19	0.23	194	3.10	0.13	0.24
66	3.10	0.16	0.26	109	3.02	0.15	0.24	152	3.08	0.14	0.27	195	0.06	0.18	0.16
67	3.11	0.19	0.28	110	2.94	0.15	0.26	153	2.76	0.17	0.30	196	2.97	0.18	0.30
68	3.06	0.20	0.28	111	3.06	0.18	0.23	154	2.98	0.21	0.25	197	3.00	0.11	0.27
69	3.04	0.16	0.31	112	3.12	0.17	0.22	155	3.11	0.20	0.27	198	3.03	0.11	0.27
70	3.08	0.16	0.27	113	3.01	0.16	0.28	156	3.06	0.19	0.32	199	3.02	0.13	0.21
71	2.99	0.12	0.32	114	3.04	0.17	0.24	157	2.93	0.19	0.30	200	2.99	0.11	0.19
72	3.00	0.16	0.26	115	3.04	0.19	0.29	158	3.04	0.17	0.19

3.2.2 Candidate values based on mean power

Table 4 demonstrates that the candidate values of c based on mean power in the three cases are close to 0, with an average size of 0.02. Therefore, we can consider 0 as the candidate value.

Table 4

Optimal values of c based on the average mean power for the Wilson score adjusted procedure, for n ∈ N and α = 0.05

	Case				Case				Case				Case
n	1	2	3	n	1	2	3	n	1	2	3	n	1	2	3
30	0.00	0.04	0.04	73	0.02	0.00	0.02	116	0.00	0.05	0.00	159	0.02	0.00	0.00
31	0.00	0.05	0.01	74	0.00	0.05	0.00	117	0.02	0.01	0.01	160	0.02	0.01	0.03
32	0.00	0.06	0.01	75	0.00	0.01	0.01	118	0.00	0.00	0.00	161	0.00	0.04	0.00
33	0.00	0.05	0.03	76	0.00	0.00	0.00	119	0.00	0.00	0.00	162	0.01	0.04	0.04
34	0.00	0.05	0.00	77	0.00	0.05	0.00	120	0.09	0.01	0.01	163	0.00	0.00	0.00
35	0.00	0.05	0.00	78	0.00	0.04	0.00	121	0.00	0.03	0.00	164	0.05	0.00	0.05
36	0.00	0.05	0.00	79	0.00	0.01	0.00	122	0.00	0.00	0.00	165	0.01	0.02	0.01
37	0.00	0.02	0.02	80	0.02	0.01	0.02	123	0.00	0.01	0.00	166	0.00	0.02	0.02
38	0.02	0.05	0.03	81	0.02	0.00	0.00	124	0.00	0.00	0.00	167	0.00	0.00	0.00
39	0.01	0.05	0.02	82	0.00	0.00	0.00	125	0.07	0.00	0.00	168	0.02	0.04	0.04
40	0.01	0.05	0.02	83	0.00	0.05	0.00	126	0.01	0.01	0.01	169	0.00	0.00	0.00
41	0.02	0.00	0.02	84	0.00	0.05	0.05	127	0.00	0.04	0.00	170	0.03	0.00	0.00
42	0.01	0.04	0.02	85	0.03	0.01	0.00	128	0.11	0.04	0.03	171	0.00	0.04	0.01
43	0.01	0.04	0.01	86	0.00	0.00	0.00	129	0.11	0.00	0.00	172	0.04	0.01	0.04
44	0.00	0.06	0.01	87	0.02	0.00	0.02	130	0.09	0.00	0.04	173	0.13	0.00	0.01
45	0.00	0.01	0.01	88	0.03	0.04	0.04	131	0.01	0.07	0.02	174	0.10	0.03	0.03
46	0.02	0.04	0.02	89	0.00	0.04	0.04	132	0.00	0.02	0.01	175	0.00	0.00	0.00
47	0.01	0.05	0.01	90	0.00	0.02	0.02	133	0.00	0.00	0.00	176	0.00	0.09	0.00
48	0.00	0.01	0.00	91	0.00	0.02	0.02	134	0.10	0.01	0.01	177	0.09	0.02	0.02
49	0.00	0.00	0.00	92	0.02	0.01	0.01	135	0.01	0.02	0.01	178	0.14	0.04	0.04
50	0.06	0.05	0.05	93	0.00	0.02	0.00	136	0.05	0.00	0.00	179	0.02	0.02	0.02
51	0.00	0.03	0.00	94	0.00	0.04	0.04	137	0.08	0.03	0.02	180	0.05	0.00	0.00
52	0.00	0.03	0.00	95	0.00	0.05	0.00	138	0.00	0.06	0.04	181	0.00	0.02	0.00
53	0.02	0.01	0.02	96	0.09	0.00	0.00	139	0.01	0.03	0.03	182	0.00	0.00	0.00
54	0.05	0.01	0.01	97	0.00	0.02	0.01	140	0.02	0.01	0.01	183	0.11	0.01	0.10
55	0.03	0.05	0.03	98	0.00	0.01	0.01	141	0.00	0.02	0.00	184	0.01	0.05	0.01
56	0.00	0.01	0.01	99	0.01	0.01	0.01	142	0.00	0.00	0.00	185	0.00	0.00	0.00
57	0.01	0.03	0.03	100	0.02	0.02	0.02	143	0.00	0.03	0.00	186	0.00	0.00	0.00
58	0.00	0.02	0.03	101	0.01	0.02	0.02	144	0.00	0.00	0.00	187	0.00	0.01	0.00
59	0.00	0.04	0.00	102	0.02	0.00	0.03	145	0.00	0.03	0.04	188	0.00	0.01	0.00
60	0.00	0.01	0.01	103	0.07	0.03	0.03	146	0.07	0.02	0.01	189	0.02	0.00	0.00
61	0.00	0.05	0.02	104	0.01	0.01	0.01	147	0.00	0.04	0.04	190	0.02	0.05	0.01
62	0.00	0.03	0.03	105	0.00	0.00	0.00	148	0.00	0.01	0.00	191	0.00	0.00	0.02
63	0.01	0.05	0.05	106	0.00	0.00	0.00	149	0.00	0.05	0.02	192	0.01	0.03	0.05
64	0.00	0.04	0.00	107	0.01	0.00	0.00	150	0.02	0.03	0.03	193	0.00	0.02	0.02
65	0.00	0.03	0.03	108	0.00	0.04	0.04	151	0.14	0.00	0.00	194	0.02	0.00	0.02
66	0.00	0.03	0.03	109	0.00	0.00	0.00	152	0.00	0.01	0.01	195	0.06	0.04	0.05
67	0.03	0.05	0.05	110	0.09	0.00	0.00	153	0.01	0.03	0.03	196	0.00	0.05	0.02
68	0.00	0.03	0.00	111	0.00	0.03	0.00	154	0.10	0.03	0.03	197	0.10	0.01	0.08
69	0.01	0.04	0.01	112	0.00	0.03	0.00	155	0.02	0.04	0.00	198	0.00	0.03	0.03
70	0.00	0.02	0.02	113	0.00	0.04	0.00	156	0.06	0.00	0.00	199	0.05	0.01	0.03
71	0.00	0.00	0.00	114	0.01	0.01	0.01	157	0.08	0.08	0.08	200	0.03	0.12	0.01
72	0.00	0.01	0.01	115	0.04	0.07	0.04	158	0.00	0.00	0.00

3.2.3 Selecting an optimal value

Up to this point, we have determined that for the Wilson score-adjusted procedure, the candidates for optimal values of c are 0, 0.18, 0.26, and 2.96. It is worth noting that among the values obtained as candidates for optimal, the value of 0 is also recommended in [21].

Therefore, we will select the value that exhibits the best performance in terms of test size and power from these candidates.

On the basis of Figure 2, we can make the following observations.

$Figure 2 Average test sizes and average mean power for Wilson score-adjusted procedures and optimal values of c c . (a) Average test size: Case 1, (b) average mean power: Case 1, (c) average test size: Case 2, (d) average mean power: Case 2, (e) average test size: Case 3, and (f) average mean power: Case 3.$

Figure 2

Average test sizes and average mean power for Wilson score-adjusted procedures and optimal values of c . (a) Average test size: Case 1, (b) average mean power: Case 1, (c) average test size: Case 2, (d) average mean power: Case 2, (e) average test size: Case 3, and (f) average mean power: Case 3.

Case 1. For all four values of c considered, the average test sizes are very close to α and the order (from closest to furthest) is: 2.96, 0, 0.18, and 0.26.

Case 2. The average test sizes are very close to α and the order (from closest to furthest) is: 0.18, 0.26, and 0. For c = 2.96 , the average test sizes are very far from α .

Case 3. The average test sizes are very close to α and the order (from closest to furthest) is: 0.26, 0.18, and 0. For c = 2.96 , the average test sizes are very far from α .

Related to the average mean power, for the three cases we have the following. The average mean power values (ordered from highest to lowest) are: 0, 0.18, 0.26, and 2.96. Except for the value of c = 2.96 , which performs poorly relative to the average test size in all three cases, the average mean power is quite similar for 0, 0.18, and 0.26.

Thus, we select c = 0.18 as the optimal value for the Wilson score-adjusted procedure.

3.3 Arcsine-adjusted procedure

3.3.1 Candidate values based on test sizes

From Table 5, in each case, we observe the following:

Case 1 ( p ∈ D 1 ). Candidates to optimal values of c take values around 0.53.
Case 2 ( p ∈ D 2 ). In this case, we obtained two candidates to optimal values of c (0.44 and 1.19).
Case 3 ( p ∈ D 3 ). Similar to the previous case, in this case, we obtained two candidates to optimal values of c (0.4 and 1.33).

Table 5

Optimal values of c based on the average test size for the arcsine-adjusted procedure, for n ∈ N and α = 0.05

	Case				Case				Case				Case
n	1	2	3	n	1	2	3	n	1	2	3	n	1	2	3
30	0.56	1.16	0.45	73	0.52	1.18	1.32	116	0.56	1.19	1.29	159	0.51	1.18	1.37
31	0.56	1.15	1.38	74	0.57	1.19	0.47	117	0.56	1.16	1.31	160	0.56	1.17	1.38
32	0.50	0.43	1.40	75	0.50	0.44	0.49	118	0.50	1.15	1.33	161	0.62	1.18	1.36
33	0.52	1.15	1.36	76	0.52	0.46	0.49	119	0.57	1.20	1.31	162	0.55	1.20	1.38
34	0.51	1.17	1.36	77	0.51	0.47	1.36	120	0.56	1.17	1.34	163	0.49	1.24	1.27
35	0.51	1.18	1.37	78	0.52	0.43	1.36	121	0.45	1.18	1.34	164	0.61	1.21	1.31
36	0.51	1.17	1.35	79	0.59	0.47	1.34	122	0.54	1.19	1.27	165	0.57	1.22	0.34
37	0.54	1.14	1.34	80	0.60	1.17	0.48	123	0.53	1.20	1.31	166	0.61	1.22	1.31
38	0.56	1.16	1.35	81	0.49	0.48	1.36	124	0.52	1.16	1.32	167	0.62	1.19	1.34
39	0.52	1.15	1.34	82	0.50	1.19	1.37	125	0.51	1.19	1.35	168	0.47	1.24	1.27
40	0.51	1.17	1.35	83	0.50	1.19	1.32	126	0.56	1.18	1.29	169	0.54	1.20	1.35
41	0.51	1.18	1.32	84	0.53	1.19	1.32	127	0.48	1.20	1.34	170	0.59	1.18	1.30
42	0.50	1.18	1.35	85	0.55	1.20	1.32	128	0.56	1.18	1.30	171	0.49	1.24	1.28
43	0.53	1.18	1.34	86	0.51	1.21	1.32	129	0.60	1.21	1.29	172	0.48	1.24	1.29
44	0.54	1.15	1.34	87	0.53	1.21	1.33	130	0.54	1.21	1.28	173	0.55	1.22	1.32
45	0.58	1.16	1.34	88	0.49	1.21	1.31	131	0.55	1.21	1.27	174	0.58	1.24	1.34
46	0.57	1.17	1.39	89	0.52	1.22	1.35	132	0.53	1.21	1.27	175	0.61	1.16	1.35
47	0.49	1.16	1.33	90	0.50	1.22	1.30	133	0.61	1.23	1.30	176	0.55	1.15	1.32
48	0.51	1.17	1.36	91	0.51	1.17	1.31	134	0.51	1.24	1.26	177	0.49	1.13	1.36
49	0.49	1.18	1.33	92	0.53	1.19	1.28	135	0.53	1.22	1.31	178	0.54	1.25	1.34
50	0.52	1.15	1.34	93	0.47	1.19	1.33	136	0.52	1.20	1.31	179	0.60	1.16	1.34
51	0.52	1.20	1.36	94	0.54	1.20	1.34	137	0.49	1.26	1.29	180	0.51	1.16	0.31
52	0.56	1.19	1.37	95	0.56	1.22	1.32	138	0.57	1.25	1.28	181	0.47	1.17	0.31
53	0.57	1.18	0.43	96	0.55	1.17	1.32	139	0.60	1.22	1.29	182	0.46	1.19	0.31
54	0.48	0.42	1.35	97	0.49	1.15	1.30	140	0.60	1.18	1.29	183	0.69	1.12	1.37
55	0.52	1.22	0.43	98	0.66	1.15	1.32	141	0.53	1.19	1.31	184	0.55	1.18	1.32
56	0.51	1.17	1.35	99	0.56	1.19	1.30	142	0.60	1.21	1.32	185	0.52	1.14	0.33
57	0.54	1.17	0.43	100	0.57	1.21	1.33	143	0.52	1.21	1.30	186	0.45	1.15	1.37
58	0.53	1.16	0.43	101	0.53	1.19	1.34	144	0.47	1.22	1.27	187	0.61	1.16	1.38
59	0.51	1.16	0.43	102	0.54	1.20	1.35	145	0.48	1.23	1.30	188	0.48	1.20	1.31
60	0.54	1.17	0.43	103	0.54	1.18	1.36	146	0.51	1.26	1.31	189	0.65	1.21	1.34
61	0.54	0.41	0.43	104	0.57	1.16	1.35	147	0.54	1.25	1.32	190	0.54	1.17	1.33
62	0.51	0.42	0.46	105	0.45	1.19	1.38	148	0.60	1.26	1.34	191	0.51	1.20	1.32
63	0.52	0.42	1.34	106	0.55	1.16	1.34	149	0.51	1.24	1.32	192	0.51	1.19	1.34
64	0.53	1.18	0.46	107	0.56	1.22	1.39	150	0.55	1.18	1.32	193	0.49	1.07	1.33
65	0.55	0.41	0.43	108	0.53	1.19	1.34	151	0.50	1.17	1.35	194	0.53	1.23	0.31
66	0.55	0.44	1.31	109	0.49	1.18	1.38	152	0.48	1.19	1.34	195	0.49	1.21	0.30
67	0.51	0.44	1.34	110	0.58	1.16	1.33	153	0.52	1.15	1.35	196	0.54	1.22	1.30
68	0.52	0.45	1.35	111	0.55	1.17	1.29	154	0.49	1.22	0.35	197	0.62	1.16	1.33
69	0.49	0.46	0.47	112	0.50	1.16	1.37	155	0.47	1.23	1.35	198	0.45	1.10	0.31
70	0.52	0.48	1.30	113	0.48	1.16	1.33	156	0.48	1.17	1.35	199	0.56	1.13	0.29
71	0.51	0.45	0.49	114	0.57	1.16	1.32	157	0.53	1.15	1.37	200	0.53	1.15	0.31
72	0.54	0.45	0.50	115	0.55	1.19	1.29	158	0.48	1.17	1.34

3.3.2 Candidate values based on mean power

In Table 5, we presented the optimal values of c based on the average test size for the arcsine-adjusted procedure, for n ∈ N and α = 0.05 .

From Table 6, the candidate values are 0.01 and 0; however, for these two values of c , we note a poor performance in terms of the test sizes for the three cases.

Table 6

Optimal values of c based on the average mean power for the arcsine-adjusted procedure, for n ∈ N and α = 0.05

	Case				Case				Case				Case
n	1	2	3	n	1	2	3	n	1	2	3	n	1	2	3
30	0.01	0.00	0.00	73	0.00	0.00	0.00	116	0.00	0.00	0.00	159	0.00	0.00	0.00
31	0.00	0.00	0.00	74	0.02	0.00	0.00	117	0.05	0.00	0.00	160	0.00	0.00	0.00
32	0.00	0.00	0.00	75	0.00	0.00	0.00	118	0.00	0.00	0.00	161	0.00	0.02	0.00
33	0.00	0.00	0.00	76	0.00	0.00	0.00	119	0.03	0.00	0.00	162	0.00	0.00	0.00
34	0.00	0.00	0.00	77	0.00	0.00	0.00	120	0.00	0.00	0.00	163	0.00	0.00	0.00
35	0.00	0.00	0.00	78	0.00	0.00	0.00	121	0.01	0.00	0.00	164	0.01	0.00	0.01
36	0.00	0.00	0.00	79	0.01	0.00	0.00	122	0.00	0.00	0.00	165	0.00	0.02	0.02
37	0.02	0.00	0.00	80	0.00	0.00	0.00	123	0.06	0.00	0.00	166	0.06	0.00	0.00
38	0.00	0.00	0.00	81	0.00	0.00	0.00	124	0.09	0.00	0.00	167	0.00	0.05	0.00
39	0.00	0.00	0.00	82	0.00	0.00	0.00	125	0.02	0.00	0.00	168	0.01	0.00	0.00
40	0.00	0.00	0.00	83	0.00	0.00	0.00	126	0.01	0.00	0.00	169	0.00	0.12	0.00
41	0.01	0.00	0.00	84	0.00	0.00	0.00	127	0.04	0.02	0.02	170	0.01	0.02	0.02
42	0.00	0.00	0.00	85	0.03	0.00	0.00	128	0.01	0.00	0.00	171	0.00	0.00	0.00
43	0.00	0.00	0.00	86	0.03	0.00	0.00	129	0.00	0.00	0.00	172	0.00	0.00	0.00
44	0.00	0.00	0.00	87	0.00	0.00	0.00	130	0.00	0.00	0.00	173	0.00	0.00	0.00
45	0.00	0.00	0.00	88	0.00	0.00	0.00	131	0.10	0.02	0.02	174	0.01	0.00	0.00
46	0.00	0.00	0.00	89	0.04	0.00	0.00	132	0.01	0.00	0.00	175	0.07	0.00	0.04
47	0.00	0.00	0.00	90	0.00	0.00	0.00	133	0.14	0.00	0.00	176	0.01	0.00	0.00
48	0.00	0.00	0.00	91	0.00	0.00	0.00	134	0.02	0.00	0.00	177	0.03	0.16	0.03
49	0.00	0.00	0.00	92	0.00	0.00	0.00	135	0.00	0.00	0.00	178	0.00	0.00	0.00
50	0.01	0.00	0.00	93	0.00	0.00	0.00	136	0.00	0.00	0.00	179	0.01	0.13	0.13
51	0.01	0.00	0.00	94	0.00	0.00	0.00	137	0.00	0.00	0.00	180	0.06	0.00	0.00
52	0.00	0.00	0.00	95	0.00	0.00	0.00	138	0.00	0.00	0.00	181	0.02	0.00	0.00
53	0.00	0.00	0.00	96	0.01	0.00	0.01	139	0.01	0.00	0.00	182	0.00	0.00	0.00
54	0.01	0.00	0.00	97	0.00	0.02	0.00	140	0.00	0.00	0.00	183	0.00	0.00	0.00
55	0.00	0.00	0.00	98	0.01	0.00	0.00	141	0.02	0.00	0.01	184	0.00	0.00	0.00
56	0.02	0.00	0.00	99	0.04	0.00	0.00	142	0.00	0.00	0.00	185	0.10	0.00	0.03
57	0.00	0.00	0.00	100	0.00	0.00	0.00	143	0.02	0.00	0.00	186	0.00	0.00	0.00
58	0.01	0.00	0.00	101	0.00	0.00	0.00	144	0.00	0.00	0.00	187	0.00	0.00	0.00
59	0.00	0.00	0.00	102	0.10	0.00	0.00	145	0.05	0.00	0.00	188	0.00	0.00	0.00
60	0.00	0.00	0.00	103	0.00	0.00	0.00	146	0.05	0.00	0.00	189	0.00	0.00	0.00
61	0.00	0.00	0.00	104	0.06	0.00	0.00	147	0.00	0.00	0.03	190	0.02	0.00	0.00
62	0.00	0.00	0.00	105	0.00	0.00	0.00	148	0.00	0.08	0.00	191	0.20	0.00	0.00
63	0.00	0.00	0.00	106	0.00	0.00	0.00	149	0.00	0.01	0.01	192	0.05	0.02	0.06
64	0.00	0.00	0.00	107	0.00	0.00	0.00	150	0.02	0.00	0.00	193	0.00	0.03	0.00
65	0.00	0.00	0.00	108	0.03	0.00	0.00	151	0.08	0.00	0.00	194	0.00	0.00	0.00
66	0.00	0.00	0.00	109	0.03	0.00	0.00	152	0.01	0.00	0.00	195	0.00	0.00	0.00
67	0.00	0.00	0.00	110	0.00	0.03	0.00	153	0.07	0.00	0.00	196	0.01	0.10	0.00
68	0.00	0.00	0.00	111	0.02	0.00	0.00	154	0.00	0.00	0.00	197	0.06	0.00	0.00
69	0.02	0.00	0.00	112	0.00	0.00	0.00	155	0.01	0.00	0.01	198	0.02	0.00	0.03
70	0.00	0.00	0.00	113	0.01	0.00	0.00	156	0.01	0.00	0.00	199	0.01	0.09	0.04
71	0.01	0.00	0.00	114	0.00	0.00	0.00	157	0.09	0.00	0.09	200	0.00	0.04	0.04
72	0.00	0.00	0.00	115	0.07	0.00	0.00	158	0.09	0.00	0.09

3.3.3 Selecting an optimal value

Based on Figure 3, it can be observed that the value of c = 0.44 shows the best performance.

$Figure 3 Average test sizes and average mean power for arcsine-adjusted procedures and optimal values of c c . (a) Average test size: Case 1, (b) average mean power: Case 1, (c) average test size: Case 2, (d) average mean power: Case 2, (e) average test size: Case 3 and (f) average mean power: Case 3.$

Figure 3

Average test sizes and average mean power for arcsine-adjusted procedures and optimal values of c . (a) Average test size: Case 1, (b) average mean power: Case 1, (c) average test size: Case 2, (d) average mean power: Case 2, (e) average test size: Case 3 and (f) average mean power: Case 3.

While the authors in [21] suggest a value of c = 0.5 , we found that this value leads to average test sizes further away from 0.05 compared to the values of 1.19 and 0.44 for cases 2 and 3, respectively. Furthermore, for c = 0.53 , we obtained average test sizes that behave better than those for c = 0.5 .

3.4 Böhning-Viwatwongkasem adjusted procedure

3.4.1 Candidate values based on test sizes

From Table 7, in each case, we observe the following:

Case 1 ( p ∈ D 1 ). The only candidate value to optimal in this case is c = 1.74 .
Case 2 ( p ∈ D 2 ). In this case, we obtained two candidate values: 0.62 and 2.43.
Case 3 ( p ∈ D 3 ). Candidate values to optimal for this case are 0.89 and 2.14.

Table 7

Optimal values of c based on the average test size for the Böhning-Viwatwongkasem adjusted procedure, for n ∈ N and α = 0.05

	Case				Case				Case				Case
n	1	2	3	n	1	2	3	n	1	2	3	n	1	2	3
30	1.73	0.55	1.02	73	2.01	0.62	2.03	116	1.97	0.63	0.88	159	1.60	2.49	2.25
31	1.75	0.56	1.03	74	1.57	0.61	2.03	117	1.81	0.65	2.22	160	1.90	2.51	0.84
32	1.85	0.55	1.58	75	1.62	2.36	2.04	118	1.43	0.62	2.20	161	1.93	0.65	2.26
33	1.51	2.10	1.61	76	1.68	0.62	2.06	119	1.95	0.64	2.18	162	1.78	0.64	0.84
34	1.75	2.11	1.63	77	1.71	2.38	2.05	120	2.06	2.49	2.19	163	1.62	0.63	2.27
35	1.83	2.10	1.00	78	1.89	2.35	0.87	121	1.81	0.64	0.88	164	1.41	2.53	2.29
36	1.57	0.56	0.99	79	1.84	2.37	2.04	122	1.87	0.64	0.87	165	1.27	2.53	2.29
37	1.86	0.57	0.98	80	1.67	0.61	2.10	123	1.63	2.46	0.87	166	1.35	0.64	0.86
38	1.97	0.57	1.70	81	1.84	2.38	2.11	124	1.45	0.63	2.20	167	1.88	0.65	0.86
39	1.78	2.15	1.72	82	1.88	0.62	0.90	125	1.37	2.46	0.89	168	1.58	0.64	2.32
40	1.65	2.15	1.75	83	1.83	0.61	0.90	126	1.70	2.48	2.22	169	2.02	2.53	0.85
41	1.72	0.56	1.76	84	1.72	2.37	0.90	127	1.78	2.49	2.22	170	1.70	0.64	0.87
42	1.47	0.57	1.78	85	1.76	2.38	2.09	128	1.74	2.48	0.83	171	1.86	2.52	0.85
43	1.57	2.18	0.95	86	1.87	0.62	2.08	129	1.93	0.64	2.24	172	1.26	0.66	2.31
44	1.85	0.58	0.96	87	1.53	2.39	2.09	130	1.69	0.65	0.88	173	1.45	2.54	2.30
45	1.88	0.59	1.80	88	1.68	2.42	0.88	131	1.92	2.51	0.86	174	1.51	2.55	0.85
46	1.80	0.58	0.95	89	1.82	2.40	0.89	132	1.40	0.65	0.88	175	1.48	0.67	2.28
47	1.51	0.57	1.85	90	1.95	0.63	2.14	133	1.67	2.46	0.85	176	1.90	0.66	0.84
48	1.56	0.57	1.84	91	2.06	2.39	0.88	134	1.74	2.48	2.25	177	1.32	2.53	0.86
49	1.55	2.24	1.86	92	1.68	2.41	0.89	135	1.74	2.50	2.24	178	1.61	2.52	2.28
50	1.83	0.58	0.95	93	1.79	2.42	0.86	136	1.48	2.49	0.86	179	1.81	2.51	2.27
51	1.62	2.24	0.96	94	1.96	0.61	0.88	137	1.71	2.49	2.20	180	1.94	2.53	2.29
52	1.64	0.59	0.94	95	1.80	2.40	2.13	138	1.85	0.64	2.22	181	1.85	2.55	2.31
53	1.86	2.26	0.93	96	1.78	2.43	0.88	139	1.79	2.48	0.90	182	1.25	2.56	2.33
54	1.65	0.58	0.92	97	1.78	2.43	0.88	140	2.08	2.49	2.28	183	1.99	2.54	0.89
55	1.48	0.59	1.91	98	2.08	2.44	2.15	141	1.79	0.65	2.25	184	1.80	2.57	0.83
56	1.75	0.59	0.92	99	1.57	0.63	0.89	142	1.72	2.48	2.25	185	1.52	2.52	2.32
57	1.87	2.29	1.91	100	1.56	2.44	0.90	143	1.74	0.63	2.25	186	1.71	2.54	2.29
58	1.87	0.60	0.92	101	1.70	2.43	2.15	144	1.68	2.48	0.85	187	1.96	2.56	0.87
59	2.08	0.60	1.93	102	1.75	2.42	2.18	145	1.49	2.50	0.88	188	1.78	2.55	0.86
60	1.57	2.29	0.93	103	1.94	0.63	2.15	146	1.64	0.64	2.25	189	2.10	0.67	2.31
61	1.74	0.60	1.98	104	1.73	0.62	0.89	147	2.04	2.50	2.25	190	1.45	0.63	2.28
62	1.69	2.30	0.92	105	1.62	0.62	0.90	148	1.78	2.49	2.25	191	1.76	2.54	2.32
63	1.55	2.33	1.98	106	1.71	2.44	2.18	149	1.88	2.51	0.88	192	1.81	2.58	0.83
64	1.71	2.31	0.91	107	1.67	2.41	2.19	150	1.61	0.63	2.28	193	1.83	0.66	2.34
65	1.87	2.31	1.98	108	1.79	2.43	0.89	151	1.72	2.50	0.85	194	1.82	0.67	2.34
66	1.77	2.31	2.00	109	1.77	0.62	2.18	152	2.15	0.65	0.84	195	1.84	0.65	2.33
67	1.67	0.60	0.91	110	1.62	0.64	2.18	153	1.76	0.64	2.30	196	2.30	2.55	0.85
68	1.65	0.60	2.00	111	2.26	0.64	2.18	154	1.97	2.51	2.27	197	1.52	0.64	2.34
69	1.85	2.33	0.92	112	2.23	0.62	0.89	155	1.55	2.53	2.29	198	1.81	0.64	0.84
70	1.64	2.34	2.02	113	1.84	0.63	2.18	156	1.47	0.65	2.31	199	1.43	2.51	2.29
71	1.81	0.61	2.01	114	1.67	2.45	0.89	157	1.88	0.64	0.87	200	1.64	0.65	2.34
72	1.67	0.61	0.90	115	2.00	0.64	2.17	158	2.00	2.53	0.88

3.4.2 Candidate values based on mean power

From Table 8, for the three cases, the optimal values are scattered between 0 and 4. In this way, we consider the values 0, 1, 2, 3, and 4. For 0, 3, and 4, the corresponding average test sizes are too big.

Table 8

Optimal values of c based on the average mean power for the Böhning-Viwatwongkasem adjusted procedure, for n ∈ N and α = 0.05

	Case				Case				Case				Case
n	1	2	3	n	1	2	3	n	1	2	3	n	1	2	3
30	0.00	3.97	0.00	73	0.11	3.93	0.00	116	0.44	3.95	0.97	159	0.41	2.94	0.96
31	0.00	4.00	0.00	74	0.13	3.96	0.79	117	0.03	3.97	0.79	160	0.01	2.96	0.00
32	0.12	3.95	0.00	75	0.10	3.97	0.96	118	0.37	4.00	1.00	161	0.05	2.75	0.09
33	0.03	4.00	0.00	76	0.11	3.99	0.00	119	0.33	1.97	0.00	162	0.73	3.99	0.74
34	0.07	3.99	0.00	77	1.20	4.00	0.95	120	0.31	3.95	0.00	163	1.01	3.95	1.99
35	0.00	4.00	0.00	78	0.12	3.96	1.97	121	0.57	2.01	2.01	164	0.09	3.31	0.10
36	0.05	3.00	0.00	79	0.06	4.00	3.00	122	0.20	4.00	2.00	165	0.38	0.83	0.38
37	0.02	3.97	0.00	80	0.08	3.90	0.99	123	0.00	3.00	0.00	166	0.11	3.86	1.67
38	0.09	3.94	0.00	81	0.46	3.99	0.00	124	0.06	3.98	0.96	167	0.02	2.05	3.37
39	0.01	4.00	0.00	82	0.89	3.99	0.00	125	0.66	2.98	2.98	168	2.63	2.96	2.70
40	0.13	3.99	0.00	83	0.73	4.00	1.95	126	0.63	4.00	1.97	169	3.02	3.98	1.91
41	0.03	4.00	0.00	84	0.08	3.90	0.99	127	0.80	3.89	0.98	170	1.02	3.91	1.02
42	0.00	3.98	0.00	85	0.31	3.95	1.96	128	0.15	2.98	2.88	171	1.49	1.97	1.49
43	0.09	4.00	0.00	86	0.02	3.95	0.95	129	0.03	3.92	0.57	172	1.73	3.00	0.86
44	0.06	3.98	0.00	87	0.17	4.00	2.00	130	0.73	3.91	3.91	173	0.05	2.95	0.05
45	0.02	3.88	0.00	88	0.47	3.00	0.96	131	1.44	1.90	1.44	174	0.50	2.90	2.90
46	0.18	3.97	0.00	89	0.38	2.98	0.00	132	0.10	2.91	0.10	175	1.82	3.73	1.75
47	0.01	3.99	0.00	90	1.16	3.97	0.80	133	0.60	3.85	3.07	176	0.67	1.96	3.75
48	0.11	4.00	0.00	91	0.12	3.93	0.99	134	1.25	3.95	0.84	177	0.05	1.94	1.94
49	0.07	2.99	0.00	92	0.31	3.90	2.00	135	0.10	2.94	1.92	178	0.50	1.70	2.56
50	0.11	3.91	0.90	93	0.00	2.91	0.00	136	2.16	3.90	1.34	179	0.89	1.99	1.99
51	0.22	3.97	0.96	94	0.09	4.00	1.02	137	0.65	3.87	2.82	180	0.32	2.03	0.74
52	0.18	4.00	0.00	95	0.63	4.00	0.96	138	0.48	3.87	0.97	181	0.14	3.10	0.17
53	0.09	3.99	0.91	96	0.07	2.95	0.00	139	1.49	3.97	0.86	182	1.41	3.99	1.41
54	0.08	3.98	0.00	97	0.23	3.97	0.00	140	1.61	3.84	0.93	183	1.82	3.99	2.94
55	0.19	3.99	0.00	98	0.18	3.99	0.20	141	0.19	3.86	2.97	184	1.28	3.79	0.65
56	0.01	4.00	0.00	99	0.26	3.97	1.04	142	0.26	3.67	2.96	185	1.00	2.98	0.99
57	0.03	3.98	0.00	100	0.54	2.95	0.02	143	0.56	2.96	3.69	186	0.73	2.96	0.74
58	0.00	3.97	1.00	101	0.79	3.94	0.99	144	0.04	3.00	3.53	187	0.45	3.42	1.65
59	0.06	3.95	0.00	102	0.08	3.70	1.80	145	0.32	2.80	2.97	188	0.23	2.16	2.39
60	0.04	4.00	0.00	103	0.15	3.94	1.00	146	0.02	2.61	0.96	189	0.85	3.80	0.82
61	0.40	3.99	0.01	104	0.36	3.00	3.00	147	0.61	2.95	2.97	190	0.60	2.81	2.78
62	0.16	3.96	0.00	105	0.06	3.99	0.02	148	0.54	3.99	3.99	191	0.91	3.12	2.05
63	0.59	2.96	0.00	106	0.12	3.00	0.00	149	0.28	3.01	0.33	192	2.00	3.79	2.00
64	0.03	3.94	0.03	107	0.00	2.76	0.00	150	0.83	2.85	0.86	193	1.29	3.94	2.97
65	0.00	3.99	0.00	108	0.13	3.93	0.90	151	1.53	3.57	1.02	194	0.57	3.83	3.83
66	0.18	3.99	0.00	109	0.85	3.94	0.80	152	0.18	3.86	1.94	195	0.61	3.96	1.49
67	0.08	3.99	0.00	110	0.00	2.99	0.01	153	0.34	3.94	2.87	196	0.58	3.45	3.45
68	0.39	2.97	0.04	111	1.55	3.88	1.85	154	0.79	3.99	0.80	197	0.81	0.52	0.81
69	0.02	4.00	0.00	112	0.81	3.98	0.83	155	0.41	3.91	1.04	198	0.93	0.89	0.89
70	0.47	3.93	0.97	113	0.65	4.00	0.00	156	2.46	3.98	2.46	199	0.03	3.98	2.30
71	0.16	3.98	0.91	114	1.03	3.95	2.00	157	1.23	3.93	0.94	200	0.54	3.54	0.51
72	0.29	3.92	0.99	115	0.85	3.96	0.85	158	0.82	4.00	0.84

3.4.3 Selecting an optimal value

Thus, we have optimal values for c : 0.62, 0.89, 1, 1.74, 2, 2.14, and 2.43.

Among these values of c , highlight c = 1 , which is recommended in [20] as optimal.

From Figure 4, we can establish the following remarks.

$Figure 4 Average test sizes and average mean power for Böhning-Viwatwongkasem adjusted procedures and optimal values of c c . (a) Average test size: Case 1, (b) average mean power: Case 1, (c) average test size: Case 2, (d) average mean power: Case 2, (e) average test size: Case 3 and (f) average mean power: Case 3.$

Figure 4

Average test sizes and average mean power for Böhning-Viwatwongkasem adjusted procedures and optimal values of c . (a) Average test size: Case 1, (b) average mean power: Case 1, (c) average test size: Case 2, (d) average mean power: Case 2, (e) average test size: Case 3 and (f) average mean power: Case 3.

Case 1. The values of c with average test sizes closest to α , ordered from closest to furthest, are 1.74, 2, and 2.14, and the difference between them is not so great.

Case 2. The values of c with average test sizes closest to α , ordered from closest to furthest, are 0.62, 2.14, and 2, and the difference between them is not so great. Note that for 0.89, 1, 1.74, and 2.43 average test sizes behave poorly, especially for small sample sizes.

Case 3. The values of c with average test sizes closest to α , ordered from closest to furthest, are 0.89, 1, 1.74, and 2.14, while 2 and 0.62 have bad performance.

In addition, note in Figure 4 that the average mean power is similar for all statistics in each case.

Thus, we select c = 2.14 as the optimal value for the Böhning-Viwatwongkasem-adjusted procedure.

4 Comparison of the optimal procedures

All calculations in this investigation were performed using R programs written by the authors.

In previous sections, we obtained optimal values of c for each of the adjusted statistical procedures. Thus, the optimal procedures for each of the studied statistic are as follows: Wald 0.71 ( W 0.71 ), Wilson score 0.18 ( S 0.18 ), Arcsine 0.44 ( A 0.44 ), and Böhning-Viwatwongkasem 2.14 ( B 2.14 ). In this section, we will compare these competing procedures.

In Figure 5, the behavior of the average test sizes and average mean power of the adjusted procedures under comparison is shown. This behavior is summarized in Table 9, where the statistics are ordered according to the fulfillment of the optimization criteria. The statistic that appears first in the table is the one that best fulfills the criterion, and so on.

$Figure 5 Average test sizes and average mean power for the optimal values of c c . (a) Average test size: Case 1, (b) average mean power: Case 1, (c) average test size: Case 2, (d) average mean power: Case 2, (e) average test size: Case 3, and (f) average mean power: Case 3.$

Figure 5

Average test sizes and average mean power for the optimal values of c . (a) Average test size: Case 1, (b) average mean power: Case 1, (c) average test size: Case 2, (d) average mean power: Case 2, (e) average test size: Case 3, and (f) average mean power: Case 3.

Table 9

Summary of the behavior of the tests under comparison

The statistical procedures marked in blue are those that have conservative average test size, that is, less than or equal to α = 0.05 .

From Table 9, we can establish the following recommendations for each case.

Case 1. (Central values of p 0 ). Use A 0.44 , if a conservative test is preferred, use S 0.18 .

Case 2. (Extreme values of p 0 ). Use S 0.18 , if a conservative test is preferred, use W 0.71 .

Case 3. (All values of p 0 ). Use W 0.71 when n ≥ 70 and for n < 70 use S 0.18 or A 0.44 .

5 Evaluation of optimal confidence intervals in practice

In this section, we illustrate, in a practical way, the performance of the optimal procedures obtained in previous sections. Thus, the optimal procedures for each of the statistics studied are as follows: Wald 0.71 ( W 0.71 ) , Wilson score 0.18 ( S 0.18 ) , Arcsine 0.44 ( A 0.44 ) , and Böhning-Viwatwongkasem 2.14 ( B 2.14 ) .

For this evaluation, we consider the data from [22], these authors, through a retrospective study, determined whether the proportion of nonsmokers among patients with lung cancer is increasing. As part of their analysis, the authors obtained a 95% confidence interval for the proportion of non-small-cell lung cancer patients who never smoked and with nonmetastatic disease (stage I–III); according to the authors, these confidence intervals were obtained using an exact binomial method.

To illustrate the performance of the optimal procedures, obtained in this work, in practice, we will construct the corresponding confidence intervals for Wald 0.71 ( W 0.71 ) , Wilson score 0.18 ( S 0.18 ) , Arcsine 0.44 ( A 0.44 ) , and Böhning-Viwatwongkasem 2.14 ( B 2.14 ) , and we will compare their length.

Table 10 presents the proposed confidence interval in [22] as well as the optimal procedures for each of the statistics studied: Wald 0.71 ( W 0.71 ) , Wilson score 0.18 ( S 0.18 ) , Arcsine 0.44 ( A 0.44 ) , and Böhning-Viwatwongkasem 2.14 ( B 2.14 ) . The lengths of each of these intervals are also reported.

Table 10

CI For [22], Wald 0.71 ( W 0.71 ), Wilson score (0.18) ( S 0.18 ) , Arcsine 0.44 ( A 0.44 ), and Böhning-Viwatwongkasem 2.14 ( B 2.14 )

In Table 10, it can be observed that for each case, the narrower confidence interval is indicated in blue, while the wider one is indicated in red. Thus, we can see that the most appropriate confidence interval (with the shortest length) in almost all cases is that of Böhning-Viwatwongkasem 2.14 ( B 2.14 ), while the worst (with greater length) is the one used in [22].

Ignoring the interval proposed by [22], the confidence interval with the longest length is the Wald interval 0.71 ( W 0.71 ), followed by the Arcsine interval 0.44 ( A 0.44 ), for almost all cases.

6 Conclusion

This study sheds light on the performance of four adjusted statistical tests or equivalently that of their corresponding confidence intervals. On the basis of our findings, we recommend the following guidelines for using confidence intervals:

For central values of p 0 , that is, for p 0 ∈ [ 0.2 , 0.8 ] . Use A 0.44 ; if a conservative interval is preferred, use S 0.18 .

For extreme values of p 0 , that is, for p 0 ∈ [ 0 , 0.2 ) ∪ ( 0.8 , 1 ] . Use S 0.18 ; if a conservative interval is preferred, use W 0.71 .

When the researcher has no idea about the possible value of p 0 , that it, if is unknown if p 0 is a central or extreme value, use W 0.71 when n ≥ 70 and for n < 70 use S 0.18 or A 0.44 .

The values of c most recommended in the literature of the theme for the estimator p ^ c are c = 0.5 ([20] and [7]; c = 2 and c = z α / 2 2 ([2] and [23]); c = 1 / 6 ([24] and [25]).

Our results differ from those obtained by other authors. We consider this to be because our analysis was based on a much wider range of configurations. Specifically, we studied the behavior of the tests for 401 values of c , 999 values of p 0 , and 171 sample sizes, and this means that we have analyzed 68,502,429 different configurations for each of the four statistics; thus, in total, we analyzed 274,009,716 different configurations.

In addition, we obtained that c = 2 can be a good selection for Wald-adjusted procedure if a conservative procedure is required, and this conservative behavior was also established in [2].

Additional research is necessary for values of α different from 0.05 and sample sizes as well as other confidence intervals or equivalently, statistical tests.

Acknowledgments

The authors are also deeply grateful to Madeline Ann Ahrens for helping to improve the English. The authors wish to express their deepest gratitude to the referees for their exceedingly valuable suggestions.

Funding information: Partial support for this study was provided to the first author from SNI-CONACyT, COFAA-IPN and project SIP-IPN 20210815.
Author contributions: All authors contributed equally to the writing of this article. All authors read and approved the final manuscript.
Conflict of interest: The authors state no conflict of interest.

Appendix

Proposition 1

M P T ( n , p 0 ) = n + ⌈ x 1 ⌉ − ⌊ x 2 ⌋ n + 1

Proof

□ M P ( n , p 0 ) = ∫ 0 1 β ( n , p ) d p = ∫ 0 1 ∑ x ∈ R T n x p x ( 1 − p ) n − x d p = ∑ x ∈ R T n x ∫ 0 1 p x ( 1 − p ) n − x d p = ∑ x ∈ R T n x B ( x + 1 , n − x + 1 ) = ∑ x ∈ R T 1 ( n + 1 ) B ( x + 1 , n − x + 1 ) B ( x + 1 , n − x + 1 ) = ∑ x ∈ R T 1 n + 1 = Card ( R T ) n + 1 = n + ⌈ x 1 ⌉ − ⌊ x 2 ⌋ n + 1 .

References

[1] C. J. Clopper and E. S. Pearson, The use of confidence or fiducial limits illustrated in the case of the binomial, Biometrika 26 (1934), no. 4, 404–413, DOI: https://doi.org/10.2307/2331986. 10.1093/biomet/26.4.404Search in Google Scholar

[2] A. Agresti and B. A. Coull, Approximate is better than exact for interval estimation of binomial proportions, Amer. Statist. 52 (1998), no. 2, 119–126, DOI: https://doi.org/10.2307/2685469. 10.1080/00031305.1998.10480550Search in Google Scholar

[3] L. D. Brown, T. T. Cai, and A. DasGupta, Interval estimation for a binomial proportion, Statist. Sci. 16 (2001), no. 2, 101–133, DOI: https://doi.org/10.1214/ss/1009213286. 10.1214/ss/1009213286Search in Google Scholar

[4] R. G. Newcombe and N. M. Nurminen, In defence of score intervals for proportions and their differences, Comm. Statist. Theory Methods 40 (2011), no. 7, 1271–1282, DOI: https://doi.org/10.1080/03610920903576580. 10.1080/03610920903576580Search in Google Scholar

[5] A. Agresti and B. Caffo, Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures, Amer. Statist. 54 (2000), no. 4, 280–288, DOI: https://doi.org/10.2307/2685779. 10.1080/00031305.2000.10474560Search in Google Scholar

[6] A. Agresti and Y. Min, On small-sample confidence intervals for parameters in discrete distribution, Biometrics 57 (2004), no. 3, 963–971, DOI: https://doi.org/10.1111/j.0006-341x.2001.00963.x. 10.1111/j.0006-341X.2001.00963.xSearch in Google Scholar

[7] L. D. Brown, T. T. Cai, and A. DasGupta, Confidence intervals for a binomial proportion and asymptotic expansions, Ann. Statist. 30 (2002), no. 1, 160–201, DOI: https://doi.org/10.1214/aos/1015362189. 10.1214/aos/1015362189Search in Google Scholar

[8] H. Reyes-Cervantes, F. Almendra-Arao, and M. Morales-Cortés, A comparison of confidence intervals for a proportion and criteria for their application, Adv. Appl. Stat. 58 (2019), no. 1, 35–43, DOI: http://dx.doi.org/10.17654/AS058010035. 10.17654/AS058010035Search in Google Scholar

[9] T. J. Santner, A note on teaching binomial confidence intervals, Collaborative Res. Center 386 (1997), 87, DOI: https://doi.org/10.5282/ubm/epub.1480. Search in Google Scholar

[10] Y. Guan, A generalized score confidence interval for a binomial proportion, J. Statist. Plann. Inference 142 (2012), no. 4, 785–793, DOI: https://doi.org/10.1016/j.jspi.2011.09.010. 10.1016/j.jspi.2011.09.010Search in Google Scholar

[11] M. Thulin, On split sample and randomized confidence intervals for binomial proportions, Stat. Probab. Lett. 92 (2014), 65–71, DOI: https://doi.org/10.1016/j.spl.2014.05.005. 10.1016/j.spl.2014.05.005Search in Google Scholar

[12] S. V. Stehman and D. Xing, Confidence intervals for proportion of area estimated from a stratified random sample, Remote Sens. Environ. 280 (2022), 113193, DOI: https://doi.org/10.1016/j.rse.2022.113193. 10.1016/j.rse.2022.113193Search in Google Scholar

[13] J. Frey and Y. Zhang, Improved exact confidence intervals for a proportion using ranked-set sampling, J. Korean Statist. Soc. 48 (2019), no. 3, 493–501, DOI: https://doi.org/10.1016/j.jkss.2019.05.003. 10.1016/j.jkss.2019.05.003Search in Google Scholar

[14] I. R. Harris, A simple approximation to the likelihood interval for a binomial proportion, Stat. Methodol. 13 (2013), 42–47, DOI: https://doi.org/10.1016/j.stamet.2013.01.005. 10.1016/j.stamet.2013.01.005Search in Google Scholar

[15] X. Liu, Y. Li, J. Yu, and T. Zeng, Posterior-based Wald-type statistics for hypothesis testing, J. Econometrics 230 (2022), no. 1, 83–113, DOI: https://doi.org/10.1016/j.jeconom.2021.11.003. 10.1016/j.jeconom.2021.11.003Search in Google Scholar

[16] M. S. Balch, New two-sided confidence intervals for binomial inference derived using Walley’s imprecise posterior likelihood as a test statistic, Internat. J. Approx. Reason. 123 (2020), 77–98, DOI: https://doi.org/10.1016/j.ijar.2020.05.005. 10.1016/j.ijar.2020.05.005Search in Google Scholar

[17] R. G. Newcombe, Two-sided confidence intervals for the single proportion: comparison of seven methods, Stat. Med. 17 (1998), no. 8, 857–872, DOI: https://doi.org/10.1002/(sici)1097-0258(19980430)17:8%3C857::aid-sim777%3E3.0.co;2-e. 10.1002/(SICI)1097-0258(19980430)17:8<857::AID-SIM777>3.0.CO;2-ESearch in Google Scholar

[18] A. Khurshid, Binomial and Poisson confidence intervals and its variants: A bibliography, Pakistan J. Stat Oper. Res. 6 (2010), no. 1, 75–100, DOI: https://doi.org/10.18187/pjsor.v6i1.139. 10.18187/pjsor.v6i1.139Search in Google Scholar

[19] G. Casella and R. Berger, Statistical Inference, 2nd ed., Thomson Learning, Australia, 2002. Search in Google Scholar

[20] D. Böhning and C. Viwatwongkasem, Revisiting proportion estimators, Stat. Methods Med. Res. 14 (2005), no. 2, 1–23, DOI: https://doi.org/10.1191/0962280205sm393oa. 10.1191/0962280205sm393oaSearch in Google Scholar

[21] A. Martín Andrés and M. Álvarez Hernández, Two-tailed asymptotic inferences for a proportion, J. Appl. Stat. 41 (2014), no. 7, 1516–1529, DOI: https://doi.org/10.1080/02664763.2014.88178310.1080/02664763.2014.881783Search in Google Scholar

[22] L. Pelosof, C. Ahn, A. Gao, L. Horn, A. Madrigales, J. Cox, et al., Proportion of Never-Smoker non-small cell lung cancer patients at three diverse institutions, J. Natl. Cancer Inst. 109 (2017), no. 7, djw295, DOI: https://doi.org/10.1093/jnci/djw295. 10.1093/jnci/djw295Search in Google Scholar

[23] H. Chen, The accuracy of approximate intervals for a binomial parameter, J. Amer. Statist. Assoc. 85 (1990), no. 410, 514–518, DOI: https://doi.org/10.1080/01621459.1990.10476229. 10.1080/01621459.1990.10476229Search in Google Scholar

[24] J. Sánchez-Meca and F. Marín-Martínez, Testing the significance of a common risk difference in meta-analysis, Comput. Statist. Data Anal. 33 (2000), no. 3, 299–313, DOI: https://doi.org/10.1016/S0167-9473(99)00055-9. 10.1016/S0167-9473(99)00055-9Search in Google Scholar

[25] J. W. Tukey, Exploratory Data Analysis, Addison-Wesley, Reading, Massachusetts, 1977. Search in Google Scholar

Received: 2022-09-09

Revised: 2023-03-29

Accepted: 2023-04-30

Published Online: 2023-06-06

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/math-2022-0588

Keywords for this article

confidence interval; binomial proportion; coverage probability; shrinkage estimator

Creative Commons

BY 4.0