Multiple comparisons and small sample size, common characteristics of many types of “Big Data” including those that are produced by genomic studies, present specific challenges that affect reliability of inference. Use of multiple testing procedures necessitates calculation of very small tail probabilities of a test statistic distribution. Results based on large deviation theory provide a formal condition that is necessary to guarantee error rate control given practical sample sizes, linking the number of tests and the sample size; this condition, however, is rarely satisfied. Using methods that are based on Edgeworth expansions (relying especially on the work of Peter Hall), we explore the impact of departures of sampling distributions from typical assumptions on actual error rates. Our investigation illustrates how far the actual error rates can be from the declared nominal levels, suggesting potentially wide-spread problems with error rate control, specifically excessive false positives. This is an important factor that contributes to “reproducibility crisis”. We also review some other commonly used methods (such as permutation and methods based on finite sampling inequalities) in their application to multiple testing/small sample data. We point out that Edgeworth expansions, providing higher order approximations to the sampling distribution, offer a promising direction for data analysis that could improve reliability of studies relying on large numbers of comparisons with modest sample sizes.
Contents
- Commentary
-
Publicly AvailableBig Data, Small SampleMay 20, 2017
- Research Articles
-
Publicly AvailableParameter Estimation of a Two-Colored Urn Model ClassMarch 25, 2017
-
Publicly AvailableCombinatorial Mixtures of Multiparameter Distributions: An Application to Bivariate DataFebruary 16, 2017
-
March 17, 2017
-
Publicly AvailableComparing Four Methods for Estimating Tree-Based Treatment RegimesMay 12, 2017
-
Publicly AvailableOn Stratified Adjusted Tests by Binomial TrialsFebruary 14, 2017
-
Publicly AvailableImprovement Screening for Ultra-High Dimensional Data with Censored Survival Outcomes and Varying CoefficientsMay 18, 2017
-
Publicly AvailableBayesian Variable Selection Methods for Matched Case-Control StudiesJanuary 31, 2017
-
Publicly AvailableTesting Equality of Treatments under an Incomplete Block Crossover Design with Ordinal ResponsesFebruary 3, 2017
-
Publicly AvailableEmpirical Likelihood in Nonignorable Covariate-Missing Data ProblemsApril 20, 2017
-
Publicly AvailableA Quantitative Concordance Measure for Comparing and Combining Treatment Selection MarkersMarch 25, 2017
-
Publicly AvailableMedian Analysis of Repeated Measures Associated with Recurrent Events in Presence of Terminal EventApril 28, 2017
-
Publicly AvailableA Theorem at the Core of Colliding BiasMarch 31, 2017
-
Publicly AvailableGroup Tests for High-dimensional Failure Time Data with the Additive Hazards ModelsMay 9, 2017
-
Publicly AvailableCharacterizing Highly Benefited Patients in Randomized Clinical TrialsMay 20, 2017