Abstract
Ohlson (Empirical Accounting Seminars: Elephants in the Room. Accounting, Economics, and Law: A Convivium 15 (1): 1–8) argues that researchers tacitly avoid raising statistics-related ‘elephants’ that could undermine inferences. We offer a balanced perspective, first applauding the remarkable progress made in deriving testable predictions, leveraging modern statistical techniques, and tapping alternative Big Data sources to address issues relevant to practitioners, regulators and academia. While we concur with Ohlson’s elephants, we caution against over-criticism based on statistical design choices, as it risks creating new elephants. Our key lessons: focus on meaningful hypotheses, recognize merits of descriptive studies, balance Type I and II errors in data handling and journal reviewing, employ proper context when interpreting statistical significance and consider economic significance. Overall, though empirical accounting research faces challenges, criticism should not deter innovative research (Type II error in journal reviewing).
Acknowledgment
We thank Yuri Biondi (editor), three anonymous referees, Pengchia Chiu and Jim Ohlson for help comments and suggestions.
Appendix: ASA Statement on Statistical Significance and P-Values
We summarize the six principles, labeled P1 through P6, from the 2016 ASA Statement on Statistical Significance and P-values in this appendix and highlight common mistakes of accounting researchers in implementing them.
P-Values can indicate how incompatible the data are with a specified statistical method.
A small p-value (usually < 0.05) means that there is strong evidence against the null hypothesis (e.g. there is no association between x and y variables) with the current data so one can reject the null. However, a common mistake by accounting researchers is to conclude that a large p-value (>0.05) means that there is x is unrelated to y when all that can be concluded is that the evidence is inconclusive as it is compatible with the null hypothesis being either true or false (Cready et al. 2022).
P-Values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
The researcher should be reminded that a p-value does not tell one whether the null hypothesis is true or not. It only tells one how likely it is that the data would have turned out the way it did if the (null) hypothesis is actually true. A small p-value suggests that it is unlikely that the data would have turned out as observed if x and y were unrelated, but it does not prove that x and y are related.
Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
This is a common mistake of researchers making conclusions from the statistical findings in empirical research. Principle 3 of the ASA Statement says that a p-value should not be used as the sole basis for making accounting policies driven by accounting research. A p-value is just one piece of information that researchers use to make decisions about the data. Other factors such as study design, measurement quality, external evidence, and assumptions underlying data analysis should also be discussed when making inferences from data. Using a “bright-line” rule like “p < 0.05” to justify scientific claims can lead to incorrect beliefs and poor decision-making.
Proper inference requires full reporting and transparency.
This is key concern because journals have incentives to publish significant results and often reject non-significant results. Researchers should not cherry pick positive (significant) findings only, a practice called variously data dredging, significance chasing or questing, selective inference or p-hacking. Harvey’s (2017) presidential address to the American Finance Association highlighted this concern and suggests strategies for discouraging this practice. The ASA recommends that researchers disclose the number of hypotheses explored during the study, all data collection decisions, all statistical analyses conducted, and all p-values computed. Note that a step towards transparency in research for experimental studies is pre-registering a study. This practice may bring advantages also to benefit archival studies.
A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
We quote the ASA statement directly – “Statistical significance is not equivalent to scientific or economic significance. Smaller p-values do not necessarily imply the presence of larger or more important effects, and conversely larger p-values do not imply a lack of importance or even lack of effect.”
Also, researchers need to be reminded that any effect, no matter how tiny, can produce a small p-value if the sample size is high enough, and conversely large effects may produce unimpressive p-values if the sample size is small. Similarly, identical estimated effects will have different p-values if the measurement error of the variables differs.” See Johannesson, Ohlson and Zhai (2023) for the application of this point in the accounting context.
By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.
Principle 6 of the ASA Statement says that researchers should rely on p-values with context, use other approaches and robustness tests for proper inferences. Examples of other approaches include confidence or prediction intervals; Bayesian methods; likelihood ratios or Bayes factors; false discovery rates. The additional approaches can also provide additional evidence about size or uncertainty of an effect.
A final remark going beyond the six principles: we would like to remind researchers that the logical reasoning for the null and alternative hypotheses must pass what we refer to as the “smell test.” If the reasonings for the mechanisms to produce the proposed hypotheses are too fanciful, Occam’s razor rule suggests that evidence from testing the hypothesis is unlikely to yield meaningful inferences.
References
Allison, P. 2009. Fixed Effects Regression Models. Thousand Oaks: SAGE.10.4135/9781412993869Suche in Google Scholar
Angrist, J., and J. Pischke. 2009. Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton: Princeton University Press.10.1515/9781400829828Suche in Google Scholar
Asness, C., A. Frazzini, and L. Pedersen. 2019. “Quality Minus Junk.” Review of Accounting Studies 24: 34–112. https://doi.org/10.1007/s11142-018-9470-2.Suche in Google Scholar
Ball, R., and P. Brown. 1968. “An Empirical Evaluation of Accounting Income Numbers.” Journal of Accounting Research 6: 159–78. https://doi.org/10.2307/2490232.Suche in Google Scholar
Ball, R., and P. Brown. 2014. “Ball and Brown (1968): A Retrospective.” The Accounting Review January 89 (1): 1–26. https://doi.org/10.2308/accr-50604.Suche in Google Scholar
Basu, D. 2020. “Bias of OLS Estimators Due to Exclusion of Relevant Variables and Inclusion of Irrelevant Variables.” Oxford Bulletin of Economics & Statistics 82: 209–34. https://doi.org/10.1111/obes.12322.Suche in Google Scholar
Bernard, V., and J. Thomas. 1989. “Post-Earnings-Announcement Drift: Delayed Price Response or Risk Premium?” Journal of Accounting Research 27: 1–36. https://doi.org/10.2307/2491062.Suche in Google Scholar
Bertomeu, J. 2025. “Statistical Versus Economic Significance in Accounting: A Reality Check.” Accounting, Economics, and Law: A Convivium 15: 105–21.10.2139/ssrn.4545223Suche in Google Scholar
Biondi, Y. 2025. “Limits of Empirical Studies in Accounting and Social Sciences: A Constructive Critique from Accounting, Economics and the Law.” Accounting, Economics, and Law: A Convivium 15: 9–19.10.1515/ael-2021-0089Suche in Google Scholar
Bloomfield, R., M. W. Nelson, and E. Soltes. 2016. “Gathering Data for Archival, Field, Survey, and Experimental Accounting Research.” Journal of Accounting Research 54: 341–95. https://doi.org/10.1111/1475-679x.12104.Suche in Google Scholar
Bonsall IV, S., A. Leone, B. Miller, and K. Rennekamp. 2017. “A Plain English Measure of Financial Reporting Readability.” Journal of Accounting and Economics 63: 329–57. https://doi.org/10.1016/j.jacceco.2017.03.002.Suche in Google Scholar
Breuer, M., and E. DeHaan. 2023. Using and Interpreting Fixed Effects Models, Working Paper. Stanford University.10.2139/ssrn.4539828Suche in Google Scholar
Cready, W., J. He, W. Lin, C. Shao, D. Wang, and Y. Zhang. 2022. “Is There a Confidence Interval for that? A Critical Examination of Null Outcome Reporting in Accounting Research.” Behavioral Research in Accounting 34: 43–72. https://doi.org/10.2308/bria-2020-033.Suche in Google Scholar
Dyckman, T., and S. Zeff. 2015. “Accounting Research: Past, Present, and Future.” Abacus 51: 511–24. https://doi.org/10.1111/abac.12058.Suche in Google Scholar
Fama, E. F., and J. D. MacBeth. 1973. “Risk, Return, and Equilibrium: Empirical Tests.” Journal of Political Economy 81: 607–36. https://doi.org/10.1086/260061.Suche in Google Scholar
Graham, J., M. Hanlon, T. Shevlin, and T. Shroff. 2014. “Incentives for Tax Planning and Avoidance: Evidence from the Field.” The Accounting Review 89 (3): 991–1023. https://doi.org/10.2308/accr-50678.Suche in Google Scholar
Harvey, C. 2017. “Presidential Address: The Scientific Outlook in Financial Economics.” The Journal of Finance 72: 1399–440. https://doi.org/10.1111/jofi.12530.Suche in Google Scholar
Hirshleifer, D. 2015. “Editorial: Cosmetic Surgery in the Academic Review Process.” Review of Financial Studies 28 (3): 637–49. https://doi.org/10.1093/rfs/hhu093.Suche in Google Scholar
Jennings, J., J. Lee, J. Kim, and D. Taylor. 2022. “Measurement Error, Fixed Effects, and False Positives in Accounting Research.” Working Paper. St. Louis: Washington University.Suche in Google Scholar
Johannesson, E., J. Ohlson, and W. Zhai. 2023 In press. “The Explanatory Power of Explanatory Variables.” Review of Accounting Studies 28, https://doi.org/10.1007/s11142-023-09781-w.Suche in Google Scholar
Libby, R., R. Bloomfield, and M. Nelson. 2002. “Experimental Research in Financial Accounting.” Accounting, Organizations and Society 27: 775–810. https://doi.org/10.1016/s0361-3682(01)00011-3.Suche in Google Scholar
Loughran, T., and J. Ritter. 2000. “Uniformly Least Powerful Tests of Market Efficiency.” Journal of Financial Economics 55: 361–89. https://doi.org/10.1016/s0304-405x(99)00054-9.Suche in Google Scholar
Ohlson, J. 2025. “Empirical Accounting Seminars: Elephants in the Room.” Accounting, Economics, and Law: A Convivium 15: 1–8. https://doi.org/10.1515/ael-2021-0067.Suche in Google Scholar
Petersen, M. 2009. “Estimating Standard Errors in Finance Panel Data Sets: Comparing Approaches.” Review of Financial Studies 22: 435–80. https://doi.org/10.1093/rfs/hhn053.Suche in Google Scholar
Sloan, R. 1996. “Do Stock Prices Fully Reflect Information in Accruals and Cash Flows about Future Earnings?” The Accounting Review 71: 289–315.Suche in Google Scholar
Teoh, S. 2018. “The Promise and Challenges of New Datasets for Accounting Research.” Accounting, Organizations and Society 68: 109–17. https://doi.org/10.1016/j.aos.2018.03.008.Suche in Google Scholar
Treiman, D. 2009. Quantitative Data Analysis: Doing Social Research to Test Ideas. San Francisco: Jossey-Bass.Suche in Google Scholar
Wasserstein, R., and N. Lazar. 2016. “The ASA Statement on P-Values: Context, Process, and Purpose.” The American Statistician 70 (2): 129–33.10.1080/00031305.2016.1154108Suche in Google Scholar
© 2023 CONVIVIUM, association loi de 1901
Artikel in diesem Heft
- Frontmatter
- Research Articles
- Empirical Accounting Seminars: Elephants in the Room
- Limits of Empirical Studies in Accounting and Social Sciences: A Constructive Critique from Accounting, Economics and the Law
- Accounting Research’s “Flat Earth” Problem
- Accounting Research as Bayesian Inference to the Best Explanation
- The Elephant in the Room: p-hacking and Accounting Research
- De-emphasizing Statistical Significance
- Statistical versus Economic Significance in Accounting: A Reality Check
- Another Way Forward: Comments on Ohlson’s Critique of Empirical Accounting Research
- Setting Statistical Hurdles for Publishing in Accounting
Artikel in diesem Heft
- Frontmatter
- Research Articles
- Empirical Accounting Seminars: Elephants in the Room
- Limits of Empirical Studies in Accounting and Social Sciences: A Constructive Critique from Accounting, Economics and the Law
- Accounting Research’s “Flat Earth” Problem
- Accounting Research as Bayesian Inference to the Best Explanation
- The Elephant in the Room: p-hacking and Accounting Research
- De-emphasizing Statistical Significance
- Statistical versus Economic Significance in Accounting: A Reality Check
- Another Way Forward: Comments on Ohlson’s Critique of Empirical Accounting Research
- Setting Statistical Hurdles for Publishing in Accounting