Setting Statistical Hurdles for Publishing in Accounting

Siew Hong Teoh; Yinglei Zhang

doi:10.1515/ael-2022-0104

Article

Setting Statistical Hurdles for Publishing in Accounting

Siew Hong Teoh and Yinglei Zhang

Published/Copyright: December 25, 2023

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Accounting, Economics, and Law: A Convivium Volume 15 Issue 1

Abstract

Ohlson (Empirical Accounting Seminars: Elephants in the Room. Accounting, Economics, and Law: A Convivium 15 (1): 1–8) argues that researchers tacitly avoid raising statistics-related ‘elephants’ that could undermine inferences. We offer a balanced perspective, first applauding the remarkable progress made in deriving testable predictions, leveraging modern statistical techniques, and tapping alternative Big Data sources to address issues relevant to practitioners, regulators and academia. While we concur with Ohlson’s elephants, we caution against over-criticism based on statistical design choices, as it risks creating new elephants. Our key lessons: focus on meaningful hypotheses, recognize merits of descriptive studies, balance Type I and II errors in data handling and journal reviewing, employ proper context when interpreting statistical significance and consider economic significance. Overall, though empirical accounting research faces challenges, criticism should not deter innovative research (Type II error in journal reviewing).

Keywords: Fama-Macbeth regression; fixed effects; sample selection bias; Type I and Type II errors; p-hacking; statistical design choices

JEL Classification: M41; C12; M40

Corresponding author: Siew Hong Teoh, UCLA Anderson School of Management, Los Angeles, CA, USA, E-mail: steoh@anderson.ucla.edu

Acknowledgment

We thank Yuri Biondi (editor), three anonymous referees, Pengchia Chiu and Jim Ohlson for help comments and suggestions.

Appendix: ASA Statement on Statistical Significance and P-Values

We summarize the six principles, labeled P1 through P6, from the 2016 ASA Statement on Statistical Significance and P-values in this appendix and highlight common mistakes of accounting researchers in implementing them.

P-Values can indicate how incompatible the data are with a specified statistical method.

A small p-value (usually < 0.05) means that there is strong evidence against the null hypothesis (e.g. there is no association between x and y variables) with the current data so one can reject the null. However, a common mistake by accounting researchers is to conclude that a large p-value (>0.05) means that there is x is unrelated to y when all that can be concluded is that the evidence is inconclusive as it is compatible with the null hypothesis being either true or false (Cready et al. 2022).

P-Values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.

The researcher should be reminded that a p-value does not tell one whether the null hypothesis is true or not. It only tells one how likely it is that the data would have turned out the way it did if the (null) hypothesis is actually true. A small p-value suggests that it is unlikely that the data would have turned out as observed if x and y were unrelated, but it does not prove that x and y are related.

Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.

This is a common mistake of researchers making conclusions from the statistical findings in empirical research. Principle 3 of the ASA Statement says that a p-value should not be used as the sole basis for making accounting policies driven by accounting research. A p-value is just one piece of information that researchers use to make decisions about the data. Other factors such as study design, measurement quality, external evidence, and assumptions underlying data analysis should also be discussed when making inferences from data. Using a “bright-line” rule like “p < 0.05” to justify scientific claims can lead to incorrect beliefs and poor decision-making.

Proper inference requires full reporting and transparency.

This is key concern because journals have incentives to publish significant results and often reject non-significant results. Researchers should not cherry pick positive (significant) findings only, a practice called variously data dredging, significance chasing or questing, selective inference or p-hacking. Harvey’s (2017) presidential address to the American Finance Association highlighted this concern and suggests strategies for discouraging this practice. The ASA recommends that researchers disclose the number of hypotheses explored during the study, all data collection decisions, all statistical analyses conducted, and all p-values computed. Note that a step towards transparency in research for experimental studies is pre-registering a study. This practice may bring advantages also to benefit archival studies.

A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.

We quote the ASA statement directly – “Statistical significance is not equivalent to scientific or economic significance. Smaller p-values do not necessarily imply the presence of larger or more important effects, and conversely larger p-values do not imply a lack of importance or even lack of effect.”

Also, researchers need to be reminded that any effect, no matter how tiny, can produce a small p-value if the sample size is high enough, and conversely large effects may produce unimpressive p-values if the sample size is small. Similarly, identical estimated effects will have different p-values if the measurement error of the variables differs.” See Johannesson, Ohlson and Zhai (2023) for the application of this point in the accounting context.

By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.

Principle 6 of the ASA Statement says that researchers should rely on p-values with context, use other approaches and robustness tests for proper inferences. Examples of other approaches include confidence or prediction intervals; Bayesian methods; likelihood ratios or Bayes factors; false discovery rates. The additional approaches can also provide additional evidence about size or uncertainty of an effect.

A final remark going beyond the six principles: we would like to remind researchers that the logical reasoning for the null and alternative hypotheses must pass what we refer to as the “smell test.” If the reasonings for the mechanisms to produce the proposed hypotheses are too fanciful, Occam’s razor rule suggests that evidence from testing the hypothesis is unlikely to yield meaningful inferences.

References

Allison, P. 2009. Fixed Effects Regression Models. Thousand Oaks: SAGE.10.4135/9781412993869Search in Google Scholar

Angrist, J., and J. Pischke. 2009. Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton: Princeton University Press.10.1515/9781400829828Search in Google Scholar

Asness, C., A. Frazzini, and L. Pedersen. 2019. “Quality Minus Junk.” Review of Accounting Studies 24: 34–112. https://doi.org/10.1007/s11142-018-9470-2.Search in Google Scholar

Ball, R., and P. Brown. 1968. “An Empirical Evaluation of Accounting Income Numbers.” Journal of Accounting Research 6: 159–78. https://doi.org/10.2307/2490232.Search in Google Scholar

Ball, R., and P. Brown. 2014. “Ball and Brown (1968): A Retrospective.” The Accounting Review January 89 (1): 1–26. https://doi.org/10.2308/accr-50604.Search in Google Scholar

Basu, D. 2020. “Bias of OLS Estimators Due to Exclusion of Relevant Variables and Inclusion of Irrelevant Variables.” Oxford Bulletin of Economics & Statistics 82: 209–34. https://doi.org/10.1111/obes.12322.Search in Google Scholar

Bernard, V., and J. Thomas. 1989. “Post-Earnings-Announcement Drift: Delayed Price Response or Risk Premium?” Journal of Accounting Research 27: 1–36. https://doi.org/10.2307/2491062.Search in Google Scholar

Bertomeu, J. 2025. “Statistical Versus Economic Significance in Accounting: A Reality Check.” Accounting, Economics, and Law: A Convivium 15: 105–21.10.2139/ssrn.4545223Search in Google Scholar

Biondi, Y. 2025. “Limits of Empirical Studies in Accounting and Social Sciences: A Constructive Critique from Accounting, Economics and the Law.” Accounting, Economics, and Law: A Convivium 15: 9–19.10.1515/ael-2021-0089Search in Google Scholar

Bloomfield, R., M. W. Nelson, and E. Soltes. 2016. “Gathering Data for Archival, Field, Survey, and Experimental Accounting Research.” Journal of Accounting Research 54: 341–95. https://doi.org/10.1111/1475-679x.12104.Search in Google Scholar

Bonsall IV, S., A. Leone, B. Miller, and K. Rennekamp. 2017. “A Plain English Measure of Financial Reporting Readability.” Journal of Accounting and Economics 63: 329–57. https://doi.org/10.1016/j.jacceco.2017.03.002.Search in Google Scholar

Breuer, M., and E. DeHaan. 2023. Using and Interpreting Fixed Effects Models, Working Paper. Stanford University.10.2139/ssrn.4539828Search in Google Scholar

Cready, W., J. He, W. Lin, C. Shao, D. Wang, and Y. Zhang. 2022. “Is There a Confidence Interval for that? A Critical Examination of Null Outcome Reporting in Accounting Research.” Behavioral Research in Accounting 34: 43–72. https://doi.org/10.2308/bria-2020-033.Search in Google Scholar

Dyckman, T., and S. Zeff. 2015. “Accounting Research: Past, Present, and Future.” Abacus 51: 511–24. https://doi.org/10.1111/abac.12058.Search in Google Scholar

Fama, E. F., and J. D. MacBeth. 1973. “Risk, Return, and Equilibrium: Empirical Tests.” Journal of Political Economy 81: 607–36. https://doi.org/10.1086/260061.Search in Google Scholar

Graham, J., M. Hanlon, T. Shevlin, and T. Shroff. 2014. “Incentives for Tax Planning and Avoidance: Evidence from the Field.” The Accounting Review 89 (3): 991–1023. https://doi.org/10.2308/accr-50678.Search in Google Scholar

Harvey, C. 2017. “Presidential Address: The Scientific Outlook in Financial Economics.” The Journal of Finance 72: 1399–440. https://doi.org/10.1111/jofi.12530.Search in Google Scholar

Hirshleifer, D. 2015. “Editorial: Cosmetic Surgery in the Academic Review Process.” Review of Financial Studies 28 (3): 637–49. https://doi.org/10.1093/rfs/hhu093.Search in Google Scholar

Jennings, J., J. Lee, J. Kim, and D. Taylor. 2022. “Measurement Error, Fixed Effects, and False Positives in Accounting Research.” Working Paper. St. Louis: Washington University.Search in Google Scholar

Johannesson, E., J. Ohlson, and W. Zhai. 2023 In press. “The Explanatory Power of Explanatory Variables.” Review of Accounting Studies 28, https://doi.org/10.1007/s11142-023-09781-w.Search in Google Scholar

Libby, R., R. Bloomfield, and M. Nelson. 2002. “Experimental Research in Financial Accounting.” Accounting, Organizations and Society 27: 775–810. https://doi.org/10.1016/s0361-3682(01)00011-3.Search in Google Scholar

Loughran, T., and J. Ritter. 2000. “Uniformly Least Powerful Tests of Market Efficiency.” Journal of Financial Economics 55: 361–89. https://doi.org/10.1016/s0304-405x(99)00054-9.Search in Google Scholar

Ohlson, J. 2025. “Empirical Accounting Seminars: Elephants in the Room.” Accounting, Economics, and Law: A Convivium 15: 1–8. https://doi.org/10.1515/ael-2021-0067.Search in Google Scholar

Petersen, M. 2009. “Estimating Standard Errors in Finance Panel Data Sets: Comparing Approaches.” Review of Financial Studies 22: 435–80. https://doi.org/10.1093/rfs/hhn053.Search in Google Scholar

Sloan, R. 1996. “Do Stock Prices Fully Reflect Information in Accruals and Cash Flows about Future Earnings?” The Accounting Review 71: 289–315.Search in Google Scholar

Teoh, S. 2018. “The Promise and Challenges of New Datasets for Accounting Research.” Accounting, Organizations and Society 68: 109–17. https://doi.org/10.1016/j.aos.2018.03.008.Search in Google Scholar

Treiman, D. 2009. Quantitative Data Analysis: Doing Social Research to Test Ideas. San Francisco: Jossey-Bass.Search in Google Scholar

Wasserstein, R., and N. Lazar. 2016. “The ASA Statement on P-Values: Context, Process, and Purpose.” The American Statistician 70 (2): 129–33.10.1080/00031305.2016.1154108Search in Google Scholar

Received: 2022-12-07

Accepted: 2023-12-12

Published Online: 2023-12-25

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/ael-2022-0104

Keywords for this article

Fama-Macbeth regression; fixed effects; sample selection bias; Type I and Type II errors; p-hacking; statistical design choices