Empirical Accounting Seminars: Elephants in the Room

James A. Ohlson

doi:10.1515/ael-2021-0067

Article Publicly Available

Empirical Accounting Seminars: Elephants in the Room

James A. Ohlson

Published/Copyright: May 11, 2023

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Accounting, Economics, and Law: A Convivium Volume 15 Issue 1

Abstract

Attendees of accounting empirical research seminars all too often come to view the conclusions presented in the papers as non-persuasive. This disappointing situation indicates that researchers employ data analysis methodologies which inherently support conclusions they are looking for. Such issues are rarely discussed because many participants have relied on the same methodologies – thus they have firsthand knowledge about the inherent deficiencies. The mantra becomes: “We are all aware of uncomfortable aspects of the methodologies used in our research, so why dwell on it?” Because these potential questions tend to be outside normal and acceptable bounds, I term them “elephants in the room”. Five such cases are delineated to illustrate incontrovertible problems therein. To sum it up, the elephants highlight that the purported substantive contents of most published papers will be taken with a grain of salt for the foreseeable future.

Keywords: research methodology; research politic; collective bias

The “Elephants in the Room” Metaphor
Five Elephants
1. Referring to the Absence of a Fama-MacBeth Analysis
2. Asking whether a Key Right-Hand-Side (RHS) Variable Contributes to Explaining the Dependent Variable
3. It Takes More than Stars to Settle the Matter
4. Referring to the Possibility of Using a Holdout Sample
5. Issues Related to “Screen-Picking” and “Data-Snooping”
A Remark: what About Non-offensive, Hardnosed Question?
What Can the Struggling Researcher Do?
Final Remark
References

Empirical Research in Accounting and Social Sciences: Elephants in the Room

Empirical Accounting Seminars: Elephants in the Room, by James A. Ohlson, https://doi.org/10.1515/ael-2021-0067.
Limits of Empirical Studies in Accounting and Social Sciences: A Constructive Critique from Accounting, Economics and the Law, by Yuri Biondi, https://doi.org/10.1515/ael-2021-0089.
Accounting Research’s “Flat Earth” Problem, by William M. Cready, https://doi.org/10.1515/ael-2021-0045.
Accounting Research as Bayesian Inference to the Best Explanation, by Sanjay Kallapur, https://doi.org/10.1515/ael-2021-0083.
The Elephant in the Room: p-hacking and Accounting Research, by Ian D. Gow, https://doi.org/10.1515/ael-2022-0111.
De-emphasizing Statistical Significance, by Todd Mitton, https://doi.org/10.1515/ael-2022-0100.
Statistical versus Economic Significance in Accounting: A Reality Check, by Jeremy Bertomeu, https://doi.org/10.1515/ael-2023-0002.
Another Way Forward: Comments on Ohlson’s Critique of Empirical Accounting Research, by Matthias Breuer, https://doi.org/10.1515/ael-2022-0093.
Setting Statistical Hurdles for Publishing in Accounting, by Siew Hong Teoh and Yinglei Zhang, https://doi.org/10.1515/ael-2022-0104.

1 The “Elephants in the Room” Metaphor

Astute individuals know that to avoid professional mishaps, it helps to be aware of taboo topics. Members of a community generally recognize that raising them, familiar to everyone, can cast people of importance in a poor light. Such outcomes are widely perceived as undesirable – resulting in such questions unlikely to be raised. Worse yet, the community itself can look bad when such topics are discussed. In colloquial parlance, these bottled-up issues, more or less well-known, are commonly referred to as “elephants in the room”.

This memorandum gives examples of elephants in accounting research to demonstrate that overt questions raised about a paper’s methodology can remind seminar participants that most research achieves little, if anything. Upon close scrutiny the methodological foundations simply look shaky (or worse), leading to uncomfortable reactions (“All of us know research publication is a game, and we do not need to be reminded, thank you very much.”). Drawing attention to some particular issues might well also trigger flashbacks to researchers’ more personal and painful experiences, beyond routine practices like disregarding, and not reporting on, results that refute the promoted core findings.

Many knowledgeable and mature members of the research community agree that the overall validity of empirical accounting research is far from beyond reproach. However, given their prestige, academics do not like to think their work is all that unserious, reducing the need to be self-critical and confront potentially uncomfortable questions. But as time passes it becomes increasingly clear that the problems at hand ought not to be pushed under the rug.

Highlighting some particularly egregious cases of obviously deficient methods may be of practical interest (I hope), as well as broadly educating. This memo presents five non-esoteric “elephants” below. The origin of each is discussed, then some hypothetical straightforward, seminar-type questions are given. These are intended to look remarkable only because most seminar participants will know that they are hard to answer satisfactorily.

2 Five Elephants

2.1 Referring to the Absence of a Fama-MacBeth Analysis

To make a seminar aware that the paper did not report on results applying the Fama – MacBeth (FM) method (Fama and MacBeth 1973) is a no-no. Researchers learned decades ago that this method tends to disappoint. In practice, clear-cut null-acceptances are far too likely. So why bring up FM when we already suspect that most stories evaluated tend to be farfetched?

Replacing FM, the contemporary “preferred” method of analysis pools all data and uses so-called fixed effects (FEs) for each year of data. This method helps to achieve the goal of null-rejection. However, it comes at a cost. FM, if it rejects the null, tends to be much more persuasive than the FE approach: if the former rejects the null, so will the FE—while the converse is not true. Thus, the absence of FM turns out to be prima facie evidence that results will be non-persuasive.

Examples of offensive questions.

“I notice that you did not report on results using FM. Why is that? After all, it could easily have been done.”

“If you estimate annual regressions, what percentage of the 28 years of data would result in the correct sign on the main variable of interest? If you did not check, what do you think would be the outcome? Feel free to guess.”

Many researchers actually do consider FM-type annual regressions. However, having not “worked out”, the researchers thus decide to forget about them as best as they can. It may seem that presenters nonetheless should consider “what do I say in case I am asked?” There are few worries insofar FM is part of the elephant herd. But just in case, one can always use the excuse that FM may lead to false rejections of the null when the error terms are correlated across the years (Seminar participants are likely to smile. They know that that is not the real reason for not using FM.)^[1]

2.2 Asking whether a Key Right-Hand-Side (RHS) Variable Contributes to Explaining the Dependent Variable

Requesting a speaker to address this matter is offensive simply because most people are fully aware that the honest answer is “at best, the variable’s contribution is marginal”. An answer along these lines would not make the presenter feel or look good since it would contradict the paper’s broader message. In turn, it may expose the research ethics of on what basis the author(s) claims that an effect was material. People do not want to deal with this painful subject. So why bring it up, even indirectly?

Examples of offensive questions: “By how much would your R-square decrease after having deleted variable X on the RHS, in Table 2? Though the table by itself does not provide an answer, I presume you checked?”

“Per your table of descriptive statistics, your main variable of interest, X, and the dependent variable correlate to the tune of 5.3 % – as measured by the Pearson coefficient (r) for instance-, which I do not believe anyone would claim supports X’s relevance. Why should I believe that X can be of any greater relevance in your main regression because you added 15 controlling variables, plus fixed effects?”

Nowadays, a researcher responding to a question about a variable’s material relevance may state something like “I am not claiming that X explains Y; I am satisfied with the weaker hypothesis that Y only relates to X” (To be sure, some listeners may now nod quietly thinking “how ingenious!”).

2.3 It Takes More than Stars to Settle the Matter

Asking a presenter to deal with the statistical issue related to “large N” and a t-statistic that seems relatively small is so surprising that the respondent may not even internalize the question initially. But the substantive issue is far from complex. On the one hand, a few statistical significance stars, by convention, should suffice to declare victory. It is what social scientists with an empirical bent have been doing for decades. On the other hand, everybody knows that something seems seriously remiss when N runs into the hundreds of thousands and the t-stat approximates, say, 4.^[2] Our intuition, correctly, tells us that under such circumstances classical statistics misleads. One can no longer pretend that N is of no relevance.

That said, why argue about long-established conventions in regular seminars which do not focus on methodological issues? We have all learned to live with “stars are stars; to raise the role of N should be secondary or even unnecessary”. If the question is indeed raised, a flustered presenter may just retort: “no prior research in this area has addressed this kind of question, so why put me on the spot?”

Examples of offensive questions: “To support that X relates to Y, you refer to a regression showing two stars. However, you have 65,000 plus observations. Does a case of two stars not fall far short from minimum requirements to suggest economic significance? Given the sample size, don’t you think you would need at least a t-stat of 5 to back up your claim?”

“You have 55,000 plus observations, and thus nobody should be surprised that quite a few of your controlling variables are statistically significant. In fact, I am struck that your main variable of interest has such a low t-statistic, relatively speaking. Moreover, and some of these more powerful variables end up with what seems to be the wrong signs. Don’t you think that using this regression as support for your conclusions is less than satisfactory?”

Recently researchers have become well aware that one does not have to disclose the reasoning behind sign-hypotheses related to the independent variables (prior to running a regression). It potentially causes problems without any real benefits. If needed, many researchers in finance and economics have paved the way by simply HARKING. Nobody is likely to complain – “we all do it to some degree”.

2.4 Referring to the Possibility of Using a Holdout Sample

Setting aside the case when researchers hypothesize about making money in financial markets (via clever and innovative trading rules), conventions do not require researchers to evaluate a hypothesis by using a (fresh) holdout sample, which implies replicating the empirical testing on alternative samples or sub-samples through time or space. The importance of a holdout sample comes into focus if a researcher worries about stating conclusions that are unlikely to be robust. However, worries about publishing dubious conclusions are quite rare, and thus there are no apparent compelling imperatives to use holdout samples. Doing so will make the paper longer and look more pretentious. Worse still, the findings produced by a holdout sample may introduce ambiguities to the correctness of the overall conclusions. To worry or complain about current practice shows a lack of appreciation of what our business is all about. General research practice should be accepted per se – without humming and hawing.

Examples of offensive questions: “I notice that the data you analyze end in 2016. Why did you not check whether the results would be the same when using 3–4 more years of recent data? It can serve as an exceptionally “clean” and relevant holdout sample.”

“Given that you have a relatively large N of 90,000, could you not have split the sample into, say, three mutually exclusive subsets (of 30,000 observations) and then check whether the three cases pretty much tell the same story?”

2.5 Issues Related to “Screen-Picking” and “Data-Snooping”

To raise these matters goes to the heart of what is unacceptable. If someone in the seminar raises such a question, the presenter might well perceive the question as a personal assault. Many seminar participants will be sympathetic. Most people have had their own frustrating experiences massaging the data – staring at regressions until the procedures yield acceptable findings. Nowadays people are fully aware that software can generate millions of regressions, the one most pleasing can be picked as the “main” regression. Nonetheless, researchers do not like to discuss this kind of experiences. The topic is too sordid, testing ethical boundaries, accordingly everyone learns to live with it as a more or less painful private matter.

Offending questions, examples: “Your regression relies on some very strange RHS variables that seem to be important, yet they do not show up as part of the descriptive statistics. Can you elaborate on this matter? And why do you have such an abundance of interactive effects as independent variables?”

“Your regression findings show that quite a few of the RHS variables have the wrong signs, yet you do not confront this matter. For example, leverage has a negative sign though the dependent variable relates to cost of equity. Did you try various regressions to potentially produce the correct signs but failed? Or is the matter irrelevant because you only care about the sign and t-stat of the main variable on the RHS?”

3 A Remark: what About Non-offensive, Hardnosed Question?

Is a more nuanced tone of aggressive questioning practical at all? The answer is affirmative. One needs to keep in mind: (i) start out being positive and recognize some apparent merits of the presenter’s work, then (ii) proceed cautiously to raise the question without hinting the focus is on potentially serious weaknesses prowling in the background. In other words, only upon further reflection would the presenter, as well as most seminar participants, recognize that the question turns on methodological deficiencies that might well indirectly suggest dubious conclusions.

Example: “Your implementation of the standard fixed effects model is impressive, and you demonstrate that a serious analysis is required to ensure comprehensive controls that go beyond prior literature. I think most of us fully appreciate these contributions of your research. That said, you could perhaps do more to convince readers of the compelling results. My suggestion is that you implement the following two-stage procedure. First, run your pooled FE regression without the main variable on the RHS. Second, for each year correlate the regression residual with the main variable. You then evaluate the implied hypothesis that the great majority of correlation across the years is positive. It ought to be quite informative, do you not agree?”

4 What Can the Struggling Researcher Do?

In the author’s view, the state of affairs in empirical accounting research is not a happy one. Most people agree that the main culprit pertains to the incentive system with its pitiless publication requirements. That would potentially be quite acceptable if the papers published have improved due to the incentives. However, many people argue that it has made things worse. Flooding the A-journals with papers containing obvious and well-known flaws of course impairs the morale of many aspiring scholars. It also resulted in many researchers harboring the opinion that deviating materially from accepted practice poses its own publication risks – reducing the probability of getting a paper accepted.

Can the individual researcher stay away from dubious ethics? There are few remedies, insofar that “in a corrupt system it is too much to ask for behavior beyond reproach – we all have to make a living”. I have only two minor observations to make here.

First, at least in principle, researchers can pick a topic that is sufficiently interesting regardless of its actual outcome. It eliminates the pressure to back an implausible story with “compelling evidence”. Common sense suggests that “extra-ordinary hypotheses require extra-ordinary evidence”. Since this is typically quite impossible, claims to the contrary are often not taken seriously. The papers may be published in an A-journal, but most people are perfectly fine just shrugging their shoulders. More generic general interest topics (like risk connects with leverage), however, may most likely yield ambiguous conclusions. And that, of course, poses dicey issues when one tries to write a publishable paper.

Second, a researcher can try to tune the exposition in the spirit of “litigation support” documents. That is, the researcher may recognize in the paper that evidence can be weaved together to support the story evaluated. The paper may note that though the case can be made that the evidence supports the hypotheses, there may be alternative methods of data analysis that refute them. (Of course, in legal proceedings everybody knows that the documents can be extremely biased – no need to bring it up in such a context.) In other words, the reporting on findings can be explicitly humble and implicitly suggest: “Please, do not take my conclusions too literally. I am merely trying to make my case to get a publication”. This kind of language (expressed in more eloquent terms) would thus avoid grandiose and embarrassing claims that “all robustness tests work out” and the “economic significance is material”. (Some authors may even consider recognizing – in a footnote, to avoid undue attention – that “one of the robustness tests fell short for reasons we have not fully understood.”)

5 Final Remark

Readers who agree with the suggestion that there are too many apparent elephants in the seminar room may well ask “What is to be done?” Perhaps the culture of non-critical questioning could be modified, without debasing participants’ attitudes. However, because individuals tend to respond only to incentives, such a change in culture is unlikely to occur without recognizing a broader problem. The issues go beyond behavior and culture: as a community we need to acknowledge that research has been inherently deficient. Everyone knows the name of the game is publications. Yet the demand for the publications we produce is close to non-existent, unsurprising in light of their all too often dubious conclusions. Universities support and enforce this self-serving incentive structure. (Noted by Sunder (2008), among many others).

If the narrow professional rewards for publications (promotions and money) were materially reduced, then one can argue that individuals’ reputed honesty/integrity would be much more appreciated by colleagues and deans. That said, whether or not the quality of our research would get better is an open question. At least resources could be saved while the benefits from research would not decline. I, for one, do not expect that our potential constituencies (real world professionals, regulators, policymakers, and students) may start to complain and message something to the effect: “You guys better get back to the good old incentive system where faculty members who cannot publish in top journals are at best treated as second class members of the community”.

Corresponding author: James A. Ohlson, City University of Hong Kong, Hong Kong, China, E-mail: jo20@stern.nyu.edu

References

Fama Eugene, F., & MacBeth James, D. (1973). Risk, return, and equilibrium: Empirical tests. Journal of Political Economy, 81(3), 607–636. https://doi.org/10.1086/260061 Search in Google Scholar

Mitchell, P. (2009). Estimating standard errors in finance panel data sets: Comparing approaches. Review of Financial Studies, 22(1), 435–480. https://doi.org/10.1093/rfs/hhn053 Search in Google Scholar

Sunder, S. (2008). Building research culture. China Journal of Accounting Research, 1(1), 2–5.10.1016/S1755-3091(13)60006-4Search in Google Scholar

Published Online: 2023-05-11

Articles in the same Issue

https://doi.org/10.1515/ael-2021-0067

Keywords for this article

research methodology; research politic; collective bias

Empirical Accounting Seminars: Elephants in the Room

Article

Abstract

Table of Contents

Empirical Research in Accounting and Social Sciences: Elephants in the Room

1 The “Elephants in the Room” Metaphor

2 Five Elephants

2.1 Referring to the Absence of a Fama-MacBeth Analysis

2.2 Asking whether a Key Right-Hand-Side (RHS) Variable Contributes to Explaining the Dependent Variable

2.3 It Takes More than Stars to Settle the Matter

2.4 Referring to the Possibility of Using a Holdout Sample

2.5 Issues Related to “Screen-Picking” and “Data-Snooping”

3 A Remark: what About Non-offensive, Hardnosed Question?

4 What Can the Struggling Researcher Do?

5 Final Remark

References

Articles in the same Issue

Articles in the same Issue

Articles in the same Issue