Robustness Reexamined: Unpacking the Limits of Model-Based Robustness Analysis as Explanatory Reasoning

Margherita Harris

doi:10.1515/krt-2025-0002

Article Open Access

Robustness Reexamined: Unpacking the Limits of Model-Based Robustness Analysis as Explanatory Reasoning

Margherita Harris

Published/Copyright: July 1, 2025

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal KRITERION – Journal of Philosophy

Abstract

In science, obtaining the same result through different means is often taken to support a hypothesis. Bayesian approaches have long aimed to clarify the logic behind this practice, commonly known as robustness analysis (RA), typically by appealing to probabilistic independence to capture evidential diversity. Schupbach’s (2018. “Robustness Analysis as Explanatory Reasoning.” The British Journal for the Philosophy of Science 69 (1): 275–300.) explanatory account offers a novel alternative, particularly suited to empirically driven cases. However, this paper questions whether the account can be successfully extended to model-based RAs. I argue that its application in such contexts is more problematic than often acknowledged and that it fails to accommodate a broader class of model-based RAs than previously assumed.

Keywords: robustness analysis; Bayesian confirmation; climate models; explanatory reasoning

1 Introduction

In science, obtaining the same result through different means is often seen as providing further support for a hypothesis. The Bayesian has had a lot say about the logic underpinning this method of confirmation (a method which I will henceforth refer to as robustness analysis (RA)). However, till recently, Bayesian accounts of RA have relied on probabilistic independence to explicate the notion of evidence diversity. As Schupbach (2018) persuasively argues, these accounts are in many cases, no matter how subtly formulated, woefully inadequate.^[1] Therefore, it seems evident that, in order to capture those cases, the Bayesian must depart from independence-based accounts of RA diversity. Schupbach’s (2018) recent explanatory account of RA has been rightly welcomed as a promising step in the right direction. Indeed, by having “as its central notions explanation and elimination” (ibid., 286), this account seems to fit very nicely with some empirically driven cases of RA in science, thereby revealing why these cases are able to lend confirmation to a hypothesis.

Schupbach further argues that his explanatory account of RA “applies to model-based RA just as well as it does to empirically driven RAs” (ibid., 297, my emphasis). And he is not alone. Winsberg (2018), in his book “Philosophy and Climate Science,” has also emphatically argued that this explanatory account of RA can finally shed light on the epistemic import of model robustness in climate science. According to O’Loughlin (2021, 36), “Winsberg (2018) convincingly argues that [Schupbach’s account] can be applied to climate models.” In reviews of Winsberg’s book, Lusk (2019) writes that “Winsberg’s argument is a convincing reconceptualization of robustness analysis in climate science” and Knüsel (2020, 116) that “Winsberg […] makes a novel, convincing suggestion for when multiple sources of evidence in favor of a hypothesis are meaningful in climate science.” Overall, my impression is that philosophers so far have very much welcomed Schupbach’s suggestion that his explanatory account of RA can be successfully applied to model-based RA, thereby shedding light on its epistemic import. However, neither Schupbach nor Winsberg have attempted to rigorously show if and when this account can really shed light on the epistemic import of model-based RA. Hence, given the importance of the topic, and my difference of opinion as far as the applicability of this account to most cases of model-based RA is concerned, the aim of this paper is to critically assess if and when this is the case.

The structure of this paper is as follows. In Section 2, I will introduce Schupbach’s explanatory account of robustness analysis (RA). In Section 3, I will present an example of an empirically driven case of RA to demonstrate why Schupbach’s account can be successfully applied in such contexts. Section 4 shifts the focus to model-based RAs. First, I will argue that, unlike empirically driven RAs, in model-based RAs, the hypothesis we seek to confirm cannot straightforwardly serve as a possible explanation for why the same result is detected. Therefore, any attempt to apply Schupbach’s explanatory account to model-based RAs must acknowledge this fundamental difference and determine when it can still be applicable despite it (Section 4.1). I will then examine a specific example from climate science, used by Winsberg to argue for the applicability of Schupbach’s account to model-based RAs. I will contend that Winsberg’s argument is unsound because it fails to recognize this crucial distinction. More broadly, I will argue that Schupbach’s account is inapplicable to all cases of model-based RAs where the hypothesis to be confirmed is that a model’s result is instantiated in the target system and the models selected for RA involve incompatible assumptions about that system – of which Winsberg’s example is a particular case (Section 4.2). I will then compare my criticisms regarding the applicability of Schupbach’s account to model-based RAs with those offered by O’Loughlin (2021), arguing that while O’Loughlin’s criticisms are pertinent, they do not go far enough (Section 4.3). Section 5 will conclude.

2 The Confirmatory Value of ERA Diverse Evidence

According to Schupbach, when there is more than one means of detecting a result R, the notion of diversity that is relevant to RA is the following:

ERA Diversity:^[2] Means of detecting R are ERA diverse with respect to potential explanation (target hypothesis) H and its competitors to the extent that their detections (R ₁, R ₂, …, R _n) can be put into a sequence for which any member is explanatorily discriminating between H and some competing explanation(s) not yet ruled out by the prior members of that sequence. (Schupbach 2018, 288)

Why is ERA diversity epistemically important from a Bayesian perspective? By relying on a probabilistic conception of explanatory power ɛ(H, E), Schupbach attempts to provide an answer to this question.

According to Schupbach, the explanatory power an explanation H has over its explanandum E is given by

(1) ε ( E , H ) = Pr ( H | E ) − Pr ( H | ¬ E ) Pr ( H | E ) + Pr ( H | ¬ E ) ,

where ɛ(E, H) can take values ranging from [−1, 1] and the greater the value of ɛ(E, H), the more strongly H explains E.^[3] It can be shown that the value of ɛ(E, H) is positively correlated with the degree of statistical relevance between E and H, that is, the strength of the inequality Pr(E) < Pr(E|H) (see Schupbach and Sprenger 2011, 110). Hence, according to this measure of explanatory power, the more H decreases the degree to which E is surprising, the more strongly H explains E.^[4]

Similarly, the explanatory power that an explanation H has over its explanandum E, in light of some proposition p, is given by

Equipped with this probabilistic conception of explanatory power, Schupbach (2018) then shows that the following five formal conditions for a successful increment of ERA diversity guarantee an incremental confirmation of the target hypothesis H, (i.e. Pr(H|E&R _n) > Pr(H|E)):

Past detections: We are given E = R ₁ &R ₂ &…&R _n−1 (informally, a result R has been detected using n − 1 different means);

Success: ɛ(E, H), ɛ(E, H′) > 0 (informally, the target hypothesis H explains this coincidence, but so does another rival hypothesis H′);

Competition:^[5] (i) Pr(H&H′) = 0, or (ii) ɛ(E, H|H′) ≤ 0) (informally, H and H′ epistemically compete with one another, with respect to E);

Discrimination: ɛ(R _n, H|E) ≈ 1, ε ¬ R n , H ′ | E ≈ 1 (informally, there is another nth means of potentially detecting R such that, in light of E, H would strongly explain the detecting of R by this means (R _n) and H′ would strongly explain not detecting R by this means (¬R _n));

New detection: we learn R _n (informally, the nth means also detects result R).

In light of these formal conditions, the epistemic significance of Schupbach’s account of ERA diversity from a Bayesian perspective becomes clear: evidence that is ERA diverse with respect to a target hypothesis H and its competitors should rationally increase belief in H.^[6] Of course, this fact alone doesn’t determine how much confidence in H should increase; the increase could range from negligible to substantial. However, Schupbach also discusses the factors influencing the extent of this increase, and for further details, I refer the reader to Schupbach (2018, 293–296).

But what should one make of the above ERA diversity conditions? Do they fit nicely with actual cases of RA in science?

3 ERA Diversity and Empirically Driven RAs

To motivate the intuition behind these conditions, Schupbach considers the case of the at the time curious motion of a sample of pollen granules suspended in water, first observed in 1827 by the botanist Robert Brown. In the early 20th century, Einstein famously offered a possible explanation for this observation: this motion was due to random molecular collisions in the water. Later, Jean Perrin performed a variety of experiments to determine if Einstein’s molecular explanation for this motion (nowadays known as Brownian motion) was correct. As Schupbach points out, the detection of this motion using a multitude of other different experiments (using different materials, different media, different means of suspending the particle, etc.) was considered by Perrin (1916, 83–86) as evidence in support of Einstein’s explanation.^[7] But why should the robustness of Brownian motion have counted as evidence for Einstein’s molecular explanation?

According to Schupbach, this is because the various means of detecting the Brownian motion were ERA diverse with respect to Einstein’s explanation and its competitors. Indeed, when Brown first observed the curious motion of the pollen granules suspended in water (R ₁), there were more than a few competing explanations for this observed phenomenon: the motion might have been due to currents or evaporation of the water, or to a sexual drive inherent in pollen, etc. But there were many later detections of this motion that were able to explanatorily discriminate between Einstein’s molecular explanation H and one of the many competing explanations not yet ruled out. Take, for instance, the competing explanation H′ that the motion was due to a sexual drive inherent in pollen. And consider a new detection of this motion using an inorganic material (R ₂). Does this new detection satisfy all conditions for a successful increment of ERA diversity? Let us quickly go through each of them.

In this example, the Brownian motion has already been detected using a sample of pollen granules suspended in water (i.e. E = R ₁) and hence the past detection condition is satisfied. Einstein’s molecular explanation H and the sexual drive inherent in pollen explanation H′ provide different causal explanations for the observed motion R ₁, so it seems reasonable to assume that they both increase the probability that one should observe this motion and hence ɛ(R ₁, H) > 0 and ε(R1, H′) > 0 (see footnote 4) in accordance with the success condition. The competition condition is also plausible. For although H and H′ are not mutually exclusive hypotheses, H′ seems sufficient for doing the explanatory work of H, i.e. ɛ(E, H|H′) ≤ 0. Furthermore, H cites causes of the observed motion that would also cause the movement of inorganic material, whereas H′ does not. Hence, it seems plausible to assume that whereas H makes it extremely likely that we would observe this motion using an inorganic material (i.e. Pr(R ₂|H&R ₁) ≈ 1), H′ makes it extremely likely that we would not observe it (i.e. Pr ¬ R 2 | H ′ & R 1 ≈ 1 ). And this implies that ɛ(R ₂, H|R ₁) ≈ 1 and ε ¬ R 2 , H ′ | R 1 ≈ 1 , in accordance with the discrimination condition. Finally, the Brownian motion was detected using inorganic material (i.e. we learn R ₂) and hence the new detection condition is also satisfied.

All conditions of ERA diversity seem plausible in this example and a similar story could, arguably, be told for the many other means that were used by Perrin to detect Brownian motion. So I am happy to conclude that Schupbach’s account of ERA diversity can reasonably apply to this case of RA. I have not attempted to convince the reader that Schupbach’s account is an adequate account of RA diversity in general, not least because I don’t think it is. However, for the purposes of this paper, what matters is that this account does seem to fit well with some cases of empirically driven RA.^[8] The key to applying Schupbach’s account in this instance was the successful identification of both a target explanation and a rival explanation that could reasonably satisfy all conditions of ERA diversity. In the next section, I will explore the challenges involved in identifying such explanations in the context of model-based robustness analysis.

4 Model-Based RA

Model-based Robustness Analysis involves examining the stability of a model’s result under various perturbations of the model’s features.^[9] This practice is undertaken for multiple reasons. As Houkes et al. (2024) note, RA can be used to generate causal hypotheses by exploring the implications of different assumptions and parameter settings, thereby identifying a possible causal structure of a phenomenon. Additionally, it can deepen our understanding of causal hypotheses by studying the effects of adding or removing factors, as well as by identifying potential confounders and mediators. RA also plays a critical role in calibrating alternative modeling techniques, ensuring they replicate desired outcomes, thereby validating or refining these approaches. However, the focus of this paper is on RA’s confirmatory role: whether positive results from RA should increase our confidence in the truth of a hypothesis. In particular, I will assess whether Schupbach’s account of ERA diversity can help answer this question.

I will concentrate on a broad class of model-based RAs, specifically those cases where the hypothesis to be confirmed is that a model’s result is instantiated in the target system, and where the models used in RA involve incompatible assumptions about that system.^[10] This role for RA is common across many scientific fields. For instance, in epidemiology, multiple models (e.g., SIR models, agent-based models) may be employed to predict how a disease will spread during an outbreak, how many people will be infected, and what interventions might be effective. Despite focusing on the same target system, these models often make incompatible assumptions about key factors such as transmission dynamics, disease progression, and the impact of public health interventions. Similarly, in economics, multiple models might be used to assess the impact of policy changes, such as tax reforms or minimum wage increases, and these models often involve incompatible assumptions regarding market behaviour, consumer preferences, or labour market dynamics. In climate science, multiple models might be employed to project future climate change, yet they often incorporate different assumptions about physical processes, parameterizations, and the effects of feedback mechanisms.

There is a common intuition that the robustness of a result across multiple models should increase our confidence that the result holds in the real world. However, when and why this should be the case has been a significant source of contention in the philosophical literature.^[11] Schupbach contends that his account of ERA diversity can finally shed light on the epistemic significance of this practice, since it “applies to model-based RAs just as well as it does to empirically driven RAs” (ibid., 297). However, in this section, I will argue that when it comes to model-based RA, the situation is considerably more complex than he suggests. In particular, I will show that, unlike in the empirically driven RA case discussed earlier, it is not straightforward to formulate a target hypothesis and rival hypotheses that satisfy all the conditions of ERA diversity. Whether this can be done depends on several substantial assumptions which, as I will argue, cannot be met in a large class of model-based RAs.

4.1 The Difference Between Model-Based and Empirically Driven RAs

There is an important difference between empirically driven RAs and model-based ones, and it is essential to clarify this distinction before assessing the applicability of Schupbach’s account of ERA diversity to the latter. Recall that Schupbach’s account of ERA diversity concerns distinct means of detecting the same result R. Schupbach has shown that if those distinct means of detecting a result R are ERA diverse with respect to a target explanation H and its rival explanations for their detections R ₁, R ₂, …, R _n, then H is incrementally confirmed. In the empirically driven case of RA considered in Section 3:

R is Brownian motion in the actual world;
R _i are the distinct detections of R in the actual world;
H is a possible explanation for why we detect R in the actual world, which is also the hypothesis we are interested in confirming (i.e., Einstein’s molecular explanation).

In this scenario, R and its detections R ₁, R ₂, …, R _n all concern the actual world, and the hypothesis we want to confirm (i.e., Einstein’s molecular explanation) is a possible explanation for these detections.

In the case of model-based RA, however, things are less straightforward. Suppose, for instance, that an ensemble of distinct epidemiological models all indicate that an intervention will effectively eradicate a disease (this is our R). The hypothesis we are interested in confirming in this case is that result R is instantiated in the actual world, not just in model land. Hence, in this case:

R is the eradication of a disease by an intervention in model land;
R _i are the detections of R in model land;
H is a possible explanation for why we detect result R in model land. However, the hypothesis we ultimately want to confirm – that R is instantiated in the actual world – cannot, on its own, play this explanatory role, since it does not account for model-land detections.

Thus, in this case, R and its detections R ₁, R ₂, …, R _n all concern model land, whereas the hypothesis we ultimately wish to confirm concerns the actual world. This marks a crucial difference from empirically driven RAs, where the target hypothesis can also serve as a possible explanation for the observed result. By contrast, in model-based RAs, this explanatory link is missing. As a result, applying Schupbach’s account of ERA diversity in these contexts is considerably more complex. This does not imply that Schupbach’s account is not applicable to model-based RAs, but it does nonetheless show that any attempt to successfully apply it will have to acknowledge this difference and demonstrate when it can be applicable in spite of it. In this section, I will attempt to do just that.

Consider a toy example of model-based RA. Suppose I have already learned that an epidemiological model indicates that an intervention will effectively eradicate a disease (i.e., I have learned R ₁). Suppose further that I subsequently learn that another model, in which an idealization/assumption A ₁ of the original model has been replaced by another idealization/assumption A ₂, also shows that an intervention will effectively eradicate a disease (i.e., I learn R ₂). Does Schupbach’s account of ERA diversity show that learning R ₂ should incrementally confirm that the intervention will be effective in the actual world? For this to be the case, we must identify a target hypothesis H and a rival hypothesis H′ that satisfy the conditions of ERA diversity.

Let’s first consider a good candidate for H. Naturally, the hypothesis we ultimately want to confirm is that the models’ result R is instantiated in the actual world (i.e., that an intervention will effectively eradicate a disease in the target system). However, as mentioned earlier, this hypothesis concerns the actual world and therefore cannot, on its own, explain why we detect R in model land. Crucially, for H to be a good candidate for the target hypothesis, it must satisfy the following two conditions:

Condition 1: H must be a possible explanation for why we detect the common result R in model land.

Condition 2: If H is confirmed, so is the hypothesis that R is instantiated in the actual world.

Condition 1 is required for the target hypothesis to meet Schupbach’s conditions of ERA diversity, in particular the success and discrimination conditions. Condition 2 is necessary because the aim is to confirm a claim about the actual world, which is the outcome Schupbach’s account is intended to support.

A possible (and arguably the only possible) candidate for H that satisfies both conditions is the following conjunction:^[12]

H: The models’ result R is instantiated in the target system & both models are adequate (not by mere luck) for discerning whether R is instantiated in the target system.

The first conjunct is a claim about the world – that R is instantiated in the target system. The second is a claim about the models’ ability to indicate R if and only if R actually holds in the target system. This hypothesis satisfies both conditions.

Condition 1 is satisfied because H provides a possible explanation for the common result R in model land. In our example, if H is true – that is, if the intervention is effective and both models are adequate for discerning whether the intervention is effective – then both models would necessarily indicate that the intervention is effective, thus meeting the success and discrimination conditions, and thereby satisfying Condition 1.

This explanatory structure may seem demanding, but it is not unusual. In many familiar epistemic contexts – such as measurement and testimony – explaining a given result often involves not only a claim about the world, but also a claim about the reliability of the method by which that result is obtained. For instance, when I read a scale and see that it displays the value x, what explains the observed result is not just that my weight is x, but that the scale is functioning properly and my weight is x. Likewise, when I read a news report, a possible explanation for the presence of that report is not merely that the event occurred, but that it occurred and the reporting process functioned reliably. The event alone does not explain why it appears in the news. The explanatory structure in model-based robustness analysis is analogous: to explain why the models converge on result R, we must posit not only that R is instantiated in the target system, but also that the models are adequate (not by mere luck) for detecting whether R is instantiated. Both components are needed to satisfy the explanatory demands of ERA diversity.

Condition 2 is also satisfied because, if H is confirmed, its conjuncts are confirmed as well, including the claim that R is instantiated in the target system, which is the hypothesis we ultimately seek to confirm. However, in order to fully justify this, two key terms in H require further clarification: “adequacy” and “not by mere luck.”

When considering adequacy, it is important to recognize that models rarely, if ever, fully represent the target system. Scientific models simplify reality through idealizations and abstractions that depart from the full complexity of the target system.^[13] These simplifications are often necessary to make models usable and informative. However, this also means that being accurate in every respect is not typically the goal. Instead, what we hope is that a model, despite its various inaccuracies, is nonetheless adequate for the particular task at hand (Parker 2009, 2020).^[14] In this context, adequacy means that the model is sufficiently reliable for drawing conclusions about some aspect of the target system. For example, when assessing whether an intervention is effective, the model does not need to represent every feature of the target system accurately. What matters is that it provides a reliable indication of the relevant outcome, in this case the effectiveness of the intervention.

However, H is a good candidate for the target hypothesis only if the models are adequate not by mere luck, but because their structure meaningfully tracks the underlying mechanisms of the target system. Without such a connection, H would fail to satisfy Condition 2. To see why, consider Parker’s definition of adequacy-for-purpose: a model is adequate-for-P in a particular instance of use if and only if using it in that instance achieves purpose P (Parker 2020, 461). While this definition ties adequacy directly to success in achieving the intended purpose, it does not rule out the possibility that such success might be accidental. A model could yield accurate results, and thus be judged adequate under this definition, even if its assumptions bear little resemblance to the relevant structure of the target system. In other words, adequacy-for-purpose in this sense does not guarantee that the model’s success is non-coincidental.

For example, suppose a pharmacokinetic model assumes that a drug is metabolized in the liver, while in reality it is processed by the kidneys. If the model nonetheless predicts the correct plasma concentration, and one has no reason to believe the site of metabolism is irrelevant, its success may reasonably be seen as a case of epistemic luck. Or consider SIR-type epidemiological models that assume homogeneous population mixing, yet sometimes yield accurate results in highly structured populations. When the model’s success cannot be traced to the relevance or irrelevance of particular assumptions, it should likewise be regarded as coincidental. For a more vivid historical example consider early chemical models based on phlogiston theory, which were sometimes predictively successful despite resting on fundamentally false assumptions about combustion. These successes, while accurate, cannot plausibly be attributed to the theory’s explanatory content, and are best understood as coincidental.

In such cases, H amounts to an arbitrary conjunction: the claim that R is instantiated in the target system is simply paired with the claim that the models are adequate, without any explanatory connection between the two. As Goodman (1983, 69) observes, “if a hypothesis is judged to be an arbitrary conjunction, the establishment of one component does not confer credibility to the other components.” Under this interpretation of adequacy, H fails to meet Conditions 2. For H to be a viable candidate for a target explanation, one that satisfies both conditions, the agent must do more than merely believe that the models could succeed at the task. They must believe that the models can succeed not by mere luck, but because they reliably and meaningfully capture features of the target system that are relevant to the result. Only under this assumption can H satisfy Conditions 2.

Importantly, however, whether a model’s success is regarded as coincidental is not an entirely objective matter, but depends on the agent’s epistemic perspective. A model might involve false or idealized assumptions, yet an agent may still rationally judge that it could be adequate (not by mere luck) if they have good reason to believe those assumptions are irrelevant to the result – in the sense that replacing them with other assumptions would not affect whether the result holds. Of course, such judgments are constrained by the agent’s background knowledge, and may be subject to revision in light of new information. For Schupbach’s account to apply, what matters is that the agent can plausibly assign non-zero probability to the hypothesis that the models are adequate in this non-accidental sense.

What about a possible candidate for the rival hypothesis H′? This is not so clear. Recall that a rival hypothesis must explain why result R has been detected by the previous means and why it would not be detected by the next means. In our toy example, where we’ve already learned that an epidemiological model indicates that a particular intervention will be effective (i.e., we’ve learned R ₁), H′ must explain why this model indicates this and why another model, where an idealization/assumption A ₁ of the original model is replaced by A ₂, would not indicate that the treatment is effective. The only plausible rival explanation I can think of in this case is a logical hypothesis stating that the assumption A ₁ is necessary for deriving result R (i.e., if A ₁ were replaced by a different assumption, the new model would no longer entail R). However, the specifics of the rival hypothesis are not central to this paper, so I leave it to a defender of Schupbach’s account to determine the details.

If logical hypotheses count as explanatory, then we might have both a target and a rival explanation that, when plausible, meet Schupbach’s conditions for ERA diversity. In that case, R ₁ and R ₂ could count as ERA diverse.^[15] However, in the next section, I will argue that in many model-based RAs – where the aim is to confirm that a model result holds in a target system, and the models involve incompatible assumptions – either the target or the rival explanation will be implausible (that is, either Pr(H) = 0 or Pr(H′) = 0). So Schupbach’s account does not apply to this class of model-based RAs.

4.2 ERA Diversity and Climate Model Ensembles: The Incompatibility Dilemma

There is a great deal of uncertainty about how to adequately represent the climate system. Due to this uncertainty, it is often impossible to choose which model, out of the available ones, future climate change projections should rely. Hence, current projections of future climate change very often rely on more than a single model. As Parker (2013, 213) remarks, although it is not at all clear how one should interpret multi-model ensemble’s results, “the intuition persists that agreement among ensemble members about the extent of future climate change warrants increased confidence in the projected changes”.

Winsberg (2018) argues that Schupbach’s account of ERA diversity can help vindicate this intuition by clarifying the epistemic value of model-based RA in climate science. To expose the limitations of Schupbach’s account, I focus on a toy example that Winsberg offers in support of this claim. I do so because the epistemic value of RA in climate science remains a central and unresolved issue in both philosophical and scientific debates. If philosophers are to contribute meaningfully to this discussion, it is crucial to distinguish arguments that withstand scrutiny from those that do not. I argue that Winsberg’s example rests on problematic assumptions that undermine its credibility. Demonstrating why it fails not only clarifies the limits of this specific proposal, but also highlights the broader inapplicability of Schupbach’s framework to a wide class of model-based robustness analyses.

Winsberg writes:

Suppose that a climate simulation can be used to calculate that equilibrium climate sensitivity (ECS) is greater than 2 °C. One explanation of this is that ECS is actually greater than 2 °C. Thus this would count as a detection of the hypothesis (that ECS is greater than 2 °C) by a model. But another possible explanation might be that the calculated result is an artifact of the large grid size of the simulation. A natural move is to try to halve the grid size and check to see if the result is maintained. If it is half the grid size again. If the result remains stable, then the probability of that rival explanation goes way down. Thus a reasonable ensemble of different simulation models with descending grid size could count as [ERA] diverse. […] But even once we are convinced that the grid size is not responsible for the purported detection of the hypothesis, there remains the possibility that the detection is an artifact of the way that cloud formation is parameterized in the simulation. A rival cloud parameterization can be tried. Certainly those two methods of detection would count as [ERA] diverse. Again, context and judgment, but this time presumably of a more subtle and difficult character, would be required to decide at what point, if any, enough different cloud parameterization schemes are enough to rule out all such hypotheses. (Winsberg 2018, 192–93, my emphasis)

Here, Winsberg is offering two examples to show when simulation results might be ERA diverse: varying grid sizes and parameterizations. However, the first example (grid size variation) seems less relevant to the epistemic import of model-based RA in climate science. Scientists are very well aware that higher resolution reduces the influence of parameterizations, but increasing resolution is often impractical due to enormous computational demands. The real question is whether current global climate models can reliably inform us about the climate. Hence for the time being, the resolution of current global climate models is what it is and climate scientists have to live with it. Winsberg’s second example, which involves testing robustness across different parameterization schemes, is more pertinent. It speaks directly to the question of whether agreement among structurally varied models should increase confidence in their shared results. To assess this case properly, we must briefly examine one of the principal sources of uncertainty in climate modeling: the parameterization of physical processes.

Some physical processes, critical for accurate projections, cannot be directly resolved by current climate models due to their small scale relative to the model’s grid resolution.^[16] For example, cloud processes significantly affect Earth’s radiation budget but occur at scales too small for current models, typically around 50–100 km horizontally (Parker 2013). To account for these subgrid processes, models must represent them using larger-scale variables. This process, known as parameterization, involves choosing equations and parameters to describe these relationships. Thus, parameterization inherently involves two types of uncertainty: parameter uncertainty (the choice of parameter values) and structural uncertainty (the choice of equations).

With this background in mind, let us return to Winsberg’s example. He considers a simulation with a particular parameterization scheme for cloud formation (i.e., structural assumption S ₁) which results in ECS greater than 2 °C (i.e., we learn R ₁). He claims that if a second simulation with a different parameterization scheme (i.e., structural assumption S ₂) also results in ECS greater than 2 °C (i.e., we learn R ₂), these detections would count as ERA diverse. Here, Winsberg is implicitly assuming it is possible to find a target explanation H and a rival explanation H′ that meet Schupbach’s conditions for ERA diversity. The following candidate for H might initially seem reasonable:

H: ECS is greater than 2 °C & both climate simulations are adequate (not by mere luck) for the purpose of discerning whether ECS is greater than 2°C

However, I will argue that, due to the incompatibility of the assumptions S ₁ and S ₂, either the proposed hypothesis H fails to serve as a plausible target explanation or no suitable rival explanation H′ can be constructed. In either case, the core requirements of ERA diversity are not satisfied. Thus, despite initial appearances, these two methods of detection do not qualify as ERA diverse, contrary to Winsberg’s suggestion.

In a nutshell, my argument is as follows. In light of the incompatibility between S ₁ and S ₂, an agent considering a model-based robustness analysis can be in one of only three epistemic states. In none of these states can the agent identify both a plausible target and rival hypothesis that jointly satisfy Schupbach’s conditions for ERA diversity. As a result, Schupbach’s account cannot shed light on the epistemic import (if any) of RA in cases where models with incompatible assumptions converge on a result and the goal is to confirm that the result is instantiated in the target system.

The remainder of this section makes the argument more explicit. Since the third epistemic state amounts to uncertainty between the first two, I begin by examining states one and two. They are the following:

Epistemic state 1: An agent believes that at most one of the two simulations can be adequate (not by mere luck) for the purpose at hand. Hence, H is not a plausible candidate for the target hypothesis, since for such an agent, Pr(H) = 0.
Epistemic state 2: An agent believes that both simulations can be adequate (not by mere luck) and hence, for such an agent, Pr(H) > 0. However, for such an agent, there is no rival explanation H′ that satisfies the discrimination condition, and hence it is impossible to find an adequate candidate for a rival explanation.

Consider epistemic state 1. In this state, an agent may reasonably believe that since different parameterizations for a particular process (in this case, cloud formation) are competing ways to represent such a process (Parker 2006), at most one of these two simulations can be adequate (not by mere luck) for the purpose at hand. Hence, for an agent in this epistemic state, Pr(H) = 0, and H is not a plausible target hypothesis for them. The fact that an agent in epistemic state 1 is unable to find an adequate target hypothesis is anything but surprising. Indeed, one can easily conjure up an empirically driven case of RA in which it is impossible to find an adequate target hypothesis for analogous reasons. Here is one:

You are the investigator of a murder case and know that at most one person in Room A could have been a witness. Anyone who was not a witness would not know the identity of the murderer; at best, they could only guess or lie. You decide to question the first person in the room, and they tell you that suspect number 1 committed the murder. A possible explanation for this is that the person was the witness and that suspect number 1 is indeed the murderer. You then ask a second person in the room, and they also tell you that suspect number 1 committed the murder. However, since you know that at most one person could have been a witness, the hypothesis that “suspect number 1 is the murderer and both people were witnesses and are telling the truth” cannot be true. This hypothesis is therefore ruled out as a possible explanation.

Hence, Schupbach’s account of ERA diversity does not apply in this case because there is no plausible candidate for the target explanation H. However, notice that this case seems to present a clear instance where confirmation of the hypothesis that suspect number 1 is the murderer could occur. Suppose, for simplicity, that the investigator knows that exactly one person in the room must be a witness. If multiple people independently report that suspect number 1 committed the murder, the investigator now has strong reason to believe that suspect number 1 is indeed the murderer – even though only one person could be telling the truth for the right reason. It may seem like a coincidence that everyone points to suspect number 1, given that only one of them could have witnessed the crime. But as long as the investigator has no reason to suspect collusion or bias, this agreement is a fortunate coincidence – a “happy” one – because it increases the likelihood that the true witness, whoever they are, believes suspect number 1 is the murderer. And that, in turn, should raise the investigator’s confidence in the hypothesis. Moreover, this increase in confidence could begin even before the investigator has spoken to everyone. The more people independently agree that suspect number 1 is the murderer, the more likely it is that the witness – again, whoever they are – believes this to be true. So this is a clear case in which robustness seems to have confirmatory value. Yet Schupbach’s explanatory account cannot accommodate this situation. There is no plausible target hypothesis available that satisfies the conditions of ERA diversity, since any explanation appealing to all informants being truthful is ruled out by background knowledge. In this case, the result – the convergence on suspect number 1 – is simply a happy coincidence that boosts confidence without requiring an explanatory structure of the sort ERA demands.

Consider epistemic state 2. In this state, an agent might reasonably believe that, although different parameterizations for a process (such as cloud formation) represent competing ways to model that process, the simulations are still sufficiently similar in all relevant aspects. As a result, the agent regards the differences between these parameterizations as irrelevant to whether the simulations can be adequate for the purpose at hand. Hence, the agent concludes that it is possible for both simulations to be adequate (not merely by luck) because they are bound to be either both adequate or both inadequate (and hence, if they are both adequate, it is not a matter of luck). In other words, the agent does not see it as a matter of luck that both simulations might be adequate; they believe that if one is adequate, the other is bound to be as well. Thus, for such an agent, H (the hypothesis that both models are adequate not by mere luck) can be considered a plausible target hypothesis.

However, this belief comes with a significant implication: for the agent to view the differences between the simulations as irrelevant to their adequacy, they must also believe that these differences do not affect the result that the simulations produce. In other words, the agent must believe that both simulations, despite their differing parameterizations, are bound to yield the same result. This belief creates a problem for the application of Schupbach’s account: if the agent is convinced that the differences in parameterization do not lead to different outcomes, then for such an agent there can be no rival hypothesis H′ that would predict a different result. Without such a rival hypothesis, it is impossible to satisfy the discrimination condition, which requires that H′ explains why the first simulation produced one result (R ₁) and the second simulation, with its different parameterization, would not produce the same result (R ₂).^[17]

Again, the fact that an agent in epistemic state 2 is unable to find an adequate rival hypothesis is anything but surprising. Indeed, one can also conjure up an example of an empirically driven case of RA in which it is impossible to find an adequate rival hypothesis for analogous reasons. Here is one:

You are the investigator of a murder case and have a peculiar piece of information: either everyone in Room B witnessed the murder, or none of them did. If they were not witnesses, they wouldn’t know the identity of the murderer; they could only guess or lie. Additionally, you know that whether or not they were witnesses, they have all committed to giving the same answer. You decide to question the first person in the room, and they tell you that suspect number 1 committed the murder. This could mean one of two things: either the person was a witness and suspect number 1 is indeed the murderer, or they were not a witness and are either guessing or lying. Unfortunately, you realize that asking anyone else in the room won’t help you distinguish between these two possibilities, because they are all bound to give the same answer as the first person, regardless of which scenario is true.

Hence, Schupbach’s account of ERA diversity does not apply in this case because there is no plausible candidate for a rival explanation H′ that satisfies the discrimination condition. Of course, this is a situation where it is not surprising that the robustness of the result fails to raise our confidence that suspect number 1 committed the murder. After all, you know that regardless of whether everyone in Room B witnessed the event – or whether none of them did – you would still get the same robust verdict. In other words, the agreement is guaranteed independently of whether suspect number 1 is actually the murderer. So there is no epistemic gain in collecting further testimonies, since the outcome is already fixed by the setup.

In this sense, it is almost reassuring that Schupbach’s framework does not apply here: the agreement clearly lacks evidential value, and it would be a mistake to treat it as confirmatory. One might then ask: if this is the case, why should we see it as a problem for Schupbach’s account? On its own, it isn’t. But the difficulty becomes clear once we recognize that, in neither epistemic state 1 nor epistemic state 2, can an agent construct both a plausible target and a rival hypothesis that satisfy all the conditions of ERA diversity. From this, it follows that the framework cannot apply in any possible epistemic state. Why? Because the only remaining option is epistemic state 3, which, as I explain next, faces the very same limitations.

Epistemic state 3: An agent withholds belief about whether the incompatible assumptions made respectively by the models are relevant to whether the models can all be adequate (not by mere luck) for the purpose at hand.

This is, arguably, a very common epistemic state. For instance, climate scientists are often in this position: before observing the model results, they may not know whether the incompatible assumptions made by the models are relevant to the result they care about – that is, whether these differences bear on the models’ adequacy for the inferential task at hand.

While it is certainly possible for an agent to be in this kind of epistemic state, notice that it effectively amounts to being uncertain about whether they are in epistemic state 1 or epistemic state 2. In other words, they are unsure whether the incompatible assumptions are relevant to the result (which would place them in epistemic state 1), or irrelevant (which would place them in epistemic state 2). But since, as I have argued above, Schupbach’s conditions cannot be met in either of those states, it follows that an agent in epistemic state 3 – suspended between them – will also be unable to find both a plausible target and a rival hypothesis that satisfy all the conditions of ERA diversity.

To see why this is the case, it will be helpful to consider the following analogous case:

You are the investigator of a murder case, and you have potential witnesses in two rooms, Room A and Room B. The characteristics of the people in these rooms are the same as described in the previous examples. You find yourself in one of the two rooms, without knowing whether it’s Room A or Room B, and decide to question the first person you encounter. They tell you that suspect number 1 committed the murder. You reason as follows: If I am in Room A and I ask a second person, and they also tell me that suspect number 1 is the murderer, then, since at most one person could be a witness, the hypothesis that “suspect number 1 is the murderer and both people were witnesses” must be false. Therefore, this cannot be a possible explanation. On the other hand, if I am in Room B, and I ask a second person, they will be bound to give the same answer as the first person, regardless of whether they were all witnesses or not. Hence, if I am in Room B, asking a second person won’t help me discriminate between these two possible explanations, because the second person is bound to give the same answer as the first, regardless of the correct scenario.

Hence, Schupbach’s account of ERA diversity does not apply in this case because despite the agent’s uncertainty as to whether they should be in epistemic state 1 or epistemic state 2, the agent knows for certain that there is either no plausible candidate for a target explanation H or no plausible candidate for a rival explanation H′.

In summary, my argument is that Schupbach’s account of ERA diversity cannot be applied to model-based RAs that involve incompatible assumptions and aim to confirm that a result holds in a specific target system. Across all possible epistemic states – whether the agent believes that only one model could be adequate not by mere luck (epistemic state 1), that both could be adequate not by mere luck despite their incompatibilities (epistemic state 2), or is uncertain between these options (epistemic state 3) – the conditions required by Schupbach’s account cannot be satisfied. In each case, the agent is unable to identify both a plausible target and a plausible rival hypothesis that jointly meet the all the conditions of ERA diversity.

This limitation matters, because the kinds of robustness analyses I focus on – involving complex models with incompatible assumptions and limited understanding of their mutual relevance – are not marginal cases. On the contrary, they are among the most philosophically and scientifically significant instances where robustness is invoked to secure confidence in model outputs. If ERA diversity cannot account for the epistemic force of robustness in these cases, then its scope is much narrower than often assumed.

4.3 Beyond the Surface: Deeper Objections and Key Takeaways

I am not the first to argue that Schupbach’s account of ERA diversity does not necessarily apply to climate model ensembles. O’Loughlin (2021) has pointed out that since these models differ across several assumptions and idealizations simultaneously, it is impractical to use them for eliminating rival hypotheses. In ERA, robustness is achieved by using diverse methods to eliminate competing explanations, thereby confirming a target hypothesis. However, in the context of climate modeling, the models involved are so complex and varied across multiple dimensions – such as grid resolution, parameterizations, and representation of processes – that it is difficult to isolate specific rival hypotheses that can be ruled out.

O’Loughlin’s position can be summarized as follows: Winsberg’s analysis suggests that climate models could, in theory, be used in a diverse set for robustness analysis to eliminate competing hypotheses. However, because current models differ in so many ways simultaneously, O’Loughlin argues that Winsberg’s example does not reflect how robustness analysis is actually practiced in climate science. In reality, the typical ERA approach of systematically ruling out alternative explanations doesn’t work because there isn’t a clear, single rival hypothesis to eliminate. The differences between models are not controlled in a way that allows for such a precise eliminative process, which undermines the applicability of ERA in situations where models vary broadly and complexly, as is the case with climate model ensembles.

While I agree with O’Loughlin’s observation that Winsberg’s example does not accurately reflect robustness analysis in climate science, I have argued that the inapplicability of ERA diversity to climate model ensembles is even more profound. Contrary to O’Loughlin, I contend that Schupbach’s account of ERA diversity cannot apply even to Winsberg’s simple toy example. When models include incompatible assumptions about the target system, and the hypothesis we seek to confirm is that a model’s result is instantiated in that target system, it is impossible to identify both an adequate target and an adequate rival hypothesis that satisfy all conditions of ERA diversity.

This does not mean it is impossible to argue for the epistemic significance of model-based RA in such cases. However, I argue that any such defense cannot rely on Schupbach’s account. Indeed, the notion that the epistemic significance of robustness hinges on finding an explanation for why models agree is, in my view, the wrong approach to understanding what is happening in these cases of robustness analysis. What matters is not why the models agree on a result, but rather that they do agree at all.

Consider the example I gave in Section 4.2 to illustrate epistemic state 1. The reason our confidence increases as more people in Room A identify suspect number 1 as the murderer does not depend on explaining why they agree since we already know there can only be one witness. Instead, our confidence rises because, as the number of agreeing verdicts grows, it becomes increasingly likely that at least one of those individuals is the actual witness, and thus that the shared verdict reflects the truth. Analogously, one might argue that as more models agree on a result, our confidence increases, not because we can explain their agreement, but because we have greater reason to believe that at least one of them is adequate for the task, simply in virtue of having considered more models. This represents a fundamentally different way of motivating the epistemic value of model-based RA – one that does not rely on explanatory reasoning, as Schupbach’s account does, but instead draws on probabilistic intuitions about adequacy and ensemble size.

Of course, this strategy faces its own challenges. In particular, it depends on having some justified understanding of what counts as a potentially adequate representation of the target system, and of how well the ensemble of models samples that representational space.^[18] Without this background knowledge, it is difficult to assess whether adding more models should genuinely increase our confidence that one is adequate, or whether the ensemble might systematically overlook relevant portions of the uncertainty space.

A similar concern also arises in the case of another promising alternative put forward by Lehtinen (2018), which also avoids reliance on explanatory hypotheses. Lehtinen proposes that robustness can contribute to the indirect confirmation of a result through two main mechanisms: first, by functioning as evidence accumulation when a result is derived from multiple, independently supported models; and second, by helping to isolate the “core assumptions” on which the result depends, insofar as it remains stable under variation of auxiliary assumptions.

This is a compelling proposal, and one that deserves further philosophical attention. At the same time, especially in relation to the second mechanism, questions remain about how best to understand claims about the robustness of a result with respect to a model’s core. If this is interpreted as the idea that “as long as C is retained, R follows regardless of other modeling choices,” then the degree of confirmation arguably depends on our ability to characterize the broader space of plausible auxiliary assumptions and to ensure that this space has been adequately sampled. As with the ensemble-based strategy discussed above, if that assumption space is poorly mapped, or if the models vary only in limited and possibly correlated ways, it may be difficult to assess what kind of confirmation – if any – is genuinely conferred.

Let me stress again that the aim of this paper is not to evaluate such alternative accounts. My focus has been exclusively on Schupbach’s explanatory framework and its limitations in certain modeling contexts. Nothing in the argument presented here rules out the possibility that other, more promising strategies for accounting for the epistemic significance of robustness in these settings exist and deserve further philosophical exploration.

Indeed, there remains considerable scope for further philosophical work on these matters. Climate scientists, for example, have long grappled with how to assess the independence of models within an ensemble (e.g., Annan and Hargreaves 2017; Bishop and Abramowitz 2013; Boé 2018; Sanderson et al. 2015). Although different approaches to model dissimilarity abound, they often rest on a shared intuition: that greater structural diversity among models justifies greater confidence in their shared outputs. But why should we assume that dissimilarity across models reliably correlates with a broader coverage of relevant uncertainties? Two models might be broadly similar but differ critically in their treatment of uncertain processes like cloud formation – and thereby span more epistemically relevant uncertainty than two models that are structurally distinct in less significant ways. What, then, are the features that matter when assessing whether an ensemble adequately samples the space of plausible models? And which of these features can be identified and operationalized in practice? These are precisely the kinds of questions that philosophers are well placed to investigate – and addressing them is crucial if we are to clarify the epistemic role of robustness analysis in complex, model-driven scientific contexts.

5 Conclusions

Numerous attempts have been made to justify the epistemic value of robustness analysis from a Bayesian perspective. Prior to Schupbach, most accounts relied on some notion of probabilistic independence to capture evidence diversity. However, these approaches, no matter how sophisticated their formulation, fail to apply in many contexts. Schupbach’s challenge to the assumption that independence is always necessary for justifying the epistemic significance of robustness analysis is therefore well-founded. Yet, abandoning independence-based accounts does not absolve us from the responsibility of critically examining our intuitions.

In this paper, I have argued that Schupbach’s account of Explanatory Robustness Analysis is inapplicable to a significant class of model-based robustness analyses – specifically, those cases where the hypothesis we seek to confirm is that a model’s result is instantiated in a target system, and the models involved make incompatible assumptions about that system. Hence I have shown that, despite the initial appeal of Winsberg’s argument that Schupbach’s account of ERA diversity might clarify the epistemic value of model robustness in climate science, deeper conceptual challenges persist. These challenges can easily be overlooked if, instead of rigorously scrutinizing our intuitions, we allow them to shape the very assumptions we are willing to accept.

Corresponding author: Margherita Harris, SOCRATES Centre, Leibniz University Hannover, Hannover, Germany, E-mail: margherita.harris@philos.uni-hannover.de

Research ethics: Not applicable.
Informed consent: Not applicable.
Author contributions: Margherita Harris (the single author of this paper) has entirely contributed to the study conception and writing of the manuscript.
Conflict of interest: All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Research funding: Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project 470816212/KFG43.
Data availability: Not applicable.

References

Annan, James D., and Julia C. Hargreaves. 2017. “On the Meaning of Independence in Climate Science.” Earth System Dynamics 8 (1): 211–24. https://doi.org/10.5194/esd-8-211-2017.Search in Google Scholar

Bishop, Craig H., and Gab Abramowitz. 2013. “Climate Model Dependence and the Replicate Earth Paradigm.” Climate Dynamics 41 (3-4): 885–900. https://doi.org/10.1007/s00382-012-1610-y.Search in Google Scholar

Boé, Julien. 2018. “Interdependency in Multimodel Climate Projections: Component Replication and Result Similarity.” Geophysical Research Letters 45 (6): 2771–9, https://doi.org/10.1002/2017gl076829.Search in Google Scholar

Cartwright, Nancy. 1991. “Replicability, Reproducibility, and Robustness: Comments on Harry Collins.” History of Political Economy 23 (1): 143–55, https://doi.org/10.1215/00182702-23-1-143.Search in Google Scholar

Chalmers, Alan. 2011. “Drawing Philosophical Lessons from Perrin’s Experiments on Brownian Motion: A Response to Van Fraassen.” The British Journal for the Philosophy of Science 62 (4): 711–32. https://doi.org/10.1093/bjps/axq039.Search in Google Scholar

Crupi, Vincenzo, and Katya Tentori. 2012. “A Second Look at the Logic of Explanatory Power (with Two Novel Representation Theorems).” Philosophy of Science 79 (3): 365–85. https://doi.org/10.1086/666063.Search in Google Scholar

Frisch, Mathias. 2015. “Predictivism and Old Evidence: A Critical Look at Climate Model Tuning.” European Journal for Philosophy of Science 5 (2): 171–90. https://doi.org/10.1007/s13194-015-0110-4.Search in Google Scholar

Garber, Daniel. 1983. “Old Evidence and Logical Omniscience in Bayesian Confirmation Theory.” In Testing Scientific Theories, edited by John Earman, 99–132. Minneapolis: University of Minnesota Press.10.5749/j.cttts94f.8Search in Google Scholar

Glymour, Clark. 2015. “Probability and the Explanatory Virtues.” British Journal for the Philosophy of Science 66 (3): 591–604. https://doi.org/10.1093/bjps/axt051.Search in Google Scholar

Good, Irving John. 1960. “Weight of Evidence, Corroboration, Explanatory Power, Information and the Utility of Experiments.” Journal of the Royal Statistical Society: Series B (Methodological) 22 (2): 319–31. https://doi.org/10.1111/j.2517-6161.1960.tb00378.x.Search in Google Scholar

Goodman, Nelson. 1983. Fact, Fiction, and Forecast. Cambridge, MA: Harvard University Press.Search in Google Scholar

Harris, Margherita. 2021. “The Epistemic Value of Independent Lies: False Analogies and Equivocations.” Synthese 199: 14577–97, https://doi.org/10.1007/s11229-021-03434-8.Search in Google Scholar

Houkes, Wybo, Dunja Šešelja, and Krist Vaesen. 2024. “Robustness Analysis.” In In the Routledge Handbook of Philosophy of Scientific Modeling, edited by Natalia Carrillo, Tarja Knuuttila, and Rami Koskinen, 195–207. Abingdon and New York: Routledge.10.4324/9781003205647-18Search in Google Scholar

Hudson, Robert. 2020. “The Reality of Jean Perrin’s Atoms and Molecules.” The British Journal for the Philosophy of Science 71 (1): 33–58. https://doi.org/10.1093/bjps/axx054.Search in Google Scholar

Justus, James. 2012. “The Elusive Basis of Inferential Robustness.” Philosophy of Science 79 (5): 795–807.10.1086/667902Search in Google Scholar

Katzav, Joel. 2014. “The Epistemology of Climate Models and Some of its Implications for Climate Science and the Philosophy of Science.” Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics 46: 228–38. https://doi.org/10.1016/j.shpsb.2014.03.001.Search in Google Scholar

Knüsel, Benedikt. 2020. “Philosophy and Climate Science.” Ethics, Policy & Environment 23 (1): 114–7. https://doi.org/10.1080/21550085.2020.1733299.Search in Google Scholar

Kuorikoski, Jaakko, Aki Lehtinen, and Caterina. Marchionni. 2010. “Economic Modelling as Robustness Analysis.” The British Journal for the Philosophy of Science 61 (3): 541–67. https://doi.org/10.1093/bjps/axp049.Search in Google Scholar

Kuorikoski, Jaakko, Aki Lehtinen, and Caterina Marchionni. 2012. “Robustness Analysis Disclaimer: Please Read the Manual Before Use!.” Biology & Philosophy 27 (6): 891–902.10.1007/s10539-012-9329-zSearch in Google Scholar

Lehtinen, Aki. 2018. “Derivational Robustness and Indirect Confirmation.” Erkenntnis 83 (3): 539–76. https://doi.org/10.1007/s10670-017-9902-6.Search in Google Scholar

Lloyd, Elizabeth A. 2009. “Varieties of Support and Confirmation of Climate Models.” Proceedings of the Aristotelian Society 83 (1): 213–32, https://doi.org/10.1111/j.1467-8349.2009.00179.x.Search in Google Scholar

Lloyd, Elizabeth A. 2015. “Model Robustness as a Confirmatory Virtue: The Case of Climate Science.” Studies in History and Philosophy of Science Part A 49: 58–68, https://doi.org/10.1016/j.shpsa.2014.12.002.Search in Google Scholar

Lusk, Greg. 2019. “Philosophy and Climate Science, by Eric Winsberg.” The British Journal for the Philosophy of Science Review of Books, https://www.thebsps.org/reviewofbooks/lusk-on-winsburg/.Search in Google Scholar

Mayo, DeborahG. 1986. “Cartwright, Causality, and Coincidence.” PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1986 (1): 42–58, https://doi.org/10.1086/psaprocbienmeetp.1986.1.193106.Search in Google Scholar

McGrew, Timothy. 2003. “Confirmation, Heuristics, and Explanatory Reasoning.” The British Journal for the Philosophy of Science 54 (4): 553–67. https://doi.org/10.1093/bjps/54.4.553.Search in Google Scholar

Odenbaugh, Jay, and Anna Alexandrova. 2011. “Buyer Beware: Robustness Analyses in Economics and Biology.” Biology & Philosophy 26 (5): 757–71.10.1007/s10539-011-9278-ySearch in Google Scholar

O’Loughlin, Ryan. 2021. “Robustness Reasoning in Climate Model Comparisons.” Studies in History and Philosophy of Science Part A 85: 34–43, https://doi.org/10.1016/j.shpsa.2020.12.005.Search in Google Scholar

Orzack, Steven H., and Elliot Sober. 1993. “A Critical Assessment of Levins’s the Strategy of Model Building in Population Biology (1966).” The Quarterly Review of Biology 68 (4): 533–46.10.1086/418301Search in Google Scholar

Parker, Wendy S. 2006. “Understanding Pluralism in Climate Modeling.” Foundations of Science 11 (4): 349–68. https://doi.org/10.1007/s10699-005-3196-x.Search in Google Scholar

Parker, Wendy S. 2009. “Confirmation and Adequacy-for-Purpose in Climate Modeling.” AGUFM 2009: GC34A–02.Search in Google Scholar

Parker, Wendy S. 2011. “When Climate Models Agree: The Significance of Robust Model Predictions.” Philosophy of Science 78 (4): 579–600. https://doi.org/10.1086/661566.Search in Google Scholar

Parker, Wendy S. 2013. “Ensemble Modeling, Uncertainty and Robust Predictions.” Wiley Interdisciplinary Reviews: Climate Change 4 (3): 213–23. https://doi.org/10.1002/wcc.220.Search in Google Scholar

Parker, Wendy S. 2020. “Model Evaluation: An Adequacy-for-Purpose View.” Philosophy of Science 87 (3): 457–77. https://doi.org/10.1086/708691.Search in Google Scholar

Perrin, Jean. 1916. Les Atomes. Paris: F. Alean. Translated by D. Ll. Hammick, Atoms. New York: Van Nostrand.Search in Google Scholar

Popper, Karl. 1959. The Logic of Scientific Discovery. London: Hutchinson.10.1063/1.3060577Search in Google Scholar

Psillos, Stathis. 2011. “Moving Molecules above the Scientific Horizon: On Perrin’s Case for Realism.” Journal for General Philosophy of Science 42 (2): 339–63. https://doi.org/10.1007/s10838-011-9165-x.Search in Google Scholar

Sanderson, Benjamin M., Reto Knutti, and Peter Caldwell. 2015. “Addressing Interdependency in a Multimodel Ensemble by Interpolation of Model Properties.” Journal of Climate 28 (13): 5150–70. https://doi.org/10.1175/jcli-d-14-00361.1.Search in Google Scholar

Schupbach, Jonah N. 2018. “Robustness Analysis as Explanatory Reasoning.” The British Journal for the Philosophy of Science 69 (1): 275–300. https://doi.org/10.1093/bjps/axw008.Search in Google Scholar

Schupbach, Jonah N., and Jan Sprenger. 2011. “The Logic of Explanatory Power.” Philosophy of Science 78 (1): 105–27. https://doi.org/10.1086/658111.Search in Google Scholar

Stensrud, David J. 2009. Parameterization Schemes: Keys to Understanding Numerical Weather Prediction Models. Cambridge: Cambridge University Press.Search in Google Scholar

Weisberg, Michael. 2006. “Robustness Analysis.” Philosophy of Science 73 (5): 730–42. https://doi.org/10.1086/518628.Search in Google Scholar

Winsberg, Eric. 2018. Philosophy and Climate Science. Cambridge: Cambridge University Press.10.1017/9781108164290Search in Google Scholar

Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/krt-2025-0002).

Received: 2025-01-31

Accepted: 2025-06-18

Published Online: 2025-07-01

This work is licensed under the Creative Commons Attribution 4.0 International License.

Supplementary Material

https://doi.org/10.1515/krt-2025-0002

Keywords for this article

robustness analysis; Bayesian confirmation; climate models; explanatory reasoning

Creative Commons

BY 4.0