Home Mathematics On the dimensional indeterminacy of one-wave factor analysis under causal effects
Article Open Access

On the dimensional indeterminacy of one-wave factor analysis under causal effects

  • Tyler J. VanderWeele EMAIL logo and Charles J. K. Batty
Published/Copyright: July 28, 2023

Abstract

It is shown, with two sets of indicators that separately load on two distinct factors, independent of one another conditional on the past, that if it is the case that at least one of the factors causally affects the other, then, in many settings, the process will converge to a factor model in which a single factor will suffice to capture the covariance structure among the indicators. Factor analysis with one wave of data then cannot distinguish between factor models with a single factor vs those with two factors that are causally related. Therefore, unless causal relations between factors can be ruled out a priori, alleged empirical evidence from one-wave factor analysis for a single factor still leaves open the possibilities of a single factor or of two factors that causally affect one another. The implications for interpreting the factor structure of psychological scales, such as self-report scales for anxiety and depression, or for happiness and purpose, are discussed. The results are further illustrated through simulations to gain insight into the practical implications of the results in more realistic settings prior to the convergence of the processes. Some further generalizations to an arbitrary number of underlying factors are noted. Factor analyses with one wave of data should themselves be interpreted as characterizing associations among indicators that may be present either due to conceptual relations or due to causal relations concerning the underlying construct phenomena.

MSC 2010: 62H25

1 Introduction

Exploratory factor analysis [1,2,3] is frequently used to assess the dimensionality of a set of indicators in a survey. In many cases, the motivation for such an analysis is to allegedly demonstrate that a set of items constitutes a unidimensional scale. Such factor analysis is typically carried out with only a single wave of data collected, i.e., at a single time point at which all of the items are assessed. Establishing the unidimensionality of a scale is often viewed as an important part of scale development [4], to be carried about before the scale is employed in longitudinal data collection efforts. Considerations of causal effects in factor analysis is generally ignored.

It is well-known that one cannot typically assess causal relations with a single wave of cross-sectional data in which data on all variables are collected at the same time [5,6,7]. However, the implications of this fact for the psychometric evaluation of scales have been neglected. If, for example, there are two underlying factors that explain a set of survey item responses or indicators, and if these factors are causally related, it will not be possible, with a single wave of data, to assess causal relations between them. Unfortunately, this has rather serious consequences for attempts to assess factor dimensionality with a single wave of data. Specifically, with a single wave of data, it would not be possible to distinguish associations arising from causal relations between the factors from those allegedly arising from conceptual relations among the indicators. The present study formalizes this intuition and discusses the implications for the practice of factor analysis.

2 Factor analysis with two causally related factors

Consider a standard factor analytic model [1,2,3], with two sets of survey item responses, ( Y 1 t , , Y w t ) and ( Y w + 1 t , , Y p t ) , measured at time t, that separately load on two distinct factors, η 1 t and η 2 t , independent of one another conditional on past values of the latent factors, η 1 t 1 and η 2 t 1 . Suppose, however, that over time at least one of the factors causally affects the other, which then gives rise to what is sometimes called a dynamic factor model or dynamic structural equation model [8,9,10], as illustrated in Figure 1.

Figure 1 
               Causal effects of two latent factors on each other over time.
Figure 1

Causal effects of two latent factors on each other over time.

More generally, if we let Y t = ( Y 1 t , , Y p t )′ denote a set of item responses or indicators measured at time t, and η t = ( η 1 t , , η m t ) be a set of m latent factors at time t, then the standard factor analytic model with independent errors is given by

Y t = Λ η t + ε t ,

where Λ is a p × m matrix and at each time t, ε t is a p × 1 vector of independent normally distributed random variables whose distribution is stationary over time. Exploratory factor analysis [1,2,3] attempts to draw conclusions about the dimensionality of η t using the observed data Y t . While this is a very specific model, it is the model that is effectively employed in thousands of applied papers on scale development [2,4], almost always with data at a single wave i.e., at a single time point t.

Now suppose that the latent factors η t can change over time and that the components may be causally related to each other but that the entire vector of latent variables η t follows a Markov process so that

η t = B η t 1 + W t ,

where B is an m × m matrix with the i-j entry representing the causal effect of factor j at time t–1 on factor i at time t, and where W t is an m × 1 vector of random errors independent across time and independent of all ε t . At a given point in time, both W t and ε t capture variation across individuals, and as t varies variation across time, W t concerning the relationship between the latent variables η t 1 at one point in time vs the next η t , and ε t concerning the relation between the latent variables η t and the indicators Y t . We can consider the behavior of the process constituted by the set of indicators Y t = ( Y 1 t , , Y p t ) over time. As time passes, we have the following result concerning the convergence of this process.

Theorem 1

Suppose indicators Y t = ( Y 1 t , , Y p t ) at time t are related to a set of m latent factors η t = ( η 1 t , , η m t ) by

Y t = Λ η t + ε t ,

and that η t = B η t 1 + W t where the variables W t are independently normally distributed with potentially distinct parameters at each t. Let B = QD Q 1 be the Jordan decomposition of B where D is an m × m matrix of Jordan normal form. Suppose it is the case that (i) as t , Y t converges in distribution to some random variable Y * , (ii) B is invertible, and (iii) the random variables W t decay sufficiently quickly such that as t , k = 1 t B k W k converges in distribution to some normally distributed variable W * , then Y * will follow a factor model as follows:

Y * = Λ * η * + ε ,

with dim( η * ) = rank( D * ), where D * = lim t D t .

Proof of Theorem 1

Since η t = B η t 1 + W t , we then have that

Y t = Λ η t + ε = Λ ( B η t 1 + W t ) + ε t ,

= Λ B η t 1 + Λ W t + ε t ,

= Λ B ( B η t 2 + W t 1 ) + Λ W t + ε t ,

= Λ B 2 η t 2 + Λ B W t 1 + Λ W t + ε t ,

and by iteration

Y t = Λ B t η 0 + Λ k = 0 t 1 B k W t k + ε t .

For the process Y t to converge in distribution as t to some random variable Y * regardless of the initial distribution η 0 , we must have that the matrix B t converges as t . Let B = QD Q 1 denote the Jordan decomposition of B for some m × m invertible matrix Q and where D is an m × m matrix of Jordan normal form [11]. It then follows that B t = Q D t Q 1 and B t will converge if and only if D t converges. By Theorem 1 of Oldenburger [12], for D t to converge as t , it must have, as its limit, subject to permutation of indices, a matrix of the form

D * = lim t D t = I 0 0 0 ,

where either the identity or the zeros along the diagonal may possibly be absent. Define B * = lim t B t , then

B * = lim t B t = lim t QD t Q 1 = QD * Q 1 .

By condition (iii), the random variables W t decay sufficiently quickly over time such that as t , k = 1 t B k W k converges in distribution to some normally distributed variable W * and thus provided, as above, B t converges to some matrix B * as t , we would have

k = 0 t 1 B k W t k = B t k = 1 t B k W k d B * W * .

We then have that Y t = Λ B t η 0 + Λ k = 0 t 1 B k W t k + ε t converges in distribution to a random variable Y * given by

Y * = Λ B * η 0 + Λ B * W * + ε t ,

= Λ QD * Q 1 η 0 + Λ QD * Q 1 W * + ε t .

Define Λ * = Λ Q , η * = D * Q 1 ( η 0 + W * ) . In equilibrium as t , the factor model could thus be written as follows:

Y * = Λ * η * + ε ,

with ε having the same distribution as ε t and since Q is invertible, the dimensionality of η * will be rank( D * ). This completes the proof.□

Theorem 1 implies that, under the conditions given in the Theorem, the process constituted by the set of indicators Y t = ( Y 1 t , , Y p t )' will, over time, converge to a factor model Y * = Λ * η * + ε where the dimensionality of the factors η * depends on the matrix D in the Jordan normal form of the matrix B denoting the causal effects of each of the factors on the others.

The conditions (i)–(iii) of Theorem 1 are technical and will be considered in somewhat greater detail further below. However, briefly, condition (i) is simply an assumption that the process Y t does in fact converge over time. Condition (ii) is a simple statement on the invertibility of the matrix B. Condition (iii) concerns the rate of the decay of the random error terms W t in the process relating the underlying latent variables at one time point to the next. This third assumption might be interpreted as pertaining to settings in which exogenous variation in the underlying latent variables is larger at earlier time periods (e.g., earlier in life) vs later. Some possible examples are discussed below.

Theorem 1 in fact has particularly striking consequences when there are only two factors η t = ( η 1 t , η 2 t ) , as stated in the following Corollary.

Corollary

Under the conditions of Theorem 1, if dim ( η t ) = 2, then as t the resulting factor model

Y * = Λ * η * + ε ,

is such that dim ( η * ) = 2 if and only if B = I.

Proof of Corollary

With two latent factors, η t = ( η 1 t , η 2 t ) , B, D, and D * will all be 2 × 2 matrices. Recall that D is the Jordan normal form of B. If D has the form

λ 1 0 λ ,

then D t will only converge if | λ | < 1 (Oldenberger [12]). However, in that case, D * = lim t D t = 0 . If D has the form

λ 1 0 0 λ 2 ,

then D t will only converge if 1 < λ 1 1 and 1 < λ 2 1 (Oldenburger, [12]). If one of λ 1 or λ 2 is less than 1 in absolute value and the other is equal to 1, then rank( D * ) = rank ( lim t D t ) = 1 . If both are less than 1 in absolute value, then rank( D * ) = 0. Thus, the only way we can have rank( D * ) =2 is if λ 1 = λ 2 = 1, in which case, D = I and B = QD Q 1 = Q Q 1 = I . This completes the proof.□

The corollary states that for two latent factors η t = ( η 1 t , η 2 t ) , under the condition of Theorem 1, the only way for the process constituted by the set of indicators Y t = ( Y 1 t , , Y p t )' to converge over time to a factor model which itself has two factors, so that dim( η * ) = 2, is if the matrix B denoting the causal effects of each of the factors on the others is the identity matrix i.e., if neither η 1 t 1 affects η 2 t , nor η 2 t 1 affects η 1 t so that there are no causal effects of the factors on one another.

We are thus left with the conclusion that, in equilibrium as t , either there are no causal effects of the factors on one another, or if there were, then a factor model with a single factor will be sufficient. This latter result is in some sense analogous to the Perron–Frobenius theorem for time-homogeneous Markov chains with finite state spaces (cf. [13]) for which if the Markov chain is irreducible and aperiodic, then the transition matrix raised to the kth power will converge to a rank-one matrix. In the present dynamic factor model under consideration, η t is itself a Markov chain on a measurable state space, although Y t is not. Note that in the proof of Theorem 1, although the factors constituted by the components of the vector η * = D * Q 1 ( η 0 + W * ) may be correlated, if rank ( D * ) = 1 , then we will still be left with only a single factor.

Stated another way, Corollary 1 indicates that causal effects of one latent factor on another imply that, in equilibrium, if only a single wave of data is collected, a factor model with one factor will sometimes suffice, even if the true underlying structures are such that there are two causally related factors. The implications of this result for the current practice of factor analysis are unsettling. Efforts are often made during scale development to demonstrate unidimensionality of a set of item responses using factor analysis with one wave of data [4]. If factor analysis provides evidence that a single factor is sufficient to explain most of the covariance in the item responses, this is generally deemed satisfactory. However, the argument above indicates that this is also exactly the empirical result that one might expect, with one-wave factor analysis, if there were in fact two distinct latent factors that causally affected each other over time. With a one-wave factor analysis, if there is evidence for more than one factor, then this may be genuine evidence against unidimensionality. But if a one-wave factor analysis suggests only one factor, then we cannot distinguish between the possibilities of two factors with causal effects over time vs a single factor.

The only way we could establish unidimensionality in this case would be if we could rule out, a priori, that, if there were two or more factors, then they definitively did not causally affect one another. But it is difficult to imagine circumstances in which we were uncertain, on conceptual grounds, about the number of factors, but knew that, if more than one existed, then they were causally unrelated. We are left with the conclusion that, in many circumstances, alleged empirical evidence for a single factor from one-wave factor analysis in fact essentially cannot rule out the possibility of the presence of two factors with causal effects on one another.

A number of potential objections to the conclusions of the analysis above concerning current factor analytic practices might be put forward. First, Figure 1 considers only a relatively simple factor model with each indicator loading only on one of the two factors, and with independent errors. In some sense though, this is an ideal case, when one might most expect to be able to discern two separate factors from a one-wave factor analysis. And even in this ideal case, one could not in equilibrium distinguish the models with a single factor vs two factors that were causally related. However, Theorem 1 and its Corollary did not in fact rely on indicators loading on only a single factor or on independent errors. Even under these more complex structures, if at least one factor causally affects the other, then the argument above shows that, in equilibrium, under those conditions, a factor model with one factor will suffice.

Second, the argument above also imposed certain assumptions on the matrix B and random errors W t . Other, or more general, specifications could be considered. However, this case does suffice to demonstrate how causal effects of the factors on one another, can, in equilibrium, lead to a reduction in the number of factors that suffice for a factor model to account for associations among item responses. The argument did require a decaying structure of the errors so as to obtain convergence (condition (iii) in Theorem 1). Such a model of the errors over time may not always be realistic. However, in settings in which exogenous sources of variation are more common earlier in life and the variables or states are more stable later in life, such assumptions on the decaying nature of the error terms may be a reasonable approximation. Such may be the case with, for example, education and say the quantile of total wealth. Each of these may causally affect the other, and while either of these can in principle change as time goes on, the likelihood of large increases in education, or in quantile of total wealth, diminish substantially by mid-life. Likewise, if lifelong trajectories of anxiety and depression are more powerfully shaped by life circumstances, experiences, and therapies in childhood, adolescence, and early adulthood, and typically over time become somewhat more stable by mid-life at either relatively low or relatively high levels of psychological distress, still subject to variation but less so, then once again a decaying error structure may be a reasonable approximation.

Third, one might dispute that equilibrium is ever achieved and object that the notion of convergence as t , is a theoretical abstraction. However, the limit argument above does imply that if at least one of two factors causally affects the other so that B is not the identity matrix, then one can find a finite number of time steps k, such that D k is within any given arbitrarily small deviation from being either 0 or the matrix

1 0 0 0 ,

and thus, for there being only very slight deviation from a single factor explaining the set of indicators at time k. Moreover, in practice, when investigators use factor analysis to determine the number of factors they will often ignore factor loadings below 0.3 and indeed the guidance given in textbooks on exploratory factor analysis is to do precisely this [14]. We are not endorsing this practice, but only pointing out that it is often employed. However, if relatively low factor loadings are indeed ignored, then in principle only a very small number of steps may be sufficient for an investigator to erroneously conclude in practice that a single factor was present. We illustrate this below in the simulation studies. Thus, once again, if there are causal effects of one factor on another and the process η t has proceeded through even a small number of steps, then a one-wave factor analysis can in practice, long before convergence is attained or even approximated, erroneously suggest a single factor based on current factor analysis practices.

It is possible that sufficiently prior to convergence being obtained, factor analyses with one wave of data could be employed to uncover aspects of the underlying processes. For example, under the data-generating structure in Figure 1, prior to convergence, exploratory factor analysis, with a sufficiently large sample size, allowing for correlated/oblique factors (e.g., using varimax rotation, cf. [1,2,3]) could uncover the fact that the two sets of items were loaded on two separate factors. However, an analyst who instead employed independent/orthogonal factors would attain an equally good fit with two independent factors and all items loading on both factors. In this regard, fit statistics may be of limited use. The factor model, even sufficiently prior to convergence, is unidentified from one wave of data and so it is impossible to distinguish between the two solutions. In some sense, each analyst would correctly identify some aspect of the underlying processes: the first analyst would correctly identify distinct sets of factor loadings, and the second analyst would correctly identify the potential independence of the factors (though, at a given wave, they are only independent of each other conditional on the past). However, it is not possible to fully identify the correct structure with only one wave of data. Again, the causal relationships between the factors renders such identification impossible with one wave of data.

More generally, regardless of the equilibrium argument, and the specification of the error terms, it should be clear even from Figure 1 itself, without any further algebraic derivation, that if there are two factors that causally affect one another, and a factor model is fit with a single wave of data, the covariance amongst the indicators will arise both from the underlying factor for each indicator, and from the causal effect of one factor on the other. The model with factor loadings for each indicator and with causal effects across factors is effectively unidentified with a single wave of data. There is no way to distinguish the two sources of covariance with one wave of data. If there were no causal effects of η 1 t 1 on η 2 t , nor of η 2 t 1 on η 1 t , then items Y 1 t and Y w + 1 t would be statistically independent. In the presence of a causal effect of η 2 t 1 on η 1 t , they will be statistically dependent. It will not be possible to distinguish between causal relations among the factors and the allegedly conceptual relations arising from underlying latent factors using just one wave of data.

3 Example from prior literature: anxiety and depression

Feldman [15] examined self-report scales for anxiety and depression and employed factor analysis to assess whether these self-report anxiety and depression scales measure distinct constructs. Based on results from factor analysis with one wave of data she concludes, “These analyses provide evidence that anxiety and depression self-report scales do not measure discriminant mood constructs and may therefore be better thought of as measures of general negative mood rather than as measures of anxiety and depression per se.” Similar conclusions were drawn by Norton et al. [16] in a meta-confirmatory factor analysis using data from 28 samples concerning the Hospital Anxiety and Depression Scale. They conclude that “Due to the presence of a strong general factor, [the Hospital Anxiety and Depression Scale] does not provide good separation between symptoms of anxiety and depression. We recommend it is best used as a measure of general distress.”

However, these conclusions ignore the role that causal relationships between the phenomena of depression and anxiety may play in these factor analytic empirical results. There is in fact evidence from numerous longitudinal studies that the experience of anxiety renders subsequent depression more likely, and likewise that the experience of depression renders subsequent anxiety more likely [17]. It is likely that each has causal effects on the other. This of course has implications for the interpretation of factor analyses such as those of Feldman [15] and Norton et al. [16]. If the experience of anxiety causes depression, and the experience of depression causes anxiety, or even if just one of these two causal relations held, then even if it were the case that the anxiety and depression items loaded on distinct anxiety and depression factors, from the results above, one might still anticipate, from a one-wave factor analysis, evidence for a factor structure with only a single factor. The results of a one-wave factor analysis, such as those of Feldman [15] and Norton et al. [16], are exactly what one might expect, even if there were two distinct causally related factors. Allowing for the possibility of causally related factors, and in this case there is good reason to expect that possibility, the results of Feldman [15] and Norton et al. [16] then cannot adequately distinguish between the possibility of two separate factors with causal effects vs a single factor. The basic emotions of sadness and fear, underlying depressive and anxiety disorders, respectively, are arguably clearly conceptually distinct. It may well be the case that the only reason the analyses of Feldman [15] and Norton et al. [16] supposedly indicate that the two cannot be separated is because there are causal relations which their analyses cannot assess because they use only one wave of data.

4 Data analysis example: happiness and purpose

Węziak-Białowolska et al. [18] considered data from a Health and Well-Being Survey including various well-being indicators for happiness, health, purpose, character, relationships, and financial security. Using factor analysis with one wave of data, they presented evidence for distinct well-being factors for health, character, relationships, and financial security. However, the factor analyses did not indicate separate factors for happiness and purpose in life, but suggested these two as a single factor. It is possible that the failure to distinguish these factors arises in part from causal effects of purpose in life on happiness, and possibly also from effects of happiness on purpose in life. There is some evidence, from longitudinal studies, for effects in both of these directions, and the evidence for effects of purpose on happiness and life satisfaction is perhaps especially pronounced [19,20].

We revisit here this issue of distinguishing between happiness and purpose using data from a Well-being Assessment in year 2019 of 1,209 employees at a large insurance company [21]. Specifically, we will consider data on three items for happiness, namely, “How satisfied are you with life as a whole these days?,” “How happy have you felt during the last 7 days?,” and “I expect more good things in my life than bad,” each with scores 0–10; and also three items for purpose: “To what extent do you feel the things you do in your life are worthwhile?,” “I understand my purpose in life,” and “I am pursuing what is most important to me in my life,” again each with scores 0–10. In fitting a factor model using varimax rotation [1,2,3] with two factors, the factor loadings on the first factor are 0.812, 0.836, 0.539, 0.625, 0.355, and 0.403 and the factor loadings on the second factor are 0.397, 0.335, 0.515, 0.536, 0.851, and 0.770. The first purpose indicator has a larger factor loading on what is supposedly the happiness factor (0.625) than on what is supposedly the purpose factor (0.536), and the loading on the happiness factor for this purpose indicator is also larger than the happiness factor loading for the third happiness indicator (0.539). The third happiness indicator moreover has a substantial factor loading on the purpose factor (0.515), of roughly the same magnitude as its loading on the happiness factor (0.539). Furthermore, all six indicators have factor loadings above 0.3 on both factors. This one-wave factor analyses may, however, in part be confounded by the effects that each of past happiness and past purpose-in-life have on the present values of other.

Fortunately, in the case of this well-being survey, data are also available on the same items, one year prior, in 2018. To attempt to control for potential causal effects of purpose and happiness, stratification on past values of purpose and happiness can be carried out. Given prior evidence for the especially strong effects of purpose on happiness, attention could be restricted to the subgroup for which levels of purpose in 2018 were below the median and levels of happiness in 2018 were already above the median. Fitting again a factor model using varimax rotation with two factors with the 2019 data, with restriction to this stratum to attempt to control for confounding for causal effects of past purpose on happiness, the factor loadings on the first factor then becomes 0.867, 0.804, 0.459, 0.449, 0.194, and 0.172 and the factor loadings on the second factor are 0, 0.211, 0.249, 0.399, 0.751, and 0.865. While the separation of the factors is not perfect, it is considerably better. The first purpose indicator no longer has a stronger factor loading on the happiness factor than the third happiness indicator; all of the happiness indicators have factor loadings on the purpose factor that are below 0.3; and all of the purpose indicators except the first now also have factor loadings on the happiness factor that are below 0.3. Although such crude stratification by past values of happiness and purpose only partially controls for the causal effects of purpose on happiness, we see that even with this relatively crude form of control, there is greater separation between the happiness and purpose factors.

5 Simulations

In this section we illustrate the results further through simulations to gain insight into the implications of the results in more realistic settings in which only a finite and possibly small number of time periods or steps have taken place so that convergence is not yet achieved. Consider again, the dynamic factor model with Y t = ( Y 1 t , , Y p t ) denoting a set of indicators measured at time t, and η t = ( η 1 t , , η m t ) denoting a set of m latent factors at time t, with

Y t = Λ η t + ε t ,

where Λ is a p × m matrix and ε t is a p × 1 vector of independent normally distributed random variables. For simplicity, we will consider the setting with m = 2 factors and with 3 indicators per latent variable so that p = 6 and we will assume that at each stage t the variables Y t have been standardized to have mean 0 and variance 1. Suppose again, as above, that

η t = B η t 1 + W t ,

where B is a 2 × 2 matrix with the i-j entry representing the causal effect of factor j at time t–1 on factor i at time t, and where W t is a 2 × 1 vector of random errors.

Consider a setting in which at each time t the first three items load only on the first factor and the second three items load only on the second factor, as in Figure 1, with relatively strong factor loadings given by Λ . 1 = (0.7, 0.7, 0.8, 0, 0, 0) and Λ . 2 = (0, 0, 0, 0.7, 0.8, 0.6) . This will be a rather ideal scenario as the “cross-loadings” are all 0 and the other factor loadings are relatively large in magnitude. We will see, however, that, even in this scenario, the processes begin to converge relatively quickly to a model that can be approximated by a single factor. Consider the setting with

B = 0 . 8 γ γ 0 . 8 ,

so that γ denotes the magnitude of the causal effect of each factor on the other. Let each of the two components of W t be normally distributed with mean value 0 and standard deviation σ δ t so that δ denotes how quickly the random variation in the process governing the latent factors η t decays over time.

We will first illustrate the results above with a particular set of parameter values and then consider, over a broad range of the parameter values of γ and δ and sample size N, how quickly the process becomes effectively indistinguishable from a one-factor model when using only one wave of data.

Consider first the setting with γ = 0.35, σ = 0.3 , and δ = 0 . 9 , with sample size N = 1,000. Note that a standardized effect size for the effects of one factor on another of γ = 0.35 would be somewhat smaller than the estimated effects of depression and anxiety on each other [17], similar to or slightly smaller than the effect of purpose on happiness [20], and perhaps only slightly larger than the effect of happiness on purpose [19]. Code for the simulations that follows is available in the online supplement. With an initial simulated dataset (t = 0), the estimates of the factor loadings from a factor model using varimax rotation are Λ . 1 = (0.73, 0.69, 0.83, 0, 0, 0) and Λ . 2 = (0, 0, 0, 0.70, 0.83, 0.57) . Under the null hypothesis that one latent factor (with six factor loading parameters Λ . 1 ) is sufficient, as compared with an unconstrained model for the correlation matrix with 15 parameters, a likelihood ratio χ 2 test-statistic with 9 degrees of freedom gives a p-value of 9.1 × 10−145. A one factor model is insufficient to explain the correlation structure. After only three time steps (t = 3), the factor loading estimates, if only a single wave of data at t = 3 is used, are Λ . 1 = (0.65, 0.64, 0.64, 0.39, 0.53, 0.43) and Λ . 2 = (0.38, 0.42, 0.51, 0.67, 0.63, 0.52) and the distinct factors are much less clearly distinguishable from one another. However, it is still the case that a χ 9 2 test statistic under the null hypothesis that one latent factor is sufficient gives a p-value of 0.0039. By t = 5, the factor loading estimates are rotation Λ . 1 = (0.73, 0.71, 0.77, 0.69, 0.54, 0.62) and Λ . 2 = (0.41, 0.44, 0.46, 0.46, 0.84, 0.42) and the χ 9 2 test statistic under the null hypothesis that one latent factor is sufficient now gives a p-value of 0.57. It is effectively no longer possible to distinguish the two factors.

To gain further insight into how quickly the two factors become indistinguishable, we consider a range of scenarios in which we vary the magnitude of the effect of the factors on each other, γ , the rate δ at which the random variation in the process governing the latent factors decays over time, and the sample size N. We will consider γ = ( 0 . 1 , 0 . 35 , and 0 . 5 ) corresponding to small, moderate, and large effects sizes of the factors on one another; δ = ( 0 . 65 , 0 . 8 , and 0 . 9 ) corresponding to fast, moderate, and relatively slow rates of decay at which the random variation in the process governing the latent factors decreases, and N = ( 300 , 1 , 000 , and 3 , 000 ) corresponding to modest, moderate, and relatively large sample sizes as employed in factor analysis. For each parameter setting, 500 simulated datasets are created. For each set of parameters, we report in Table 1 the mean value, standard deviation, minimum, maximum, median, and 25th and 75th quantiles of number of iterations across the 500 datasets before the χ 9 2 test statistic under the null hypothesis that one latent factor sufficiently rises above the p = 0.05 threshold.

Table 1

Mean values, standard deviations, minimum, maximum, median, and 25th and 75th quantiles of number of iterations across the 500 datasets before the two distinct factors become effectively indistinguishable based on a χ 9 2 test for the null hypothesis of one factor across scenarios defined by cross-factor effects ( γ ), the rate of decay of random variation in the process underlying the latent factor ( δ ), and the sample size (N)

N γ δ Mean value Std dev. Min 25% Median 75% Max
300 0.1 0.9 5.56 1.05 3 5 5 6 10
300 0.1 0.8 4.66 0.78 3 4 5 5 7
300 0.1 0.65 4.19 0.72 3 4 4 5 6
300 0.35 0.9 2.94 0.78 2 2 3 3 6
300 0.35 0.8 2.50 0.63 2 2 2 3 5
300 0.35 0.65 2.26 0.49 2 2 2 2 5
300 0.5 0.9 2.30 0.70 1 2 2 3 6
300 0.5 0.8 2.03 0.55 1 2 2 2 6
300 0.5 0.65 1.89 0.54 1 2 2 2 4
1,000 0.1 0.9 7.90 1.13 5 7 8 9 12
1,000 0.1 0.8 5.85 0.75 4 5 6 6 9
1,000 0.1 0.65 5.16 0.67 4 5 5 6 7
1,000 0.35 0.9 4.62 1.16 3 4 5 5 9
1,000 0.35 0.8 3.46 0.66 2 3 3 4 6
1,000 0.35 0.65 2.86 0.56 2 3 3 3 5
1,000 0.5 0.9 3.82 1.11 2 3 4 5 7
1,000 0.5 0.8 2.67 0.66 2 2 3 3 6
1,000 0.5 0.65 2.19 0.41 2 2 2 2 4
3,000 0.1 0.9 10.35 1.28 8 9 10 11 15
3,000 0.1 0.8 7.05 0.76 6 7 7 7 10
3,000 0.1 0.65 6.01 0.70 5 6 6 6 9
3,000 0.35 0.9 7.07 1.21 4 6 7 8 11
3,000 0.35 0.8 4.33 0.74 3 4 4 5 7
3,000 0.35 0.65 3.30 0.49 3 3 3 4 5
3,000 0.5 0.9 6.31 1.98 3 5 6 7 40
3,000 0.5 0.8 3.62 0.72 2 3 4 4 7
3,000 0.5 0.65 2.59 0.62 2 2 3 3 5

As can be seen from Table 1, in a relatively small number of iterations, it effectively becomes no longer possible to distinguish the two factors. Each of the parameters that is varied clearly matters. As would be expected, larger cross-factor effects γ , faster decay of the random variation in the process for the underlying latent variables δ , and smaller sample size N all result in the factors effectively becoming indistinguishable in fewer iterations. Even with small cross-factor effects ( γ = 0.1), slow decay of the random variation in the process for the underlying latent factor ( δ = 0 . 9 ), and a large sample size (N = 3,000), only about 10 iterations are needed before the factors effectively become indistinguishable. When the cross-factor effects are moderate or large ( γ = 0.35 or γ = 0.5) and the decay of the random variation in the process for the latent factors is moderate or fast ( δ = 0 . 8 or δ = 0 . 65 ), then often only 2 or 3 iterations are needed before the factors effectively become indistinguishable even when the sample size is large.

There is some variation across the 500 datasets with regard to the number of iterations before the factors become effectively indistinguishable. The parameter that most strongly affects the number of iterations before the factors become effectively indistinguishable appears to be the rate of decay of the random variation in the process for the underlying latent factors. When δ = 0.9, the standard deviation in the number of required iterations is considerable higher than with more rapid rates of decay and this holds across the various other parameter settings. The mean value and median number of required iterations is still often quite modest, but again the standard deviation and also the maximum of the number of required iterations is notably larger when δ = 0 . 9 . When the cross-factor effects were large ( γ = 0.5) with a large sample size (N = 3,000), and a slow decay rate ( δ = 0 . 9 ), in one of the 500 simulated datasets, the χ 2 test had not rejected the null hypothesis of one factor even after 40 iterations.

In summary, when the cross-factor effects are moderate or large, and the decay of the random variation in the process for the latent variables is moderate or fast, very few iterations would be needed before two distinct factors effectively become indistinguishable. With modest cross-factor effects, and slower rates of decay, it may continue to be possible to distinguish distinct factors especially if sample sizes are large and the process by which the underlying latent variables causally affect one other has not proceeded for a longer period of time.

6 Some generalizations

We will now consider some generalizations of the results above, and show that similar phenomena may arise with an arbitrary set of k factors. A set of k causally related factors can likewise, in a one-wave factor analysis, give rise to patterns of association among item responses that, in equilibrium, as t , suggest one factor is sufficient. However, in other cases, a set of k causally related factors may give to patterns of association among item responses that, in equilibrium, suggest that more than one factor, but fewer than k factors, are present.

Suppose once again that Y t = ( Y 1 t , , Y p t ) denotes a set of item responses measured at time t, and η t = ( η 1 t , , η m t ) is a set of latent factors at time t, with the standard factor analytic model with independent errors given by

Y t = Λ η t + ε t ,

and suppose that the latent factors η t may be causally related to each other so that

η t = B η t 1 + W t .

Under the notation and assumptions of Theorem 1, we have that, as t , Y t will converge in distribution to the random variable

Y * = Λ * η * + ε ,

where, by Theorem 1, the dimensionality of η * will be rank( D * ) and where

D * = lim t D t = I 0 0 0 ,

with D being a matrix of Jordan normal form in the Jordan decomposition of B.

We can consider the rank of D * . Let ∼ denote an equivalence relation over { 1 , , m } defined by i j if and only if there exists { k 1 , , k v } { 1 , , m } such that k 1 = i , k v = j , and for all r { 1 , , v } , either B k r k r 1 0 or B k r 1 k r 0 . The distinct equivalence classes of the factors indexed by { 1 , , m } are such that no factor in one equivalence class is causally related to any factor in any other equivalence class. By permuting indices, one can put the matrix B in block matrix form

B = B 1 0 0 B q ,

where each B r corresponds to the causal relationships among the factors in the rth equivalence class. We then have that

B t = B 1 t 0 0 B q t .

For each B r , we can consider the Jordan decomposition B r = Q D r Q 1 for some invertible matrix Q and with D r in Jordan normal form; we then have B r t = Q D r t Q 1 and

D * = lim t D t = D 1 t 0 0 D q t .

For each equivalence class, r, that constitutes only one factor, we have B r = D r = ( 1 ) , and this will contribute 1 to the rank of D * . For each equivalence class, r, constituted by two factors, we have by the argument in the Corollary to Theorem 1, that this will likewise contribute at most 1 to the rank of D * since D r t can only converge and have rank 2, if D r itself is a 2 × 2 identity matrix, in which case B r would be a 2 × 2 identity matrix, but then each of the two factors would constitute its own equivalence class, contrary to the supposition that the two formed a single equivalence class. Thus, any equivalence class, r, that is constituted by only two factors, will at most contribute only 1 to the rank of D * , and will thus, in equilibrium, as t , reduce to at most a single factor. Finally, consider an equivalence class constituted by more than two factors. Suppose that B r is such that all its entries are strictly positive. It would then follow from the Perron–Frobenius theorem [11,22,23] that there is a unique eigenvalue with largest absolute value that has an associated eigenspace of dimension 1. By Theorem 1 of Oldenburger [12], for D r t to converge as t , all eigenvalues must be less than or equal to 1 in absolute value. With a unique eigenvalue with largest absolute value and with an eigenspace of dimension 1, it would follow that the Jordan normal form matrix D r would have at most a single entry of 1 on its diagonal, and by Theorem 1 of Oldenburger [12], for lim t D r t to converge, it would have dimension at most 1.

Thus, for an equivalence class of latent factors wherein each factor in that equivalence class positively causally affects the others, this equivalence class will contribute at most 1 to the rank of D * and, in equilibrium, will thus reduce to at most a single factor. For an equivalence class of latent factors such that some of the factors causally affect others, but not all causal relations are present within the class, or for which some factors might negatively affect others, so that the entries of B r are not strictly positive, the dimensionality of lim t D r t may exceed 1. There has been some work on relaxing the strict positivity requirement of the Perron–Frobenius theorem [24] and thus, in some of these cases, the equivalence class may likewise only contribute 1 to the rank of D * and thus reduce, in equilibrium, to at most a single factor, but this is not always guaranteed by the present results.

Thus, in many cases, each equivalence class, as defined above, will give rise to, in equilibrium, at most a single factor. This will always be the case in the models above when a factor constitutes its own equivalence class, when an equivalence class has only two factors, or when an equivalence class is such that each of the factors positively affects each of the others. It may or may not be the case in other settings. However, as discussed further below, the final case of an equivalence class such that each of the factors positively affects each of the others may be relevant in settings with a series of indicators that are all closely related to a single construct but represent distinct facets of the construct with potentially different causal relationships to other outcomes. This may in fact be a relatively common setting. And once again, if this were the case, it would follow that, over time, this setting would become indistinguishable, in one-way factor analysis, from that of having only a single factor.

In settings in which it is known in advance that certain subgroups of factors cannot affect other subgroups of factors, the analysis above would still be applicable within each subgroup of factors. In cases in which the factor structure might differ according to subgroups defined by some other variable, the analysis above would still be applicable within subgroups defined by that variable. However, for one of the most common uses of factor analysis – to attempt to establish the unidimensionality of a scale – the generalizations here are not in fact needed. A one-wave factor analysis suggesting evidence for a single factor could arise from a single underlying factor, or from two causally related factors, or from a set of k causally related factors each of which positively affects the other. The central point, demonstrated already in Section 2, is that one cannot distinguish, from evidence for unidimensionality from a one-wave factor analysis, whether there is one factor or whether there may be more than one in the presence of causal effects of the factors upon one another.

7 Discussion

The implications of the present work for current psychometric practices are potentially far-reaching. Factor analysis with one wave of data seemingly cannot distinguish between factor models with a single factor vs those with two or more factors that causally affect one another over time. If causal relations between factors cannot be ruled out a priori, alleged empirical evidence from one-wave factor analysis for a single factor still leaves open the possibilities of a single factor or of two or more factors that causally affect one another. It would, moreover, as noted above, be very unusual for it to be the case that one was uncertain as to the dimensionality of a set of factors, but confident that, if there were several, they would be causally unrelated. The results above pertain to linear models, but similar difficulties would in general arise in non-linear models as well, and trying to distinguish between causal relations and the dimensionality of factors would likely be yet more challenging still. In most cases, we thus effectively cannot distinguish between these possibilities when one-wave factor analyses conclude that only one factor suffices to explain the covariance structure of the indicators at that wave. This arguably constitutes a substantial portion of factor analytic studies. In the models considered above, the conclusion of two factors from a one-wave factor analysis would suffice to conclude that one factor is insufficient; but the supposed conclusion of one factor does not preclude the possibility of two factors being present that are causally related.

The problems arise because of the inability to distinguish between causation and alleged conceptual relationships with one wave of data. It is well-known in statistics, and in the biomedical and social sciences, that correlation does not imply causation. The sub-discipline of causal inference [5,6,7,25,26] provides a formal framework to reason about the assumptions needed to move from conclusions of association to conclusions of causation. Such careful thought helps us avoid the fallacy that “Correlation implies causation.” We might refer to this as the “causal fallacy.” Unfortunately, however, a converse fallacy seems to typically arise in psychometric measurement evaluation, namely, that “Correlation cannot imply causation – it must indicate a conceptual relationship.” This too, of course, is false. Correlations may sometimes arise from conceptual relationships; but sometimes correlations may arise from causal relationships. As shown in this study, it is this second converse fallacy that underlies dimensionality assessment in most psychometric work on evaluating measures. We might refer to this converse fallacy that “Correlation cannot imply causation – it must indicate a conceptual relationship” as the “measurement fallacy.” From the discussion above, it is arguably the case that this measurement fallacy in fact ought to be treated with the same level of critique and skepticism that is appropriately directed at the causal fallacy. Both are fallacies. Both fallacies need to be avoided. A fair amount of attention has been given to the causal fallacy. However, to date, the measurement fallacy has almost entirely been ignored. This needs to change.

The problems that the present study makes clear may eventually require re-evaluation of a great deal of prior psychometric assessment of scales. Many psychometric studies employ one-wave factor analysis to indicate that a single factor suffices to explain the covariance structure of the indicators at that wave and, neglecting the possibility of causal effects across factors, perhaps erroneously conclude that there is, in fact, a single underlying univariate factor. Because of the potential of causal effects across factors such conclusions require re-evaluation. It is important to note, however, that even if this is the case, it does not necessarily imply that the scales themselves are problematic. Many of them may be reasonable assessments for their corresponding constructs. What is problematic is not necessarily the scales themselves, but the evidence that has been used for claims of unidimensionality [27,28].

As shown in this study, distinct factors with causal relationships can, over time, seemingly collapse into a single factor. This may be especially problematic with items that, on the face of it, would seem to correspond to two or more distinct constructs with construct phenomena that may be causally related, and that are then claimed, from one-wave factor analysis, to be unidimensional. Such was the case with analyses above concerning anxiety and depression. However, such claims of unidimensionality may also be problematic with regard to a series of indicators that seem to correspond to a single construct, but that may constitute causally related, but distinct, facets. Indeed, the analysis concerning the generalizations given in Section 6 implies that even if each indicator represented its own “factor,” if each of these positively affected the others, then over time the process will become indistinguishable from that corresponding to a single factor. The setting of indicators corresponding to distinct facets of a single construct with causal relationships between the different aspects of the phenomena represented by each indicator may, in actuality, be a very common setting, perhaps one in fact corresponding to most psychosocial phenomena. It is entirely possible that this insight has been regularly missed because of the causal relationships between distinct facets and the over-reliance on factor analysis with one wave of data. Furthermore, as discussed elsewhere [27,28], even scales with an underlying univariate latent variable may still give rise to causal structures such that distinct indicators have differential causal effects on outcomes. A causal interpretation of the common factor model [29] may thus often not be reasonable. Claims, therefore, based on one-wave factor analysis, that there is a single underlying univariate latent variable that is all that is causally relevant are thus highly problematic. It is perhaps well accepted that good causal inference requires careful measurement. But the discussion above indicates that the converse is also true: good measure evaluation requires careful causal inference.

The way forward with regard to dimensionality assessment for a set of indicators is not entirely clear. It is clear that current practices are, in many contexts, flawed in the ways documented above. In the presence of potential causal effects amongst factors, almost certainly two waves of data collection on all item responses will be needed so as to attempt to disentangle causal from supposedly conceptual relationships. However, even with two waves of data available, further work would need to be done on the correct analytic approach. Exploratory structural equation modeling [30,31] might provide a potential way forward as it allows for multiple waves of data, data-driven dimensionality assessment, and the specification of potential causal effects. It is possible that with two waves of data, if one were to impose time-invariant loadings and allow the factors at each wave to affect the other factors one wave later, but impose no other assumptions beyond standard linearity/normality, that this would suffice to correctly identify that, at wave 2, two factors independent of each other conditional on the past were present, with the two sets of items loading on separate factors. Intuitively, it might, in this way, be possible to use the associations between wave 1 and wave 2 item responses to try to infer the causal relations among the factors, and then effectively use the correlations amongst item responses at wave 2, once the causal effects are “netted out,” to try to infer the underlying factor structure and factor loadings. This would, however, require further development and evaluation of the conditions under which such an approach would lead to correct identification of the underlying causal and factor analytic structure. Regardless of such future developments, however, the results of this study suggest that we should be wary of claims of unidimensionality for psychosocial phenomena assessed by various indicators and then evaluated by one-way factor analysis. Factor analyses with one wave of data should themselves be interpreted as characterizing associations among indicators that may be present due to either conceptual relations or causal relations concerning the underlying construct phenomena.

Acknowledgements

The authors thank Bengt Muthén for helpful comments on the manuscript.

  1. Funding information: This research was funded by the National Institutes of Health, U.S.A.

  2. Conflict of interest: Prof. Tyler J. VanderWeele is a member of the Editorial Board of the Journal of Causal Inference but was not involved in the review process of this article.

  3. Data availability statement: Code for the data and simulations is available in the online Supplemental Materials.

References

[1] Thompson B. Exploratory and confirmatory factor analysis: Understanding concepts and applications. Washington, DC: American Psychological Association; 2004.10.1037/10694-000Search in Google Scholar

[2] Comrey AL, Lee HB. A first course in factor analysis. New York, NY: Psychology Press; 2013.10.4324/9781315827506Search in Google Scholar

[3] Kline P. An easy guide to factor analysis. New York, NY: Routledge; 2014.10.4324/9781315788135Search in Google Scholar

[4] DeVellis RF. Scale development: Theory and applications. Thousand Oaks, CA: Sage Publications; 2016.Search in Google Scholar

[5] Morgan SL, Winship C. Counterfactuals and causal inference. New York, NY: Cambridge University Press; 2015.10.1017/CBO9781107587991Search in Google Scholar

[6] VanderWeele TJ. Explanation in causal inference: Methods for mediation and interaction. New York, NY: Oxford University Press; 2015.10.1093/ije/dyw277Search in Google Scholar PubMed PubMed Central

[7] Hernán MA, Robins JM. Causal inference: What if. Boca Raton: Chapman & Hall/CRC; 2022.Search in Google Scholar

[8] Stock J, Watson M. Dynamic factor models. In: MP Clements, DF Hendry, editors. Oxford handbook on economic forecasting. Oxford: Oxford University Press; 2011.10.1093/oxfordhb/9780195398649.013.0003Search in Google Scholar

[9] Shumway RH, Stoffer DS. Time series analysis and its applications: With R examples. 4th ed. Springer; 2017.10.1007/978-3-319-52452-8Search in Google Scholar

[10] Asparouhov T, Hamaker EL, Muthén B. Dynamic structural equation models. Struct Equ Model: A Multidiscip J. 2018;25:359–88.10.1080/10705511.2017.1406803Search in Google Scholar

[11] Meyer CD. Matrix analysis and applied linear algebra. Philadelphia, PA: SIAM; 2000.10.1137/1.9780898719512Search in Google Scholar

[12] Oldenburger R. Infinite powers of matrices and characteristic roots. Duke Math J. 1940;6:357–61.10.1215/S0012-7094-40-00627-5Search in Google Scholar

[13] Serfozo R. Basics of applied stochastic processes. Berlin, Germany: Springer; 2009.10.1007/978-3-540-89332-5Search in Google Scholar

[14] Field A. Discovering statistics using SPSS. 4th ed. London, UK: SAGE; 2013.Search in Google Scholar

[15] Feldman LA. Distinguishing depression and anxiety in self-report: Evidence from confirmatory factor analysis on nonclinical and clinical samples. J Consulting Clin Psychol. 1993;61:631–8.10.1037/0022-006X.61.4.631Search in Google Scholar

[16] Norton S, Cosco T, Doyle F, Done J, Sacker A. The hospital anxiety and depression scale: A meta confirmatory factor analysis. J Psychosom Res. 2013;74:74–81.10.1016/j.jpsychores.2012.10.010Search in Google Scholar PubMed

[17] Jacobson NC, Newman MG. Anxiety and depression as bidirectional risk factors for one another: A meta-analysis of longitudinal studies. Psychol Bull. 2017;143:1155–200.10.1037/bul0000111Search in Google Scholar PubMed

[18] Węziak-Białowolska D, McNeely E, VanderWeele T. Flourish index and secure flourish index – Development and validation. Soc Sci Res Netw. 2017;3145336. 10.2139/ssrn.3145336.Search in Google Scholar

[19] Kim ES, Delaney SW, Tay L, Chen Y, Diener E, VanderWeele TJ. Life satisfaction and subsequent physical, behavioral, and psychosocial health in older adults. Milbank Q. 2021;99:209–39.10.1111/1468-0009.12497Search in Google Scholar PubMed PubMed Central

[20] Kim ES, Nakamura JS, Chen Y, Ryff CD, VanderWeele TJ. Sense of purpose in life and subsequent health and well-being in older adults: an outcome-wide analysis. Am J Health Promotion. 2022;36:137–47.10.1177/08901171211038545Search in Google Scholar PubMed PubMed Central

[21] Weziak-Bialowolska D, Bialowolski P, Lee MT, Chen Y, VanderWeele TJ, McNeely E. Psychometric properties of flourishing scales from a comprehensive well-being assessment. Front Psychol. 2021;12:1033.10.3389/fpsyg.2021.652209Search in Google Scholar PubMed PubMed Central

[22] Perron O. Zur Theorie der Matrices. Mathematische Annalen. 1907;64:248–63.10.1007/BF01449896Search in Google Scholar

[23] Frobenius G. Ueber Matrizen aus nicht negativen Elementen. Sitzungsberichte der Königlich Preussischen Akademie der Wissenschaften. 1912;23:456–77.Search in Google Scholar

[24] Noutsos D. On Perron–Frobenius property of matrices having some negative entries. Linear Algebra its Appl. 2006;412:132–53.10.1016/j.laa.2005.06.037Search in Google Scholar

[25] Pearl J. Causality. New York, NY: Cambridge University Press; 2009.Search in Google Scholar

[26] Imbens GW, Rubin DB. Causal inference in statistics, social, and biomedical sciences. New York, NY: Cambridge University Press; 2015.10.1017/CBO9781139025751Search in Google Scholar

[27] VanderWeele TJ. Constructed measures and causal inference: towards a new model of measurement for psychosocial constructs. Epidemiology. 2022;33:141–51.10.1097/EDE.0000000000001434Search in Google Scholar PubMed PubMed Central

[28] VanderWeele TJ, Vansteelandt S. A statistical test to reject the structural interpretation of a latent factor model. J R Stat Society Ser B. 2022;84:2032–54.10.1111/rssb.12555Search in Google Scholar PubMed PubMed Central

[29] Van Bork R, Wijsen LD, Rhemtulla M. Toward a causal interpretation of the common factor model. Disputatio. 2017;9(47):581–601.10.1515/disp-2017-0019Search in Google Scholar

[30] Asparouhov T, Muthén B. Exploratory structural equation modeling. Struct Equ Model: A Multidiscip J. 2009;16:397–438.10.1080/10705510903008204Search in Google Scholar

[31] Marsh HW, Muthén B, Asparouhov T, Lüdtke O, Robitzsch A, Morin AJ, et al. Exploratory structural equation modeling, integrating CFA and EFA: Application to students’ evaluations of university teaching. Struct Equ Model: A Multidiscip J. 2009;16:439–76.10.1080/10705510903008220Search in Google Scholar

Received: 2022-11-07
Revised: 2023-03-10
Accepted: 2023-04-18
Published Online: 2023-07-28

© 2023 the author(s), published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

  1. Research Articles
  2. Adaptive normalization for IPW estimation
  3. Matched design for marginal causal effect on restricted mean survival time in observational studies
  4. Robust inference for matching under rolling enrollment
  5. Attributable fraction and related measures: Conceptual relations in the counterfactual framework
  6. Causality and independence in perfectly adapted dynamical systems
  7. Sensitivity analysis for causal decomposition analysis: Assessing robustness toward omitted variable bias
  8. Instrumental variable regression via kernel maximum moment loss
  9. Randomization-based, Bayesian inference of causal effects
  10. On the pitfalls of Gaussian likelihood scoring for causal discovery
  11. Double machine learning and automated confounder selection: A cautionary tale
  12. Randomized graph cluster randomization
  13. Efficient and flexible mediation analysis with time-varying mediators, treatments, and confounders
  14. Minimally capturing heterogeneous complier effect of endogenous treatment for any outcome variable
  15. Quantitative probing: Validating causal models with quantitative domain knowledge
  16. On the dimensional indeterminacy of one-wave factor analysis under causal effects
  17. Heterogeneous interventional effects with multiple mediators: Semiparametric and nonparametric approaches
  18. Exploiting neighborhood interference with low-order interactions under unit randomized design
  19. Robust variance estimation and inference for causal effect estimation
  20. Bounding the probabilities of benefit and harm through sensitivity parameters and proxies
  21. Potential outcome and decision theoretic foundations for statistical causality
  22. 2D score-based estimation of heterogeneous treatment effects
  23. Identification of in-sample positivity violations using regression trees: The PoRT algorithm
  24. Model-based regression adjustment with model-free covariates for network interference
  25. All models are wrong, but which are useful? Comparing parametric and nonparametric estimation of causal effects in finite samples
  26. Confidence in causal inference under structure uncertainty in linear causal models with equal variances
  27. Special Issue on Integration of observational studies with randomized trials - Part II
  28. Personalized decision making – A conceptual introduction
  29. Precise unbiased estimation in randomized experiments using auxiliary observational data
  30. Conditional average treatment effect estimation with marginally constrained models
  31. Testing for treatment effect twice using internal and external controls in clinical trials
Downloaded on 20.12.2025 from https://www.degruyterbrill.com/document/doi/10.1515/jci-2022-0074/html
Scroll to top button