Abstract
It is shown, with two sets of indicators that separately load on two distinct factors, independent of one another conditional on the past, that if it is the case that at least one of the factors causally affects the other, then, in many settings, the process will converge to a factor model in which a single factor will suffice to capture the covariance structure among the indicators. Factor analysis with one wave of data then cannot distinguish between factor models with a single factor vs those with two factors that are causally related. Therefore, unless causal relations between factors can be ruled out a priori, alleged empirical evidence from one-wave factor analysis for a single factor still leaves open the possibilities of a single factor or of two factors that causally affect one another. The implications for interpreting the factor structure of psychological scales, such as self-report scales for anxiety and depression, or for happiness and purpose, are discussed. The results are further illustrated through simulations to gain insight into the practical implications of the results in more realistic settings prior to the convergence of the processes. Some further generalizations to an arbitrary number of underlying factors are noted. Factor analyses with one wave of data should themselves be interpreted as characterizing associations among indicators that may be present either due to conceptual relations or due to causal relations concerning the underlying construct phenomena.
1 Introduction
Exploratory factor analysis [1,2,3] is frequently used to assess the dimensionality of a set of indicators in a survey. In many cases, the motivation for such an analysis is to allegedly demonstrate that a set of items constitutes a unidimensional scale. Such factor analysis is typically carried out with only a single wave of data collected, i.e., at a single time point at which all of the items are assessed. Establishing the unidimensionality of a scale is often viewed as an important part of scale development [4], to be carried about before the scale is employed in longitudinal data collection efforts. Considerations of causal effects in factor analysis is generally ignored.
It is well-known that one cannot typically assess causal relations with a single wave of cross-sectional data in which data on all variables are collected at the same time [5,6,7]. However, the implications of this fact for the psychometric evaluation of scales have been neglected. If, for example, there are two underlying factors that explain a set of survey item responses or indicators, and if these factors are causally related, it will not be possible, with a single wave of data, to assess causal relations between them. Unfortunately, this has rather serious consequences for attempts to assess factor dimensionality with a single wave of data. Specifically, with a single wave of data, it would not be possible to distinguish associations arising from causal relations between the factors from those allegedly arising from conceptual relations among the indicators. The present study formalizes this intuition and discusses the implications for the practice of factor analysis.
2 Factor analysis with two causally related factors
Consider a standard factor analytic model [1,2,3], with two sets of survey item responses,

Causal effects of two latent factors on each other over time.
More generally, if we let
where
Now suppose that the latent factors
where
Theorem 1
Suppose indicators
and that
with dim(
Proof of Theorem 1
Since
and by iteration
For the process
where either the identity or the zeros along the diagonal may possibly be absent. Define
By condition (iii), the random variables
We then have that
Define
with
Theorem 1 implies that, under the conditions given in the Theorem, the process constituted by the set of indicators
The conditions (i)–(iii) of Theorem 1 are technical and will be considered in somewhat greater detail further below. However, briefly, condition (i) is simply an assumption that the process
Theorem 1 in fact has particularly striking consequences when there are only two factors
Corollary
Under the conditions of Theorem 1, if dim (
is such that dim (
Proof of Corollary
With two latent factors,
then
then
The corollary states that for two latent factors
We are thus left with the conclusion that, in equilibrium as
Stated another way, Corollary 1 indicates that causal effects of one latent factor on another imply that, in equilibrium, if only a single wave of data is collected, a factor model with one factor will sometimes suffice, even if the true underlying structures are such that there are two causally related factors. The implications of this result for the current practice of factor analysis are unsettling. Efforts are often made during scale development to demonstrate unidimensionality of a set of item responses using factor analysis with one wave of data [4]. If factor analysis provides evidence that a single factor is sufficient to explain most of the covariance in the item responses, this is generally deemed satisfactory. However, the argument above indicates that this is also exactly the empirical result that one might expect, with one-wave factor analysis, if there were in fact two distinct latent factors that causally affected each other over time. With a one-wave factor analysis, if there is evidence for more than one factor, then this may be genuine evidence against unidimensionality. But if a one-wave factor analysis suggests only one factor, then we cannot distinguish between the possibilities of two factors with causal effects over time vs a single factor.
The only way we could establish unidimensionality in this case would be if we could rule out, a priori, that, if there were two or more factors, then they definitively did not causally affect one another. But it is difficult to imagine circumstances in which we were uncertain, on conceptual grounds, about the number of factors, but knew that, if more than one existed, then they were causally unrelated. We are left with the conclusion that, in many circumstances, alleged empirical evidence for a single factor from one-wave factor analysis in fact essentially cannot rule out the possibility of the presence of two factors with causal effects on one another.
A number of potential objections to the conclusions of the analysis above concerning current factor analytic practices might be put forward. First, Figure 1 considers only a relatively simple factor model with each indicator loading only on one of the two factors, and with independent errors. In some sense though, this is an ideal case, when one might most expect to be able to discern two separate factors from a one-wave factor analysis. And even in this ideal case, one could not in equilibrium distinguish the models with a single factor vs two factors that were causally related. However, Theorem 1 and its Corollary did not in fact rely on indicators loading on only a single factor or on independent errors. Even under these more complex structures, if at least one factor causally affects the other, then the argument above shows that, in equilibrium, under those conditions, a factor model with one factor will suffice.
Second, the argument above also imposed certain assumptions on the matrix
Third, one might dispute that equilibrium is ever achieved and object that the notion of convergence as
and thus, for there being only very slight deviation from a single factor explaining the set of indicators at time k. Moreover, in practice, when investigators use factor analysis to determine the number of factors they will often ignore factor loadings below 0.3 and indeed the guidance given in textbooks on exploratory factor analysis is to do precisely this [14]. We are not endorsing this practice, but only pointing out that it is often employed. However, if relatively low factor loadings are indeed ignored, then in principle only a very small number of steps may be sufficient for an investigator to erroneously conclude in practice that a single factor was present. We illustrate this below in the simulation studies. Thus, once again, if there are causal effects of one factor on another and the process
It is possible that sufficiently prior to convergence being obtained, factor analyses with one wave of data could be employed to uncover aspects of the underlying processes. For example, under the data-generating structure in Figure 1, prior to convergence, exploratory factor analysis, with a sufficiently large sample size, allowing for correlated/oblique factors (e.g., using varimax rotation, cf. [1,2,3]) could uncover the fact that the two sets of items were loaded on two separate factors. However, an analyst who instead employed independent/orthogonal factors would attain an equally good fit with two independent factors and all items loading on both factors. In this regard, fit statistics may be of limited use. The factor model, even sufficiently prior to convergence, is unidentified from one wave of data and so it is impossible to distinguish between the two solutions. In some sense, each analyst would correctly identify some aspect of the underlying processes: the first analyst would correctly identify distinct sets of factor loadings, and the second analyst would correctly identify the potential independence of the factors (though, at a given wave, they are only independent of each other conditional on the past). However, it is not possible to fully identify the correct structure with only one wave of data. Again, the causal relationships between the factors renders such identification impossible with one wave of data.
More generally, regardless of the equilibrium argument, and the specification of the error terms, it should be clear even from Figure 1 itself, without any further algebraic derivation, that if there are two factors that causally affect one another, and a factor model is fit with a single wave of data, the covariance amongst the indicators will arise both from the underlying factor for each indicator, and from the causal effect of one factor on the other. The model with factor loadings for each indicator and with causal effects across factors is effectively unidentified with a single wave of data. There is no way to distinguish the two sources of covariance with one wave of data. If there were no causal effects of
3 Example from prior literature: anxiety and depression
Feldman [15] examined self-report scales for anxiety and depression and employed factor analysis to assess whether these self-report anxiety and depression scales measure distinct constructs. Based on results from factor analysis with one wave of data she concludes, “These analyses provide evidence that anxiety and depression self-report scales do not measure discriminant mood constructs and may therefore be better thought of as measures of general negative mood rather than as measures of anxiety and depression per se.” Similar conclusions were drawn by Norton et al. [16] in a meta-confirmatory factor analysis using data from 28 samples concerning the Hospital Anxiety and Depression Scale. They conclude that “Due to the presence of a strong general factor, [the Hospital Anxiety and Depression Scale] does not provide good separation between symptoms of anxiety and depression. We recommend it is best used as a measure of general distress.”
However, these conclusions ignore the role that causal relationships between the phenomena of depression and anxiety may play in these factor analytic empirical results. There is in fact evidence from numerous longitudinal studies that the experience of anxiety renders subsequent depression more likely, and likewise that the experience of depression renders subsequent anxiety more likely [17]. It is likely that each has causal effects on the other. This of course has implications for the interpretation of factor analyses such as those of Feldman [15] and Norton et al. [16]. If the experience of anxiety causes depression, and the experience of depression causes anxiety, or even if just one of these two causal relations held, then even if it were the case that the anxiety and depression items loaded on distinct anxiety and depression factors, from the results above, one might still anticipate, from a one-wave factor analysis, evidence for a factor structure with only a single factor. The results of a one-wave factor analysis, such as those of Feldman [15] and Norton et al. [16], are exactly what one might expect, even if there were two distinct causally related factors. Allowing for the possibility of causally related factors, and in this case there is good reason to expect that possibility, the results of Feldman [15] and Norton et al. [16] then cannot adequately distinguish between the possibility of two separate factors with causal effects vs a single factor. The basic emotions of sadness and fear, underlying depressive and anxiety disorders, respectively, are arguably clearly conceptually distinct. It may well be the case that the only reason the analyses of Feldman [15] and Norton et al. [16] supposedly indicate that the two cannot be separated is because there are causal relations which their analyses cannot assess because they use only one wave of data.
4 Data analysis example: happiness and purpose
Węziak-Białowolska et al. [18] considered data from a Health and Well-Being Survey including various well-being indicators for happiness, health, purpose, character, relationships, and financial security. Using factor analysis with one wave of data, they presented evidence for distinct well-being factors for health, character, relationships, and financial security. However, the factor analyses did not indicate separate factors for happiness and purpose in life, but suggested these two as a single factor. It is possible that the failure to distinguish these factors arises in part from causal effects of purpose in life on happiness, and possibly also from effects of happiness on purpose in life. There is some evidence, from longitudinal studies, for effects in both of these directions, and the evidence for effects of purpose on happiness and life satisfaction is perhaps especially pronounced [19,20].
We revisit here this issue of distinguishing between happiness and purpose using data from a Well-being Assessment in year 2019 of 1,209 employees at a large insurance company [21]. Specifically, we will consider data on three items for happiness, namely, “How satisfied are you with life as a whole these days?,” “How happy have you felt during the last 7 days?,” and “I expect more good things in my life than bad,” each with scores 0–10; and also three items for purpose: “To what extent do you feel the things you do in your life are worthwhile?,” “I understand my purpose in life,” and “I am pursuing what is most important to me in my life,” again each with scores 0–10. In fitting a factor model using varimax rotation [1,2,3] with two factors, the factor loadings on the first factor are 0.812, 0.836, 0.539, 0.625, 0.355, and 0.403 and the factor loadings on the second factor are 0.397, 0.335, 0.515, 0.536, 0.851, and 0.770. The first purpose indicator has a larger factor loading on what is supposedly the happiness factor (0.625) than on what is supposedly the purpose factor (0.536), and the loading on the happiness factor for this purpose indicator is also larger than the happiness factor loading for the third happiness indicator (0.539). The third happiness indicator moreover has a substantial factor loading on the purpose factor (0.515), of roughly the same magnitude as its loading on the happiness factor (0.539). Furthermore, all six indicators have factor loadings above 0.3 on both factors. This one-wave factor analyses may, however, in part be confounded by the effects that each of past happiness and past purpose-in-life have on the present values of other.
Fortunately, in the case of this well-being survey, data are also available on the same items, one year prior, in 2018. To attempt to control for potential causal effects of purpose and happiness, stratification on past values of purpose and happiness can be carried out. Given prior evidence for the especially strong effects of purpose on happiness, attention could be restricted to the subgroup for which levels of purpose in 2018 were below the median and levels of happiness in 2018 were already above the median. Fitting again a factor model using varimax rotation with two factors with the 2019 data, with restriction to this stratum to attempt to control for confounding for causal effects of past purpose on happiness, the factor loadings on the first factor then becomes 0.867, 0.804, 0.459, 0.449, 0.194, and 0.172 and the factor loadings on the second factor are 0, 0.211, 0.249, 0.399, 0.751, and 0.865. While the separation of the factors is not perfect, it is considerably better. The first purpose indicator no longer has a stronger factor loading on the happiness factor than the third happiness indicator; all of the happiness indicators have factor loadings on the purpose factor that are below 0.3; and all of the purpose indicators except the first now also have factor loadings on the happiness factor that are below 0.3. Although such crude stratification by past values of happiness and purpose only partially controls for the causal effects of purpose on happiness, we see that even with this relatively crude form of control, there is greater separation between the happiness and purpose factors.
5 Simulations
In this section we illustrate the results further through simulations to gain insight into the implications of the results in more realistic settings in which only a finite and possibly small number of time periods or steps have taken place so that convergence is not yet achieved. Consider again, the dynamic factor model with
where
where
Consider a setting in which at each time t the first three items load only on the first factor and the second three items load only on the second factor, as in Figure 1, with relatively strong factor loadings given by
so that
We will first illustrate the results above with a particular set of parameter values and then consider, over a broad range of the parameter values of
Consider first the setting with
To gain further insight into how quickly the two factors become indistinguishable, we consider a range of scenarios in which we vary the magnitude of the effect of the factors on each other,
Mean values, standard deviations, minimum, maximum, median, and 25th and 75th quantiles of number of iterations across the 500 datasets before the two distinct factors become effectively indistinguishable based on a
| N | γ | δ | Mean value | Std dev. | Min | 25% | Median | 75% | Max |
|---|---|---|---|---|---|---|---|---|---|
| 300 | 0.1 | 0.9 | 5.56 | 1.05 | 3 | 5 | 5 | 6 | 10 |
| 300 | 0.1 | 0.8 | 4.66 | 0.78 | 3 | 4 | 5 | 5 | 7 |
| 300 | 0.1 | 0.65 | 4.19 | 0.72 | 3 | 4 | 4 | 5 | 6 |
| 300 | 0.35 | 0.9 | 2.94 | 0.78 | 2 | 2 | 3 | 3 | 6 |
| 300 | 0.35 | 0.8 | 2.50 | 0.63 | 2 | 2 | 2 | 3 | 5 |
| 300 | 0.35 | 0.65 | 2.26 | 0.49 | 2 | 2 | 2 | 2 | 5 |
| 300 | 0.5 | 0.9 | 2.30 | 0.70 | 1 | 2 | 2 | 3 | 6 |
| 300 | 0.5 | 0.8 | 2.03 | 0.55 | 1 | 2 | 2 | 2 | 6 |
| 300 | 0.5 | 0.65 | 1.89 | 0.54 | 1 | 2 | 2 | 2 | 4 |
| 1,000 | 0.1 | 0.9 | 7.90 | 1.13 | 5 | 7 | 8 | 9 | 12 |
| 1,000 | 0.1 | 0.8 | 5.85 | 0.75 | 4 | 5 | 6 | 6 | 9 |
| 1,000 | 0.1 | 0.65 | 5.16 | 0.67 | 4 | 5 | 5 | 6 | 7 |
| 1,000 | 0.35 | 0.9 | 4.62 | 1.16 | 3 | 4 | 5 | 5 | 9 |
| 1,000 | 0.35 | 0.8 | 3.46 | 0.66 | 2 | 3 | 3 | 4 | 6 |
| 1,000 | 0.35 | 0.65 | 2.86 | 0.56 | 2 | 3 | 3 | 3 | 5 |
| 1,000 | 0.5 | 0.9 | 3.82 | 1.11 | 2 | 3 | 4 | 5 | 7 |
| 1,000 | 0.5 | 0.8 | 2.67 | 0.66 | 2 | 2 | 3 | 3 | 6 |
| 1,000 | 0.5 | 0.65 | 2.19 | 0.41 | 2 | 2 | 2 | 2 | 4 |
| 3,000 | 0.1 | 0.9 | 10.35 | 1.28 | 8 | 9 | 10 | 11 | 15 |
| 3,000 | 0.1 | 0.8 | 7.05 | 0.76 | 6 | 7 | 7 | 7 | 10 |
| 3,000 | 0.1 | 0.65 | 6.01 | 0.70 | 5 | 6 | 6 | 6 | 9 |
| 3,000 | 0.35 | 0.9 | 7.07 | 1.21 | 4 | 6 | 7 | 8 | 11 |
| 3,000 | 0.35 | 0.8 | 4.33 | 0.74 | 3 | 4 | 4 | 5 | 7 |
| 3,000 | 0.35 | 0.65 | 3.30 | 0.49 | 3 | 3 | 3 | 4 | 5 |
| 3,000 | 0.5 | 0.9 | 6.31 | 1.98 | 3 | 5 | 6 | 7 | 40 |
| 3,000 | 0.5 | 0.8 | 3.62 | 0.72 | 2 | 3 | 4 | 4 | 7 |
| 3,000 | 0.5 | 0.65 | 2.59 | 0.62 | 2 | 2 | 3 | 3 | 5 |
As can be seen from Table 1, in a relatively small number of iterations, it effectively becomes no longer possible to distinguish the two factors. Each of the parameters that is varied clearly matters. As would be expected, larger cross-factor effects
There is some variation across the 500 datasets with regard to the number of iterations before the factors become effectively indistinguishable. The parameter that most strongly affects the number of iterations before the factors become effectively indistinguishable appears to be the rate of decay of the random variation in the process for the underlying latent factors. When
In summary, when the cross-factor effects are moderate or large, and the decay of the random variation in the process for the latent variables is moderate or fast, very few iterations would be needed before two distinct factors effectively become indistinguishable. With modest cross-factor effects, and slower rates of decay, it may continue to be possible to distinguish distinct factors especially if sample sizes are large and the process by which the underlying latent variables causally affect one other has not proceeded for a longer period of time.
6 Some generalizations
We will now consider some generalizations of the results above, and show that similar phenomena may arise with an arbitrary set of k factors. A set of k causally related factors can likewise, in a one-wave factor analysis, give rise to patterns of association among item responses that, in equilibrium, as
Suppose once again that
and suppose that the latent factors
Under the notation and assumptions of Theorem 1, we have that, as
where, by Theorem 1, the dimensionality of
with D being a matrix of Jordan normal form in the Jordan decomposition of B.
We can consider the rank of
where each
For each
For each equivalence class, r, that constitutes only one factor, we have
Thus, for an equivalence class of latent factors wherein each factor in that equivalence class positively causally affects the others, this equivalence class will contribute at most 1 to the rank of
Thus, in many cases, each equivalence class, as defined above, will give rise to, in equilibrium, at most a single factor. This will always be the case in the models above when a factor constitutes its own equivalence class, when an equivalence class has only two factors, or when an equivalence class is such that each of the factors positively affects each of the others. It may or may not be the case in other settings. However, as discussed further below, the final case of an equivalence class such that each of the factors positively affects each of the others may be relevant in settings with a series of indicators that are all closely related to a single construct but represent distinct facets of the construct with potentially different causal relationships to other outcomes. This may in fact be a relatively common setting. And once again, if this were the case, it would follow that, over time, this setting would become indistinguishable, in one-way factor analysis, from that of having only a single factor.
In settings in which it is known in advance that certain subgroups of factors cannot affect other subgroups of factors, the analysis above would still be applicable within each subgroup of factors. In cases in which the factor structure might differ according to subgroups defined by some other variable, the analysis above would still be applicable within subgroups defined by that variable. However, for one of the most common uses of factor analysis – to attempt to establish the unidimensionality of a scale – the generalizations here are not in fact needed. A one-wave factor analysis suggesting evidence for a single factor could arise from a single underlying factor, or from two causally related factors, or from a set of k causally related factors each of which positively affects the other. The central point, demonstrated already in Section 2, is that one cannot distinguish, from evidence for unidimensionality from a one-wave factor analysis, whether there is one factor or whether there may be more than one in the presence of causal effects of the factors upon one another.
7 Discussion
The implications of the present work for current psychometric practices are potentially far-reaching. Factor analysis with one wave of data seemingly cannot distinguish between factor models with a single factor vs those with two or more factors that causally affect one another over time. If causal relations between factors cannot be ruled out a priori, alleged empirical evidence from one-wave factor analysis for a single factor still leaves open the possibilities of a single factor or of two or more factors that causally affect one another. It would, moreover, as noted above, be very unusual for it to be the case that one was uncertain as to the dimensionality of a set of factors, but confident that, if there were several, they would be causally unrelated. The results above pertain to linear models, but similar difficulties would in general arise in non-linear models as well, and trying to distinguish between causal relations and the dimensionality of factors would likely be yet more challenging still. In most cases, we thus effectively cannot distinguish between these possibilities when one-wave factor analyses conclude that only one factor suffices to explain the covariance structure of the indicators at that wave. This arguably constitutes a substantial portion of factor analytic studies. In the models considered above, the conclusion of two factors from a one-wave factor analysis would suffice to conclude that one factor is insufficient; but the supposed conclusion of one factor does not preclude the possibility of two factors being present that are causally related.
The problems arise because of the inability to distinguish between causation and alleged conceptual relationships with one wave of data. It is well-known in statistics, and in the biomedical and social sciences, that correlation does not imply causation. The sub-discipline of causal inference [5,6,7,25,26] provides a formal framework to reason about the assumptions needed to move from conclusions of association to conclusions of causation. Such careful thought helps us avoid the fallacy that “Correlation implies causation.” We might refer to this as the “causal fallacy.” Unfortunately, however, a converse fallacy seems to typically arise in psychometric measurement evaluation, namely, that “Correlation cannot imply causation – it must indicate a conceptual relationship.” This too, of course, is false. Correlations may sometimes arise from conceptual relationships; but sometimes correlations may arise from causal relationships. As shown in this study, it is this second converse fallacy that underlies dimensionality assessment in most psychometric work on evaluating measures. We might refer to this converse fallacy that “Correlation cannot imply causation – it must indicate a conceptual relationship” as the “measurement fallacy.” From the discussion above, it is arguably the case that this measurement fallacy in fact ought to be treated with the same level of critique and skepticism that is appropriately directed at the causal fallacy. Both are fallacies. Both fallacies need to be avoided. A fair amount of attention has been given to the causal fallacy. However, to date, the measurement fallacy has almost entirely been ignored. This needs to change.
The problems that the present study makes clear may eventually require re-evaluation of a great deal of prior psychometric assessment of scales. Many psychometric studies employ one-wave factor analysis to indicate that a single factor suffices to explain the covariance structure of the indicators at that wave and, neglecting the possibility of causal effects across factors, perhaps erroneously conclude that there is, in fact, a single underlying univariate factor. Because of the potential of causal effects across factors such conclusions require re-evaluation. It is important to note, however, that even if this is the case, it does not necessarily imply that the scales themselves are problematic. Many of them may be reasonable assessments for their corresponding constructs. What is problematic is not necessarily the scales themselves, but the evidence that has been used for claims of unidimensionality [27,28].
As shown in this study, distinct factors with causal relationships can, over time, seemingly collapse into a single factor. This may be especially problematic with items that, on the face of it, would seem to correspond to two or more distinct constructs with construct phenomena that may be causally related, and that are then claimed, from one-wave factor analysis, to be unidimensional. Such was the case with analyses above concerning anxiety and depression. However, such claims of unidimensionality may also be problematic with regard to a series of indicators that seem to correspond to a single construct, but that may constitute causally related, but distinct, facets. Indeed, the analysis concerning the generalizations given in Section 6 implies that even if each indicator represented its own “factor,” if each of these positively affected the others, then over time the process will become indistinguishable from that corresponding to a single factor. The setting of indicators corresponding to distinct facets of a single construct with causal relationships between the different aspects of the phenomena represented by each indicator may, in actuality, be a very common setting, perhaps one in fact corresponding to most psychosocial phenomena. It is entirely possible that this insight has been regularly missed because of the causal relationships between distinct facets and the over-reliance on factor analysis with one wave of data. Furthermore, as discussed elsewhere [27,28], even scales with an underlying univariate latent variable may still give rise to causal structures such that distinct indicators have differential causal effects on outcomes. A causal interpretation of the common factor model [29] may thus often not be reasonable. Claims, therefore, based on one-wave factor analysis, that there is a single underlying univariate latent variable that is all that is causally relevant are thus highly problematic. It is perhaps well accepted that good causal inference requires careful measurement. But the discussion above indicates that the converse is also true: good measure evaluation requires careful causal inference.
The way forward with regard to dimensionality assessment for a set of indicators is not entirely clear. It is clear that current practices are, in many contexts, flawed in the ways documented above. In the presence of potential causal effects amongst factors, almost certainly two waves of data collection on all item responses will be needed so as to attempt to disentangle causal from supposedly conceptual relationships. However, even with two waves of data available, further work would need to be done on the correct analytic approach. Exploratory structural equation modeling [30,31] might provide a potential way forward as it allows for multiple waves of data, data-driven dimensionality assessment, and the specification of potential causal effects. It is possible that with two waves of data, if one were to impose time-invariant loadings and allow the factors at each wave to affect the other factors one wave later, but impose no other assumptions beyond standard linearity/normality, that this would suffice to correctly identify that, at wave 2, two factors independent of each other conditional on the past were present, with the two sets of items loading on separate factors. Intuitively, it might, in this way, be possible to use the associations between wave 1 and wave 2 item responses to try to infer the causal relations among the factors, and then effectively use the correlations amongst item responses at wave 2, once the causal effects are “netted out,” to try to infer the underlying factor structure and factor loadings. This would, however, require further development and evaluation of the conditions under which such an approach would lead to correct identification of the underlying causal and factor analytic structure. Regardless of such future developments, however, the results of this study suggest that we should be wary of claims of unidimensionality for psychosocial phenomena assessed by various indicators and then evaluated by one-way factor analysis. Factor analyses with one wave of data should themselves be interpreted as characterizing associations among indicators that may be present due to either conceptual relations or causal relations concerning the underlying construct phenomena.
Acknowledgements
The authors thank Bengt Muthén for helpful comments on the manuscript.
-
Funding information: This research was funded by the National Institutes of Health, U.S.A.
-
Conflict of interest: Prof. Tyler J. VanderWeele is a member of the Editorial Board of the Journal of Causal Inference but was not involved in the review process of this article.
-
Data availability statement: Code for the data and simulations is available in the online Supplemental Materials.
References
[1] Thompson B. Exploratory and confirmatory factor analysis: Understanding concepts and applications. Washington, DC: American Psychological Association; 2004.10.1037/10694-000Search in Google Scholar
[2] Comrey AL, Lee HB. A first course in factor analysis. New York, NY: Psychology Press; 2013.10.4324/9781315827506Search in Google Scholar
[3] Kline P. An easy guide to factor analysis. New York, NY: Routledge; 2014.10.4324/9781315788135Search in Google Scholar
[4] DeVellis RF. Scale development: Theory and applications. Thousand Oaks, CA: Sage Publications; 2016.Search in Google Scholar
[5] Morgan SL, Winship C. Counterfactuals and causal inference. New York, NY: Cambridge University Press; 2015.10.1017/CBO9781107587991Search in Google Scholar
[6] VanderWeele TJ. Explanation in causal inference: Methods for mediation and interaction. New York, NY: Oxford University Press; 2015.10.1093/ije/dyw277Search in Google Scholar PubMed PubMed Central
[7] Hernán MA, Robins JM. Causal inference: What if. Boca Raton: Chapman & Hall/CRC; 2022.Search in Google Scholar
[8] Stock J, Watson M. Dynamic factor models. In: MP Clements, DF Hendry, editors. Oxford handbook on economic forecasting. Oxford: Oxford University Press; 2011.10.1093/oxfordhb/9780195398649.013.0003Search in Google Scholar
[9] Shumway RH, Stoffer DS. Time series analysis and its applications: With R examples. 4th ed. Springer; 2017.10.1007/978-3-319-52452-8Search in Google Scholar
[10] Asparouhov T, Hamaker EL, Muthén B. Dynamic structural equation models. Struct Equ Model: A Multidiscip J. 2018;25:359–88.10.1080/10705511.2017.1406803Search in Google Scholar
[11] Meyer CD. Matrix analysis and applied linear algebra. Philadelphia, PA: SIAM; 2000.10.1137/1.9780898719512Search in Google Scholar
[12] Oldenburger R. Infinite powers of matrices and characteristic roots. Duke Math J. 1940;6:357–61.10.1215/S0012-7094-40-00627-5Search in Google Scholar
[13] Serfozo R. Basics of applied stochastic processes. Berlin, Germany: Springer; 2009.10.1007/978-3-540-89332-5Search in Google Scholar
[14] Field A. Discovering statistics using SPSS. 4th ed. London, UK: SAGE; 2013.Search in Google Scholar
[15] Feldman LA. Distinguishing depression and anxiety in self-report: Evidence from confirmatory factor analysis on nonclinical and clinical samples. J Consulting Clin Psychol. 1993;61:631–8.10.1037/0022-006X.61.4.631Search in Google Scholar
[16] Norton S, Cosco T, Doyle F, Done J, Sacker A. The hospital anxiety and depression scale: A meta confirmatory factor analysis. J Psychosom Res. 2013;74:74–81.10.1016/j.jpsychores.2012.10.010Search in Google Scholar PubMed
[17] Jacobson NC, Newman MG. Anxiety and depression as bidirectional risk factors for one another: A meta-analysis of longitudinal studies. Psychol Bull. 2017;143:1155–200.10.1037/bul0000111Search in Google Scholar PubMed
[18] Węziak-Białowolska D, McNeely E, VanderWeele T. Flourish index and secure flourish index – Development and validation. Soc Sci Res Netw. 2017;3145336. 10.2139/ssrn.3145336.Search in Google Scholar
[19] Kim ES, Delaney SW, Tay L, Chen Y, Diener E, VanderWeele TJ. Life satisfaction and subsequent physical, behavioral, and psychosocial health in older adults. Milbank Q. 2021;99:209–39.10.1111/1468-0009.12497Search in Google Scholar PubMed PubMed Central
[20] Kim ES, Nakamura JS, Chen Y, Ryff CD, VanderWeele TJ. Sense of purpose in life and subsequent health and well-being in older adults: an outcome-wide analysis. Am J Health Promotion. 2022;36:137–47.10.1177/08901171211038545Search in Google Scholar PubMed PubMed Central
[21] Weziak-Bialowolska D, Bialowolski P, Lee MT, Chen Y, VanderWeele TJ, McNeely E. Psychometric properties of flourishing scales from a comprehensive well-being assessment. Front Psychol. 2021;12:1033.10.3389/fpsyg.2021.652209Search in Google Scholar PubMed PubMed Central
[22] Perron O. Zur Theorie der Matrices. Mathematische Annalen. 1907;64:248–63.10.1007/BF01449896Search in Google Scholar
[23] Frobenius G. Ueber Matrizen aus nicht negativen Elementen. Sitzungsberichte der Königlich Preussischen Akademie der Wissenschaften. 1912;23:456–77.Search in Google Scholar
[24] Noutsos D. On Perron–Frobenius property of matrices having some negative entries. Linear Algebra its Appl. 2006;412:132–53.10.1016/j.laa.2005.06.037Search in Google Scholar
[25] Pearl J. Causality. New York, NY: Cambridge University Press; 2009.Search in Google Scholar
[26] Imbens GW, Rubin DB. Causal inference in statistics, social, and biomedical sciences. New York, NY: Cambridge University Press; 2015.10.1017/CBO9781139025751Search in Google Scholar
[27] VanderWeele TJ. Constructed measures and causal inference: towards a new model of measurement for psychosocial constructs. Epidemiology. 2022;33:141–51.10.1097/EDE.0000000000001434Search in Google Scholar PubMed PubMed Central
[28] VanderWeele TJ, Vansteelandt S. A statistical test to reject the structural interpretation of a latent factor model. J R Stat Society Ser B. 2022;84:2032–54.10.1111/rssb.12555Search in Google Scholar PubMed PubMed Central
[29] Van Bork R, Wijsen LD, Rhemtulla M. Toward a causal interpretation of the common factor model. Disputatio. 2017;9(47):581–601.10.1515/disp-2017-0019Search in Google Scholar
[30] Asparouhov T, Muthén B. Exploratory structural equation modeling. Struct Equ Model: A Multidiscip J. 2009;16:397–438.10.1080/10705510903008204Search in Google Scholar
[31] Marsh HW, Muthén B, Asparouhov T, Lüdtke O, Robitzsch A, Morin AJ, et al. Exploratory structural equation modeling, integrating CFA and EFA: Application to students’ evaluations of university teaching. Struct Equ Model: A Multidiscip J. 2009;16:439–76.10.1080/10705510903008220Search in Google Scholar
© 2023 the author(s), published by De Gruyter
This work is licensed under the Creative Commons Attribution 4.0 International License.
Articles in the same Issue
- Research Articles
- Adaptive normalization for IPW estimation
- Matched design for marginal causal effect on restricted mean survival time in observational studies
- Robust inference for matching under rolling enrollment
- Attributable fraction and related measures: Conceptual relations in the counterfactual framework
- Causality and independence in perfectly adapted dynamical systems
- Sensitivity analysis for causal decomposition analysis: Assessing robustness toward omitted variable bias
- Instrumental variable regression via kernel maximum moment loss
- Randomization-based, Bayesian inference of causal effects
- On the pitfalls of Gaussian likelihood scoring for causal discovery
- Double machine learning and automated confounder selection: A cautionary tale
- Randomized graph cluster randomization
- Efficient and flexible mediation analysis with time-varying mediators, treatments, and confounders
- Minimally capturing heterogeneous complier effect of endogenous treatment for any outcome variable
- Quantitative probing: Validating causal models with quantitative domain knowledge
- On the dimensional indeterminacy of one-wave factor analysis under causal effects
- Heterogeneous interventional effects with multiple mediators: Semiparametric and nonparametric approaches
- Exploiting neighborhood interference with low-order interactions under unit randomized design
- Robust variance estimation and inference for causal effect estimation
- Bounding the probabilities of benefit and harm through sensitivity parameters and proxies
- Potential outcome and decision theoretic foundations for statistical causality
- 2D score-based estimation of heterogeneous treatment effects
- Identification of in-sample positivity violations using regression trees: The PoRT algorithm
- Model-based regression adjustment with model-free covariates for network interference
- All models are wrong, but which are useful? Comparing parametric and nonparametric estimation of causal effects in finite samples
- Confidence in causal inference under structure uncertainty in linear causal models with equal variances
- Special Issue on Integration of observational studies with randomized trials - Part II
- Personalized decision making – A conceptual introduction
- Precise unbiased estimation in randomized experiments using auxiliary observational data
- Conditional average treatment effect estimation with marginally constrained models
- Testing for treatment effect twice using internal and external controls in clinical trials
Articles in the same Issue
- Research Articles
- Adaptive normalization for IPW estimation
- Matched design for marginal causal effect on restricted mean survival time in observational studies
- Robust inference for matching under rolling enrollment
- Attributable fraction and related measures: Conceptual relations in the counterfactual framework
- Causality and independence in perfectly adapted dynamical systems
- Sensitivity analysis for causal decomposition analysis: Assessing robustness toward omitted variable bias
- Instrumental variable regression via kernel maximum moment loss
- Randomization-based, Bayesian inference of causal effects
- On the pitfalls of Gaussian likelihood scoring for causal discovery
- Double machine learning and automated confounder selection: A cautionary tale
- Randomized graph cluster randomization
- Efficient and flexible mediation analysis with time-varying mediators, treatments, and confounders
- Minimally capturing heterogeneous complier effect of endogenous treatment for any outcome variable
- Quantitative probing: Validating causal models with quantitative domain knowledge
- On the dimensional indeterminacy of one-wave factor analysis under causal effects
- Heterogeneous interventional effects with multiple mediators: Semiparametric and nonparametric approaches
- Exploiting neighborhood interference with low-order interactions under unit randomized design
- Robust variance estimation and inference for causal effect estimation
- Bounding the probabilities of benefit and harm through sensitivity parameters and proxies
- Potential outcome and decision theoretic foundations for statistical causality
- 2D score-based estimation of heterogeneous treatment effects
- Identification of in-sample positivity violations using regression trees: The PoRT algorithm
- Model-based regression adjustment with model-free covariates for network interference
- All models are wrong, but which are useful? Comparing parametric and nonparametric estimation of causal effects in finite samples
- Confidence in causal inference under structure uncertainty in linear causal models with equal variances
- Special Issue on Integration of observational studies with randomized trials - Part II
- Personalized decision making – A conceptual introduction
- Precise unbiased estimation in randomized experiments using auxiliary observational data
- Conditional average treatment effect estimation with marginally constrained models
- Testing for treatment effect twice using internal and external controls in clinical trials