Age–Period–Cohort Models and the Perpendicular Solution

Robert M. O’Brien

doi:10.1515/em-2014-0006

Article Publicly Available

Age–Period–Cohort Models and the Perpendicular Solution

Robert M. O’Brien

Published/Copyright: January 28, 2015

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Epidemiologic Methods Volume 4 Issue 1

Abstract

Separating the effects of ages, periods, and cohorts is a classic problem not only in epidemiology but also in demography and the social sciences in general. Frost provides a classic example in epidemiology that I use as an empirical example. In the classic age–period–cohort (APC) approach a single constraint is used to eliminate the linear dependency and identify the model. Among the infinite number of possible choices for a constraint, a particular constraint has been discovered and rediscovered in multiple guises. This constraint produces a solution to the APC model that is perpendicular to the null vector. Among these guises are the minimum norm solution, the Moore–Penrose solution, the principle components solution, the intrinsic estimator, the singular value decomposition solution, the partial least squares solution, and the maximum entropy solution. The results based on the perpendicular constraint provide a solution that has a smaller variance than any other constrained solution. This would be an important advantage, if there were some reason to believe that the perpendicular solutions were an unbiased estimate of the parameters that generated the dependent variable. To help explicate the common features of these solutions, this paper carefully examines the algebraic, geometric, and statistical characteristics of the perpendicular solution.

Keywords: perpendicular solution; age–period–cohort models; intrinsic estimator; constrained solutions

1 Introduction

The age–period–cohort (APC) identification problem is relevant to many disciplines including epidemiology, demography, and the social sciences in general. Simply put the problem is this: outcome variables ranging from lung cancer mortality rates to suicide rates to public support for medical research may be related to a person’s age, the year or period in which the measurement is taken, and the birth cohort (year of birth) of the individual. The question then might be whether mortality rates for lung cancer increase as a result of aging and, if so, whether that increase is linear or monotonic with age? Is that rate increasing or decreasing or how does it fluctuate over time (over periods)? Are certain birth cohorts more or less prone to mortality from lung cancer? That lung cancer mortality rates, or a myriad of other outcome variables, might be related to age effects and/or period effects and/or cohort effects seem to be theoretically plausible. Perhaps most likely age, period, and cohort are each independently related to lung cancer and to the levels of other substantively important dependent variables.

The APC problem is an identification problem. If we include age, period, and cohort in a model there are an infinite number of solutions that fit the data equally well. The model is not statistically identified. We can, however, obtain a unique set of solutions (statistically identify the model) by placing a single constraint on the model such as constraining the coefficient for the effect of age group 1 to equal the estimated effect for age group 2 (age 1=age 2) or forcing the trend in period effects to have a zero linear trend across periods (Holford, 1983). Each of these constraints just identifies the model and produces a single solution under the constraint. The problem with such constrained solutions is that they are likely to result in coefficient estimates that differ greatly from one constraint to another, and they cannot all be correct in terms of being unbiased estimates of the parameters that generated the outcome data.

Traditionally researchers have attempted to justify the constraint used on substantive grounds. Trying to demonstrate, for example, that for the age effects that generated the data there are reasons to expect that the effects for the two youngest age groups (age 1 and age 2) on the dependent variable are the same or at least quite similar or that for the periods under consideration the trend in the period effects that generated the outcome data is approximately zero. These single constraint solutions are of a class of solutions that I label as just identifying APC-constrained solutions. ^[1] The perpendicular solution, which is the focus of this paper, is a member of this class of just identifying APC-constrained solutions.

The perpendicular solution is what I label a “mechanical solution” in that it is used without regard to the particular substantive problem. It does not require that the researcher use substantive knowledge/theory to justify the plausibility of the particular constraint. These solutions are labeled perpendicular solutions, because (despite their diverse derivations and rationales) they all result in solutions that are perpendicular to the null vector. This paper provides readers with a sense of what this perpendicular solution is algebraically and geometrically so that its properties are understandable.

1.1 The perpendicular solution in the literature

The solution that is perpendicular to the null vector appears, in various guises in the methodological literature, as a solution to APC identification problem. Kupper and Janis and colleagues (1980, 1983, 1985) showed that this solution is equivalent to the principle component regression solution used in the rank deficient by one situation (retaining as regressors the components with non-zero eigenvalues), and they demonstrated that this solution is equivalent to that generated by the Moore–Penrose generalized inverse. The perpendicular solution to the APC model is equivalent to the intrinsic estimator introduced by Fu and his collaborators (2000; Yang et al., 2004, 2008). Fu (2000) derived the intrinsic estimator using a ridge regression approach (as the “shrinkage penalty” diminishes, the ridge regression solution converges toward the perpendicular solution). Tu et al. (2013) added partial least squares to this list of perpendicular solutions and emphasized that this relationship holds when the maximum number of components is extracted for partial least squares and for principle component regression. Browning et al. (2013) demonstrated that the maximum entropy solution is essentially equivalent to the intrinsic estimator solution (the perpendicular solution). Though not specifically addressing APC models, Press et al. (1992) noted that the perpendicular solution is the singular value decomposition solution and produces a minimum norm solution. Various versions of the perpendicular solution have been used in dozens of recent studies during the past decade with dependent variables ranging from blood pressure (Tao et al., 2013) to hepatocellular carcinoma mortality (Tu et al., 2011) from marijuana use (Miech and Koester, 2012) to religious beliefs and activities (Schwadel, 2011). Then in a different context these methods end up being rediscovered again (Fukuda, 2014).

These solutions are equivalent. They produce the solution that is perpendicular to the null vector, which I label as “perpendicular solution.” Given the common use of the perpendicular solution in empirical studies and its multiple discoveries and rediscoveries, it is important to explicate the properties of this particular solution. What are its algebraic characteristics? What are its geometric characteristics? What are its statistical characteristics?

2 The APC multiple classification model

2.1 The model

The standard APC multiple classification model can be conceptualized as modeling an age–period table. Table 1 shows the typical setup using Frost’s (1939) tuberculosis data for males’ aged 0–9 to 60–69 for the periods 1880–1930. The intervals between periods are the same as the number of years in each age group: for example, age groups 0–9, 10–19,…, 60–69 are matched with the periods 1880, 1890,…, 1930. The cells contain the age–period-specific male tuberculosis mortality rates per 100,000 male residents (Frost did not supply counts) as the dependent variable. With this setup, the earliest birth cohort (1810–1819) is represented by the bottom left-hand cell (475 per 100,000), the next earliest birth cohort (1820–1829) is represented by the two cells on the downward diagonal starting with those 50–59 in 1880 and ending with those 60–69 in 1890. We proceed in this manner until the most recent cohort (1920–29) is represented by one observation: the one in the upper right-hand cell of the age–period table. The task is to model these cell values as dependent on the age, period, and cohort effects with age on the rows, periods on the columns, and cohorts on the diagonals.

Table 1

Frost’s (1939) tuberculosis mortality rates per 100,000 for males*

Ages	Periods
	1880	1890	1900	1910	1920	1930
0–9	402	314	170	115	66	26
10–19	126	115	90	63	49	21
20–29	444	361	288	207	149	81
30–39	378	368	296	253	164	115
40–49	364	336	253	253	175	118
50–59	366	325	267	252	171	127
60–69	475	340	304	246	172	95

^[2]

A standard depiction of the APC multiple classification model is

[1]yij=μ+ai+pj+ck+εij.

where the observed value of the dependent variable for the ij th cell of the age–period table, μ is the intercept, ai is the i th age effect, pj is the j th period effect, ck is the k th cohort effect, and εij is the residual for the ij th cell of the age–period table. There are I ages, J periods, and K cohorts, where k=I−i+j. I use analysis of variance coding (effect coding), where the parameters in equation 1 are subject to the following constraints:

[2]∑i=1Iai=∑j=1Jpj=∑k=1Kck=0.

When analyzing rates, the natural log of the yij is used as the dependent variable.

Equation [1] can be rewritten in matrix notation as:

[3]Xb=y+ε,

where X is the matrix of independent variables with the first column containing all ones for the intercept, the next I−1 columns contain the age-group effects except for the reference category for ages, the next J−1 columns contain the period effects except for the reference category for periods, and the final K−1=I+J−2 columns contain the cohort effects except for the reference category for cohorts. In the I by J age–period table there are 2I+J−3 columns in the X-matrix. The number of rows in the matrix is I×J, one for each cell in the age–period table. The vector b is a column vector of 2I+J−3 elements that correspond to the intercept, age, period, and cohort coefficients (for the non-reference categories). The y vector is an I×J column vector of the observed values for the outcome (dependent variable), and ε is an I×J column vector of residuals.

The age, period, and cohort effects in X are effect coded. Each row corresponds to a particular cell in the age–period table and each column in a row contains a 1 if that column corresponds to the age-group, period or cohort for the cell represented by that row; a row representing a cell that is in the reference category is coded with a minus 1 on the column variables for which it is the reference category. All other entries are coded as zero.

In the APC model, X is not of full column rank; the columns are not linearly independent. This means that there is a vector v that when premultiplied by X produces a zero vector with I×J elements. This vector is referred to as the null vector. It is unique up to multiplication by a scalar, so we may write Xsv=0, where s is a scalar and v is the null vector. ^[2] The null vector plays a major role in this paper. We first use it to derive an important feature of the solutions to the APC multiple classification model. That is, for the just identifying constrained APC solutions all of the best fitting solutions lie on a single line in multidimensional solution space (Kupper et al., 1983; O’Brien, 2011).

For ordinary least squares, a solution is a vector of parameter estimates that minimize the sum of the squared residuals ∑(yij−yˆij)2, where yij is the observed value of the dependent variable and yˆij are predicted (expected) values from the following linear equation that minimize the sum of squared residuals:

[4]yˆij=Xbc1.

In the APC context bc1 represents a solution vector (under the constraint c1) that produces a least squares solution.

Given the rank deficiency of X, the X-matrix post-multiplied by the null vector equals the zero vector. Thus we may write:

[5]yˆ=Xbc1+Xsv=Xbc=Xbc1+sv,

where bc represents any of the infinite number of least squares solutions for eq. [4] when X is rank deficient by 1. This means that any solution based on bc1+sv produces the same best fitting solution. These solutions all lie on a line of solutions in multidimensional solution space. We represent this line of solutions as bc=bc1+sv, where bc1 is one of an infinite number of best fitting solutions, and bc represents any of the infinite number of constrained solution vectors that provide a best fitting solution (as s changes values, it produces the line of solutions). The just identifying constrained solutions are related to each other in that they are all on the line of solutions and differ from one another by sv.

This result generalizes to the solutions based on generalized linear models (for example, logistic regression or Poisson regression) where we can write the model as:

[6]gu=Xbc1,

where g is a link function, u is the expected value of the outcome variable, and bc1 is a constrained solution to the model (under constraint c1). Here, bc1 is one of the constrained solutions that identifies the model under the constraint and thus provides the best fit for the dependent variable. In this situation, where X is rank deficient by 1, we can write eq. [6] as

[7]gu=Xbc1+Xsv=Xbc=Xbc1+sv.

The discussion below holds for generalized linear models as well as ordinary least squares.

When we identify eq. [3] using a single constraint such as age 1=age 2 or cohort 1=cohort 2 or using a more complex single constraint such as constraining the period coefficients to have a zero slope (Holford, 1983) or the constraint that produces the solution orthogonal to the null vector (Kupper et al., 1985), these solutions all lie on the line of solutions and fit the outcome variable equally well. We say that these solutions are identified under the particular constraint. Without a constraint they are not identified.

2.2 The null vector

Kupper et al. (1983) provide a convenient method for deriving the null vector when the age, period, and cohort variables are coded using effect coding for the multiple classification model. Based on their presentation, the null vector for the effect coded X-matrix is

[8]0;i−I+12;−j−J+12;k−I+J2.

The zero is the null vector element for the intercept. The elements of the null vector for the age, period, and cohort effects are within the first, second, and third square brackets in eq. [8]. I and J are the number of age groups and period, respectively: in the first square bracket i=1 to I−1, in the second square bracket j=1 to J−1, and in the third square bracket k=1 to I+J−2. This assumes the final categories are designated the reference categories.

As an example, for a 4×4 age–period matrix the null vector elements for age are −1.5, −0.5, and 0.5; for period they are 1.5, 0.5, and −0.5; and for cohorts the null vector elements are −3, −2, −1, 0, 1, and 2. When we place these elements into the null vector including the zero as the first element for the intercept, the resulting vector premultiplied by X (based on a 4×4 age–period matrix that is effect coded) produces a zero vector.

3 Representations of the perpendicular solution

What does it mean for the solution to be perpendicular to the null vector? Perhaps the best way to think about what it means is to think geometrically. The null vector and the line of solutions are in an m-dimensional solution space, where m=2I+J−3, and these two lines are parallel to each other in this space, since they share the same direction v. The null vector is v and the line of solutions is bc1+sv. We depict these two lines in Figure 1. This figure ignores all but two of the dimensions in this space. Since these two lines are parallel, they must lie on a plane in the m-dimensional space. The page on which the Figure 1 is drawn represents this plane. The null vector passes through the origin of the m-dimensional space. When using constrained regression to produce a solution on the line of solutions, a hyperplane that passes through the origin of the m-dimensional space is constrained to intersect this line of solutions (O’Brien, 2012). In Figure 1, that hyperplane appears to be a line, since it cuts across the plane of the page. The solution on the line of solutions that is perpendicular to the null vector (at the origin) is the perpendicular solution, which I label as b⊥. The “perpendicular solution” constrains the hyperplane so that it intersects the line of solutions perpendicular to the null vector. Other solutions based on other constraints can be depicted in Figure 1 as lines from the origin to other points on the line of solutions: bc1 and bc2. ^[3]

Figure 1

Schematic representation of the null vector, the line of solutions, the perpendicular solution, and other constrained solutions

Figure 1 shows that the lengths of these lines (solution vectors), from the origin to the line of solutions, are all longer than the line perpendicular to the null vector. That is, the perpendicular solution is a minimum norm solution. We write that b⊥<bc for any constrained solution except the one corresponding to the perpendicular solution. The double-lined brackets mean that the elements of the solution vectors are squared, then summed, and then have the square root taken: for example, b′b=b, which is the length of the vector b.

An algebraic way of showing that the perpendicular solution is perpendicular to the null vector is to compute the dot product of the perpendicular solution vector times the null vector. The dot product will be zero, if the solution vector and the null vector are perpendicular to each other: b⊥⋅v=0. The geometry of being orthogonal to the null vector, the algebraic criterion (b⊥⋅v=0), and the minimum distance interpretation are perhaps the most immediately grasped characteristics of the perpendicular solution.

Another way to gain algebraic/geometric insight about the perpendicular solution focuses on the linear components of the age coefficients, period coefficients, and cohort coefficients produced by the perpendicular solution. The constraint for the perpendicular solution is b⊥⋅v=0, which includes only the non-reference categories. Equation [9] shows the constraint imposed by the perpendicular solution on the linear components of age, period, and cohort (O’Brien, 2013):

[9]Intercept⋅0+∑i=1I−1a0+i⋅ta⋅via+∑j=1J−1p0+j⋅tp⋅vjp+∑k=1I+J−2c0+k⋅tc⋅vkc=0.

Equation [9] treats the final category of ages, of periods, and of cohorts as reference categories. Within the parentheses are written the linear components of the age coefficients, the linear components for the period coefficients, and the linear component for the cohort coefficients. For example, after we compute the perpendicular solution we can compute the intercept for age a0 and the linear trend in the age coefficients ta by regressing the age coefficients on the numbers 1,2,⋯,I−1. A similar procedure is used for calculating the intercepts and trends in the period (p) and cohort (c) coefficients. The elements of the null vector that correspond to the age coefficients are written as via for i=1toI−1, those for the period are vjp for j=1toJ−1, and those for cohort are vkc for k=1toI+J−2. The part of eq. [9] involved in the first summation sign is the sum of the linear components of age times their corresponding null vector elements. The second summation is the sum of the linear components of periods times their corresponding null vector elements. The final summation is the sum of the linear components of cohorts times their corresponding null vector elements. The sum of these linear components times the corresponding null vector elements is zero, which means that the combined linear components of the age, period, and cohort coefficients for the perpendicular solution are perpendicular to the null vector. ^[4]

This version of the perpendicular constraint is quite important, since the various constrained solutions each constraint the linear components of the solution. The deviations of the age coefficients, period coefficients, and cohort coefficients around their linear components are the same for any of the just identifying constrained solutions. That is, they are estimable functions (O’Brien, 2014a). The linear components are fixed by the constraint.

4 Statistical properties of the perpendicular solution

4.1 Minimum norm

As noted above, the perpendicular solution is a minimum norm solution and this can be considered a statistical property of this solution. The solution vector b that minimizes the distance: ||y−Xb|| is the least squares solution. In the rank deficient by one case, which we encounter in the APC model, this criterion of best fit does not yield a unique solution because any solution on the line of solutions minimizes this distance. Only one solution, however, minimizes ||y−Xb|| and ||b||: that is, b⊥. The same holds true for generalized linear models; except, of course, the fit criterion does not minimize the sum of the squared deviations, but b⊥ is the shortest of the solution vectors on the line of solutions that produces the best fitting model for the specific generalized linear model.

4.2 Smallest variance

Kupper et al. (1983) note that in the special case where the constraint equals the null vector c=v the estimate “deals with the exact linear dependency… by involving only the non-zero eigenvalues associated with the eigenvectors… in its formulation.” They note that since the other constrained estimators “do not deal directly with this multicollinearity problem, one can argue that [this solution] is to be preferred on variance grounds” (p. 2797). The perpendicular solution is perpendicular to the vector that is most highly related to the column vectors in X: that is, the vector that shows that the columns are linearly dependent. Kupper et al. (1983) realized that this would reduce the variance of the estimated coefficients just as principal components regression is designed to do. Although they note the potential of this solution to reduce the variance of the estimates, they emphasize that it could easily lead to more bias than some of the other constrained solutions: “so that the optimal method for choosing c should probably take into account both bias and variance (i.e., mean square error) considerations. Of course, when the squared multiple correlation coefficient R2 is fairly close to 1, a result which seems to occur not infrequently in practice, then bias becomes the main area of concern” (p. 2797). Yang et al. (2004) in appendix B of their paper prove that the perpendicular solution “has a variance smaller than that of any CGLIM [constrained generalized linear model] estimator – i.e., var(b)−var(B) is positive-definite for a nontrivial identifying constraint,” where B represents the perpendicular solution and b represents any of the other just identified constrained solutions.

Given that the constrained solutions all fit the data equally well (while often providing very disparate estimates of the age, period, and cohort effects); this minimum variance property is not a compelling reason for using the perpendicular constraint.

5 An empirical example using Frost’s data

We use Frost’s (1939) data in an empirical example. I have modified the data somewhat to make it appropriate for the classical constrained regression approach to APC analysis. I combined the two age categories 0–4 and 5–9 into a single 10-year age category 0–9 (I averaged the rates per 100,000 in these two age groups) and eliminated the age category 70 plus. This results in the data in Table 1. The results in Table 2 are based on two different constraints: the perpendicular constraint and the zero linear trend in periods constraint. I have also included the null vector in the final column; note that it only has elements for the non-reference categories (those categories that have columns in the X-matrix). The dot product of the perpendicular solution and the null vector is zero (5.090×0−0.069×−3⋯−0.442×4.5=0). That the trend for the period coefficients is zero, in the zero linear trend for periods solution, can be easily verified by regressing the period coefficients on time (e.g. 1, 2, 3, 4, 5, and 6). The solution for the zero linear trend in period is also perpendicular to its constraint: the constraint used to produce this solution is −5×period1−4×period2⋯−1×period5=0. The dot product of the solution times its constraint is −5×−0.079−4×0.031⋯−1×−0.044=0. All of the solutions that use just identifying constraints for the APC models are perpendicular to their constraints. Not surprisingly the length of the solution vector for the perpendicular solution (5.297) is less than the length of the zero linear trend in periods solution vector (6.100).

Table 2

Constrained solutions for the Frost data using the perpendicular constraint and the zero linear trend in periods constraint

	Perpendicular solution	Zero linear trend in period	Null vector
Intercept	5.090	5.090	0
0–9	−0.069	0.690	−3
10–19	−0.922	−0.416	−2
20–29	0.173	0.426	−1
30–39	0.220	0.220	0
40–49	0.173	−0.080	1
50–59	0.204	−0.302	2
60–69	0.221	−0.538
1880	0.554	−0.079	2.5
1890	0.410	0.031	1.5
1900	0.143	0.016	0.5
1910	−0.037	0.090	−0.5
1920	−0.336	0.044	−1.5
1930	−0.735	−0.102
1810–1819	0.299	1.691	−5.5
1820–1829	0.081	1.220	−4.5
1830–1839	0.141	1.027	−3.5
1840–1849	0.149	0.782	−2.5
1850–1859	0.208	0.587	−1.5
1860–1869	0.173	0.299	−0.5
1870–1879	0.271	0.145	0.5
1880–1889	0.196	−0.183	1.5
1890–1899	0.058	−0.575	2.5
1900–1909	−0.105	−0.990	3.5
1910–1919	−0.442	−1.581	4.5
1920–1929	−1.028	−2.420

Note that both of these solutions fit the data equally well. This is true for any just identified constrained solution. With effect coding it is also the case that the intercepts (5.090) are the same for all solutions on the line of solutions as are the coefficients for any element that has a zero for the corresponding null vector element, here the null vector element for age 30–39 is zero and these two coefficients are the same (0.220). This must occur because the coefficients for the different just identified solutions differ by sv in the equation bc=bc1+sv. Note that the solutions for the other coefficients differ.

Figure 2

Plot of the regression coefficients of the logged tuberculosis rates under the perpendicular constraint and the zero linear trend in periods constraint

Figure 2 shows the pattern of the discrepancies between these solutions. It is probably not the case that there was not any period effect trend associated with the decrease in the age–period-specific tuberculosis mortality rates over time given improvements in care, but that is what the constraint assumes and because of this it assigns any underlying trend in period effects to cohorts and ages. ^[5] It is also unlikely that the combined underlying effects the linear the trends for ages, period, and cohorts are orthogonal to the null vector. Note the difference the two constraints make. The zero linear constraint for periods produces age effects that correspond to those suggested by Frost (1939) when he examined the age distribution within cohorts: a large age effect for the youngest age and then a steep drop and a second peak for those 20–29 and then a monotonic decrease. This does not mean that the zero linear trend for periods specification is correct, but it results in a plausible age effects distribution. Frost thought that the cohorts drove the changes in the age distribution across periods and that is certainly the case with the zero linear trend results in Figure 2. Again that does not make this specification correct. ^[6]

Using the perpendicular constraint results in a very slight increase after a peak for age 20–29, no substantial drop in the cohort effects until the cohort born in 1870–1879, and the decrease in the cohort effect from that point forward is monotonic. There is also a monotonic drop in the period effects. It is certainly possible to construct a plausible explanation for this pattern of results. One colleague of mine has noted that interpreting different just identified APC models is like a Rorschach test, the test taker can always find something in the pattern.

6 What the perpendicular solution does not do

6.1 Provide an unbiased solution for the “generating parameters”

The perpendicular solution does not solve the identification problem in terms of producing unbiased estimates of the parameter values that generated the outcome data. There is probably near universal agreement on this point among statisticians, but it is far too easy for substantive researchers to think that the perpendicular solution somehow solves the identification problem. Given that the X-matrix is rank deficient by one, such a solution does not exist without the researcher imposing a constraint that selects one of the infinite numbers of best fitting solutions on the line of solutions. The perpendicular solution imposes a constraint that is more complex than the constraint imposed by most researchers, but it is a constraint. That constraint determines a particular solution on the line of solutions. Only one of the solutions on the line of solutions is an unbiased estimator of the parameters that generated the outcome data. This is the case even though in the published literature some researchers proceed as if the perpendicular solution should be considered an appropriate estimator of the substantive effects of ages, periods, and cohort (citations in this paper and Luo, 2013).

6.2 Provide age, period, and cohort effects as estimable functions

The perpendicular solution is not an estimable function of the individual age, period, and cohort effects in the classical sense of Searle (1971), Scheffé (1950), and Bose (1944). ^[7] This statement is easy to verify, since the perpendicular solution does not meet the necessary and sufficient condition for an estimable function of the individual age, period, and cohort coefficients derived by Searle (1971), which is equivalent to the condition stated in Scheffé (1950) and based upon Bose (1944). Searle (1971:185) states that a given function “q′b is estimable if and only if q′H=q′,” where q′ is a vector and H=GX′X. Here, G is any one of the generalized inverses that produces a solution on the line of solutions. ^[8] For example, the age 1 coefficient is not estimable. If age 1 corresponds to the second column in the X-matrix, then the q vector is (0, 1, 0, 0, 0,…,0)′ with 2I+J−3 elements and q′H≠q′. Using this necessary and sufficient condition, the age 1 coefficient for the APC model estimated from the perpendicular solution or any other solution is shown to be not estimable (O’Brien, 2014b).

A different criterion for estimable functions was introduced by Kupper et al. (1985): l′v=0, where l is a vector that produces a linear combination of the elements of v. If this linear combination is zero, then the same linear combination of any of the solution vectors is estimable; they produce the same value no matter which constrained solution is used (O’Brien, 2014b). This criterion (as it should) agrees with that of Searle (1971). Yang and colleagues (2004, 2008) have modified the Kupper et al. (1985) criterion by allowing l to be a matrix rather than a vector and 0 to be a zero vector. With these changes in hand, they assert that the perpendicular solution is an estimable function. It is true that each of the rows of their matrix times v is zero, but most of the rows are linear combinations of several elements and not a linear combination consisting of a single age, period, or cohort effect (O’Brien, 2014b). In addition, we can use Kupper et al.’s criterion to show that the linear combination for the age 1 coefficient is not estimable by setting l′=0,1,0,0,…,0 and multiplying it times the null vector. ^[9]Kupper et al. (1985) on page 830 of their paper, demonstrate using their method that not all of the age, period, and cohort coefficients are estimable. For the individual elements of the solution vector, only those with zeroes in the corresponding element of the null vector are estimable (O’Brien, 2014a, 2014b).

6.3 Provide the most representative solution?

There is a claim in the literature that the perpendicular solution is some sort of average or most representative solution in the rank deficient by one situation. Smith (2004) states: “There is also a sense in which the IE [intrinsic estimator] is an average of CGLIM estimates” (p. 116). Press, Teukolsky, Vettering, and Flannery (1992) note that in the rank deficient by one situation: “If we want to single out one particular member of the solution-set of vectors as representative, we might want to pick the one with the smallest length” (p. 62).

I have pondered this claim at length and wish that the authors in the two sources cited above had laid out a convincing rationale for their conclusions. I am convinced by the comments of two external reviewers of this article that the claim of representativeness is difficult to establish. The reviewers noted that an infinitely long line, like the line of solutions, has no middle. In what sense then can the perpendicular solutions somehow be representative of all the infinite number of potential solutions? The other reviewer asked: why should we care about whether the perpendicular solution is representative; or not, if it is not an unbiased estimate of the parameters that generated the dependent variable values?

7 Conclusions

The perpendicular solution is one of an infinite number of best fitting solutions to the APC multiple classification model that imposes a single constraint on the solution. The perpendicular solution has been derived in a variety of ways; for example, as the minimum norm solution, the principle component regression solution, the Moore–Penrose solution, the partial least squares solution, the intrinsic estimator solution, the maximum entropy solution, and the singular value decomposition solution. I label this solution the perpendicular solution, because it is perpendicular to the null vector.

Statistically, it is a minimum norm solution and has a smaller variance than any other constrained solution. These features might be used in an attempt to justify the use of this solution as a preferred solution to the APC model when there is no other information that would favor using some other constraint. The perpendicular solution, however, is not an unbiased estimate of the age, period, and cohort parameters that generated the outcome values on the dependent variable (there is no reason to think that it would be). The perpendicular solution does not provide the individual age, period, and cohort coefficients as estimable functions. ^[10] It is a mechanical solution that does not take substantive knowledge of the particular research setting into account in determining a plausible just identifying constraint.

Figure 3 helps to demonstrate and summarize many of the points made in this paper using a very simple illustration. Imagine a situation in which there are two equations: 6=1b1+2b2 and 3=.5b1+1b2. These two equations are linearly dependent (rank deficient by 1). The null vector is (2,−1), since the matrix of independent variable values has two rows one is (1, 2) and the other is (0.5, 1) and the dot product of each of these with the null vector (2, −1) is zero. The null vector is unique up to multiplication by a scalar. The solution space is two dimensional with b1 as one of the axis and b2 as the other axis. We can draw the line of solutions on this two dimensional space (Figure 3) as the line that crosses the b1 axis at (6, 0) and the b2 axis at (0, 3). We know these two points are on the line of solutions, because given our two equations when b2 is zero b1 must be 6 and when b1 is zero b2 must be 3 to provide correct solutions to these equations. These points are on the line of solutions, because they are solutions to this two equation system. The perpendicular solution can be determined by constructing a line perpendicular to the null vector at the origin and determining where this line intersects the line of solutions. The perpendicular solution is b1=1.2 and b2= 2.4 as depicted in Figure 3. This solution lies on the line of solutions (it provides a solution to this two equation system), and it is perpendicular to the null vector: b⊥′v=0. This solution is the minimum norm solution, the Moore–Penrose solution, the solution with the smallest variance, etc. But clearly it is not necessarily or even likely to be an unbiased estimate of the parameters that generated the outcome data (which could be any of the solutions on the line of solutions).

Figure 3

The perpendicular solution to a set of two linearly dependent equations

This simple illustration provides some sense of how much confidence researchers should have in this particular solution. My judgment is: “not much.” For constrained solutions substantive knowledge is the best way to set constraints. There are other approaches, however, that provide some insights into the effects of ages, periods, and cohorts that are not dependent upon constrained regression. For example, estimable functions (O’Brien 2014a) and factor-characteristic approaches (O’Brien, 2000, 2014b). This paper adds to our understanding of the algebraic, geometric, and statistical characteristics of the perpendicular solution in whatever form it appears. Doing so should help researchers evaluate how much confidence they should have in the coefficients produced by the perpendicular solution in whatever form it appears. From my study of this approach, I would not put much credence in the results of these perpendicular solutions.

References

Bose, R. C. (1944). The fundamental theorem of linear estimation. In: Proceedings of the 31st Indian Science Congress. Abstract A, 2–3.Search in Google Scholar

Browning, M., Crawford, I., and Knoef, M. (2013). The age-period-cohort-problem: Set identification and point identification. Online users.ox.ac.uk/~econ0237/papers/apc.pdf‎ (accessed 12-03-2013).Search in Google Scholar

Frost, W. H. (1939). The age selection of mortality from tuberculosis in successive decades. American Journal of Hygiene, 30:91–96. [Reprinted: American Journal of Epidemiology (1995) 141:4–9.Search in Google Scholar

Fu, W. J. (2000). Ridge estimator in singular design with applications to the age-period-cohort analysis of disease rates. Communications in Statistics—Theory and Methods, 29: 263–278.Search in Google Scholar

Fukuda, K. (2014). Age-period-cohort decompositions using principle components and partial least squares. Journal of Statistical Computation and Simulation, 81:1871–1878. doi:10.180/00949655.2010.507763.Search in Google Scholar

Holford, T. R. (1983). The estimation of age, period, and cohort effects for vital rates. Biometrics, 39: 311–324.Search in Google Scholar

Kupper, L. L., Janis, J. M. (1980). The multiple classification model in age-period-cohort analysis: theoretical considerations. Institute of Statistics Mimeo Series No. 1311 1980; Department of Biostatistics University of North Carolina, Chapel Hill.Search in Google Scholar

Kupper, L. L., Janis, J. M., Karmous, A., Greenberg, B. G. (1985). Statistical age-period-cohort analysis: a review and critique. Journal of Chronic Disease, 38: 811–830.Search in Google Scholar

Kupper, L. L., Janis, J. M., Salama, I. A., Yoshizawa, C. N., Greenberg, B. G. (1983). Age-period-cohort analysis: an illustration of the problems in assessing interactions in one observant ion per cell data. Communications in Statistical Theory and Methods, 12: 2779–2807.Search in Google Scholar

Luo, L. (2013). Assessing validity and application scope of the intrinsic estimator approach to the age-period-cohort problem. Demography, 50:1945–1967.Search in Google Scholar

Miech, R., and Koester, S. (2012). Trends in U.S. past-year marijuana use from 1985 to 2009: an age–period–cohort analysis. Drug and Alcohol Dependence, 124:259–267.Search in Google Scholar

O’Brien, R. M. (2000). Age period cohort characteristic models. Social Science Research, 29:123–13910.1006/ssre.1999.0656Search in Google Scholar

O’Brien, R. M. (2011). Constrained estimators and age-period-cohort models. Sociological Methods & Research, 40:419–452.Search in Google Scholar

O’Brien, R. M. (2012). Visualizing rank deficient models: a row equation geometry of rank deficient matrices and constrained-regression. PLoS ONE7(6). doi:10.1371/journal.pone.003892310.1371/journal.pone.0038923Search in Google Scholar PubMed PubMed Central

O’Brien, R. M. (2013). Comment on Liying Luo’s article, ‘assessing validity and application scope of the intrinsic estimator approach to the age-period-cohort problem’. Demography, 50:1973–1975.Search in Google Scholar

O’Brien, R. M. (2014a). Estimable functions in age-period-cohort models: a unified approach. Quality and Quantity, 48:457–474.10.1007/s11135-012-9780-6Search in Google Scholar

O’Brien, R. M. (2014b). Age-Period-Cohort Models: Approaches and Analyses with Aggregate Level Data. New York: Chapman & Hall.Search in Google Scholar

Press, W. H., Teukolsky.S. A., Vettering, W. T., and Flannery, B. (1992). Numerical Recipes: The Art of Scientific Computing. 1st Edition. Cambridge: Cambridge University Press.Search in Google Scholar

Scheffé, H. (1950). The Analysis of Variance. New York: Wiley.Search in Google Scholar

Schwadel, P. (2011). Age, period, and cohort effects on religious activities and beliefs. Social Science Research, 40:181–192.Search in Google Scholar

Searle, S. R. (1971). Linear Models. New York: Wiley.Search in Google Scholar

Smith, H. L. (2004). Response: cohort analysis redux. Sociological Methodology, 34:111–119.Search in Google Scholar

Tao, J., Gilthorpe, M. S., Shiely, F., Harrington, J. M., Perry, I. J., Kelleher, C. C., and Tu, Y. K. (2013). Age-period-cohort analysis for trends in body mass index in Ireland. BMC Public Health, 13:889. doi:10.1186/1471–2458–13–88910.1186/1471-2458-13-889Search in Google Scholar PubMed PubMed Central

TuY. K., SmithG. D., and GilthorpeM. S. (2011). A new approach to age-period-cohort analysis using partial least squares regression: the trend in blood pressure in the Glasgow Alumni Cohort. Plos One. doi:1371/journal.pone. 0019401Search in Google Scholar

Tu, Y. K., Krämer, N., and Lee, W. (2013). Addressing the identification problem in age-period-cohort analysis: a tutorial on the use of partial least squares and principle components analysis. Epidemiology, 23:583–593.Search in Google Scholar

Yang, Y., Fu, W. J., and Land, K. C. (2004). A methodological comparison of age-period-cohort models: the intrinsic estimator and conventional generalized linear models. Sociological Methodology, 34:75–110.Search in Google Scholar

Yang, Y., Schulhofer-Wohl, S., Fu, W. J., and Land, K. C. (2008). The intrinsic estimator for age-period-cohort analysis: what it is and how to use it. American Journal of Sociology, 113:1697–1736.Search in Google Scholar

Published Online: 2015-1-28

Published in Print: 2015-12-1

Articles in the same Issue

https://doi.org/10.1515/em-2014-0006

Keywords for this article

perpendicular solution; age–period–cohort models; intrinsic estimator; constrained solutions