Startseite On the Robustness of Coefficient Estimates to the Inclusion of Proxy Variables
Artikel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

On the Robustness of Coefficient Estimates to the Inclusion of Proxy Variables

  • Christopher R. Bollinger EMAIL logo und Jenny Minier
Veröffentlicht/Copyright: 13. März 2014
Veröffentlichen auch Sie bei De Gruyter Brill

Abstract

This paper considers the use of multiple proxy measures for an unobserved variable and contrasts the approach taken in the measurement error literature to that of the model specification literature. We find that including all available proxy variables in the regression minimizes the bias on coefficients of correctly measured variables in the regression. We derive a set of bounds for all parameters in the model, and compare these results to extreme bounds analysis. Monte Carlo simulations demonstrate the performance of our bounds relative to extreme bounds. We conclude with an empirical example from the cross-country growth literature in which human capital is measured through three proxy variables: literacy rates, and enrollment in primary and secondary school, and show that our approach yields results that contrast sharply with extreme bounds analysis.

JEL Codes: C4; C51; O47

Corresponding author: Christopher R. Bollinger, Department of Economics, University of Kentucky, Lexington, KY 40506, USA, E-mail:

Acknowledgment

We thank Helle Bunzel, Steven Durlauf, Josh Ederington, Per Hjerstrand, Brian Krauth, Brent Krieder, Mike McCracken, John Pepper, Shinichi Sakata, Justin Tobias, Ken Troske, Tom Wansbeek, Hendrik Wolff, Jim Ziliak, and participants in seminars at the Universities of California, Berkeley and Santa Cruz, University of Oregon, University of Washington, Iowa State University, IUPUI, the International Measurement Error Conference, Canadian Economics Association, and the Southern Economic Association meetings for helpful comments and discussion.

Appendix: Proofs

Let

V(Z1iZ2i)=[V1CCV2].

The matrix V1 is the k×k variance matrix for Z1i, C is the k×1 covariance, and V2 is the scalar variance of Z2i. Let δ be an arbitrary l×1 vector such that δρ=γ>0 for some given value γ. Let θ=β/γ.

The next three Lemmas establish key results for Proposition 3.

Lemma 1Expressions for (αa) and (θt).

Then

[at]=[V1γCγCγ2V2+(δΣδ)]1[V1α+θγCγCα+θγ2V2].

Rewriting yields

[V1γCγCγ2V2+(δΣδ)][at]=[V1α+θγCγCα+θγ2V2],

which is equivalent to

[V1γCγCγ2V2]1[V1γCγCγ2V2+(δΣδ)][at]=[V1γCγCγ2V2]1[V1α+θγCγCα+θγ2V2].

This yields

[Iγ(V1CV21C)1C(1(γ2V2)1(γ2V2+(δΣδ)))0(γ2(V2CV11C))1(γ2V2+(δΣδ)γ2CV11C)][at]=[αθ].

Noting that V2, γ, and (δ′Σδ) are all scalars, this can be written as

[Iγ(V1CV21C)1C((δΣδ)γ2V2)0(1+(δΣδ)(γ2(V2CV11C)))][at]=[αθ].

Rearranging gives

[aγ(V1CV21C)1C((δΣδ)γ2V2)t(1+(δΣδ)(γ2(V2CV11C)))t]=[αθ].

Thus

(aa)=γ(V1CV21C)1CV2CV11CV2(dΣd)(γ2(V2CV11C))+(dΣd)θ=(V1CV21C)1CV2CV11CV2(dΣd)(γ2(V2CV11C))+(dΣd)β,

and

(tθ)=((γ2(V2CV11C))(γ2(V2CV11C))+(δΣδ))θθ=((δΣδ)(γ2(V2CV11C))+(δΣδ))θ.

QED.

Lemma 2The term ((δΣδ)(γ2(V2CV11C))+(δΣδ)) is positive and increasing in (δ′Σδ).

The term (γ2(V2CV11C))+(δΣδ) is positive provided that γ≠0 and Σ is positive semi-definite. The term V2CV11C is the determinant of the V(Z1i, Z2i), and so is, by necessary assumption, positive. The term (δ′Σδ) will be non-negative provided Σ is positive semi-definite. The derivative with respect to the term (δ′Σδ) is

(γ2(V2CV11C))+(δΣδ)(δΣδ)((γ2(V2CV11C))+(δΣδ))2=γ2(V2CV11C)((γ2(V2CV11C))+(δΣδ))2>0.

Hence the inconsistency in both are increasing in (δ′Σδ). QED.

Lemma 3The solution to minδ(δ′Σδ) s.t.δρ=γ isδ=γΣ–1ρ(ρΣ–1ρ)–1.

The Lagrangian is

(δΣδ)λ(δργ).

FOC are

2Σδλρ=0δργ=0.

Solving:

δ=12λΣ1ρ12ρΣ1ρλ=γ.

Substitution yields

δ=γΣ1ρ(ρΣ1ρ)1λ=2γ(ρΣ1ρ)1.

QED.

Proof. The proof of proposition 1 follows from the details in the text combined with the above lemmas.   ■

Proof. Proof of Corollary 1. Substitution of the results from proposition 1 into the expressions in Lemma 1 yields

(δΣδ)=γ2ρΣ1ΣΣ1ρ(ρΣ1ρ)2=γ2(ρΣ1ρ).

From Lemma 1 we have that

(tθ)=θ((δΣδ)(γ2(V2CV11C))+(δΣδ))

Alternatively,

t=θ(1((δΣδ)(γ2(V2CV11C))+(δΣδ)))=βγ(γ2(V2CV11C)(γ2(V2CV11C))+(δΣδ)).

Substitute the optimal choice of δ from proposition 1 which yields

t=βγ(γ2(V2CV11C)γ2(V2CV11C)+γ2(ρΣ1ρ)1)=βγ((V2CV11C)(V2CV11C)+(ρΣ1ρ)1).

Hence, by choosing

γ=(V2CV11C)+(ρΣ1ρ)1(V2CV11C)=1+1(ρΣ1ρ)(V2CV11C),

we have t=β: no inconsistency in the coefficient on Xδ. QED   ■

Lemma 4(Sherwin-Morrison_Woodbury Matrix Inversion Lemma): If A and B are non-singular matrices, and X is conformable, then (A+XBX′)–1=A–1A–1X(B–1+X′A–1X)–1X′A–1.

Proof. Proof of Proposition 2:

The linear regression of yi on Z1i and Xi yields slope coefficients consistent for

(ab)=[V1CρρC(ρρV2+Σ)]1[V1α+CβρCα+ρV2β].

Rewriting yields

[V1CρρC(ρρV2+Σ)][ab]=[V1α+CβρCα+ρV2β],

which is equivalent to

[V1CρρCIρρV2]1[V1CρρC(ρρV2+Σ)][ab]=[V1CρρCIρρV2]1[V1α+CβρCα+ρV2β],

where I is the identity matrix of appropriate dimensions. The inverse of the leading matrix (a partitioned matrix) can be written as

[(V1Cρ(IρρV2)1ρC)1(V1Cρ(IρρV2)1ρC)1Cρ(IρρV2)1(IρρV2ρCV11Cρ)1ρCV11(IρρV2ρCV11Cρ)1].

Since ρρV2 is a scalar, this reduces to

[(V1CV21C)1(IρρV2ρCV11Cρ)1ρCV11(V1CV21C)1Cρ(ρρV2)1(IρρV2ρCV11Cρ)1].

Substitution and simplification yields

[I(V1CV21C)1(Cρ(ρρV2)1Σ)0(IρρV2ρCV11Cρ)1(ρ(V2CV11C)ρ+Σ)][ab]=[α(IρρV2ρCV11Cρ)1ρ(V2CV11C)β],

or

[a(V1CV21C)1(Cρ(ρρV2)1Σ)b(IρρV2ρCV11Cρ)1(ρ(V2CV11C)ρ+Σ)b]=[α(IρρV2ρCV11Cρ)1ρ(V2CV11C)β].

We can write

b=(ρ(V2CV11C)ρ+Σ)1ρ(V2CV11C)β,

and

a=a+(V1CV21C)1(Cρ(ρρV2)1Σ)×(ρ(V2CV11C)ρ+Σ)1ρ(V2CV11C)β.

Turning first to the term a and applying the Sherwin-Morrison_Woodbury Matrix Inversion Lemma:

a=a+(V1CV21C)1(Cρ(ρρV2)1Σ)×(Σ1Σ1ρ((V2CV11C)1+ρΣ1ρ)1ρΣ1)ρ(V2CV11C)β.

Simplification yields

a=a+(V1CV21C)1C(V2CV11C)V2×1ρΣ1ρ(V2CV11C)1+ρΣ1ρβ

or

=a+(V1CV21C)1C(V2CV11C)V2×(ρΣ1ρ)1(V2CV11C)+(ρΣ1ρ)1β,

which is the expression for a when the error-variance-minimizing choice of δ is used to construct Xδ (See Corollary 2).

Turning now to b, consider

ρb=ρ(ρ(V2CV11C)ρ+Σ)1ρ(V2CV11C)β.

Again using the Sherwin-Morrison_Woodbury Matrix Inversion Lemma,

ρb=ρ(Σ1Σ1ρ((V2CV11C)1+ρΣ1ρ)1ρΣ1)ρ(V2CV11C)β=(ρΣ1ρρΣ1ρ((V2CV11C)1+ρΣ1ρ)ρΣ1ρ)(V2CV11C)β=(V2CV11C)1(ρΣ1ρ)+(ρΣ1ρ)2(ρΣ1ρ)2(V2CV11C)1+ρΣ1ρ(V2CV11C)β=(V2CV11C)(V2CV11C)+(ρΣ1ρ)1β.

This is equal to the expression for a when the error variance minimizing choice of δ is used to construct Xδ in Corollary 1 if γ=1.QED   ■

References

Barro, Robert J. 1991. “Economic Growth in a Cross-Section of Countries.” Quarterly Journal of Economics 106 (2): 407–443.10.2307/2937943Suche in Google Scholar

Barro, Robert J., and Jong-Wha Lee. 2001. “International Data on Educational Attainment: Updates and implications.” Oxford Economic Papers 53 (3): 541–563.10.1093/oep/53.3.541Suche in Google Scholar

Bollinger, Christopher R. 1996. “Bounding Mean Regressions When A Binary Regressor is Mismeasured.” Journal of Econometrics 73 (2): 387–399.10.1016/S0304-4076(95)01730-5Suche in Google Scholar

Bollinger, Christopher R. 2003. “Measurement Error in Human Capital and the Black-White Wage Differential.” Review of Economics and Statistics 85 (3): 578–585.10.1162/003465303322369731Suche in Google Scholar

Brock, William A., and Steven N. Durlauf. 2001. “Growth Empirics and Reality.” World Bank Economic Review 15 (2): 229–272.10.1093/wber/15.2.229Suche in Google Scholar

Brock, William A., Steven N. Durlauf, and Kenneth D. West. 2003. “Policy Evaluation in Uncertain Economic Environments.” Brookings Papers on Economic Activity 2003 (1): 235–301.10.1353/eca.2003.0013Suche in Google Scholar

Durlauf, Steven N., Andros Kourtellos, and Chih Ming Tan. 2008. “Are Any Growth Theories Robust?.” Economic Journal 2008 (119): 329–346.10.1111/j.1468-0297.2007.02123.xSuche in Google Scholar

Goldberger, Arthur S., and Karl G. Jöreskog. 1975. “Estimation of a Model with Multiple Indicators and Multiple Causes of a Single Latent Variable.” Journal of the American Statistical Associaton 70 (351): 631–639.10.2307/2285946Suche in Google Scholar

Griliches, Zvi. 1974. “Errors in Variables and Other Unobservables.” Econometrica 42 (6): 971–998.10.2307/1914213Suche in Google Scholar

Klepper, Steven. 1988. “Regressor Diagnostics for the Classical Errors-in-Variables Model.” Journal of Econometrics 37: 225–243.10.1016/0304-4076(88)90004-8Suche in Google Scholar

Klepper, Steven, and Edward E. Leamer. 1984. “Consistent sets of Estimates for Regressions with Errors in All Variables.” Econometrica 52: 163–183.10.2307/1911466Suche in Google Scholar

Leamer, Edward E., and Herman B. Leonard. 1983. “Reporting the Fragility of Regression Estimates,” The Review of Economics and Statistics 65 (2): 306–317.10.2307/1924497Suche in Google Scholar

Levine, Ross, and David Renelt. 1992. “A Sensitivity Analysis of Cross-Country Growth Regressions.” American Economic Review 82 (4): 942–963.Suche in Google Scholar

Lubotsky, Darrin, and Martin Wittenberg. 2006. “Interpretation of Regressions with Multiple Proxies.” Review of Economics and Statistics 88 (3): 549–562.10.1162/rest.88.3.549Suche in Google Scholar

Mankiw, N. Gregory, David Romer, and David N. Weil. 1992. “A Contribution to the Empirics of Economic Growth.” Quarterly Journal of Economics 107 (2): 407–437.10.2307/2118477Suche in Google Scholar

Neal, Derek A., and William R. Johnson. 1996. “The Role of Premarket Factors in Black-White Wage Differences.” Journal of Political Economy 104 (5): 869–895.10.1086/262045Suche in Google Scholar

Sala-i-Martin, Xavier X. 1997. “I Just Ran Two Million Regressions.” American Economic Review 87 (2): 178–183.Suche in Google Scholar

Sala-i-Martin, Xavier, Gernot Doppelhofer, and Ronald Miller. 2004. “Determinants of Long-Term Growth: A Bayesian Averaging of Classical Estimates (BACE) Approach.” American Economic Review 94 (4): 813–835.10.1257/0002828042002570Suche in Google Scholar

Solow, Robert M. 1956. “A Contribution to the Theory of Economic Growth.” Quarterly Journal of Economics 70 (1): 65–94.10.2307/1884513Suche in Google Scholar

Wittenberg, Martin. 2007. “Testing For A Common Latent Variable In A Linear Regression.” Working Paper, Available at SSRN: http://ssrn.com/abstract=978395.Suche in Google Scholar

Wooldridge, Jeffrey. 2010. Econometric Analsyis of Cross Section and Panel Data, second edition. Cambridge: MIT Press.Suche in Google Scholar

Published Online: 2014-3-13
Published in Print: 2015-1-1

©2015 by De Gruyter

Heruntergeladen am 13.9.2025 von https://www.degruyterbrill.com/document/doi/10.1515/jem-2012-0008/html
Button zum nach oben scrollen