Home Panel conditional and multinomial logit with time-varying parameters
Article
Licensed
Unlicensed Requires Authentication

Panel conditional and multinomial logit with time-varying parameters

  • Myoung-jae Lee EMAIL logo
Published/Copyright: November 1, 2014

Abstract

Panel conditional logit estimators (PCLE) in the literature use mostly time-constant parameters. If the panel periods are volatile or long, however, the model parameters can change much. Hence this paper generalizes PCLE with time-constant parameters to PCLE with time-varying parameters; both static and dynamic PCLE are considered for this. The main finding is that time-varying parameters are fully allowed for static PCLE and the dynamic “pseudo” PCLE of [Bartolucci, F. and V. Nigro. 2010. “A Dynamic Model for Binary Panel Data with Unobserved Heterogeneity Admitting a n-Consistent Conditional Estimator.” Econometrica 78: 719–733] that are thus recommended to practitioners. As a further generalization, static “panel conditional multinomial logit estimator (PML)” with time-varying parameters is also examined. As it turns out, time-varying parameters are also fully allowed for PML. With no error term serial correlation allowed in PCLE and dynamic PCLE’s being restrictive in their assumptions, time-varying parameters provide an alternative avenue to inject dynamics and flexibility into PCLE and PML. Since PCLE and PML converge straightforwardly in computation, allowing time-varying parameters in PCLE and PML is “computationally free.” A simulation study is also provided.

JEL Classification: C14; C33; C35

Corresponding author: Myoung-jae Lee, Department of Economics, Korea University, Seoul 136-701, South Korea, Phone: +82-2-3290-2229, Fax: +82-2-926-3601, e-mail:

Acknowledgments

The author is grateful to Yoosoon Chang and two anonymous reviewers for their comments. The research for this paper has been supported by the National Research Foundation of Korea Grant funded by the Korean Government (NRF-2011-327-B00072).

Appendix

Likelihood function for static PCLE

The joint probability for (Yi=Λ)∣(δi, Xi) is

(A.1) P ( Y i = Λ | δ i , X i ) = t P ( y i t = λ t | δ i , X i ) = t P ( y i t = λ t | δ i , x i t ) = t { 1 1 + e x p ( x i t β t + δ i ) } 1 λ t { e x p ( x i t β t + δ i ) 1 + e x p ( x i t β t + δ i ) } λ t = t e x p { λ t ( x i t β t + δ i ) } 1 + e x p ( x i t β t + δ i ) = e x p ( δ i t λ t ) e x p ( t λ t x i t β t ) t { 1 + e x p ( x i t β t + δ i ) } .  (A.1)

Write P(Y=Λ∣δ, X) just as P(Λ∣δ, X) and replace Λ with Y to get the joint likelihood:

(A.2) P ( Y | δ , X ) = e x p ( δ t y t ) e x p ( t y t x t β t ) t { 1 + e x p ( x t β t + δ ) } .  (A.2)

The probability of the sum of the random responses taking the sample value Σtyt is

(A.3) P ( t y t | δ , X ) = λ ¯ = y ¯ P ( λ 1 ,..., λ T | δ , X ) .  (A.3)

Substituting (A.1) into (A.3) and then using Σtλttyt, (A.3) becomes

(A.4) P ( t y t | δ , X ) = λ ¯ = y ¯ e x p ( δ t λ t ) e x p ( t λ t x t β t ) t { 1 + e x p ( x t β t + δ ) } = e x p ( δ t y t ) λ ¯ = y ¯ e x p ( t λ t x t β t ) t { 1 + e x p ( x t β t + δ ) } .  (A.4)

Divide P(Yδ, X) in (A.2) by Ptytδ, X) in (A.4) to obtain P(Y∣Σtyt, δ, X)=P(YtytX) in (2.2). The division removes two common terms, exp(δΣtyt) and {1+exp(xtβt+δ)}, and as the ratio is free of δ, Σtyt is a sufficient statistic for δ given X.

Recursive algorithm for static PCLE

We show first how to obtain the “pre-normalization” denominator λ¯=y¯iexp(tλtxitβt), and then how to normalize this; T can be allowed to vary across i (i.e., Ti) for unbalanced panels or cross-section data with a group-structure. The key point of the algorithm is that

G ( T , t y i t ) λ ¯ = y ¯ i e x p ( t λ t x i t β t )

satisfies the recursive formula

G ( T , t y i t ) = G ( T 1, t y i t ) + G ( T 1, t y i t 1 ) e x p ( x i T β T )

where  G ( T , t y i t ) 0  when  T < t y i t  or  t y i t < 0   and   G ( T , 0 ) 1.

Using this formula, G(T, Σtyit) can be computed for each i and for each value of β. As was stated in the main text, this algorithm is a modified version of Krailo and Pike (1984).

The logic for the formula is simple. Suppose there are two ones in four periods and consider allocating the two ones over four periods. Then the formula becomes

G ( 4, 2 ) = G ( 3, 2 ) + G ( 3, 1 ) e x p ( x i 4 β 4 ) :

‘two ones over four’ = ‘two ones over the first three, and zero in the last’ + ‘one over the first three, and one in the last’ ;

e x p ( x i 4 β 4 ) is for “one in the last.” The following special cases with T=2,3 will be helpful.

To see that the recursive formula holds, suppose T=2. From the definition of G(·, ··),

G ( T 1, 0 ) = G ( 1, 0 ) = 1  and  G ( T 1, 1 ) = G ( 1, 1 ) = e x p ( x i 1 β 1 ) ; G ( T , 0 ) = G ( 2, 0 ) = 1 , G ( T , 1 ) = G ( 2, 1 ) = e x p ( x i 1 β 1 ) + e x p ( x i 2 β 2 ) , G ( T , 2 ) = G ( 2, 2 ) = e x p ( x i 1 β 1 + x i 2 β 2 ) .

The recursive formula holds because (the left-hand side from the last display)

G ( 2, 0 ) = G ( 1, 0 ) + G ( 1, 1 ) e x p ( x i 2 β 2 ) = 1 + 0 = 1 ; G ( 2, 1 ) = G ( 1, 1 ) + G ( 1, 0 ) e x p ( x i 2 β 2 ) = e x p ( x i 1 β 1 ) + e x p ( x i 2 β 2 ) ; G ( 2, 2 ) = G ( 1, 2 ) + G ( 1, 1 ) e x p ( x i 2 β 2 ) = 0 + e x p ( x i 1 β 1 + x i 2 β 2 ) .

Now suppose T=3. Then, from the definition of G,

G ( T 1, 0 ) = G ( 2, 0 ) = 1, G ( T 1, 1 ) = G ( 2, 1 ) = e x p ( x i 1 β 1 ) + e x p ( x i 2 β 2 ) , G ( T 1, 2 ) = G ( 2, 2 ) = e x p ( x i 1 β 1 + x i 2 β 2 ) , G ( T , 0 ) = G ( 3, 0 ) = 1, G ( T , 1 ) = G ( 3, 1 ) = e x p ( x i 1 β 1 ) + e x p ( x i 2 β 2 ) + e x p ( x i 3 β 3 ) , G ( T , 2 ) = G ( 3, 2 ) = e x p ( x i 1 β 1 + x i 2 β 2 ) + e x p ( x i 1 β 1 + x i 3 β 3 ) + e x p ( x i 2 β 2 + x i 3 β 3 ) , G ( T , 3 ) = G ( 3, 3 ) = e x p ( x i 1 β 1 + x i 2 β 2 + x i 3 β 3 ) .

The recursive formula holds because (the left-hand side from the last display)

G ( 3, 0 ) = G ( 2, 0 ) + G ( 2, 1 ) = 1 + 0 = 1 ; G ( 3, 1 ) = G ( 2, 1 ) + G ( 2, 0 ) e x p ( x i 3 β 3 ) = e x p ( x i 1 β 1 ) + e x p ( x i 2 β 2 ) + e x p ( x i 3 β 3 ) ; G ( 3, 2 ) = G ( 2, 2 ) + G ( 2, 1 ) e x p ( x i 3 β 3 ) = e x p ( x i 1 β 1 + x i 2 β 2 ) + { e x p ( x i 1 β 1 ) + e x p ( x i 2 β 2 ) } e x p ( x i 3 β 3 ) = e x p ( x i 1 β 1 + x i 2 β 2 ) + e x p ( x i 1 β 1 + x i 3 β 3 ) + e x p ( x i 2 β 2 + x i 3 β 3 ) ; G ( 3, 3 ) = G ( 2, 3 ) + G ( 2, 2 ) e x p ( x i 3 β 3 ) = e x p ( x i 1 β 1 + x i 2 β 2 + x i 3 β 3 ) .

With time-constant regressors ci in xit, do the normalization with xi1β1:

(A.5) set  x i 1 β 1 = 0  and replace  x i t β t  by  x i t β t x i 1 β 1    t = 2, , T  (A.5)

where ci appears as ci(βtcβ1c) in xitβtxi1β1. With this modification for the denominator of the likelihood function, the normalized version exp{t2yit(xitβtxi1β1)} should be used accordingly for the numerator of the likelihood function.

Likelihood for dynamic PCLE with no regressors

With T=3 and βt=0, the likelihood function for (y1, y2, y3)∣(y0, δ) is

(A.6) P ( y 1 , y 2 , y 3 | y 0 , δ ) = e x p { ( α 1 y 0 y 1 + α 2 y 1 y 2 + α 3 y 2 y 3 ) + δ ( y 1 + y 2 + y 3 ) } { 1 + e x p ( α 1 y 0 + δ ) } { 1 + e x p ( α 2 y 1 + δ ) } { 1 + e x p ( α 3 y 2 + δ ) } .  (A.6)

Given (y0, δ), consider (y1=0, y2=1, y3) and (y1=1, y2=0, y3):

(A.7) P ( y 1 = 0, y 2 = 1, y 3 | y 0 , δ ) = e x p { α 3 y 3 + δ ( 1 + y 3 ) } { 1 + e x p ( α 1 y 0 + δ ) } { 1 + e x p ( δ ) } { 1 + e x p ( α 3 + δ ) } ; P ( y 1 = 1, y 2 = 0, y 3 | y 0 , δ ) = e x p { α 1 y 0 + δ ( 1 + y 3 ) } { 1 + e x p ( α 1 y 0 + δ ) } { 1 + e x p ( α 2 + δ ) } { 1 + e x p ( δ ) } .  (A.7)

The two denominators are the same under α2=α3, and thus only the numerators matter in the ratio of the first probability to the sum of the two probabilities in (A.7):

(A.8) P ( y 1 = 0, y 2 = 1, y 3 | y 0 , δ ) P ( y 1 = 1, y 2 = 0, y 3 | y 0 , δ ) + P ( y 1 = 0, y 2 = 1, y 3 | y 0 , δ ) = e x p { α 3 y 3 + δ ( 1 + y 3 ) } e x p { α 1 y 0 + δ ( 1 + y 3 ) } + e x p { α 3 y 3 + δ ( 1 + y 3 ) } = e x p ( α 3 y 3 α 1 y 0 ) 1 + e x p ( α 3 y 3 α 1 y 0 ) .  (A.8)

(A.8) can be written also as

P ( y 1 = 0, y 2 = 1, y 1 + y 2 = 1, y 3 | y 0 , δ ) P ( y 1 + y 2 = 1, y 3 | y 0 , δ ) = P ( y 1 = 0, y 2 = 1 | y 1 + y 2 = 1, y 0 , y 3 , δ ) .

Hence

P ( y 1 = 0, y 2 = 1 | y 1 + y 2 = 1, y 0 , y 3 , δ ) = e x p ( α 3 y 3 α 1 y 0 ) 1 + e x p ( α 3 y 3 α 1 y 0 ) .

The conditional log-likelihood function for (α1, α3) under α2=α3 with four waves is thus (Lc3); 1[yi1+yi2=1] is to use only the informative observations (yi1yi2) as in (2.4)

Likelihood for dynamic PCLE with the same last two-period regressors

Analogous to (A.7) is the likelihood for (y1, y2, y3)∣(y0, δ, X) with ytwtζt in:

P ( y 1 = 0, y 2 = 1, y 3 | y 0 , δ , X ) = e x p { α 3 y 3 + y 3 w 3 ζ 3 + ( x 2 β 2 + y 3 x 3 β 3 ) + δ ( 1 + y 3 ) } { 1 + e x p ( α 1 y 0 + y 0 w 1 ζ 1 + x 1 β 1 + δ ) } { 1 + e x p ( x 2 β 2 + δ ) } { 1 + e x p ( α 3 + w 3 ζ 3 + x 3 β 3 + δ ) } ; P ( y 1 = 1, y 2 = 0, y 3 | y 0 , δ , X ) = e x p { α 1 y 0 + y 0 w 1 ζ 1 + ( x 1 β 1 + y 3 x 3 β 3 ) + δ ( 1 + y 3 ) } { 1 + e x p ( α 1 y 0 + y 0 w 1 ζ 1 + x 1 β 1 + δ ) } { 1 + e x p ( α 2 + w 2 ζ 2 + x 2 β 2 + δ ) } { 1 + e x p ( x 3 β 3 + δ ) } .

The two denominators are equal if α2=α3, β2=β3, ζ2=ζ3 and x2=x3, under which we can proceed analogously to the preceding no-regressor case.

The ratio of the first probability to the sum of the two probabilities given α2=α3, β2=β3, ζ2=ζ3 and x2=x3 is

P ( y 1 = 0, y 2 = 1, y 3 | y 0 , δ , X , x 2 = x 3 ) P ( y 1 = 1, y 2 = 0, y 3 | y 0 , δ , X , x 2 = x 3 ) + P ( y 1 = 0, y 2 = 1, y 3 | y 0 , δ , X , x 2 = x 3 ) = P ( y 1 = 0, y 2 = 1 | y 0 , y 3 , y 1 + y 2 = 1, δ , X , x 2 = x 3 ) .

With Aexp{α3y3+y3w3ζ3+x2β2+y3x3β3+δ(1+y3)}, this equals

A e x p { α 1 y 0 + y 0 w 1 ζ 1 + x 1 β 1 + y 3 x 3 β 3 + δ ( 1 + y 3 ) } + A = e x p ( α 3 y 3 α 1 y 0 + y 3 w 3 ζ 3 y 0 w 1 ζ 1 + x 2 β 2 x 1 β 1 ) 1 + e x p ( α 3 y 3 α 1 y 0 + y 3 w 3 ζ 3 y 0 w 1 ζ 1 + x 2 β 2 x 1 β 1 ) .

Here, the parameters are (α1, α3, ζ1, ζ3, β1, β2) for the regressors (–y0, y3, –y0w1, y3w3, –x1,x2). This leads to (3.5), which becomes (Lhk3) when ζt=0 ∀t.

Likelihood for dynamic PCLE conditional on yT of Bartolucci and Nigro (2010)

Under (3.6),

P ( Y | y 0 , δ , X ) = t P ( y t | y t 1 , δ , X ) = t e x p { α t y t 1 y t + y t 1 y t w t ζ t + y t x t β t + δ y t + y t e t * ( δ , X ) } 1 + e x p { α t y t 1 + y t 1 w t ζ t + x t β t + δ + e t * ( δ , X ) } = t e x p { α t y t 1 y t + y t 1 y t w t ζ t + y t x t β t + δ y t } t [ e x p { e t * ( δ , X ) } ] y t t [ 1 + e x p { α t y t 1 + y t 1 w t ζ t + x t β t + δ + e t * ( δ , X ) } ] .

t [ e x p { e t * ( δ , x ) } ] y t in the numerator equals

[ 1 + e x p { α 2 + w 2 ζ 2 + x 2 β 2 + δ + e 2 * ( δ , X ) } 1 + e x p { x 2 ' β 2 + δ + e 2 * ( δ , X ) } ] y 1 [ 1 + e x p { α T + w T ζ T + x T β T + δ + e T * ( δ , X ) } 1 + e x p { x T β T + δ + e T * ( δ , X ) } ] y T 1 e x p { e T * ( δ , X ) } ] y T .

The denominator of P(Yy0, δ, X) can be written as

t [ 1 + e x p { α t + w t ζ t + x t β t + δ + e t * ( δ , X ) } ] y t 1 [ 1 + e x p { x t β t + δ + e t * ( δ , X ) } ] 1 y t 1 = t [ 1 + e x p { α t + w t ζ t + x t β t + δ + e t * ( δ , X ) } 1 + e x p { x t β t + δ + e t * ( δ , X ) } ] y t 1 [ 1 + e x p { x t β t + δ + e t * ( δ , X ) } ] .

Substituting these two displays gives

P ( Y | y 0 , δ , X ) = μ ( y 0 , y T , δ , X ) t e x p { α t y t 1 y t + y t 1 y t w t ζ t + y t x t β t + δ y t }

where μ(y0, yT, δ, X) equals

[ 1 + e x p { α 1 + w 1 ζ 1 + x 1 β 1 + δ + e 1 * ( δ , X ) } 1 + e x p { x 1 β 1 + δ + e 1 * ( δ , X ) } ] y 0 [ e x p { e T * ( δ , X ) } ] y T t [ 1 + e x p { x t β t + δ + e t * ( δ , X ) } ] .

From this P(Yy0, δ, X), Ptyt, yty0, δ, X) becomes

μ ( y 0 , y T , δ , X ) λ ¯ 1, T 1 = y ¯ e x p { t α t λ t 1 λ t + t λ t t 1 λ t w t ζ t + t λ t x t β t + δ t λ t } .

Dividing P(Yy0, δ, X) by Ptyt, yTy0, δ, X) removes μ(y0, yT, δ, X) and δΣtyt to result in (3.8) after the normalization with exp(tytx1β1)=exp(tλtx1β1) and exp(yTxTβT)=exp(λTxTβT). When T=3, recalling (Lhk3) to use y1y2=λ1λ2=0 for the two useful sequences (y0, 0, 1, y3) and (y0, 1, 0, y3), (3.8) becomes (3.9).

Likelihood for PML with three alternatives and two waves

The likelihood function for Y=(ya1, yb1, yc1, ya2, yb2, yc)′ is (note Σjyjt=1 ∀t)

(A.9) P ( Y | W , δ a , δ b , δ c ) = Π j ( e x p ( w j 1 γ j 1 + δ j δ a ) j e x p ( w j 1 γ j 1 + δ j δ a ) ) y j 1 ( e x p ( w j 2 γ j 2 + δ j δ a ) j e x p ( w j 2 γ j 2 + δ j δ a ) ) y j 2 = e x p { j y j 1 ( w j 1 γ j 1 + δ j δ a ) + j y j 2 ( w j 2 γ j 2 + δ j δ a ) } j e x p ( w j 1 γ j 1 + δ j δ a ) j e x p ( w j 2 γ j 2 + δ j δ a ) = e x p { j y j 1 w j 1 γ j 1 + j y j 2 w j 2 γ j 2 + j ( y j 1 + y j 2 ) ( δ j δ a ) } j e x p ( w j 1 γ j 1 + δ j δ a ) j e x p ( w j 2 γ j 2 + δ j δ a ) ;  (A.9)

y j1+yj2, j=a, b, c, are candidates to condition on to remove δjδa, j=a, b, c. Observe

(A.10) P ( y j 1 + y j 2 , j = a , b , c | W , δ a , δ b , δ c ) = λ ¯ j = y ¯ j j e x p { j λ j 1 w j 1 γ j 1 + j λ j 2 w j 2 γ j 2 + j ( λ j 1 + λ j 2 ) ( δ j δ a ) } j e x p ( w j 1 γ j 1 + δ j δ a ) j e x p ( w j 2 γ j 2 + δ j δ a ) = e x p { j ( y j 1 + y j 2 ) ( δ j δ a ) } λ ¯ j = y ¯ j j e x p ( j λ j 1 w j 1 γ j 1 + j λ j 2 w j 2 γ j 2 ) j e x p ( w j 1 γ j 1 + δ j δ a ) j e x p ( w j 2 γ j 2 + δ j δ a ) .  (A.10)

Dividing (A.9) by (A.10) renders (4.4).

References

Anderson, E. B. 1970. “Asymptotic Properties of Conditional Maximum Likelihood Estimators.” Journal of the Royal Statistical Society (Series B) 32: 283–301.10.1111/j.2517-6161.1970.tb00842.xSearch in Google Scholar

Arellano, M. and B. Honoré. 2001. “Panel Data Models: Some Recent Developments.” In Handbook of Econometrics 5, edited by J. J. Heckman and E. Leamer. North-Holland: Elsevier.10.1016/S1573-4412(01)05006-1Search in Google Scholar

Baltagi, B. H. 2013. Econometric Analysis of Panel Data, 5th ed. Chichester, West Sussex, UK: Wiley.10.1002/9781118445112.stat03160Search in Google Scholar

Bartolucci, F. and V. Nigro. 2010. “A Dynamic Model for Binary Panel Data with Unobserved Heterogeneity Admitting a -Consistent Conditional Estimator.” Econometrica 78: 719–733.10.3982/ECTA7531Search in Google Scholar

Bartolucci, F., and V. Nigro. 2012. “Pseudo Conditional Maximum Likelihood Estimation of the Dynamic Logit Model for Binary Panel Data.” Journal of Econometrics 170: 102–116.10.1016/j.jeconom.2012.03.004Search in Google Scholar

Cai, Z. 2007. “Trending Time-varying Coefficient Time Series Models with Serially Correlated Errors.” Journal of Econometrics 136: 163–188.10.1016/j.jeconom.2005.08.004Search in Google Scholar

Chamberlain, G. 1980. “Analysis of Covariance with Qualitative Data.” Review of Economic Studies 47: 225–238.10.2307/2297110Search in Google Scholar

Chamberlain, G. 1984. “Panel Data.” In Handbook of Econometrics 2, edited by Z. Griliches and M. Intrilligator. North Holland: Amsterdam.Search in Google Scholar

Chamberlain, G. 1985. “Heterogeneity, Omitted Variable Bias and Duration Dependence.” In Longitudinal Analyses of Labor Market Data, edited by J. J. Heckman and B. Singer. San Diego: Academic Press.10.1017/CCOL0521304539.001Search in Google Scholar

Honoré, B. E., and E. Kyriazidou. 2000. “Panel Data Discrete Choice Models with Lagged Dependent Variables.” Econometrica 68: 839–874.10.1111/1468-0262.00139Search in Google Scholar

Hoover, D. R., J. A. Rice, C. O. Wu, and L. P. Yang. 1998. “Nonparametric Smoothing Estimates of Time-varying Coefficient Models with Longitudinal Data.” Biometrika 85: 809–822.10.1093/biomet/85.4.809Search in Google Scholar

Hsiao, C. 2003, Analysis of Panel Data, 2nd ed. Cambridge, UK: Cambridge University Press.Search in Google Scholar

Krailo, M. D., and M. C. Pike. 1984. “Algorithm AS 196: Conditional Multivariate Logistic Analysis of Stratified Case-control Studies.” Journal of the Royal Statistical Society (Series C) 33: 95–103.10.2307/2347671Search in Google Scholar

Lechner, M., S. Lollivier, and T. Magnac. 2008. “Parametric Binary Choice Models.” In The Econometrics of Panel Data, Chapter 7, edited by L. Mátyás and P. Sevestre, Heidelberg, Germany: Springer.10.1007/978-3-540-75892-1_7Search in Google Scholar

Lee, M. J. 2002. Panel Data Econometrics: Methods-of-moments and Limited Dependent Variables. San Diego, CA: Academic Press.Search in Google Scholar

Lee, M. J. 2014. “Panel Conditional and Multinomial Logit Estimators.” In The Oxford Handbook of Panel Data Econometrics, edited by B. Baltagi. Oxford University Press, accepted for publication.10.1093/oxfordhb/9780199940042.013.0007Search in Google Scholar

Lee, M. J. and Y. S. Kim. 2007. “Multinomial Choice and Nonparametric Average Derivatives.” Transportation Research Part B: Methodological 41: 63–81.10.1016/j.trb.2006.03.003Search in Google Scholar

Rasch, G. 1961. “On General Law and the Meaning of Measurement in Psychology.” Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability 4: 321–333.Search in Google Scholar

Wu, C. O. and K. F. Yu. 2002. “Nonparametric Varying-coefficient Models for the Analysis of Longitudinal Data.” International Statistical Review 70: 373–393.10.1111/j.1751-5823.2002.tb00176.xSearch in Google Scholar


Supplemental Material

The online version of this article (DOI: https://doi.org/10.1515/snde-2014-0003) offers supplementary material, available to authorized users.


Published Online: 2014-11-01
Published in Print: 2015-06-01

©2015 by De Gruyter

Downloaded on 21.11.2025 from https://www.degruyterbrill.com/document/doi/10.1515/snde-2014-0003/html
Scroll to top button