Elsevier

Journal of Econometrics

Volume 122, Issue 2, October 2004, Pages 281-291
Journal of Econometrics

Regression systems for unbalanced panel data: a stepwise maximum likelihood procedure

https://doi.org/10.1016/j.jeconom.2003.10.023Get rights and content

Abstract

The estimation of systems of regression equations with random individual effects from unbalanced panel data, where the unbalance is due to random attrition or accretion, by generalized least squares (GLS) and maximum likelihood (ML) is considered. In order to utilize the previous results for the balanced case, it is convenient to arrange the individuals in groups according to the number of times they are observed. It is shown that the GLS estimator can be interpreted as a matrix weighted average of the group specific GLS estimators with weights equal to the inverse of their respective covariance matrices. A stepwise algorithm for solving the ML problem, which can be interpreted as a compromise between the solution to the group specific ML problems, is presented.

Introduction

Systems of regression equations have a long history in econometrics. Notable examples are systems of demand equations for inputs or consumer goods derived from producers’ or consumers’ optimizing behaviour. The reduced form of a structural model also has this format. This history is substantially shorter in panel data econometrics. Starting from the textbook model for single equations with balanced panel data and random effects (see e.g., Greene (2003, Chapter 13)), Avery (1977), Baltagi (1980), and Magnus (1982) generalized this framework and the estimation procedures to ‘seemingly unrelated’ regression systems, Magnus (1982) by considering maximum likelihood (ML).1 These generalizations also assumed balanced panel data, which is a very restrictive assumption from a practical point of view, since data sets where the time series of the different units have unequal length, often is the rule, rather than the exception. Single equation extensions of the textbook model to models with unbalanced panel data and random individual effects were considered in Biørn (1981) and Baltagi (1985); see also Verbeek and Nijman (1996). Substantial efficiency may be lost by dropping observations from an unbalanced data set to make it balanced; see Mátyás and Lovrics (1991) and Baltagi and Chang (1994).

The purpose of this paper is to integrate, for random effects situations, the regression system ML approach to balanced panel data with the single equation approach to unbalanced panel data, when the attrition or accretion is random. As a preliminary to the ML problem, the generalized least-squares (GLS) problem is considered. Time specific random effects, which often are of secondary interest for micro data from households, firms, etc., are ignored.2

Section snippets

Model and notation

Consider a system of G regression equations, indexed by g=1,…,G, with observations from an unbalanced panel with N individuals, indexed by i=1,…,N. The individuals are observed in at least one and at most P periods. Let Np denote the number of individuals observed in p periods (not necessarily the same and not necessarily consecutive), p=1,…,P, and let n be the total number of observations, i.e., N=∑p=1PNp and n=∑p=1PNpp. Assume that the individuals are ordered in P groups such that the N1

GLS estimation

Before addressing the ML problem, we consider the GLS problem for β when Σu and Σα are known, i.e., the problem of minimizing Q=∑p=1Pi∈Ipεi(p)Ωε(p)−1εi(p) with respect to β. Since Ωε(p)−1=KpΣu−1+Jp⊗(Σu+pΣα)−1, we can rewrite Q asQ=p=1Pi∈Ipεi(p)′[KpΣu−1]εi(p)+p=1Pi∈Ipεi(p)′[JpΣ(p)−1]εi(p).

GLS estimation for the individuals observed p times: We may, as a preliminary to full GLS estimation, apply GLS on the observations for the individuals observed p times, denoted as group p, separately

ML estimation

We next consider ML estimation of β, Σα, and Σu when assuming normality of the individual effects and the disturbances, i.e., αiIIN(0,Σα), uitIIN(0,Σu). Then the εi(p)|Xi(p)'s are independent across i(p) and distributed as N(0Gp,1,Ωε(p)). The log-likelihood function of all y's conditional on all X's for the individuals in group p and for all individuals then become, respectively,L(p)=−GNpp2ln(2π)−Np2ln|Ωε(p)|−12Q(p)(β,Σα,Σu),L=p=1PL(p)=−Gn2ln(2π)−12p=1PNpln|Ωε(p)|−12Q(β,Σu,Σα),whereQ(p)=Q(p)

Acknowledgements

I thank an editor, four referees, Terje Skjerpen, and Knut R. Wangen for helpful comments.

References (18)

There are more references available in the full text version of this article.

Cited by (70)

  • Technological leadership and firm performance in Russian industries during crisis

    2021, Journal of Business Venturing Insights
    Citation Excerpt :

    We did this with the help of the seemingly unrelated regression equations (SURE) technique, which explicitly provides for this possibility. Given the panel nature of our data, we utilized the one-way random effect estimation of seemingly unrelated regressions based on the derivations by Biørn (2004) as explained in Nguyen and Nguyen (2010). As some of the variables (e.g., industry) were time-invariant, fixed effects estimation was not appropriate.

  • New evidence on international risk-sharing in the Economic Community of West African States (ECOWAS)

    2021, International Economics
    Citation Excerpt :

    For this purpose, we use the SUR (Seemingly Unrelated Regression) estimation method which takes into account heteroscedasticity and the contemporary correlation of residues between equations. The SUR method usually takes into account the individual correlation at a given period, while assuming zero correlation between two hazards as soon as the periods are different (Biørn, 2004). Since the study does not individually constrain the coefficients β, they can be greater than 1 or even negative.

View all citing articles on Scopus
View full text