Longitudinal Data Analysis 
Using Stata

A 2-Day Seminar on Regression Analysis for Panel Data
Taught by Paul D. Allison, Ph.D. 

The most common type of longitudinal data is panel data, consisting of measurements of predictor and response variables at two or more points in time for many individuals. Such data have two major attractions: the ability to control for unobservables, and the determination of causal ordering.

However, there is also a major difficulty with panel data: repeated observations are typically correlated and this invalidates the usual assumption that observations are independent. There are four widely available methods for dealing with dependence: robust standard errors, generalized estimating equations, random effects models and fixed effects models. This course examines each of these methods in some detail, with an eye to discerning their relative advantages and disadvantages. Different methods are considered for quantitative outcomes, categorical outcomes, and count data outcomes.

This is a hands-on course with ample opportunity for participants to practice the different methods. 


If you need to analyze longitudinal data and have a basic statistical background, this course is for you. You should have a good working knowledge of the principles and practice of multiple regression, as well as elementary statistical inference. And it is also helpful to have some familiarity with logistic regression. But you do not need to know matrix algebra, calculus, or likelihood theory. 


This seminar will use Stata for the many empirical examples and exercises. However, no previous experience with Stata is assumed. Lecture notes and exercises using SAS are also available on request.


Participants receive a bound manual containing detailed lecture notes (with equations and graphics), examples of computer printout, and many other useful features.


1. Opportunities and challenges of panel data.
    a. Data requirements
    b. Control for unobservables
    c. Determining causal order
    d. Problem of dependence
    e. Software considerations

2. Linear models
   a. Robust standard errors
   b. Generalized estimating equations
   c. Random effects models
   d. Fixed effects models
   e. Hybrid models

3. Logistic regression models
   a. Robust standard errors
   b. Generalized estimating equations
   c. Subject-specific vs. population averaged methods
   d. Random effects models
   e. Fixed effects models
    f. Hybrid models

4. Count data models
   a. Poisson models
   b. Negative binomial models
   c. Fixed and random effects 

5. Linear structural equation models
   a. Fixed and random effects in the SEM context
   b. Models for reciprocal causation with lagged effects

