[Limdep Nlogit List] Deviance for unconditional fixed effects negative binomial regression model
William Greene
wgreene at stern.nyu.edu
Wed Aug 12 03:41:24 EST 2009
Richard. Three observations first:
(1) In your data, as we discussed, there are a quite small number
of huge observations. Unfortunately, the program cannot protect
itself against such values, because they are only huge in a specific
context. Fitting a nonlinear model such as a fixed effects negative
binomial model is not least squares. Barring collinearity, you can
regress anything on anything and get numbers. Would that it were so
for nonlinear models, but it is not. If the data are simply inconsistent
with the model, sometimes it is simply asking too much of an estimator
to fit the model you ask for to data that surely were not generated by the
process so described by the model. Trying to fit a Poisson or NB model
to data in which 99.99% of the data are in the range 0 to 10, say, and
a handful of observation are out in the range of 10,000 to 100,000 would
sound like such a case. I suspect that is what is happening in your
case. Strictly speaking, eliminating such data points is trimming, or
sample selection. On the other hand, it really is unlikely that such
outlying observations are generated by the same process that produces the
data near zero, which weakens the argument against the truncation.
(2) In computing a measure, such as the "deviance" that you seek in
this case, there is an ambiguity as to how many observations are being
used. The measure - correct me if I am wrong here - is
D = 2/(N-K) * Sum [y(i) * log(y(i)/lambda(i))]
where 0 * log(0) = 0. In a panel, you have N joint observations,
each composed of T(i) individual data points. But, given what is
being summed, it makes more sense to sum over the individual observations.
The problem, however, is that the observations in the panel are not
independent, so the meaning of the statistic, and its properties might
not be what the creators (McCullagh and Nelder?) thought they had for
the cross section case.
(3) In your discussion below, you mention needing the overdispersion
parameter to compute the deviance measure. I don't think this is the
case - see the formula above - but if I am incorrect, you can let me know
and there will be a small modification to the code below.
This said, it is easy to compute this measure for a model - assuming that
it does actually converge. Here is a sample based on a panel data set
that I use for illustrations. You should be able to modify this accordingly.
? Generate group count, _GROUPTI and ID number, _STRATUM
regr;lhs=one;rhs=one;panel;str=id$
? Defines RHS of model
namelist ; x = age,educ,hhninc,hhkids $
? Number of variables in model
calc ; nx=col(x)$
? Panel data version of NB model with fixed effects.
negbin;lhs=docvis;rhs=x,one;pds=_groupti;fem;par$
? Pick off slopes. This drops the dispersion parameter.
matrix;beta=b(1:nx)$
? Create the fitted values.
create ; lambdai=exp(alphafe(_stratum) + beta'x)$
? Satisfy curiosity about predictions
dstat;rhs=docvis,lambdai$
? Create deviance for each observation
create ; deviance = 0 $
create ; if(docvis > 0)deviance = docvis*(log(docvis)-log(lambdai))$
? Average to obtain the aggregate measrue.
calc ; list ; 2/(n-nx) * sum(deviance) $
Sincerely,
Bill Greene
----- Original Message -----
From: rjcal at u.washington.edu
To: limdep at limdep.itls.usyd.edu.au
Sent: Tuesday, August 11, 2009 10:23:38 AM GMT -05:00 US/Canada Eastern
Subject: [Limdep Nlogit List] Deviance for unconditional fixed effects negative binomial regression model
Dear LIMDEP users:
Hi, I'm a fairly new LIMDEP user. So here's my question: I'm estimating an
unconditional negative binomial fixed effects regression model, and would like
to know the deviance. I'd like to know how to do one of three things:
1. Get LIMDEP to display the point estimates for the fixed effects, in order to
calculate deviance manually using the LIMDEP output.
2. Use the point estimates from the LIMDEP output as starting values for SAS,
in order to get the SAS model to converge and get the deviance values that way.
3. Add "deviance;" or something else equally simple to the LIMDEP syntax...(Dr.
Greene, if that functionality doesn't exist yet, could you please consider
writing it for the next LIMDEP release?)
Details follow. Thanks!
Richard
My reason for doing this is that Allison and Waterman ("Negative Binomial
Regression Models," Sociological Methodology 32:247-265, 2003) argue that the
unconditional model is more like a true fixed effects model than the
conditional version, but the standard errors of estimated parameters are biased
downwards for their simulated data. They recommend adjusting the
standard errors for the unconditional model estimates upwards by dividing by
the square root of the ratio of the deviance to the degrees of freedom for the
model.
The benefit of running this model in LIMDEP (as opposed to manually adding the
fixed effects to a regular negative binomial regression model in another
package) is that Dr. Greene has done some wizardry to get these models to
converge (although it's still unstable and I've had to deal with outliers ahead
of time to avoid crashing LIMDEP). PROC GENMOD in SAS gives you the deviance,
SAS is failing to converge.
Here's some standard code for the model so we're all on the same page:
SKIP
NEGBIN; Lhs = Y1
; Rhs = one, age, age2, X1
; Pds = grpsize
; Fem
$
NOSKIP
So I think it's possible to calculate the deviance manually by first getting
the predicted values:
SKIP
NEGBIN; Lhs = RDLSM
; Rhs = one, age, age2, jbprsc
; Pds = grpsize
; fill; list
$
NOSKIP
That gives me a table with the following:
"Observation Observed Y Predicted Y Residual x(i)b Pr[Y*=y]"
So to get the deviance, I just need to know the overdispersion parameter
lambda, the observed Y values, and values for the parameter mu_it, such that
ln(mu_it) = delta_i + beta*X_it), where X is the vector of exogeneous variables
used in the model and delta is the estimated fixed effect. (Just to confirm:
it's correct to use the formula for mu from the Poisson distribution for the
Negative Binomial, right? The only difference should be the overdispersion
parameter...)
So I can see three options for getting the deviance:
1. Calculate it manually based on the output, but how do you get LIMDEP to
display the point estimates for the fixed effects?
2. Use the point estimates from the LIMDEP output as starting values for SAS,
in order to get the SAS model to converge and get the deviance values that way.
If anyone knows how to do this, I'd love to know!
3. Add "deviance;" or something else equally simple to the LIMDEP syntax...(Dr.
Greene, if that functionality doesn't exist yet, could you please consider
writing it for the next LIMDEP release?)
Finally I'm noticing that I can get the models in LIMDEP to converge in two
ways. My dependent variable is a scale, and I've gotten the model to converge
by using a dependent variable constructed based on censoring the individual
components of the scale and then adding them. I've just considered using a
"regular" unconditional negative binomial fixed effects model on the outcome,
but technically shouldn't I then be using a model for censored data?
Thanks everyone for your feedback!
Sincerely,
Richard Callahan
---
Richard Callahan
Graduate Student, Department of Sociology
University of Washington
(206) 769-9812
rjcal at u.washington.edu
http://students.washington.edu/rjcal/
_______________________________________________
Limdep site list
Limdep at limdep.itls.usyd.edu.au
http://limdep.itls.usyd.edu.au
More information about the Limdep
mailing list