Topic 5: Ordered logit and LIMDEP
Contents: Ordered
categories as dependent variables; Introduction to LIMDEP;
Basic LIMDEP commands; Using LIMDEP to carry out
ordered logit analysis; Interpreting
the results from ordered logit analysis; Further
reading and acknowledgement; References; Examples.
Ordered categories as dependent variables.
If we have a dependent variable which is measured only on an ordinal
scale, strictly speaking we cannot use linear regression to examine it.
However, in practice, so long as the dependent variable has a reasonable
number of levels, regression will work perfectly adequately. If the dependent
variable is dichotomous, we can use discriminant analysis or logistic regression.
But what about the intermediate case, where the dependent variable has
3 to perhaps 6 different levels? In such cases, ordinary linear regression
may give misleading results. We need to use ordered logit analysis.
There are a number of types of ordered logit model; what is described here
is the one most commonly used, called the proportional odds model,
which uses cumulative logits.
Like logistic regression, ordered logit uses maximum likelihood methods,
and finds the best set of regression coefficients to predict values of
the logit-transformed probability that the dependent variable falls into
one category rather than another. Logistic regression assumes that if the
fitted probability, p, is greater than 0.5, the dependent variable
should have value 1 rather than 0. Ordered logit doesn't have such a fixed
assumption. Instead, it fits a set of cutoff points. If there are r
levels of the dependent variable (1 to r), it will find r-1
cutoff values k1 to kr-1 such that
if the fitted value of logit(p) is below k1, the
dependent variable is predicted to take value 0, if the fitted value of
logit(p) is between k1 and k2,
the dependent variable is predicted to take value 1, and so on. As with
logistic regression, we get an overall chi-square for the goodness of fit
of the entire fitted model, and we can also use a chi-squared test to assess
the improvement due to adding an extra independent variable or group of
independent variables. As with logistic regression, a crucial piece of
information for evaluating the fit of the model is a table of predicted
versus observed category membership.
back to top
Introduction to LIMDEP
Unfortunately ordered logit is not available in SPSS. It can be done
in SAS, but we have no licence for that at Exeter. However, ordered logit
can also be found in a package called LIMDEP, which specialises in fitting
models with LIMited DEPendent variables (though it also contains procedures
for doing ordinary regression). It was written with econometricians in
mind, and is most used in economics departments; however, we have a licence
for Version 6.0 on Singer. LIMDEP is not as comprehensive as SPSS, and
not as easy to use as Minitab, but it is not difficult to use it for a
restricted purpose such as doing an ordered logit analysis on data which
we have already prepared for SPSS.
Even if we only want to use LIMDEP to carry out this single type of
analysis, however, we need to know a little bit about it: its command syntax,
how to prepare a data file for the analysis, and how it deals with the
basic functions every statistics package must implement. These basic functions
include starting a session, reading in a text file, assigning names to
columns of data, transformations and other calculations based on columns
of data, dealing with missing values, creating and using dummy variables,
finishing a session, creating a file of commands that can be edited and
reused, and sending output to a file for editing or printing.
All these are described in the comprehensive LIMDEP manual (800+ pages).
The abridged version (200 pages) is fine if you have once known what to
do and just need reminding, and for some purposes is as good as the full
manual. The manuals refer to our version as the "mainframe" version;
the PC version, around which the manuals are based, contains some additional
facilities. For example, there is no usable HELP facility on the Singer
version.
back to top
Basic LIMDEP commands
This section aims to give you just enough information to enable you
to take data that you have prepared for SPSS (or produced from SPSS or
Minitab using a WRITE command) and get them into LIMDEP ready for
an ordered logit analysis. It does not cover all the facilities of LIMDEP,
which include some useful short cuts not available in SPSS. If you find
yourself using LIMDEP a lot, read through the abridged manual to find out
what is available.
In this section, LIMDEP commands are printed in bold type; these
should be typed in to the computer exactly as given here. The bits in italic
type are where you have to substitute in information that is specific to
your project.
- LIMDEP command syntax. Like SPSS and Minitab, LIMDEP has commands and
subcommands (though the LIMDEP manual does not use the word "subcommands"
for the latter). All commands and subcommands can be typed in upper or
lower case or any mixture. In these notes, commands will be put in UPPER
CASE and subcommands in lower case. Subcommands are separated by semi-colons
(same as Minitab; SPSS uses /). The entire command is terminated by $ (Singer
SPSS uses .). LIMDEP variable names consist of 8 letters or numbers, starting
with a letter (similar to SPSS).
- Preparing data files for LIMDEP. As for SPSS or Minitab, we want the
data to appear in orderly columns, with all the data for the first person
followed by all the data for the second person, etc. Text data files prepared
for use with SPSS or Minitab, or produced from either with its WRITE command,
should be suitable provided they are sensibly formatted. There is one catch,
however: the dependent variable for an ordered logit analysis MUST be coded
with values 0, 1, 2..., NOT 1, 2, 3..., -1, 0, 1, or any of the other logically
equivalent possibilities. If an alternative code has been used, the values
must be converted either before transferring the data to LIMDEP, or after
they have been read in but before an ordered logit analysis is attempted
(you would use the COMPUTE command for this, see below). Missing
values should preferably be coded with a digit or pattern of digits that
will never occur in real data (as in SPSS), though an asterisk (as in Minitab)
can be used (but see below)
- Starting a session. At the Singer prompt, type limdep. This
will produce an introductory screen. Type start. This will give
you the LIMDEP prompt; confusingly, you are taken to the line below to
type in your response.
- Reading in a text file. All data must be numerical. The command is
READ;nvar=number of variables;file=name of file;names=full
list of names$
This version of the command assumes that the data are typed case by
case, as for the SPSS DATA LIST command, or Minitab READ.
The subcommands nvar, file, and names must all be
present. Note that the names in the list must be separated by commas,
but there must not be a comma after the last name.
- Assigning names to columns of data. This is done directly by the READ
command, or by other commands such as CREATE, see below (similar
to SPSS). There is no way of assigning names to levels of variables (i.e.
no equivalent of SPSS VALUE LABELS)
- Transformations and computations based on columns. The usual command
is
CREATE;if(logical expression) name=expression$
This sets the value of a variable called name equal to expression
for all cases in which logical expression is true; if name
already exists, its values are overwritten. For cases where logical
expression is false, name is set to 0 if it is a new variable,
or left unchanged if it already exists. if(logical expression)
can be omitted. If if is present, the subcommand else name=expression
may be used; the two names need not be the same. All the usual logical
and arithmetic operators (>, <, =, &, +, -, etc) are available,
as are standard mathematical functions. The procedure is very similar to
SPSS IF or COMPUTE, and in simple cases it is similar to
Minitab LET.
- Missing values. Numerical codes (from SPSS) or
alphabetical codes (in particular the Minitab * code) can both be used
in LIMDEP data files. However, alphabetical codes (including *) are changed
to -999 on input.
We have to tell LIMDEP explicitly to ignore cases including the specified
missing values. This can be done with the commands
SAMPLE;all$
REJECT;logical expression$
An example of a logical expression would be age=99+incomegp=9,
where 99 is the missing value code for age and 9 is the missing
value code for incomegp. Note the use of + to mean 'or',
and the absence of parentheses between expressions linked by +.
The SAMPLE;all command restores the full data set; successive REJECTs
without intervening SAMPLE commands would have the same effect as
linking in further logical expressions by +. Note that we have to
set up the correct REJECTs before doing our analysis; the analysis
commands do not themselves detect missing values.
- Creating and using dummy variables. This has to be done by brute force,
as in the following example, based on a 5-level categorical variable, worktype:
CREATE;if (worktype=1) fulltime=1$
CREATE;if (worktype=2) parttime=1$
CREATE;if (worktype=3) housespo=1$
CREATE;if (worktype=4) unemploy=1$
CREATE;if (worktype=5) retired=1$
Note that we don't have to worry about missing values in the dummy
variables, because we will deal with them by setting up a REJECT
based on worktype.
- Finishing a session. To leave LIMDEP, type STOP$ (same as Minitab;
SPSS uses FINISH)
- Setting up a file of commands to re-use. Strings of commands, using
exactly the same syntax as outlined here, can be prepared as text files
using any Singer text editor. Then if you enter LIMDEP as usual, and enter
the command
OPEN;input=filename$
your commands will be read in and executed. If you don't want to see
intermediate results, precede OPEN by the command FAST$. When all
the commands in the file have been executed, the LIMDEP prompt will appear,
and you can continue working interactively.
- Sending output to a file for editing or printing. This is done by the
command
OPEN;output=filename$
Like Minitab OUTFILE, this copies subsequent screen output (or
most of it) to filename. To stop copying, use the command CLOSE$
(compare Minitab NOOUTFILE).
back to top
Using LIMDEP for ordered logit
The command for ordered logit in LIMDEP is the following:
ORDERED PROBIT;lhs=name of DV;rhs=one,names of
IVs;logit;output=5$
The command name can be abbreviated to ORDER. As for the READ
command, the names in the list of IVs must be separated by commas, and
there must not be a comma after the last one. The subcommand output=5
reduces the number of lines produced in the output file.
According to the manuals, a subcommand alg=S can be used to produce
a stepwise procedure, but the manual advises against it, presumably because
it is vulnerable to local maxima so that a more hand-guided approach to
finding the best model is likely to do better. In any case I have not been
able to get this subcommand to work.
Output from the program is pretty well self-explanatory, except for
three problems. The first problem is that the output is headed "Ordered
Probit Model", which is confusing since ordered probit and ordered
logit are two different kinds of analysis. The heading arises because LIMDEP
uses the same command for both (the subcommand logit tells it which
version we want). Second, before LIMDEP embarks on the ordered logit analysis,
it does an approximate linear regression, and reports the results. It's
easy to read these by mistake instead of the results we actually want.
It is probably sensible to delete them from the output file before printing
it out, to save paper and to avoid confusion. Finally, if you forget to
recode the dependent variable so it starts from level 0, you will get the
message "Insufficient variation in dependent variable", which
is unlikely to help you realise what has gone wrong.
The same command syntax is used for all LIMDEP's model-fitting commands.
For example, a linear regression would be carried out by
REGRESS;lhs=name of DV;rhs=one,names of IVs$
back to top
Interpreting the output from ordered logit
analysis
The same five questions can be addressed as with other regression-type
analyses that we have considered previously:
- How well does the model account for the data? Ordered logit
does not produce an R2adj statistic. It does
produce a chi-squared value for the model, and this could probably be converted
to the LRFC1 statistic recommended by Darlington (1990,
chapter 18) for logistic
regression, but I have not found a statistical authority for doing
this. However the most useful process is probably to examine the classification
table produced at the end of the analysis, comparing actual group membership
with membership predicted on the basis of the model. As well as giving
us a measure of goodness of fit (what proportion of cases were correctly
predicted?) this may alert us to problems with the analysis - for example
if the model does not predict any cases in one or more of the categories.
- Is the overall relationship between the IVs and the DV significant?
This is addressed by the log-likelihood ratios for the model. The
LIMDEP output will include the log likelihood ratio for the null model,
in which the coefficients for all regressors are taken as zero, and also
for the fitted model. The difference between these two LLRs, multiplied
by two, is distributed like chi-squared with degrees of freedom equal to
the number of IVs, and so can be used to test the overall significance
of the model. In the same way, we can test the significance of adding a
group of regressors to a model.
- What is the effect of an individual IV on the DV in the presence
of all the other IVs? LIMDEP gives regression coefficients which can
be interpreted in the usual way, though note that as in logistic regression
they give the effects of a unit increase of the IV on the log odds of the
DV taking a higher value, not on the DV itself.
- Is the effect of an individual IV significant in the presence of
all the other IVs? This is tested by a t value associated with
each IV, exactly as for linear regression. Note that unlike SPSS's LOGISTIC
REGRESSION command, it is a t rather than a chi-squared test
statistics that is produced. The mathematics of this mean that marginally
significant values should be regarded with caution where sample sizes are
small.
- What are the relative importances of IVs in predicting the DV value?
As in logistic regression, we cannot calculate
beta-weights from the regression coefficients, but we can get a
measure of the relative importance of different IVs by multiplying each
by the standard deviation of the corresponding independent variable. (LIMDEP
does not, unfortunately, do this for you, though it does give the standard
deviation of each IV in the regression table). Note, though, that these
quotients do not have the interpretation they have in linear regression,
of being the regression coefficients you would get if you reran the regression
after standardising the variables: because of the categorical nature
of the DV, it would not be meaningful to standardise it.
back to top
Further reading & acknowledgement
If you can cope with the maths, much information is to be found in the
books by Agresti (1984, 1990). For further details about LIMDEP, see Greene
(1992). For examples of the use, interpretation, and presentation of the
results of ordered logit analysis, see Lea, Webley
and Levine (1993) and Lea, Webley and Walker
(1995).
My knowledge of both ordered logit and LIMDEP is largely owed to help
from Dr Nichola Crichton, formerly of the MSOR department. I have a set
of notes she wrote for me which are more help than any of the books, and
I will lend them to anyone who has to get into this analysis.
back to top
References
- Agresti, A. (1984), Analysis of ordinal categorical data. New
York: Wiley
- Agresti, A. (1990), Categorical data analysis. New York, Wiley.
- Darlington, R. B. (1990), Regression and linear models. New
York: McGraw-Hill.
- Greene, W. H. (1992). LIMDEP user's manual and reference guide.
Belport NY: Econometric software.
- Lea, S. E. G., Webley, P. & Levine, R. M. (1993). The economic
psychology of consumer debt. Journal of Economic Psychology, 14,
85-119.
- Lea, S. E. G., Webley, P. & Walker, C. M. (1995). Psychological
factors in consumer debt: money management, economic socialzation, and
credit use. Journal of Economic Psychology, 16, 681-701.
back to top
Examples
The file /singer1/eps/psybin/stats/hefce.txt contains (hypothetical)
data from a study in which 100 recent graduates rated the teaching in their
departments as "excellent", "satisfactory" or "unsatisfactory".
It could be read into SPSS with the following command file, which is
available as /singer1/eps/psybin/stats/hefce.in on Singer; it is also available
on the fileserver PSYCHO.
DATA LIST file='/singer1/eps/psybin/stats/hefce.txt' /
gender 1 dept 3 origin 5 examclas 7 rating 9.
VALUE LABELS gender 1 'male' 2 'female' /
dept 1 'chemistry' 2 'law' 3 'history' /
origin 1 'UK' 2 'EU' 3 'overseas' /
examclas 1 'first' 2 'two1' 3 'two2' 4 'third' /
rating 1 'unsatisfactory' 2 'good' 3 'excellent'.
MISSING VALUES gender dept origin examclas rating (9).
DO REPEAT x=UKorigin,EUorigin,OSorigin / i=1 to 3.
COMPUTE x=0.
IF origin=i x=1.
IF missing(origin) x=9.
END REPEAT.
DO REPEAT x=deptchem,deptlaw,depthist / i=1 to 3.
COMPUTE x=0.
IF dept=i x=1.
IF missing(dept) x=9.
END REPEAT.
MISSING VALUES UKorigin,EUorigin,OSorigin,deptlaw,deptchem,depthist
(9).
- Write out the corresponding command file for LIMDEP.
- Prepare the LIMDEP version of the command file, using the Singer editor.
Use each version to read the data into the package concerned.
- Use linear regression to predict ratings from the other variables in
both packages, print your output, and compare their results.
- Use LIMDEP to analyse the same data using ordered logit (Hint: don't
forget to think about the coding of the dependent variable). Print
your output, and compare the results with those you obtained using linear
regression.
back to top
Stephen Lea
University of Exeter
Department of Psychology
Washington Singer Laboratories
Exeter EX4 4QG
United Kingdom
Tel +44 1392 264626
Fax +44 1392 264623
Send questions and
comments to the departmental
administrator or to the author
of this page
Goto Home page for
this course | previous topic | FAQ
file
Goto home page for: University of
Exeter | Department of
Psychology | Staff
| Students |
Research | Teaching
| Miscellaneous
(access count since 1st March 1997).
Document revised 11th March 1997