University of Exeter

DEPARTMENT OF PSYCHOLOGY


PSY6003 Advanced statistics: Multivariate analysis II: Manifest variables analyses

Topic 5: Ordered logit and LIMDEP


Contents: Ordered categories as dependent variables; Introduction to LIMDEP; Basic LIMDEP commands; Using LIMDEP to carry out ordered logit analysis; Interpreting the results from ordered logit analysis; Further reading and acknowledgement; References; Examples.

Ordered categories as dependent variables.

If we have a dependent variable which is measured only on an ordinal scale, strictly speaking we cannot use linear regression to examine it. However, in practice, so long as the dependent variable has a reasonable number of levels, regression will work perfectly adequately. If the dependent variable is dichotomous, we can use discriminant analysis or logistic regression. But what about the intermediate case, where the dependent variable has 3 to perhaps 6 different levels? In such cases, ordinary linear regression may give misleading results. We need to use ordered logit analysis. There are a number of types of ordered logit model; what is described here is the one most commonly used, called the proportional odds model, which uses cumulative logits.

Like logistic regression, ordered logit uses maximum likelihood methods, and finds the best set of regression coefficients to predict values of the logit-transformed probability that the dependent variable falls into one category rather than another. Logistic regression assumes that if the fitted probability, p, is greater than 0.5, the dependent variable should have value 1 rather than 0. Ordered logit doesn't have such a fixed assumption. Instead, it fits a set of cutoff points. If there are r levels of the dependent variable (1 to r), it will find r-1 cutoff values k1 to kr-1 such that if the fitted value of logit(p) is below k1, the dependent variable is predicted to take value 0, if the fitted value of logit(p) is between k1 and k2, the dependent variable is predicted to take value 1, and so on. As with logistic regression, we get an overall chi-square for the goodness of fit of the entire fitted model, and we can also use a chi-squared test to assess the improvement due to adding an extra independent variable or group of independent variables. As with logistic regression, a crucial piece of information for evaluating the fit of the model is a table of predicted versus observed category membership.

back to top

Introduction to LIMDEP

Unfortunately ordered logit is not available in SPSS. It can be done in SAS, but we have no licence for that at Exeter. However, ordered logit can also be found in a package called LIMDEP, which specialises in fitting models with LIMited DEPendent variables (though it also contains procedures for doing ordinary regression). It was written with econometricians in mind, and is most used in economics departments; however, we have a licence for Version 6.0 on Singer. LIMDEP is not as comprehensive as SPSS, and not as easy to use as Minitab, but it is not difficult to use it for a restricted purpose such as doing an ordered logit analysis on data which we have already prepared for SPSS.

Even if we only want to use LIMDEP to carry out this single type of analysis, however, we need to know a little bit about it: its command syntax, how to prepare a data file for the analysis, and how it deals with the basic functions every statistics package must implement. These basic functions include starting a session, reading in a text file, assigning names to columns of data, transformations and other calculations based on columns of data, dealing with missing values, creating and using dummy variables, finishing a session, creating a file of commands that can be edited and reused, and sending output to a file for editing or printing.

All these are described in the comprehensive LIMDEP manual (800+ pages). The abridged version (200 pages) is fine if you have once known what to do and just need reminding, and for some purposes is as good as the full manual. The manuals refer to our version as the "mainframe" version; the PC version, around which the manuals are based, contains some additional facilities. For example, there is no usable HELP facility on the Singer version.

back to top

Basic LIMDEP commands

This section aims to give you just enough information to enable you to take data that you have prepared for SPSS (or produced from SPSS or Minitab using a WRITE command) and get them into LIMDEP ready for an ordered logit analysis. It does not cover all the facilities of LIMDEP, which include some useful short cuts not available in SPSS. If you find yourself using LIMDEP a lot, read through the abridged manual to find out what is available.

In this section, LIMDEP commands are printed in bold type; these should be typed in to the computer exactly as given here. The bits in italic type are where you have to substitute in information that is specific to your project.

  1. LIMDEP command syntax. Like SPSS and Minitab, LIMDEP has commands and subcommands (though the LIMDEP manual does not use the word "subcommands" for the latter). All commands and subcommands can be typed in upper or lower case or any mixture. In these notes, commands will be put in UPPER CASE and subcommands in lower case. Subcommands are separated by semi-colons (same as Minitab; SPSS uses /). The entire command is terminated by $ (Singer SPSS uses .). LIMDEP variable names consist of 8 letters or numbers, starting with a letter (similar to SPSS).
  2. Preparing data files for LIMDEP. As for SPSS or Minitab, we want the data to appear in orderly columns, with all the data for the first person followed by all the data for the second person, etc. Text data files prepared for use with SPSS or Minitab, or produced from either with its WRITE command, should be suitable provided they are sensibly formatted. There is one catch, however: the dependent variable for an ordered logit analysis MUST be coded with values 0, 1, 2..., NOT 1, 2, 3..., -1, 0, 1, or any of the other logically equivalent possibilities. If an alternative code has been used, the values must be converted either before transferring the data to LIMDEP, or after they have been read in but before an ordered logit analysis is attempted (you would use the COMPUTE command for this, see below). Missing values should preferably be coded with a digit or pattern of digits that will never occur in real data (as in SPSS), though an asterisk (as in Minitab) can be used (but see below)
  3. Starting a session. At the Singer prompt, type limdep. This will produce an introductory screen. Type start. This will give you the LIMDEP prompt; confusingly, you are taken to the line below to type in your response.
  4. Reading in a text file. All data must be numerical. The command is
    READ;nvar=number of variables;file=name of file;names=full list of names$
    This version of the command assumes that the data are typed case by case, as for the SPSS DATA LIST command, or Minitab READ. The subcommands nvar, file, and names must all be present. Note that the names in the list must be separated by commas, but there must not be a comma after the last name.
  5. Assigning names to columns of data. This is done directly by the READ command, or by other commands such as CREATE, see below (similar to SPSS). There is no way of assigning names to levels of variables (i.e. no equivalent of SPSS VALUE LABELS)
  6. Transformations and computations based on columns. The usual command is
    CREATE;if(logical expression) name=expression$
    This sets the value of a variable called name equal to expression for all cases in which logical expression is true; if name already exists, its values are overwritten. For cases where logical expression is false, name is set to 0 if it is a new variable, or left unchanged if it already exists. if(logical expression) can be omitted. If if is present, the subcommand else name=expression may be used; the two names need not be the same. All the usual logical and arithmetic operators (>, <, =, &, +, -, etc) are available, as are standard mathematical functions. The procedure is very similar to SPSS IF or COMPUTE, and in simple cases it is similar to Minitab LET.
  7. Missing values. Numerical codes (from SPSS) or alphabetical codes (in particular the Minitab * code) can both be used in LIMDEP data files. However, alphabetical codes (including *) are changed to -999 on input.
    We have to tell LIMDEP explicitly to ignore cases including the specified missing values. This can be done with the commands
    SAMPLE;all$
    REJECT;
    logical expression$
    An example of a logical expression would be age=99+incomegp=9, where 99 is the missing value code for age and 9 is the missing value code for incomegp. Note the use of + to mean 'or', and the absence of parentheses between expressions linked by +. The SAMPLE;all command restores the full data set; successive REJECTs without intervening SAMPLE commands would have the same effect as linking in further logical expressions by +. Note that we have to set up the correct REJECTs before doing our analysis; the analysis commands do not themselves detect missing values.
  8. Creating and using dummy variables. This has to be done by brute force, as in the following example, based on a 5-level categorical variable, worktype:
    CREATE;if (worktype=1) fulltime=1$
    CREATE;if (worktype=2) parttime=1$
    CREATE;if (worktype=3) housespo=1$
    CREATE;if (worktype=4) unemploy=1$
    CREATE;if (worktype=5) retired=1$
    Note that we don't have to worry about missing values in the dummy variables, because we will deal with them by setting up a REJECT based on worktype.
  9. Finishing a session. To leave LIMDEP, type STOP$ (same as Minitab; SPSS uses FINISH)
  10. Setting up a file of commands to re-use. Strings of commands, using exactly the same syntax as outlined here, can be prepared as text files using any Singer text editor. Then if you enter LIMDEP as usual, and enter the command
    OPEN;input=filename$
    your commands will be read in and executed. If you don't want to see intermediate results, precede OPEN by the command FAST$. When all the commands in the file have been executed, the LIMDEP prompt will appear, and you can continue working interactively.
  11. Sending output to a file for editing or printing. This is done by the command
    OPEN;output=filename$
    Like Minitab OUTFILE, this copies subsequent screen output (or most of it) to filename. To stop copying, use the command CLOSE$ (compare Minitab NOOUTFILE).

back to top

Using LIMDEP for ordered logit

The command for ordered logit in LIMDEP is the following:
ORDERED PROBIT;lhs=name of DV;rhs=one,names of IVs;logit;output=5$
The command name can be abbreviated to ORDER. As for the READ command, the names in the list of IVs must be separated by commas, and there must not be a comma after the last one. The subcommand output=5 reduces the number of lines produced in the output file.
According to the manuals, a subcommand alg=S can be used to produce a stepwise procedure, but the manual advises against it, presumably because it is vulnerable to local maxima so that a more hand-guided approach to finding the best model is likely to do better. In any case I have not been able to get this subcommand to work.

Output from the program is pretty well self-explanatory, except for three problems. The first problem is that the output is headed "Ordered Probit Model", which is confusing since ordered probit and ordered logit are two different kinds of analysis. The heading arises because LIMDEP uses the same command for both (the subcommand logit tells it which version we want). Second, before LIMDEP embarks on the ordered logit analysis, it does an approximate linear regression, and reports the results. It's easy to read these by mistake instead of the results we actually want. It is probably sensible to delete them from the output file before printing it out, to save paper and to avoid confusion. Finally, if you forget to recode the dependent variable so it starts from level 0, you will get the message "Insufficient variation in dependent variable", which is unlikely to help you realise what has gone wrong.

The same command syntax is used for all LIMDEP's model-fitting commands. For example, a linear regression would be carried out by
REGRESS;lhs=name of DV;rhs=one,names of IVs$

back to top

Interpreting the output from ordered logit analysis

The same five questions can be addressed as with other regression-type analyses that we have considered previously:

  1. How well does the model account for the data? Ordered logit does not produce an R2adj statistic. It does produce a chi-squared value for the model, and this could probably be converted to the LRFC1 statistic recommended by Darlington (1990, chapter 18) for logistic regression, but I have not found a statistical authority for doing this. However the most useful process is probably to examine the classification table produced at the end of the analysis, comparing actual group membership with membership predicted on the basis of the model. As well as giving us a measure of goodness of fit (what proportion of cases were correctly predicted?) this may alert us to problems with the analysis - for example if the model does not predict any cases in one or more of the categories.
  2. Is the overall relationship between the IVs and the DV significant? This is addressed by the log-likelihood ratios for the model. The LIMDEP output will include the log likelihood ratio for the null model, in which the coefficients for all regressors are taken as zero, and also for the fitted model. The difference between these two LLRs, multiplied by two, is distributed like chi-squared with degrees of freedom equal to the number of IVs, and so can be used to test the overall significance of the model. In the same way, we can test the significance of adding a group of regressors to a model.
  3. What is the effect of an individual IV on the DV in the presence of all the other IVs? LIMDEP gives regression coefficients which can be interpreted in the usual way, though note that as in logistic regression they give the effects of a unit increase of the IV on the log odds of the DV taking a higher value, not on the DV itself.
  4. Is the effect of an individual IV significant in the presence of all the other IVs? This is tested by a t value associated with each IV, exactly as for linear regression. Note that unlike SPSS's LOGISTIC REGRESSION command, it is a t rather than a chi-squared test statistics that is produced. The mathematics of this mean that marginally significant values should be regarded with caution where sample sizes are small.
  5. What are the relative importances of IVs in predicting the DV value? As in logistic regression, we cannot calculate beta-weights from the regression coefficients, but we can get a measure of the relative importance of different IVs by multiplying each by the standard deviation of the corresponding independent variable. (LIMDEP does not, unfortunately, do this for you, though it does give the standard deviation of each IV in the regression table). Note, though, that these quotients do not have the interpretation they have in linear regression, of being the regression coefficients you would get if you reran the regression after standardising the variables: because of the categorical nature of the DV, it would not be meaningful to standardise it.

back to top

Further reading & acknowledgement

If you can cope with the maths, much information is to be found in the books by Agresti (1984, 1990). For further details about LIMDEP, see Greene (1992). For examples of the use, interpretation, and presentation of the results of ordered logit analysis, see Lea, Webley and Levine (1993) and Lea, Webley and Walker (1995).

My knowledge of both ordered logit and LIMDEP is largely owed to help from Dr Nichola Crichton, formerly of the MSOR department. I have a set of notes she wrote for me which are more help than any of the books, and I will lend them to anyone who has to get into this analysis.

back to top


References

back to top


Examples

The file /singer1/eps/psybin/stats/hefce.txt contains (hypothetical) data from a study in which 100 recent graduates rated the teaching in their departments as "excellent", "satisfactory" or "unsatisfactory". It could be read into SPSS with the following command file, which is available as /singer1/eps/psybin/stats/hefce.in on Singer; it is also available on the fileserver PSYCHO.

   DATA LIST file='/singer1/eps/psybin/stats/hefce.txt' /
     gender 1 dept 3 origin 5 examclas 7 rating 9.
   VALUE LABELS gender 1 'male' 2 'female' /
     dept 1 'chemistry' 2 'law' 3 'history' /
     origin 1 'UK' 2 'EU' 3 'overseas' /
     examclas 1 'first' 2 'two1' 3 'two2' 4 'third' /
     rating 1 'unsatisfactory' 2 'good' 3 'excellent'.
   MISSING VALUES gender dept origin examclas rating (9).
   DO REPEAT x=UKorigin,EUorigin,OSorigin / i=1 to 3.
   COMPUTE x=0.
   IF origin=i x=1.
   IF missing(origin) x=9.
   END REPEAT.
   DO REPEAT x=deptchem,deptlaw,depthist / i=1 to 3.
   COMPUTE x=0.
   IF dept=i x=1.
   IF missing(dept) x=9.
   END REPEAT.
   MISSING VALUES UKorigin,EUorigin,OSorigin,deptlaw,deptchem,depthist 
     (9).
  1. Write out the corresponding command file for LIMDEP.
  2. Prepare the LIMDEP version of the command file, using the Singer editor. Use each version to read the data into the package concerned.
  3. Use linear regression to predict ratings from the other variables in both packages, print your output, and compare their results.
  4. Use LIMDEP to analyse the same data using ordered logit (Hint: don't forget to think about the coding of the dependent variable). Print your output, and compare the results with those you obtained using linear regression.

back to top


Stephen Lea

University of Exeter
Department of Psychology
Washington Singer Laboratories
Exeter EX4 4QG
United Kingdom
Tel +44 1392 264626
Fax +44 1392 264623


Send questions and comments to the departmental administrator or to the author of this page
Goto Home page for this course | previous topic | FAQ file
Goto home page for: University of Exeter | Department of Psychology | Staff | Students | Research | Teaching | Miscellaneous


Disclaimer Home (access count since 1st March 1997).
Document revised 11th March 1997