University of Exeter

DEPARTMENT OF PSYCHOLOGY


PSY2005 Statistics and Research Methods: Quantitative data analysis component

Examples on choosing the best regression model


All these examples use data held in the Singer file /singer1/eps/psybin/stats/debt.MTW. This is a Minitab worksheet containing some of the data from a large postal survey on the psychology of debt. The data in the file are, for each of 464 respondents,

All yes/no questions are coded 0=no, 1=yes. These are real data (Lea, Webley & Walker, 1995, Journal of Economic Psychology, 16, 181-701), though the published paper also also deals with many other variables. Locus of control is a personality measure introduced by Rotter, which claims to differentiate people according to how much they feel things that happen to them are as a result of processes within themselves (internal locus of control) or outside events (external locus of control).

  1. Get the data into Minitab. Use INFO to find out what columns are in use; use PRINT on some of these columns to see how Minitab reports missing values; and use DESCRIBE on these columns to see what Minitab does when there are values missing in data on which it is doing calculations.
  2. Store this worksheet into your own filespace. Use the command SYSTEM ls to check that you have stored the worksheet correctly (note that ls is a unix command so must be in lower case). .
  3. Use simple tto find out whether there are significant differences in debt attitudes between (a) smokers and non(b) those with and without bank accounts. Repeat these tests for locus of control.
  4. Use BREG to find what combination of all the other variables in the list above gives the best explanation of variations in attitude to debt.
  5. Use REGRESS to find out which of those variables are significantly associated with attitude, and to discover what the nature of the associations is. You may find that the R2adj value reported by REGRESS is not the same as the one you obtained from BREG; can you see why?
  6. Get a printout of the full results of your best regression model
  7. Use BREG to find out what combination of variables gives the most efficient explanation of variations in attitude to debt
  8. (Optional). Use STEPWISE to answer the previous question in a different way, and see whether you get the same results as you did before. HELP STEPWISE will tell you more about how STEPWISE works.


Sample of BREG output

This sample shows how BREG would be used to look for the best model to fit the teenage gambling data used in the introductory multiple regression examples. It assumes we have already read in the data and named the columns appropriately.

        MTB > BREG C6 C2
        
        Best Subsets Regression of gambling
       
                                                  p v 
                                                  o e 
                                                s c r 
                                                t m b 
                                              m a o i 
                                              0 t n n 
                      Adj.                    f u e t 
        Vars   R-sq   R-sq    C-p         s   1 s y l 
           1   38.7   37.3   11.4    24.948       X   
           1   16.6   14.8   31.0    29.094   X       
           2   50.1   47.9    3.2    22.754   X   X   
           2   40.3   37.6   12.0    24.904     X X   
           3   52.6   49.3    3.0    22.434   X   X X 
           3   50.6   47.1    4.9    22.915   X X X   
           4   52.7   48.2    5.0    22.690   X X X X 
        

Note the following:


Stephen Lea

University of Exeter
Department of Psychology
Washington Singer Laboratories
Exeter EX4 4QG
United Kingdom
Tel +44 1392 264626
Fax +44 1392 264623


Send questions and comments to the departmental administrator or to the author of this page


Goto Home page for this course | handout for this topic | next topic
Goto home page for: University of Exeter | Department of Psychology | Staff | Students | Research | Teaching | Miscellaneous


Disclaimer Home (access count).
Document revised 10th January 1997