# Dummy test on quantitative data analysis: Multiple regression and factor analysis

This dummy test aims to have the same structure and length as the term test (which will next be held on Wednesday 19th March 1997). A separate answer sheet and marking guide is available. The rubric for the test is as follows:

Time allowed: 3 hours. Access to computers will be available for one and a half hours only. The test is in two parts. Each part of the test accounts for 50% of the marks. The pass mark is 40%; it is not necessary to pass both parts separately. Unless otherwise noted, all the data used in these questions are fictitious. They should not be taken as representing real psychological phenomena.

## Part A: Multiple regression

### Questions A1 and A2 are to be answered without using the computer: the files mentioned in these questions do not exist

A1. A lecturer, Dr A., gives an end of term statistics test to a class of students. Because of the large size of the class, he has to give the test in two groups, who receive different versions of the test paper. He is worried to notice that the second group seem to get better marks than the first. He shares his concern with a colleague, Professor B. She points out that the second group of students included more men than the first, more students from the science faculty, and more of those who had done well in the previous term's statistics test. She suggests that these factors account for any difference in the marks. Dr A. is not convinced.
The lecturers use a word processor to prepare a file containing 5 columns of data, as follows:

• The version of the paper taken by the student;
• Gender (women coded 1, men coded 2);
• Mark in last term's test (out of 100);
• Faculty (1 Arts, 2 Social Studies, 3 Science);
• Mark in the current test (out of 50).

Assuming this file has been transferred to the Singer computer with the title /singer1/eps/psybin/stats/tests/students.DAT, how would the lecturers go about answering the following questions, using Minitab?

(a) Is the difference Dr A. initially noticed statistically significant?

(b) Do the data support Dr A.'s or Professor B.'s explanation?

In both cases you should write down the Minitab commands that you would use, explain why you would use them and state what information you would extract from the output.

A2. In a certain university department, the Finance officer is alarmed by the size of the photocopying bill. She obtains a report of the past month's use of the machine by each member of the department, and observes that the figures are highly variable. When she tackles colleagues about the situation, high users explain that they make a lot of copies because of the amount of teaching they do, junior colleagues say that the overspending is due to thoughtless behaviour by senior staff, while older members of the department explain that young academics never read anything nowadays, they simply make photocopies. The Finance officer herself is inclined to believe that the differences result from individual factors such as personality and gender, and decides to investigate the question using regression techniques. She sets up a Minitab worksheet file containing data on usage, seniority (1=Professor, 2=Senior Lecturer, 3=Lecturer, 4=Research staff, 5=Research student), teaching contact hours per week, gender, age, and the Finance Officer's estimate of colleagues' likely scores on the three Eysenck personality dimensions of Psychoticism, Extraversion and Neuroticism. The following is a transcript of part of her Minitab session:

```MTB > correlate c1-c8

copies seniorty hrsteach  f=0,m=1      age        p       e
seniorty -0.716
hrsteach  0.835   -0.768
f=0,m=1   0.477   -0.363    0.461
age       0.528   -0.696    0.585    0.283
p         0.265   -0.226    0.057    0.111    0.165
e         0.098   -0.282    0.080    0.040    0.401    0.217
n        -0.105    0.113   -0.217   -0.089   -0.137   -0.103   -0.008

MTB > breg c1 c2-c8

Best Subsets Regression of copies

s h
e r f
n s =
i t 0
o e ,
r a m a
Vars   R-sq   R-sq    C-p         s   y h 1 e p e n

1   69.7   68.9    5.1    261.13     X
1   51.2   49.9   30.1    331.42   X
2   74.4   73.1    0.7    243.11     X     X
2   71.1   69.5    5.2    258.62   X X
3   75.4   73.4    1.3    241.75     X     X   X
3   75.1   73.1    1.7    243.10     X X   X
4   76.1   73.3    2.4    241.86     X X   X   X
4   75.6   72.8    3.1    244.12   X X     X   X
5   76.3   72.8    4.1    244.25   X X X   X   X
5   76.1   72.6    4.4    245.25     X X   X X X
6   76.4   72.1    6.0    247.36   X X X   X X X
6   76.3   72.0    6.1    247.87   X X X X X   X
7   76.4   71.2    8.0    251.19   X X X X X X X
```

(a) how can photocopier usage best be predicted?
(b) how should the finance officer proceed to examine a model for usage based on the factors you identify in (a)?
(c) comment on the relationship between seniority and usage.
(d) on the basis of these results, what policy recommendations should the Finance officer make to the next department meeting?
In all cases, justify your answers by reference to the sample output.

### Questions A3 and A4 are to be answered by using the computers: the files mentioned in these questions can be found on singer

A3. The text file /singer1/eps/psybin/stats/dummy/death.DAT contains five columns of data for each of 45 21-year-old respondents, from a study on fear of death. The columns contain, respectively:

• The respondent's score on a scale of fear of death, running from 0 (minimum fear) to 20 (maximum);
• The N scale of the Eysenck Personality Questionnaire;
• The number of grandparents of the respondent who have died in the respondent's lifetime;
• An index of whether any closer relatives (parent or sibling) have died in the respondent's lifetime (0=no deaths, 1=at least one death);
• The respondent's gender (1=female, 2=male).

Use Minitab (a) to find the mean of each kind of quantitative data and (b) to fit a regression equation which predicts respondents' fear of death scores from the other four variables. Report and comment on your results.

A4. The file /singer1/eps/psybin/stats/dummy/hamsters.MTW contains a Minitab worksheet giving the results of an experiment on hoarding behaviour in hamsters. Hamsters were kept individually for 2 weeks in a 3-metre square enclosure with a single food source. Equal numbers of male and female hamsters were used (sex is coded 1=female, 2=male). The data collected were the hamster's mean body-weight over the period of the experiment (as a percentage of the mean for the previous 2-week period), the distance from the food source at which the hamster's established its nest, in metres; and the number of precision food pellets hoarded per day. After the experiment it was realised that the hamsters originated from three different suppliers, and these are coded 1 to 3 in the worksheet.
(a) Report the value of each of these variables for the hamster which did the median amount of hoarding.
(b) Investigate how hoarding can be predicted from the other variables.
(c) Comment on the relation between hoarding and the distance from food-supply to nest.
(d) What further investigations would you carry out?

## Part B: SPSS, Factor Analysis, Item Analysis and scale construction

### Questions B1 and B2 to be answered without use of the computer

B1. What is a 'factor' in Factor Analysis?

B2. What is the `scree test' and what is it used for?

### Questions B3-B5 to be answered using computer

Copy the published file 'dummy.sys' into your filespace and set up a command file to carry out the following jobs in SPSS and send the results to an output file. A short description of the variables is given at the end of the questions.

B3. After declaring the value '8' as missing for all variables from OFFCARE to REWSKILL, use the command frequencies with the subcommand /statistics to get the following information:
How many men and women are there in the sample? How many people are in the salariat category of the class variable (GOLD1)? What are the means, minimum and maximum values for GVTTRUST and GVTBENEF?

B4. Do a cross-tabulation of GETNEED by sex_r. Do men and women differ in their views about whether or not people in Britain 'get what they need'? (Write down the Pearson chi-square significance level).

B5. Set up a command file to carry out a factor analysis in SPSS on the variables OFFCARE to REWSKILL. Remember to declare missing values (if you have not already done so). Set the 'criteria' command to run up to 200 iterations and extract 2 factors, using the PAF method of extraction and an orthogonal (varimax) rotation. After rotation, which variables have loadings above .3 on the first factor? Which load above .3 on factor 2? What interpretation could you give to factor 1?

B6. Using the reliability procedure, get a measure of Cronbach's alpha for a scale based on the 3 variables loading on Factor 1 which do not also load significantly on Factor 2 (GVTTRUST, GVTBENEF, TRIAL). Would this be a reliable scale? Although this scale already has rather few items, for the purposes of this exercise - could it be improved by omitting any of the variables?

### Variables found in 'dummy.sys'

SEX_R: sex of respondent (male=1, female=2); GOLD1: social class; OFFCARE: 'officials don't care what people like me think'; VOTERCH: 'voters in this country have a real choice'; TRIAL: 'poor people have as much chance of a fair trial as rich people'; GVTTRUST: 'you can always trust the government to do what is right'; GVTBENEF: ' the government is run for the benefit of all the people'; EQOPP: people in this country have equal opportunities; REWEFFRT: 'people get rewarded for their effort'; GETNEED: 'people in this country get what they need'; REWSKILL: 'people get rewarded for their skills and intelligence'; For all variables from OFFCARE to REWSKILL scores were recorded on a 5-point likert scale from 'very often' to 'never' or from 'strongly agree' to 'strongly disagree'. For the variables OFFCARE to REWSKILL 'don't know' was coded as '8'.

A separate sheet will be issued giving the answers to the questions and a marking guide

Stephen Lea, Carole Burgoyne

University of Exeter
Department of Psychology
Washington Singer Laboratories
Exeter EX4 4QG
United Kingdom
Tel +44 1392 264626
Fax +44 1392 264623