In some questions, the marks available add up to more than the maximum allowed. This is to allow different ways of getting full credit, but you are not allowed to get more than the maximum mark for the question.
Reminder: all the data were entirely fictitious and should not be taken as giving information about real psychological phenomena.
Question  Maximum marks 
A1 
7 
A2 
8 
A3 
15 
A4 
20 
Total for Section A 
50 
B1 
6 
B2 
5 
B3 
8 
B4 
5 
B5 
15 
B6 
11 
Total for Section B 
50 
Grand total 
100 
Question A1 

To make the data available for analysis  READ 'singer1/eps/psybin/stats/tests/students' C1C5  1 mark  use READ because this is a text file 
To name the variables  NAME C1 'version' C2 'f1m2' C3 'lasttrm%'
NAME C4 'faculty' C5 'mark/50' 
1 mark  The naming could all be done on a single line 
To answer question (a)  TWOT C5 C1  1 mark  You must use TWOT, and get the right variables in the right order. You can then use the TWOT output to examine the difference in the two means, the value of the t statistic reported, and its significance level. If t is not significant, there is no real case for proceeding further. 
Assuming that in part (a) there is a difference between the means and it is significant, to answer question (b) use  INDICATOR C4 C11C13  1 mark  to form dummy variables from the faculty codes 
NAME C11 'arts' C12 'socstuds' C13 'science  1 mark  to name the dummy variables  
TALLY C4  1 mark  to examine which faculty code is modal, i.e. occurs most often  
supposing faculty code 2 is found to be modal  REGRESS C5 5 C1C3 C11 C13  1 mark  note that neither C4, the categorical variable itself, nor C12, the dummy variable for the modal category, is included in the regression model. 
To see whether version has an effect when all the other variables are taken into account  we look at the regression coefficient for C1, and ask (i) is it comparable in size to the difference of means observed in (a), and (ii) is the corresponding t value significant?  2 marks  
An alternative approach  use BREG C5 C1C3 C11 C13 and consider whether C1 is included in the best regression model.  2 marks  
Maximum available for question A1  7 marks 
Question A2 

(a) The most complete prediction of photocopier use that is reasonably efficient is obtained from the maximum R^{2}_{adj }model, which includes the variables teaching contact hours per week, psychoticism, and neuroticism.  1 mark 
We do not have the data available in this printout to find the model which is most efficient while being also reasonably complete, which would require F values for each model in the table.  1 mark 
(b) The appropriate Minitab command would be REGRESS C1 3 C3 C6 C8  1 mark 
(c) There is a high negative correlation (0.72) between the seniority measure and the number of copies made. Since low seniority scores mean high seniority, more senior staff tend to make more copies.  1 mark 
However, the seniority variable is not included in the best regression model. Further examination of the correlation matrix suggests that this is because of a strong correlation between seniority and teaching contact hours (which is in the model), with more senior staff doing more teaching.  1 mark 
The fact that it is teaching hours rather than seniority that is included in the best regression model suggests that the apparent relation between seniority and usage is due only to the extra teaching done by senior staff (though because there is such a high correlation between seniority and teaching load, there is an identification problem here and the conclusion can only be reached tentatively).  1 mark 
(d) The only variables retained in the best regression model for photocopier usage are teaching hours and personality variables. It is unlikely that the department can do much about the personality of its staff. Therefore it needs to look at the relation between teaching hours and copies made, and consider whether the pattern of teaching could be changed so it was not so dependent on the production of photocopies. Perhaps it would be more efficient to rely more on textbooks and less on handouts, though the costs of buying extra copies of texts for the library would have to be taken into account.  2 marks 
Maximum for question A2  8 marks 
Question A3. 

(a) The means are as follows:

2 marks for getting all
lose 1 mark for spurious precision (more than 1 decimal place reported) 
The index of closer deaths, and respondent gender, are not quantitative data, and means should not be reported  lose 1 mark for giving their means 
(b) The regression equation is a fair fit,  1 mark 
since the R^{2}_{adj}^{ }value is 66.6%.  1 mark 
The regression equation accounts for a significant proportion of the variance in fear of death scores (F_{4,40}^{ }= 22.98, p < 0.0005).  2 marks 
The regression equation is
fear of death score = 3.9 + 0.53 * EysenckN 0.35 * GPsdead + 3.5 * otherdeaths + 0.32 * m1f2 
2 marks 
With all other variables taken into account, the associations of fear of death with the Eysenck N score and with the occurrence of deaths among close relatives are significant: the t_{40} values are 8.51 (p < 0.0005) and 2.82 (p < 0.01) respectively.  2 marks 
Fear of death scores increase by about half a scale point of every one scale point increase in the Eysenck N score, and are about 3.5 units higher for respondents who have suffered a close bereavement than for other respondents.  2 marks 
The effect of the number of grandparents who have died approaches significance (t_{40} = 1.73, p<0.10): the more grandparents have died within the respondent's lifetime, the lower the fear of death score. It might be worth pursuing this question with a larger sample.  1 mark 
Examination of the Unusual Observations report suggests that there may be several outliers. Plotting fear of death scores against the Eysenck N scores suggests that the only one of these likely to be serious is observation 41. Rerunning the regression with this observation deleted slightly reduces the significance of the trends reported above, but does not change them qualitatively, so they can be accepted as reasonable.  2 marks 
Good reporting style  2 marks 
Maximum for question A3  15 marks 
Question A4. 

(a) The median value of 'hoard' is 173.5, which tells us that there is no unique median hamster on this variable. We can choose either hamster 5 (hoard = 167) or hamster 15 (hoard = 180) as a median animal. Since the mean of 'hoard' is higher than the median, it might be better to take the higher value, and use hamster 15.  1 mark for choosing either 5 or 15 1 extra for a good rationale for preferring one of them 
For hamster 15, weight during the experiment was 105% of its preexperimental value; the hamster came from supplier 2 and was female. She established her nest 1.31 metres from the food source.  1 mark 
(b) Supplier is an unordered categorical variable. Therefore, before we can carry out regression using this variable, we need to produce dummy variables corresponding to the three suppliers.  1 mark for recognizing this 1 mark for doing it correctly 
We will also have to decide which supplier to drop from the analysis. Since none of them is in any sense a control or normal group, we use TALLY or TABLE to find the mode (supplier 3) and drop that one.  1 mark 
To investigate how the other variables affect hoard size, we use Best or Stepwise regression.  1 mark 
Using BREG, we find that the regression model with the highest R^{2}_{adj} value (68.7%) is the threevariable model including sex, distance to nest site, and supplier 1. However, the 2variable model using only sex and distance to nest site has a better F value (22.45 as against 17.84), so if we want the most economical model we would prefer that (the best onemodel, using distance to nest site, does not have such a good F (21.62). The wording of the question suggests using the 3variable model, for a better description  2 marks (for identifying either the best R^{2}_{adj} or the best F model) 
The best fitting regression equation is:
hoard size = 60 94 * sex + 269 * distance to nest + 61 * supplier 1. 
1 mark 
It is a fair fit, with R^{2}_{adj} equal to 68.7%  1 mark 
and accounts for a significant proportion of the variation in hoard size (F_{3,20} = 17.84, p<0.0005)  1 mark 
though it must be borne in mind that its significance will be inflated since it has been selected as the best model  1 mark 
With all other variables held constant, the effects of sex and distance to the nest site are significant (t_{20 }values of 3.29 and 6.49, p < 0.01 for sex and p < 0.0005 for distance to nest).  1 mark 
though these signficance levels will also be inflated  1 mark 
Females hoard about 93 more pellets per day than males, and mean hoard size rises by about 27 pellets per day for each 10cm by which the nest is distant from the food source.  1 mark 
Hamsters from supplier 1 hoard about 61 pellets per day more than those from the other two suppliers, and this difference approaches significance (t_{20} = 1.85, p < 0.10).  1 mark 
investigating the supplier variable as a whole at any stage  2 marks 
good reporting style  2 marks 
(c) Plotting the relation between hoarding and the distance from food to nest shows a noticeable outlier (observation 9), which is also picked out by the Unusual Observations report on the 3variable model presented above. It would be worth repeating the entire analysis with the observation dropped.  1 mark for spotting the outlier 1 mark for carrying out a further analysis 
(d) It would be worth repeating the study with a larger group from supplier 1, and checking that the same gender and distance relationships held regardless of supplier  1 mark 
The relationship between nest site distance is much more regular for males than it is for females (you would need to use a selective COPY command to find this out), so future studies should include enough of each gender for their data to be studied separately.  2 marks 
Maximum for question A4  20 marks 
Total available for Section A 50 marks
B1  A factor is a hypothetical construct, or 'latent variable' which is derived from other, directly observable variables, and helps to explain the correlations between a range of different responses or behaviours. A basic assumption of Factor Analysis is that the observed correlations between observed variables result from their sharing a smaller set of underlying variables  up to 6 marks for a full answer 
B2  A 'scree' test is a method of deciding how many factors are needed to capture the important dimensions in the data. It is done by looking at the plot of eigenvalues against their associated factors and looking for a sharp change, or 'elbow' in the plot that occurs when a steep drop gives way to a shallower slope, resembling the rubble that piles up at the bottom of a scree slope.  up to 5 marks for a full answer 
B3  There are 598 men and 721 women  4 marks 
and 344 people in the 'salariat' class.  2 marks  
GVTTRUST: mean: 3.157; minimum: 1.000; maximum: 5.000
GVTBENEF: mean:3.308; minimum: 1.000; maximum: 5.000 
2 marks  
B4  Yes, there is a significant difference between men and women on GETNEED, with a Pearson chisquare probability level of .00001.  5 marks 
B5  Variables loading at .3 and above on Factor 1 are GVTTRUST, GVTBENEF, TRIAL, EQOPP, GETNEED, and VOTERCH. On factor 2, the variables with loadings above .3 are: REWEFFRT, EQOPP, GETNEED, REWSKILL, and VOTERCH.  12 marks 
You have to use your own judgement in interpreting factor 1, but if you look just at those which load highly on factor 1 and not on factor 2, it could mean something like 'Britain has a fair political and legal system'. Note that some of the variables are somewhat complex, with loadings above .3 on both factors. We would have to be cautious about using some of these as a scale.  3 marks  
B6  Cronbach's alpha value = .69;  6 marks 
Yes, it would be a reasonably reliable scale for a sample of this size  2 marks  
In principle, it could be improved (though this would leave only a 2item 'scale') by dropping TRIAL, and the value of Cronbach's alpha would then go up to .80 (rounded).  3 marks  
Total for Section B  50 marks 
Goto Home page for this
course  dummy
test paper
Goto home page for: University of
Exeter  Department of
Psychology  Staff
 Students 
Research  Teaching
 Miscellaneous