In some questions, the marks available add up to more than the maximum allowed. This is to allow different ways of getting full credit, but you are not allowed to get more than the maximum mark for the question.
Reminder: all the data were entirely fictitious and should not be taken as giving information about real psychological phenomena.
Question | Maximum marks |
A1 |
7 |
A2 |
8 |
A3 |
15 |
A4 |
20 |
Total for Section A |
50 |
B1 |
6 |
B2 |
5 |
B3 |
8 |
B4 |
5 |
B5 |
15 |
B6 |
11 |
Total for Section B |
50 |
Grand total |
100 |
Question A1 |
|||
To make the data available for analysis | READ 'singer1/eps/psybin/stats/tests/students' C1-C5 | 1 mark | use READ because this is a text file |
To name the variables | NAME C1 'version' C2 'f1m2' C3 'lasttrm%'
NAME C4 'faculty' C5 'mark/50' |
1 mark | The naming could all be done on a single line |
To answer question (a) | TWOT C5 C1 | 1 mark | You must use TWOT, and get the right variables in the right order. You can then use the TWOT output to examine the difference in the two means, the value of the t statistic reported, and its significance level. If t is not significant, there is no real case for proceeding further. |
Assuming that in part (a) there is a difference between the means and it is significant, to answer question (b) use | INDICATOR C4 C11-C13 | 1 mark | to form dummy variables from the faculty codes |
NAME C11 'arts' C12 'socstuds' C13 'science | 1 mark | to name the dummy variables | |
TALLY C4 | 1 mark | to examine which faculty code is modal, i.e. occurs most often | |
supposing faculty code 2 is found to be modal | REGRESS C5 5 C1-C3 C11 C13 | 1 mark | note that neither C4, the categorical variable itself, nor C12, the dummy variable for the modal category, is included in the regression model. |
To see whether version has an effect when all the other variables are taken into account | we look at the regression coefficient for C1, and ask (i) is it comparable in size to the difference of means observed in (a), and (ii) is the corresponding t value significant? | 2 marks | |
An alternative approach | use BREG C5 C1-C3 C11 C13 and consider whether C1 is included in the best regression model. | 2 marks | |
Maximum available for question A1 | 7 marks |
Question A2 |
|
(a) The most complete prediction of photocopier use that is reasonably efficient is obtained from the maximum R2adj model, which includes the variables teaching contact hours per week, psychoticism, and neuroticism. | 1 mark |
We do not have the data available in this printout to find the model which is most efficient while being also reasonably complete, which would require F values for each model in the table. | 1 mark |
(b) The appropriate Minitab command would be REGRESS C1 3 C3 C6 C8 | 1 mark |
(c) There is a high negative correlation (-0.72) between the seniority measure and the number of copies made. Since low seniority scores mean high seniority, more senior staff tend to make more copies. | 1 mark |
However, the seniority variable is not included in the best regression model. Further examination of the correlation matrix suggests that this is because of a strong correlation between seniority and teaching contact hours (which is in the model), with more senior staff doing more teaching. | 1 mark |
The fact that it is teaching hours rather than seniority that is included in the best regression model suggests that the apparent relation between seniority and usage is due only to the extra teaching done by senior staff (though because there is such a high correlation between seniority and teaching load, there is an identification problem here and the conclusion can only be reached tentatively). | 1 mark |
(d) The only variables retained in the best regression model for photocopier usage are teaching hours and personality variables. It is unlikely that the department can do much about the personality of its staff. Therefore it needs to look at the relation between teaching hours and copies made, and consider whether the pattern of teaching could be changed so it was not so dependent on the production of photocopies. Perhaps it would be more efficient to rely more on textbooks and less on handouts, though the costs of buying extra copies of texts for the library would have to be taken into account. | 2 marks |
Maximum for question A2 | 8 marks |
Question A3. |
|
(a) The means are as follows:
|
2 marks for getting all
lose 1 mark for spurious precision (more than 1 decimal place reported) |
The index of closer deaths, and respondent gender, are not quantitative data, and means should not be reported | lose 1 mark for giving their means |
(b) The regression equation is a fair fit, | 1 mark |
since the R2adj value is 66.6%. | 1 mark |
The regression equation accounts for a significant proportion of the variance in fear of death scores (F4,40 = 22.98, p < 0.0005). | 2 marks |
The regression equation is
fear of death score = 3.9 + 0.53 * EysenckN 0.35 * GPsdead + 3.5 * otherdeaths + 0.32 * m1f2 |
2 marks |
With all other variables taken into account, the associations of fear of death with the Eysenck N score and with the occurrence of deaths among close relatives are significant: the t40 values are 8.51 (p < 0.0005) and 2.82 (p < 0.01) respectively. | 2 marks |
Fear of death scores increase by about half a scale point of every one scale point increase in the Eysenck N score, and are about 3.5 units higher for respondents who have suffered a close bereavement than for other respondents. | 2 marks |
The effect of the number of grandparents who have died approaches significance (t40 = 1.73, p<0.10): the more grandparents have died within the respondent's lifetime, the lower the fear of death score. It might be worth pursuing this question with a larger sample. | 1 mark |
Examination of the Unusual Observations report suggests that there may be several outliers. Plotting fear of death scores against the Eysenck N scores suggests that the only one of these likely to be serious is observation 41. Rerunning the regression with this observation deleted slightly reduces the significance of the trends reported above, but does not change them qualitatively, so they can be accepted as reasonable. | 2 marks |
Good reporting style | 2 marks |
Maximum for question A3 | 15 marks |
Question A4. |
|
(a) The median value of 'hoard' is 173.5, which tells us that there is no unique median hamster on this variable. We can choose either hamster 5 (hoard = 167) or hamster 15 (hoard = 180) as a median animal. Since the mean of 'hoard' is higher than the median, it might be better to take the higher value, and use hamster 15. | 1 mark for choosing either 5 or 15 1 extra for a good rationale for preferring one of them |
For hamster 15, weight during the experiment was 105% of its pre-experimental value; the hamster came from supplier 2 and was female. She established her nest 1.31 metres from the food source. | 1 mark |
(b) Supplier is an unordered categorical variable. Therefore, before we can carry out regression using this variable, we need to produce dummy variables corresponding to the three suppliers. | 1 mark for recognizing this 1 mark for doing it correctly |
We will also have to decide which supplier to drop from the analysis. Since none of them is in any sense a control or normal group, we use TALLY or TABLE to find the mode (supplier 3) and drop that one. | 1 mark |
To investigate how the other variables affect hoard size, we use Best or Stepwise regression. | 1 mark |
Using BREG, we find that the regression model with the highest R2adj value (68.7%) is the three-variable model including sex, distance to nest site, and supplier 1. However, the 2-variable model using only sex and distance to nest site has a better F value (22.45 as against 17.84), so if we want the most economical model we would prefer that (the best onemodel, using distance to nest site, does not have such a good F (21.62). The wording of the question suggests using the 3-variable model, for a better description | 2 marks (for identifying either the best R2adj or the best F model) |
The best fitting regression equation is:
hoard size = 60 94 * sex + 269 * distance to nest + 61 * supplier 1. |
1 mark |
It is a fair fit, with R2adj equal to 68.7% | 1 mark |
and accounts for a significant proportion of the variation in hoard size (F3,20 = 17.84, p<0.0005) | 1 mark |
though it must be borne in mind that its significance will be inflated since it has been selected as the best model | 1 mark |
With all other variables held constant, the effects of sex and distance to the nest site are significant (t20 values of 3.29 and 6.49, p < 0.01 for sex and p < 0.0005 for distance to nest). | 1 mark |
though these signficance levels will also be inflated | 1 mark |
Females hoard about 93 more pellets per day than males, and mean hoard size rises by about 27 pellets per day for each 10cm by which the nest is distant from the food source. | 1 mark |
Hamsters from supplier 1 hoard about 61 pellets per day more than those from the other two suppliers, and this difference approaches significance (t20 = 1.85, p < 0.10). | 1 mark |
investigating the supplier variable as a whole at any stage | 2 marks |
good reporting style | 2 marks |
(c) Plotting the relation between hoarding and the distance from food to nest shows a noticeable outlier (observation 9), which is also picked out by the Unusual Observations report on the 3-variable model presented above. It would be worth repeating the entire analysis with the observation dropped. | 1 mark for spotting the outlier 1 mark for carrying out a further analysis |
(d) It would be worth repeating the study with a larger group from supplier 1, and checking that the same gender and distance relationships held regardless of supplier | 1 mark |
The relationship between nest site distance is much more regular for males than it is for females (you would need to use a selective COPY command to find this out), so future studies should include enough of each gender for their data to be studied separately. | 2 marks |
Maximum for question A4 | 20 marks |
Total available for Section A 50 marks
B1 | A factor is a hypothetical construct, or 'latent variable' which is derived from other, directly observable variables, and helps to explain the correlations between a range of different responses or behaviours. A basic assumption of Factor Analysis is that the observed correlations between observed variables result from their sharing a smaller set of underlying variables | up to 6 marks for a full answer |
B2 | A 'scree' test is a method of deciding how many factors are needed to capture the important dimensions in the data. It is done by looking at the plot of eigenvalues against their associated factors and looking for a sharp change, or 'elbow' in the plot that occurs when a steep drop gives way to a shallower slope, resembling the rubble that piles up at the bottom of a scree slope. | up to 5 marks for a full answer |
B3 | There are 598 men and 721 women | 4 marks |
and 344 people in the 'salariat' class. | 2 marks | |
GVTTRUST: mean: 3.157; minimum: 1.000; maximum: 5.000
GVTBENEF: mean:3.308; minimum: 1.000; maximum: 5.000 |
2 marks | |
B4 | Yes, there is a significant difference between men and women on GETNEED, with a Pearson chi-square probability level of .00001. | 5 marks |
B5 | Variables loading at .3 and above on Factor 1 are GVTTRUST, GVTBENEF, TRIAL, EQOPP, GETNEED, and VOTERCH. On factor 2, the variables with loadings above .3 are: REWEFFRT, EQOPP, GETNEED, REWSKILL, and VOTERCH. | 12 marks |
You have to use your own judgement in interpreting factor 1, but if you look just at those which load highly on factor 1 and not on factor 2, it could mean something like 'Britain has a fair political and legal system'. Note that some of the variables are somewhat complex, with loadings above .3 on both factors. We would have to be cautious about using some of these as a scale. | 3 marks | |
B6 | Cronbach's alpha value = .69; | 6 marks |
Yes, it would be a reasonably reliable scale for a sample of this size | 2 marks | |
In principle, it could be improved (though this would leave only a 2-item 'scale') by dropping TRIAL, and the value of Cronbach's alpha would then go up to .80 (rounded). | 3 marks | |
Total for Section B | 50 marks |
Goto Home page for this
course | dummy
test paper
Goto home page for: University of
Exeter | Department of
Psychology | Staff
| Students |
Research | Teaching
| Miscellaneous