Contents of this handout: Principles; Path analysis in practice; Limitations of Path analysis; Further reading and References; Examples

Path analysis is a straightforward extension of multiple regression.
Its aim is to provide estimates of the magnitude and significance of hypothesised
causal connections between sets of variables. This is best explained by
considering a **path diagram**.

To construct a path diagram we simply write the names of the variables
and draw an arrow from each variable to any other variable we believe that
it affects. We can distinguish between input and output path diagrams.
An **input path diagram** is one that is drawn beforehand to help plan
the analysis and represents the causal connections that are predicted by
our hypothesis. An **output path diagram** represents the results of
a statistical analysis, and shows what was actually found.

So we might have an input path diagram like this:

*Figure 1: Idealised input path diagram*

And an output path diagram like this:

*Figure 2: Idealised output path diagram*

It is helpful to draw the arrows so that their widths are proportional
to the (hypothetical or actual) size of the **path coefficients**. Sometimes
it is helpful to eliminate negative relationships by **reflecting**
variables - e.g. instead of drawing a negative relationship between age
and liberalism drawing a positive relationship between age and conservatism.
Sometimes we do not want to specify the causal direction between two variables:
in this case we use a double-headed arrow. Sometimes, paths whose coefficients
fall below some absolute magnitude or which do not reach some significance
level, are omitted in the output path diagram.

Some researchers will add an additional arrow pointing in to each node
of the path diagram which is being taken as a dependent variable, to signify
the **unexplained variance** - the variation in that variable that is
due to factors not included in the analysis.

Path diagrams can be much more complex than these simple examples: for a virtuoso case, see Wahlund (1992, Fig 1).

Although path analysis has become very popular, we should bear in mind a cautionary note from Everitt and Dunn (1991): "However convincing, respectable and reasonable a path diagram... may appear, any causal inferences extracted are rarely more than a form of statistical fantasy". Basically, correlational data are still correlational. Within a given path diagram, patha analysis can tell us which are the more important (and significant) paths, and this may have implications for the plausibility of pre-specified causal hypotheses. But path analysis cannot tell us which of two distinct path diagrams is to be preferred, nor can it tell us whether the correlation between A and B represents a causal effect of A on B, a causal effect of B on A, mutual dependence on other variables C, D etc, or some mixture of these. No program can take into account variables that are not included in an analysis.

What, then, can a path analysis do? Most obviously, if two or more pre-specified causal hypotheses can be represented within a single input path diagram, the relative sizes of path coefficients in the output path diagram may tell us which of them is better supported by the data. For example, in Figure 4 below, the hypothesis that age affects job satisfaction indirectly, via its effects on income and working autonomy, is preferred over the hypothesis that age has a direct effect on job satisfaction. Slightly more subtly, if two or more pre-specified causal hypotheses are represented in different input path diagrams, and the corresponding output diagrams differ in complexity (so that in one there are many paths with moderate coefficients, while in another there are just a few paths with large, significant coefficients and all other paths have negligible coefficients), we might prefer the hypothesis that yielded the simpler diagram. Note that this latter argument would not really be statistical, though the statistical work is necessary to give us the basis from which to make it.

Bryman and Cramer give a clear example using four variables from a job
survey: age, income, autonomy and job satisfaction. They propose that age
has a **direct effect** on job satisfaction. However **indirect effects**
of age on job satisfaction are also suggested; age affects income which
in turn affects satisfaction, age affects autonomy which in turn affects
satisfaction and age affects autonomy which affects income which affects
satisfaction. Autonomy and income have direct affects on satisfaction.

*Figure 3: Input diagram of causal relationships in the job
survey, after Bryman & Cramer (1990)*

To move from this input diagram to the output diagram, we need to compute
path coefficients. A path coefficient is a **standardized regression coefficient**
(**beta weight**). We compute these by setting up structural equations,
in this case:

satisfaction = *b*_{11}age + *b*_{12}autonomy
+ *b*_{13} income + *e*_{1
}income = *b*_{21}age + *b*_{22}autonomy
+ *e*_{2
}autonomy = *b*_{31}age_{ }+ *e*_{3}

We have used a different notation for the coefficients from Bryman and
Cramer's, to make it clear that *b*_{11} in the first equation
is different from *b*_{21} in the second. The terms *e*_{1},
*e*_{2}, and *e*_{3} are the **error** or
unexplained variance terms. To obtain the path coefficients we simply run
three regression analyses, with satisfaction, income and autonomy being
the dependent variable in turn and using the independent variables specified
in the equations. Because we need beta values, if we are using Minitab
we must first standardise the variables (subtract each column from its
mean and divide by its standard deviation); SPSS will give us beta values
without this preliminary step. In Bryman and Cramer's example, we find
that *b*_{11}=-0.08, *b*_{12}=0.58, *b*_{12}=0.47,
*b*_{21}=0.57, *b*_{22}=0.22, and *b*_{31}=0.28.
In either case, the betas are then taken from the output and then inserted
into the output path diagram. The constant values (*a*_{1},
*a*_{2}, and *a*_{3}) are not used. So the complete
output path diagram looks like this:

*Figure 4: Output diagram of causal relationships in the job
survey, after Bryman & Cramer (1990)*

If the values of *e*_{1}, *e*_{2}, and *e*_{3}
are required, they are calculated as the square root of 1-*R*^{2}
(note *not* 1-*R*^{2}_{adj}) from the regression
equation for the corresponding dependent variable.

Many researchers like to calculate the overall **impact** of one
variable on another - e.g. of age on job satisfaction. This is done by
simply adding the direct effect of age (-0.08) and adding the indirect
effects to it. The indirect effects are calculated by multiplying the coefficients
for each path from age to satisfaction e.g.

age -> income -> satisfaction is 0.57 x 0.47 = 0.26,

age -> autonomy -> satisfaction is 0.28 x 0.58 = 0.16,

age -> autonomy -> income -> satisfaction is 0.28 x 0.22 x 0.47
= 0.03

total indirect effect = 0.45

The result tells us that the total indirect effect of age on satisfaction is positive and quite large whereas the direct effect is small and negative. The total effect is then -0.08 + 0.45 = 0.37.

To restate the obvious, path analysis can evaluate causal hypotheses, and in some (restricted) situations can test between two or more causal hypotheses, but it cannot establish the direction of causality.

As should also already be clear, path analysis is most likely to be useful when we already have a clear hypothesis to test, or a small number of hypotheses all of which can be represented within a single path diagram. It has little use at the exploratory stage of research.

We cannot use path analysis in situations where "feedback" loops are included in our hypotheses: there must be a steady causal progression across (or down) a path diagram.

All the relationships in the path diagram must be capable of being tested by straightforward multiple regression. The intervening variables all have to serve as dependent variables in multiple regression analyses. Therefore each of them must be capable of being treated as being on an interval scale. Nominal measurement, or ordinal measurement with few categories (including dichotomies) will make path analysis impossible. Although there are types of analysis that will handle such dependent variables (as we shall see in the next two sessions), there are no accepted ways of mixing different kinds of analysis to produce the analogue of a path analysis.

- Bryman, A. & Cramer, D. (1990).
*Quantitative data analysis for social scientists*, pp. 246-251.

- Everitt, B. S., & Dunn, G. (1991).
*Applied multivariate data analysis*. London: Edward Arnold. - Wahlund, R. (1992). Tax changes and economic behavior: The case of
tax evasion.
*Journal of Economic Psychology, 13*, 657-677.

These examples use the Singer file **/singer1/eps/psybin/stats/expect.MTW**,
which is a small (n=50) data file in Minitab worksheet format. A portable
version of this is available on the PSYCHO file server as **\scratch\segl\stats\expect.MTP
**(note that I expect to move this to a different directory soon), and
it should be possible to read this into either Macintosh or PC versions
of Minitab

The study examined the factors that influenced inflationary expections. There are measures of age (in years), income (in thousands of pounds), conservatism (factor scores based on items about privatisation, defence spending and influence on trade unions) and consumer optimism (7-point scale). We have standardized these variables and put them in columns 11-15, so that you can use Minitab to obtain the betas.

- Draw an input path diagram to indicate what you consider to be a reasonable causal sequence.
- Carry out the relevant regression analyses to obtain the path coefficients and draw the appropriate output path diagram. What would you conclude about the influence of age on inflationary expectations?
- Calculate the values of the unexplained variances and add them to the path diagram.

Paul Webley, Stephen Lea

University of Exeter Department of Psychology

Washington Singer Laboratories

Exeter EX4 4QG

United Kingdom

Tel +44 1392 264626

Fax +44 1392 264623

Send questions and comments to the departmental administrator or to the author of this page

Goto Home page for this course | previous topic | next topic | FAQ file

Goto home page for: University of Exeter | Department of Psychology | Staff | Students | Research | Teaching | Miscellaneous

(access count since 21st February 1997).