SPM and Random Effects Analyses
(a [limited] summary of the e-mail help-list)
I think it's appropriate to start off with a quote by another Englishman
which for me applies to SPM in general and perhaps random effects
in particular.
...It is a riddle wrapped in a mystery inside
an enigma.''
- Sir Winston Churchill (1874-1965) [radio talk, 1 Oct 1939]
The use of Random Effects in SPM seems to be both a powerful
and a confusing feature. Just when I at least thought I understood
the process in SPM96... it has changed in SPM99, producing no shortage
of discussion on the subject. A quick search through the e-mail
database produces 258 references in 1999 alone.
I have compiled the document below on my readings and various references
from the SPM e-mail list. I am by no means a statistician (several
of my comments on the SPM list I'm sure attest to that!), and I
have compiled this reference as a way to help me in understanding
the technique. NOTE: In many places I have summarized the e-mail,
and hopefully I haven't made any errors in doing so. However, in
each case I've provided the original link for reference. Also, I've
included only a small number of e-mails overall that have referenced
random effects. If I've left out a particularly relevant one, let
me know, but hopefully no one will feel slighted. Finally, I have
not attempted to summarize the random effects e-mails prior to March
1999, so if you're interested in the application of random effects
to SPM96 you'll have to go through that on your own. Please e-mail
me (Darren
Gitelman) if you have identified errors, or have any comments.
The primary references on the topic have been published in abstract
form:
http://www.fil.ion.ucl.ac.uk/spm/RFXabstract.pdf
http://www.fil.ion.ucl.ac.uk/spm/RFXposter.pdf
and more recently there have been several articles describing the
power of the technique in relation to other analysis methods
Friston
KJ, Holmes AP, Worsley KJ.
How many subjects constitute a study?
Neuroimage. 1999 Jul;10(1):1-5.
Andrew Holmes provides the following references in his e-mail from
March
1999
The basic approach of using summary measures is pretty standard.
A good introductory paper (albeit for clinical trials) is:
Frisson L & Pocock SJ (1992) "Repeated measures in clinical
trials: Analysis using mean summary statistics and its implications
for design" Statistics in Medicine 11:1685-1704
The basic concepts of random effects are expounded in all but the
most basic stats books on "Design of Experiments". Other
key phrases to look out for are "variance components",
"mixed effects models", "repeated measures",
"multi-level modeling" & "hierarchical modeling".
I've found the first chapter of Searle, Casella & McCulloch's
1992 book "Variance Components" (Wiley, London) a good
description accessible to non-statisticians.
The paper by Roger Woods (Modeling
for intergroup comparisons of imaging data. Neuroimage. 1996 Dec;4(3
Pt 3):S84-94) also provides a readable introduction to the concepts.
However, all this theory, while important, doesn't tell
the average user what random effects is all about or more importantly
how to use it.
Summarizing a response
to Kath Moores on March 12, Andrew Holmes notes that
SPM only considers a single source of error variance. This is the
scan to scan residual error. With repeated measurements on multiple
subjects scan to scan is within subject, and the
analysis is fixed effects.... With only one "scan" per
subject, scan to scan residual variability is between subject variance.
For balanced designs, by summarising the individual subject data
with an appropriate measure, and then assessing those measures across
subject, a random effects analysis is effected. Using the within-subject
data to compute the summary measure "scan" incorporates
the within-subject error into the summary scans, such that the between
variability between subject of these computed summary "scans"
incorporates both within and between subject variance. It's fairly
easy for a balanced design to show that these are in exactly the
right ratio for a random effects assessment of the overall (population)
effect.
Although this second level (between subjects) model is a fixed effects
model, by using summary "scans", the fixed effect is the
population effect (which *is* fixed), and the residual error variance
is that appropriate for assessing that population effect allowing
for random subject effects (& subject by condition interaction).
(Actually, the SPM96 random effects approach would surmise each
subject by two "scans" - adjusted mean condition images
for each condition. These would then be assessed using a paired
t-test, which is the same as testing the differences of the adjusted
means directly. This is for technical reasons - there are problems
(grey matter) thresholding negative images in SPM96.)
----------------------
More to the point, Andrew provides this outline of how
to do random effects in a response
to Kent Kiehl
At 09:47 18/05/99 +0000, Kent Kiehl wrote:
| I read all the release notes, but I could determine for certain
if the
| new event-related fMRI analyses are now using a random effects
model.
| Is this true or is it still fixed effects, as in SPM97?
Random effects models are effected within SPM using a multi-stage
approach. For a balanced design (i.e. where the individual subjects
models are the same), each subjects data is surmised with an appropriate
single image. This "single scan per subject" data is then
entered into a second model, i.e. a model at the between subject
level. Usually this will be a simple one-sample or two-sample t-test.
The variance of these single images from subject to subject consists
of contributions from both the between and within subject components
of variance, in the correct proportions, and it can easily be shown
that the resulting analysis is mathematically identical to the appropriate
random effects (strictly called mixed effects) analysis of these
data.
.........
SPM99 now writes out contrast images in addition to SPM{t/F} images
by default in the results section. Further, these images are floating
point images, with out of brain voxels set to NaN - not a number.
Thus, these images are implicitly masked. Since SPM statistics ignores
voxels where any scan is NaN, these contrast images can be put back
into the statistics section.
Therefore, in SPM99, the basic procedure for a random effects analysis
is as follows:
* First, fit the model for each subject
- you can model all the subjects together, provided you use a subject
separable model
- strictly speaking, the individual subject models should be
identical (i.e. this is a balanced design)
* Define the effect of interest for each subject with a t-contrast.
- Each contrast will write a con_????.{img,hdr} analyze
image containing the contrast of the parameter estimates
* Armed with one contrast image per subject, proceed to the
second
(between subject) level, feeding the contrast images into one
of the "Basic stats" models as appropriate.
( This is the gist of it for a simple case. There are additional
caveats for multiple conditions, unbalanced designs, and non-orthogonal
contrasts, which we will discuss when necessary. )
Unbalanced designs generated several additional e-mails:
http://www.mailbase.ac.uk/lists/spm/1999-08/0016.html
http://www.mailbase.ac.uk/lists/spm/1999-08/0022.html
Since you can do this with any one dimensional contrast that would
produce an SPM{t}, you can look at regression slopes, interactions
and so on, in addition to simple categorical comparisons.
For er-fMRI, you would probably need to fix the slice timing of
2D multi-slice acquisitions by temporally interpolation. Your model
would have to use a canonical response (rather than a set of basis
functions), such that the effect of interest can be extracted with
a single contrast. You then put these contrast images into the between
subject level analysis...
Note that this only works for one-dimensional contrasts, i.e. a
t-contrast, with contrast weights a single vector. For F-contrasts
(which you would use for a two-sided t-test, or to test any overall
effect for a set of basis functions) you need a multivariate second
level analysis. This should be possible with the Multivariate toolbox...
[which is not yet available as far as I know- drg]
The AdjMean functions of the SPM96 random effects kit are included
in SPM99, but we recommend using the new main SPM stats routines
to define an appropriate model and extract an appropriate contrast
image as summary image.
A final point of interest: Keith Worsley's seminal 1992 paper (probably
this one: Worsley,
K.J., Marrett, S., Neelin, P., and Evans, A.C. (1992). A three-dimensional
statistical analysis for CBF activation studies in human brain.
Journal of Cerebral Blood Flow and Metabolism, 12:900-918)
for addressing the multiple comparisons problem also proposed the
use (for PET), of a two-sample t-statistic computed from average
condition images! (Although they used a variance estimate pooled
over the entire intracerebral volume rather than the voxel level
variance estimate used in SPM.) This is a repeated measures paired
t-test approach, identical to the current SPM approach, although
(possibly) primarily motivated by the substantial data reduction
the use of summary images confers.
----------------------------------
This seems clear enough... however, several dozen e-mails
follow trying to understand these points further. I should also
note that several e-mails go into detail about the comparison of
random effects and conjunction analyses, but I shall only provide
a reference to a recent article here:
Friston
KJ, Holmes AP, Price CJ, Buchel C, Worsley KJ. Multisubject fMRI
studies and conjunction analyses. Neuroimage. 1999 Oct;10(4):385-96.
----------------------------------
In general, a fairly large number of subjects are required to perform
robust random effects analyses. In several emails the general advice
given is that at least 10 subjects are required for a random effects
analysis, unless the activations are unusually robust. Andrew Holmes
notes
that:
...For sample sizes up to 12 adding additional subjects gives considerable
gain in terms of detectable normalised change. After 20 subjects
there is a diminishing return in terms of power. Our anecdotal evidence
so far (mainly from functional fMRI & PET data) suggests that
reasonable population effects require about 12 subjects, whilst
population differences need at least nine per group. (All the above
being random effects analyses.) In these situations it is usually
inter-subject variability that dominates, such that the various
imaging modalities are comparable in terms of subject numbers required.
----------------------------------
It is important to note that the contrast images generated by SPM99
are already thresholded/masked (not in the statistical sense), but
either based on the absolute or proportional actual voxel value
(PET) OR based on voxels that are present in all the images (fMRI).
Therefore these images SHOULD NOT be thresholded further in the
second level analysis. This is noted by Andrew Holmes here:
At 16:09 15/06/99 -0700, Russ Poldrack wrote:
| Hi, I have another issue to raise about fMRI analysis in spm99b.
My
| understanding of the random effects approach in spm99 is that
one first
| creates contrast images (conXXX.img) and then these are entered
into a
| second-level PET analysis. My question related to how these contrast
| images are masked according to the specified height and extent
thresholds
| when the statistics are performed.
You might prefer to use the new "Basic Stats" models,
rather than the PET
models. The options of the "Basic Stats" models are geared
towards multi-stage random effects analyses.
NOTE: One should not be using the activation images generated
in the results window as the basis for a second level analysis.
-drg
The contrast and SPM{t} images (or ess & SPM{F} images for
an F-contrast) are written by the results section once a new contrast
is defined, but before any height or extent thresholding is applied.
(Only the write-filtered button in the results interface writes
out height and extent filtered SPM{t} & SPM{F}'s). The voxels
containing NaN are precisely the voxels that were outside the volume
of analysis when the statistics were estimated. Usually this is
determined by the masking options chosen in the stats design setup
interfaces. For fMRI this is hardcoded: only voxels where all images
have voxel values greater than the global mean are considered (which
is usually al intracerebral voxels). For PET you are given a choice
of proportional or absolute threshold masking. For basic designs
you are given the further options of not doing any "analysis
threshold" (previously called "grey matter threshold"),
and of using explicit mask images.
For a given statistical analysis, all the output images (parameter
images -
beta_????.{img,hdr}, variance image - ResMS.{img,hdr},
contrast images - (con_????.{img,hdr}), ESS images - ess_????.{img,hdr},
and SPM images - spm{T,F}_????.{img,hdr})
being derived from the same analysis, will have exactly the same
voxels tagged NaN, and these will match the mask image (mask.{img,hdr}).
SPM will regard any voxel with value NaN in any input image as outside
the volume to be analysed. This is why you can simply put contrast
images back into SPM for a second level analysis without worrying
about masking out non-brain areas.
-----------------------------------
The issue of which images to use comes up several times
and is addressed more explicitly in this note:
The statistic (SPM) images (SPMt_????.img & SPMF_????.img) should
*not* be entered into a second level analysis if you want to effect
a random effects analysis. This would basically be assessing the
significance (across subjects) of the individual subjects significance!
(Rather than the significance (across subjects) of the response.
It's possible that the confusion has arisen because while you enter
contrast weights to get a SPM{t}, contrast itself refers only to
the weighted sum of the parameters, whose estimates given in the
contrast images only form the numerator of the SPM{t}. (The SPM{t}
is formed by dividing the contrast image by a suitable estimate
of the standard error.)
The contrast surmises the effect, the SPM{t} the evidence for the
effect.
To summarize: Finally, not that we have now discussed
three different scenarios:
1) Two level SPM RFX analysis using contrast images
- which effect an random effects analysis for appropriate
balanced designs, results generalise to the "population"
- "significance of response across subjects"
2) Two level SPM analysis using SPM{t} as second level input
- "significance of significances": a rather bizarre notion
3) Averaging of individual Z-score images across subjects
- combining the evidence for individual subject effects
across subjects
- a simple meta-analysis which doesn't generalise to the
population
- the topic of the thread Stefan referred to:
http://www.mailbase.ac.uk/lists/spm/1999-01/0122.html
-----------------------------------
Populations can also be compared using a random effects
analysis when implemented as described in this note
from Andrew:
At 15:41 07/06/99 -0400, Ziad Nahas wrote:
| 1) how to compare differences between 2 conditions in 2 different
groups.
| ie. 2 groups of subject received SPECT scans under 2 different
| conditions. We would like to be able to compare, for example,
the
| differences in activation between the 2 groups.
|
| group 1 : differences between condition A and B
| VERSUS
| group 2 : differences between condition A and B
This can be implemented with the "Multi-group: conditions and
covariates" design in SPM99 (in PET/SPECT mode). With multiple
scans per subject this is a fixed effects analysis, such that inference
only extends to the subjects studied.
To extend inference to the populations from which the two groups
were dawn, a mixed effects analysis is required. This is achieved
in SPM via a two stage procedure: Analyse each subject individually,
with a contrast for the individuals activation effect. Each analysis
will give a contrast image con_????.{img,hdr}, summarising the activation
effect for each subject. Enter these summary images into a two-sample
t-test (under basic models) to compare the populations.
----------------------------------
As noted above the contrast images may be entered into
regression analyses as detailed in Karl's response
to Frode Willoch, on May 28.
Looking for a condition x 'level' interaction .... could be tested
for with a fixed effect analysis but is more simply implemented
at a second level giving a random effect analysis and generalization
to the population at large. You would simply create contrast images
for each subject (using SPM99) reflecting the condition effect and
use these as new dependent variables at the second level with your
'level' as a regressor.
-----------------------------------
Questions about scaling of the images were addressed in
this response
by Andrew:
Grand mean scaling refers to the scaling of a set of images by
a common factor such that their grand mean (the mean of the global
means) is a specified value. It arose with qualitative PET data
as a a way to informally put the measured "counts" data
into the range of rCBF, exploiting the near linearity of the counts->rCBF
function for normal ranges of both and a tightly controlled input
dose.
Clearly with proportional scaling global normalisation, grand mean
scaling is redundant, since if each image is scaled to have pre-specified
global mean, the grand mean (mean of the globals) will also be the
pre-specified value.
Grand mean scaling of an entire data set does not affect the statistical
results. However, grand mean scaling a subjects data in a single
subject analysis (such that the subjects grand mean is a set value),
may well give different results to grand mean scaling that subjects
data in the context of a group analysis, where the data are scaled
such that grand mean across all scans on all subjects has the desired
value. To avoid this predicament, in SPM99, by default, grand mean
scaling is applied in a session/subject specific manner.
Clearly when preparing contrast images from a first level model
for a second level random effects analysis you want the contrast
images to be on the same scale. Hence the instruction that the grand
mean target value (or equivalently ) chosen be the same for all
data in the model. The AdjMean programs and SPM99's PET &
fMRI designs will by default do the
right thing. (Grand mean scaling is hardcoded to 100 in the
fMRI interface, and is always applied in a session specific manner.)
. FROM A SEPARATE
EMAIL: (NB: With proportional scaling the grand mean
is implicitly also scaled.)
Important: Note that in the second level model, you should
not need to do any global normalisation or grand mean scaling, since
that is taken care of in the first level analyses.
-------------------------------------
Further questions about degrees of freedom were discussed
by Geoff Aguirre and Karl Friston. The bottom line appears to be
that height tests should generally be used with low degrees of freedom
and smoother data.
Dear Geoff,
> > For SPM{T} with large d.f. the approximations for SPM{Z}
are sufficient
> > and are therefore used.
> Are you advocating some concrete threshold of df? and is this
> hard-coded into SPM? I ask only because of the increasing prevalence
of
> random effects analyses that have much reduced degrees of freedom.
There is no hard coding in SPM99. Your point about second-level
analyses is a good one (I had overlooked this and was assuming that
all analyses would now have large d.f., ideally more than 32 but
certainly more than 16). Spatial extent tests are generally more
powerful when the resolution of the data is small in relation to
the size of the activations (this is generally the case for fixed-effect,
subject-separable analyses with, say, 4mm smoothing). Height-based
tests are more powerful for low resolution data (e.g. PET or random-effect
analyses that usually employ higher degrees of smoothing to accommodate
inter-subject variability in functional anatomy, say 6-8mm). I think
therefore that most people will use corrected inferences based on
height for second-level analyses and this should certainly be the
case when the d.f. are small (less than 16).
----------------------------------------
One user wants to know where the "truth" lies
between random and fixed effects models in terms of activation location.
I, myself have struggled with this question as well (-drg). Jesper
Anderson responds:
>
>The problem comes when I compare the results obtained with a
first level
>analysis and this second level analysis (same data, same group
of
>subjects), Of course, the statistical level of significancy
is different but if I
>try to adjust the threshold artificially so that I see approximately
the
>same extend of activation, there are some similarities but THE
FOCI OF
>ACTIVATION ARE NOT AT THE SAME PLACE (it is really different!!).
Some
>large foci appear with one analysis and not with the other and
>inversely. Where is the bug ? Where is the "truth"
?
>
The purpose of the random effects analysis is to find the areas
that are activated in much the same way in all subjects, as opposed
to the fixed effects model which gives you the areas that are activated
"on the average" across the subjects. This is really a
crucial difference since a fixed effects analysis may yield "significant"
results when one or a couple of subjects activate "a lot"
even though the other subjects do not activate at all.
This distinction is achieved by including the subject-by-condition
variance into the error variance of the random effects model, weighted
in an appropriate way, whereas the "effect" is the same
between the models. The spm(t) is the basically the "effect"
map (subtraction image) divided by the error map, the latter being
different for the random effects analysis. The important thing to
realize is that not only will the general magnitude of the error
maps be different, but also the spatial variation. The comes from
two effects, one being that there are actual subject by condition
interactions that will introduce "true" patches of high
variance, and the other being that the spatial variability on the
whole will be greater due to less degrees of freedom.
So, in the fixed effects and random effects models we have two identical
effect maps, being divided by two different error maps. Hence, it
is not surprising that the resulting spm(t)s will be different,
not only quantitatively but also qualitatively. What you do by lowering
the threshold to see roughly the same amount of activated voxels
is to show the peaks of the random effects spm(t), which has a lot
poorer sensitivity than the fixed effects spm. These peaks will
in part be "true" activations that are now subthreshold
due to the poor sensitivity, and in part false positives that start
to appear because your threshold no longer protect you from those.
Hence, although I can of course not exclude the possibility of an
error of some kind, what you describe is not really surprising.
"Where is the bug?", I don't think there necessarily is
one. "Where is the truth?", perhaps one could say that
the random effects is a very uncertain truth, in that it should
be unbiased but very insensitive.
--------------------------------------
I have not included the most recent discussion (12/9-12/13), while
I await its resolution...
As always on SPM
Hope this helps,
Darren
Related Information
CAMRI Scheduling
Calendar
SPM Analysis at Northwestern
(version 2.7)
Burning CDs
|