Northwestern University
  Search  
Feinberg School of Medicine
CNADC > Cognitive Brain Mapping Group > SPM and Random Effects Analyses
   
 

SPM and Random Effects Analyses

(a [limited] summary of the e-mail help-list)

I think it's appropriate to start off with a quote by another Englishman which for me applies to SPM in general and perhaps random effects in particular.
...It is a riddle wrapped in a mystery inside an enigma.''
- Sir Winston Churchill (1874-1965) [radio talk, 1 Oct 1939]

The use of Random Effects in SPM seems to be both a powerful and a confusing feature. Just when I at least thought I understood the process in SPM96... it has changed in SPM99, producing no shortage of discussion on the subject. A quick search through the e-mail database produces 258 references in 1999 alone.

I have compiled the document below on my readings and various references from the SPM e-mail list. I am by no means a statistician (several of my comments on the SPM list I'm sure attest to that!), and I have compiled this reference as a way to help me in understanding the technique. NOTE: In many places I have summarized the e-mail, and hopefully I haven't made any errors in doing so. However, in each case I've provided the original link for reference. Also, I've included only a small number of e-mails overall that have referenced random effects. If I've left out a particularly relevant one, let me know, but hopefully no one will feel slighted. Finally, I have not attempted to summarize the random effects e-mails prior to March 1999, so if you're interested in the application of random effects to SPM96 you'll have to go through that on your own. Please e-mail me (Darren Gitelman) if you have identified errors, or have any comments.

The primary references on the topic have been published in abstract form:

http://www.fil.ion.ucl.ac.uk/spm/RFXabstract.pdf

http://www.fil.ion.ucl.ac.uk/spm/RFXposter.pdf

and more recently there have been several articles describing the power of the technique in relation to other analysis methods

Friston KJ, Holmes AP, Worsley KJ.
How many subjects constitute a study?
Neuroimage. 1999 Jul;10(1):1-5.

Andrew Holmes provides the following references in his e-mail from March 1999

The basic approach of using summary measures is pretty standard. A good introductory paper (albeit for clinical trials) is:
Frisson L & Pocock SJ (1992) "Repeated measures in clinical trials: Analysis using mean summary statistics and its implications for design" Statistics in Medicine 11:1685-1704

The basic concepts of random effects are expounded in all but the most basic stats books on "Design of Experiments". Other key phrases to look out for are "variance components", "mixed effects models", "repeated measures", "multi-level modeling" & "hierarchical modeling". I've found the first chapter of Searle, Casella & McCulloch's 1992 book "Variance Components" (Wiley, London) a good description accessible to non-statisticians.

The paper by Roger Woods (Modeling for intergroup comparisons of imaging data. Neuroimage. 1996 Dec;4(3 Pt 3):S84-94) also provides a readable introduction to the concepts.

However, all this theory, while important, doesn't tell the average user what random effects is all about or more importantly how to use it.

Summarizing a response to Kath Moores on March 12, Andrew Holmes notes that

SPM only considers a single source of error variance. This is the scan to scan residual error. With repeated measurements on multiple subjects scan to scan is within subject, and the
analysis is fixed effects.... With only one "scan" per subject, scan to scan residual variability is between subject variance. For balanced designs, by summarising the individual subject data with an appropriate measure, and then assessing those measures across subject, a random effects analysis is effected. Using the within-subject data to compute the summary measure "scan" incorporates the within-subject error into the summary scans, such that the between variability between subject of these computed summary "scans" incorporates both within and between subject variance. It's fairly easy for a balanced design to show that these are in exactly the right ratio for a random effects assessment of the overall (population) effect.

Although this second level (between subjects) model is a fixed effects model, by using summary "scans", the fixed effect is the population effect (which *is* fixed), and the residual error variance is that appropriate for assessing that population effect allowing for random subject effects (& subject by condition interaction).

(Actually, the SPM96 random effects approach would surmise each subject by two "scans" - adjusted mean condition images for each condition. These would then be assessed using a paired t-test, which is the same as testing the differences of the adjusted means directly. This is for technical reasons - there are problems (grey matter) thresholding negative images in SPM96.)

----------------------

More to the point, Andrew provides this outline of how to do random effects in a response to Kent Kiehl

At 09:47 18/05/99 +0000, Kent Kiehl wrote:
| I read all the release notes, but I could determine for certain if the
| new event-related fMRI analyses are now using a random effects model.
| Is this true or is it still fixed effects, as in SPM97?

Random effects models are effected within SPM using a multi-stage approach. For a balanced design (i.e. where the individual subjects models are the same), each subjects data is surmised with an appropriate single image. This "single scan per subject" data is then entered into a second model, i.e. a model at the between subject level. Usually this will be a simple one-sample or two-sample t-test. The variance of these single images from subject to subject consists of contributions from both the between and within subject components of variance, in the correct proportions, and it can easily be shown that the resulting analysis is mathematically identical to the appropriate random effects (strictly called mixed effects) analysis of these data.
.........

SPM99 now writes out contrast images in addition to SPM{t/F} images by default in the results section. Further, these images are floating point images, with out of brain voxels set to NaN - not a number. Thus, these images are implicitly masked. Since SPM statistics ignores voxels where any scan is NaN, these contrast images can be put back into the statistics section.

Therefore, in SPM99, the basic procedure for a random effects analysis is as follows:
* First, fit the model for each subject
- you can model all the subjects together, provided you use a subject separable model
- strictly speaking, the individual subject models should be
identical (i.e. this is a balanced design)

* Define the effect of interest for each subject with a t-contrast.
- Each contrast will write a con_????.{img,hdr} analyze
image containing the contrast of the parameter estimates
* Armed with one contrast image per subject, proceed to the second
(between subject) level, feeding the contrast images into one
of the "Basic stats" models as appropriate.

( This is the gist of it for a simple case. There are additional caveats for multiple conditions, unbalanced designs, and non-orthogonal contrasts, which we will discuss when necessary. )

Unbalanced designs generated several additional e-mails:
http://www.mailbase.ac.uk/lists/spm/1999-08/0016.html


http://www.mailbase.ac.uk/lists/spm/1999-08/0022.html

Since you can do this with any one dimensional contrast that would produce an SPM{t}, you can look at regression slopes, interactions and so on, in addition to simple categorical comparisons.

For er-fMRI, you would probably need to fix the slice timing of 2D multi-slice acquisitions by temporally interpolation. Your model would have to use a canonical response (rather than a set of basis functions), such that the effect of interest can be extracted with a single contrast. You then put these contrast images into the between subject level analysis...

Note that this only works for one-dimensional contrasts, i.e. a t-contrast, with contrast weights a single vector. For F-contrasts (which you would use for a two-sided t-test, or to test any overall effect for a set of basis functions) you need a multivariate second level analysis. This should be possible with the Multivariate toolbox... [which is not yet available as far as I know- drg]

The AdjMean functions of the SPM96 random effects kit are included in SPM99, but we recommend using the new main SPM stats routines to define an appropriate model and extract an appropriate contrast image as summary image.

A final point of interest: Keith Worsley's seminal 1992 paper (probably this one: Worsley, K.J., Marrett, S., Neelin, P., and Evans, A.C. (1992). A three-dimensional statistical analysis for CBF activation studies in human brain. Journal of Cerebral Blood Flow and Metabolism, 12:900-918) for addressing the multiple comparisons problem also proposed the use (for PET), of a two-sample t-statistic computed from average condition images! (Although they used a variance estimate pooled over the entire intracerebral volume rather than the voxel level variance estimate used in SPM.) This is a repeated measures paired t-test approach, identical to the current SPM approach, although (possibly) primarily motivated by the substantial data reduction the use of summary images confers.

----------------------------------

This seems clear enough... however, several dozen e-mails follow trying to understand these points further. I should also note that several e-mails go into detail about the comparison of random effects and conjunction analyses, but I shall only provide a reference to a recent article here:

Friston KJ, Holmes AP, Price CJ, Buchel C, Worsley KJ. Multisubject fMRI studies and conjunction analyses. Neuroimage. 1999 Oct;10(4):385-96.

----------------------------------

In general, a fairly large number of subjects are required to perform robust random effects analyses. In several emails the general advice given is that at least 10 subjects are required for a random effects analysis, unless the activations are unusually robust. Andrew Holmes notes that:

...For sample sizes up to 12 adding additional subjects gives considerable gain in terms of detectable normalised change. After 20 subjects there is a diminishing return in terms of power. Our anecdotal evidence so far (mainly from functional fMRI & PET data) suggests that reasonable population effects require about 12 subjects, whilst population differences need at least nine per group. (All the above being random effects analyses.) In these situations it is usually inter-subject variability that dominates, such that the various imaging modalities are comparable in terms of subject numbers required.

----------------------------------

It is important to note that the contrast images generated by SPM99 are already thresholded/masked (not in the statistical sense), but either based on the absolute or proportional actual voxel value (PET) OR based on voxels that are present in all the images (fMRI). Therefore these images SHOULD NOT be thresholded further in the second level analysis. This is noted by Andrew Holmes here:

At 16:09 15/06/99 -0700, Russ Poldrack wrote:
| Hi, I have another issue to raise about fMRI analysis in spm99b. My
| understanding of the random effects approach in spm99 is that one first
| creates contrast images (conXXX.img) and then these are entered into a
| second-level PET analysis. My question related to how these contrast
| images are masked according to the specified height and extent thresholds
| when the statistics are performed.

You might prefer to use the new "Basic Stats" models, rather than the PET
models. The options of the "Basic Stats" models are geared towards multi-stage random effects analyses.

NOTE: One should not be using the activation images generated in the results window as the basis for a second level analysis. -drg

The contrast and SPM{t} images (or ess & SPM{F} images for an F-contrast) are written by the results section once a new contrast is defined, but before any height or extent thresholding is applied. (Only the write-filtered button in the results interface writes out height and extent filtered SPM{t} & SPM{F}'s). The voxels containing NaN are precisely the voxels that were outside the volume of analysis when the statistics were estimated. Usually this is determined by the masking options chosen in the stats design setup interfaces. For fMRI this is hardcoded: only voxels where all images have voxel values greater than the global mean are considered (which is usually al intracerebral voxels). For PET you are given a choice of proportional or absolute threshold masking. For basic designs you are given the further options of not doing any "analysis threshold" (previously called "grey matter threshold"), and of using explicit mask images.

For a given statistical analysis, all the output images (parameter images -
beta_????.{img,hdr}, variance image - ResMS.{img,hdr},
contrast images - (con_????.{img,hdr}), ESS images - ess_????.{img,hdr},
and SPM images - spm{T,F}_????.{img,hdr})
being derived from the same analysis, will have exactly the same voxels tagged NaN, and these will match the mask image (mask.{img,hdr}). SPM will regard any voxel with value NaN in any input image as outside the volume to be analysed. This is why you can simply put contrast images back into SPM for a second level analysis without worrying about masking out non-brain areas.

-----------------------------------

The issue of which images to use comes up several times and is addressed more explicitly in this note:

The statistic (SPM) images (SPMt_????.img & SPMF_????.img) should *not* be entered into a second level analysis if you want to effect a random effects analysis. This would basically be assessing the significance (across subjects) of the individual subjects significance! (Rather than the significance (across subjects) of the response.

It's possible that the confusion has arisen because while you enter contrast weights to get a SPM{t}, contrast itself refers only to the weighted sum of the parameters, whose estimates given in the contrast images only form the numerator of the SPM{t}. (The SPM{t} is formed by dividing the contrast image by a suitable estimate of the standard error.)

The contrast surmises the effect, the SPM{t} the evidence for the effect.

To summarize: Finally, not that we have now discussed three different scenarios:

1) Two level SPM RFX analysis using contrast images
- which effect an random effects analysis for appropriate
balanced designs, results generalise to the "population"
- "significance of response across subjects"
2) Two level SPM analysis using SPM{t} as second level input
- "significance of significances": a rather bizarre notion
3) Averaging of individual Z-score images across subjects
- combining the evidence for individual subject effects
across subjects
- a simple meta-analysis which doesn't generalise to the
population
- the topic of the thread Stefan referred to:
http://www.mailbase.ac.uk/lists/spm/1999-01/0122.html

-----------------------------------

Populations can also be compared using a random effects analysis when implemented as described in this note from Andrew:

At 15:41 07/06/99 -0400, Ziad Nahas wrote:
| 1) how to compare differences between 2 conditions in 2 different groups.
| ie. 2 groups of subject received SPECT scans under 2 different
| conditions. We would like to be able to compare, for example, the
| differences in activation between the 2 groups.
|
| group 1 : differences between condition A and B
| VERSUS
| group 2 : differences between condition A and B

This can be implemented with the "Multi-group: conditions and covariates" design in SPM99 (in PET/SPECT mode). With multiple scans per subject this is a fixed effects analysis, such that inference only extends to the subjects studied.

To extend inference to the populations from which the two groups were dawn, a mixed effects analysis is required. This is achieved in SPM via a two stage procedure: Analyse each subject individually, with a contrast for the individuals activation effect. Each analysis will give a contrast image con_????.{img,hdr}, summarising the activation effect for each subject. Enter these summary images into a two-sample t-test (under basic models) to compare the populations.

----------------------------------

As noted above the contrast images may be entered into regression analyses as detailed in Karl's response to Frode Willoch, on May 28.

Looking for a condition x 'level' interaction .... could be tested for with a fixed effect analysis but is more simply implemented at a second level giving a random effect analysis and generalization to the population at large. You would simply create contrast images for each subject (using SPM99) reflecting the condition effect and use these as new dependent variables at the second level with your 'level' as a regressor.

-----------------------------------

Questions about scaling of the images were addressed in this response by Andrew:

Grand mean scaling refers to the scaling of a set of images by a common factor such that their grand mean (the mean of the global means) is a specified value. It arose with qualitative PET data as a a way to informally put the measured "counts" data into the range of rCBF, exploiting the near linearity of the counts->rCBF function for normal ranges of both and a tightly controlled input dose.

Clearly with proportional scaling global normalisation, grand mean scaling is redundant, since if each image is scaled to have pre-specified global mean, the grand mean (mean of the globals) will also be the pre-specified value.

Grand mean scaling of an entire data set does not affect the statistical results. However, grand mean scaling a subjects data in a single subject analysis (such that the subjects grand mean is a set value), may well give different results to grand mean scaling that subjects data in the context of a group analysis, where the data are scaled such that grand mean across all scans on all subjects has the desired value. To avoid this predicament, in SPM99, by default, grand mean scaling is applied in a session/subject specific manner.

Clearly when preparing contrast images from a first level model for a second level random effects analysis you want the contrast images to be on the same scale. Hence the instruction that the grand mean target value (or equivalently ) chosen be the same for all data in the model. The AdjMean programs and SPM99's PET & fMRI designs will by default do the
right thing
. (Grand mean scaling is hardcoded to 100 in the fMRI interface, and is always applied in a session specific manner.) . FROM A SEPARATE EMAIL: (NB: With proportional scaling the grand mean is implicitly also scaled.)

Important: Note that in the second level model, you should not need to do any global normalisation or grand mean scaling, since that is taken care of in the first level analyses.

-------------------------------------

Further questions about degrees of freedom were discussed by Geoff Aguirre and Karl Friston. The bottom line appears to be that height tests should generally be used with low degrees of freedom and smoother data.

Dear Geoff,

> > For SPM{T} with large d.f. the approximations for SPM{Z} are sufficient
> > and are therefore used.
> Are you advocating some concrete threshold of df? and is this
> hard-coded into SPM? I ask only because of the increasing prevalence of
> random effects analyses that have much reduced degrees of freedom.

There is no hard coding in SPM99. Your point about second-level analyses is a good one (I had overlooked this and was assuming that all analyses would now have large d.f., ideally more than 32 but certainly more than 16). Spatial extent tests are generally more powerful when the resolution of the data is small in relation to the size of the activations (this is generally the case for fixed-effect, subject-separable analyses with, say, 4mm smoothing). Height-based tests are more powerful for low resolution data (e.g. PET or random-effect analyses that usually employ higher degrees of smoothing to accommodate inter-subject variability in functional anatomy, say 6-8mm). I think therefore that most people will use corrected inferences based on height for second-level analyses and this should certainly be the case when the d.f. are small (less than 16).

----------------------------------------

One user wants to know where the "truth" lies between random and fixed effects models in terms of activation location. I, myself have struggled with this question as well (-drg). Jesper Anderson responds:
>
>The problem comes when I compare the results obtained with a first level
>analysis and this second level analysis (same data, same group of
>subjects), Of course, the statistical level of significancy is different but if I
>try to adjust the threshold artificially so that I see approximately the
>same extend of activation, there are some similarities but THE FOCI OF
>ACTIVATION ARE NOT AT THE SAME PLACE (it is really different!!). Some
>large foci appear with one analysis and not with the other and
>inversely. Where is the bug ? Where is the "truth" ?
>

The purpose of the random effects analysis is to find the areas that are activated in much the same way in all subjects, as opposed to the fixed effects model which gives you the areas that are activated "on the average" across the subjects. This is really a crucial difference since a fixed effects analysis may yield "significant" results when one or a couple of subjects activate "a lot" even though the other subjects do not activate at all.

This distinction is achieved by including the subject-by-condition variance into the error variance of the random effects model, weighted in an appropriate way, whereas the "effect" is the same between the models. The spm(t) is the basically the "effect" map (subtraction image) divided by the error map, the latter being different for the random effects analysis. The important thing to realize is that not only will the general magnitude of the error maps be different, but also the spatial variation. The comes from two effects, one being that there are actual subject by condition interactions that will introduce "true" patches of high variance, and the other being that the spatial variability on the whole will be greater due to less degrees of freedom.

So, in the fixed effects and random effects models we have two identical effect maps, being divided by two different error maps. Hence, it is not surprising that the resulting spm(t)s will be different, not only quantitatively but also qualitatively. What you do by lowering the threshold to see roughly the same amount of activated voxels is to show the peaks of the random effects spm(t), which has a lot poorer sensitivity than the fixed effects spm. These peaks will in part be "true" activations that are now subthreshold due to the poor sensitivity, and in part false positives that start to appear because your threshold no longer protect you from those.

Hence, although I can of course not exclude the possibility of an error of some kind, what you describe is not really surprising. "Where is the bug?", I don't think there necessarily is one. "Where is the truth?", perhaps one could say that the random effects is a very uncertain truth, in that it should be unbiased but very insensitive.

--------------------------------------

I have not included the most recent discussion (12/9-12/13), while I await its resolution...

As always on SPM

Hope this helps,

Darren

Related Information

CAMRI Scheduling Calendar

SPM Analysis at Northwestern (version 2.7)

Burning CDs