If you could read my mind —

Duke scientist questions his own research with new study faulting task fMRI

“This sub-branch of fMRI could go extinct if we can't address this critical limitation.”

A new review study has bad news for scientists keen on using task-oriented fMRI to draw conclusions about any one person's brain.
Enlarge / A new review study has bad news for scientists keen on using task-oriented fMRI to draw conclusions about any one person's brain.

It all started with a rejected grant proposal. Ahmad Hariri, a neuroscientist at Duke University, was interested in using so-called "task fMRI"—in which subjects perform specially designed cognitive tasks while having their brains scanned—combined with genetic testing and psychological evaluations. The goal was to identify specific biomarkers for differences in how people process thoughts and emotions that might determine whether a given subject would be more or less likely to experience depression, anxiety, or age-related cognitive decline like dementia in the future.

"The idea was to collect this data once, then collect it again and again and again and be able to track changes in an individual's brain over time to help us understand what changes over the course of their lives," Hariri told Ars. So he submitted a funding proposal outlining his plans for a longitudinal study along those lines. The proposal hypothesized that an individual's history of trauma, for instance, would map onto how their amygdala reacted to threat-related stimuli. And that would, in turn, enable the researchers to say something about the future mental well-being of the individual.

Hariri and his team designed four core, task-related measures to that end: one targeting the amygdala's threat response, one targeting the hippocampus and memory, one targeting the striatum and reward, and the fourth targeting the prefrontal cortex and executive control. He thought he was on solid scientific ground. So he was shocked when the proposal wasn't even scored by reviewers, based on skepticism regarding the reliability of fMRI to collect that kind of data.

"That was the real kick in the pants that I needed to think more seriously about the reliability of task fMRI," Hariri said. Those concerns led him to undertake an extensive review of published studies claiming it is possible to predict a person's patterns of thoughts or feelings using task fMRI. He looked specifically at what's known as "test-retest reliability:" how much correlation there is when a person takes, and then retakes, the same cognitive test while being scanned. The results, described in a recent paper in Psychological Science, overwhelmingly showed that task fMRI was not a reliable indicator: the correlation between one scan and a later scan for the same person was only fair to poor.

The findings instigated a bit of a professional crisis for Hariri. "This is more relevant to my work than just about anyone else's," he told Duke Today with remarkable frankness. "This is my fault. I'm going to throw myself under the bus. This whole sub-branch of fMRI could go extinct if we can't address this critical limitation."

Granted, he's not saying it's impossible to measure brain activation function reliably. "You just can't do it the way we've been doing it, using the tasks we've been using," he told Ars.

"It's not as if we haven't known these issues of reliability, but this paper brings them together more sharply," Russell Poldrack told Duke Today. Poldrack is a psychologist at Stanford University who was not involved in the review study, although one of his fMRI papers from 15 years ago was included in the analysis. "This is a good wakeup call, and it's a marker of Ahmad's integrity that he's taking this on," he said.

A bit of background

fMRI is one of the most popular brain-imaging techniques in use today, in part because it produces stunning full-color images—striking visualizations of statistical data—showing bright spots of brain activity in response to different tasks. Conventional medical MRI produces a static image of the brain, similar to an X-ray, but functional MRI (fMRI) monitors increases in blood flow produced by groups of neurons firing together in response to a given stimulus. Specifically, it detects slight increases in blood oxygenation levels, known as the BOLD response.

The imaging process produces a lot of raw data—as many as 50,000 data points per scan. So neuroscientists rely on computer algorithms to sift through it all, averaging out the results from the scans of many different study participants all engaged in the same tasks (typically one control task, and one designed to measure a specific target). The larger the difference between the control task and directed task, the stronger the BOLD response. Only those signals that exceed a certain statistical threshold are considered as demonstrating a correlation between the directed task and any affected brain regions.

This salmon has ceased to be! An infamous 2010 paper reported brain activity in an fMRI scan of a dead fish.
Enlarge / This salmon has ceased to be! An infamous 2010 paper reported brain activity in an fMRI scan of a dead fish.
C. Bennett et al.

There are inevitably false positives (the same area "lighting up" in two difference scans by random chance), but neuroscientists work very hard to factor potential false positives into their statistical analyses. The importance of this was famously illustrated in a 2010 paper that found a measurable BOLD response from an fMRI scan of a dead salmon. Neuroscientist Craig Bennett of the University of California, Santa Barbara, was one of the co-authors and a then-grad student at Dartmouth. He was in charge of calibrating the MRI machine, which is typically done by scanning a balloon filled with mineral oil. He and his lab partner decided to have some fun and tried scanning a Cornish game hen, a pumpkin, and finally, the infamous salmon.

Bennett and his lab partner placed the salmon inside the head coil and then ran the calibration test, which involved "presenting" the fish with pictures of human faces and "asking" it to determine the emotions on display in each image. Lo and behold, a signal appeared in the data when he analyzed it—even though there was no way the dead salmon would have shown any brain activity at all. Bennett et al. won the 2012 Ig Nobel Prize for Neuroscience for their illuminating work.

The point is not that fMRI is an unreliable technique. On the contrary, it has proven to be quite robust for studies of groups of participants performing the same task, since this produces a broad, general sample that enables scientists to pinpoint commonalities across populations. Things get a bit stickier when we're talking about studies trying to measure a BOLD response in just one person—say, to determine if the subject is lying, their belief in god, or their level of empathy. For example, if you put 100 people in a scanner and tried to figure out which of them were lying, the best you could say is that one subgroup will likely lie more often than another subgroup. You have gained a statistically significant snapshot of the group as a whole, but that is not the same as definitively determining that a given person within that group is lying.

That's why fMRI studies of individuals typically have the subject participate in multiple scanning sessions to compensate for the small sample size (N=1) and reach the required statistical threshold. But it is much more difficult to tease out strong correlations from the data, and it's easy to convince yourself that you are seeing patterns and correlations in the data that aren't really there.

Channel Ars Technica