I was just reading Eric's post on the frail methodological grounds for evidence-based medicine, and was reminded of an article which appeared in last week's New Yorker, 'The truth wears off -- Is there something wrong with the scientific method?', by Jonah Lehrer. I have a lot of appreciation for the New Yorker, but it's not the kind of place one expects to find serious discussions on scientific methodology (notwithstanding its good track record for scientifically relevant pieces, such as the famous Piraha piece by Colapinto). Now, if even places such as the New Yorker are now bringing to the fore the methodological problems underlying a lot of our best statistics-based science, this tells me that the problem is so serious that it is overspilling even into such venues.
The bulk of the article focuses on what it describes as 'the decline effect': the fact that, when a new, compelling hypothesis is first tested for, it may yield through-the-roof results which then slowly but surely start decreasing as the experiment is repeated. This phenomenon has been observed in many cases, according to the article; in particular, so-called second-generation antipsychotics, which appeared to have fantastic results when they were first investigated in the early 90's, now have seen their success decline steadily. According to the article, "a recent study showed an effect that was less than half of that documented in the first trials, in the early nineties."
Besides the decline effect, the article also reported on John Crabbe, a neuroscientist in Oregon, who performed a series of experiments on mouse behavior in three different labs: Albany (NY), Edmonton and Portland. He controlled for every variable he could think of: same strain of mice, shipped from the same supplier on the same day, same conditions, same food, same amount of light, even the same kind of surgical glove for when the mice were handled. The expectation was, of course, that any experiment on the mice's behavior should generate the same or very similar results; not so. Very extreme deviations in the results were systematically observed. In one experiment, mice were injected with cocaine (let's not get into the ethical issues here...): in Portland, the mice moved only 600 cm. more than they usually would; in Albany, they moved 701 additional cm.; in Edmonton, they moved more than 5000 aditional cm.! What's up with the Edmonton air? Should we all consider relocating?
As any scientist knows, these days replicability is (almost) everything in science; one of the main accusations against Marc Hauser in the recent scandal was that many of his experiments could not be replicated. But if Crabbe's mice in different cities behaved so differently being given nearly the exact same experimental conditions (or in any case, with the same input on all the variables current scientific standards could possibly deem as relevant), are our expectations concerning replicability really reasonable? And if we can't count on replicability for scientific justification, what on earth can we count on? In summary, the picture looks pretty grim, at least as painted by the article. I'd be interested to hear what readers may think of all this: is there something that the article is missing/misconstruing? (Thanks to Eric, here is a link to a pdf version of the article.)
Another way of looking at the whole thing: Hume was spot-on after all!
(Surprising that in the PhiPapers survey the Humean take on laws of nature came out as not that popular, only 24,7% of votes -- mine among those.)
UPDATE: I realize now that Eric had already mentioned the New Yorker article at another post of his! Apparently I haven't been a very assiduous reader of New APPS... (Busy times!)
Recent Comments