Testing and confirmation are way too easy in economics.
A lovely paper that documents this claim is: "Robert S, Goldfarb, 1997. "Now You See It, Now You Don't: Emerging Contrary Results in Economics," Journal of Economic Methodology, vol. 4(2): 221-44, December." (See also another paper by Goldfarb.)
This is due, I think, to the over-reliance on statistical technique absent fruitful/constraining background theory (that helps pins down the uncertainty in and variance of major parameters). In economics there are few very enduring quantitative results (and that also pin down major parameters). Too much economic research is sensitive either to changes in our understanding of the relevant data-set and ‘improved’ statistical technique or to changes in social circumstances that have made old claims outdated (because, as I suggested in my comment on Boulding last week, economics may have a reflexive relationship with 'objects' studied). Below I quote (without accompanying footnotes and bibliography) from my first published paper in the philosophy of economics that targets the obsession over testing/confirmation (and the accompanying *abuse* of statistical technique, but for the record: I am not critical of statistics or econometrics as such!). When I wrote the passage below I was ignorant of much of 20th century economics, although as the larger paper reveals, I was struck that the economists I was reading were very worried about underdetermination in context of research (without it being motivated by holism or confirmation).
The paper was also consequence of my disenchantment with my fellow philosophers of science (Bayesians and anti-Bayesians), who seemed to think that confirmation and explanation are crucial issues in science. By contrast, I think that concept formation (in context of theory development) is the crucial issue for philosophy of (immature/social) science--hence my interest in proxies (indirect measurement; see also here) and coining concepts.
My main reason for the extensive self-quotation below (other than my usual self-promotion, and the need for improving my citation-metrics), is a recent encounter with a working paper, "Intervention, underdetermination, and theory generation," by two hot Bayesian philosophers of science, Jan-Willem Romeijn and Jon Williamson. For in this paper Romeyn/Williamson rigorously formulate a version of the crucial problem (how to use experiments to develop theory), and promise (but not yet deliver) solutions with which "extensions of the statistical model can be guided by intervention data." (I think their work is especially useful in thinking about data from so-called natural experiments.) If they can deliver on their promise, we are heading for exciting philosophic times! (I do worry that the kind of "algorithms" they have in mind may still be vulnerable to heroic assumptions about underlying distributions, but the jury is out.) While I won't drop my general kvetch [that's a technical term] about Bayesianism (one can manipulate priors way too easily, which is why grant-writers adore it; for more technical criticism, see Norton), I welcome reader responses in calling attention to the fruitful use of (Bayesian or not) statistics in (scientific) concept formation. (For a priceless exchange that illustrates my kvetch about Bayesianism, see this piece by J.B Kadane, (2006). Misuse of Bayesian Statistics in Court, CHANCE, 19, 2, 38-40 (also read the accompanying piece (in the previous link) by Elizabeth Becker, an economist who is a leading expert for hire in these matters)!
"The complexity of the social world is often a (tacit) presupposition for explaining away data. This mindset is still very prevalent. It shows up in two, related ways of the common practice of many contemporary social scientists.
First, curve-fitting theory and data is often used to “confirm” a theory or call a “test” a success. Let me give a representative example, chosen because of the author’s fame (Nobel laureate), the clarity of prose, and the genuinely interesting effort at using an instance of general equilibrium theory to conduct a comparative study. In a lovely article, Robert Lucas claims to show that “inflation stimulates real output if, and only if, it succeeds in ‘fooling’ suppliers of labor and goods into thinking relative prices are moving in their favor” (Lucas 1973, 333). The study contains a test between two hypotheses based on a comparative analysis, and it makes an admirable effort to articulate both the working background and enabling assumptions. Without going into details of the argument, I call attention to a feature of Lucas’s method. He writes it is difficult to “make a direct comparison of variances . . . inshort time-series. Accordingly, it has been necessary to impose a specific, simple structure on the data. . . . This structure accounts for the output and inflation rate movements only moderately well, but well enough to capture the main phenomenon predicted by the natural rate theory: the higher the variance of demand, the more unfavorable are the terms of the Phillips tradeoff” (Lucas 1973, 334). This is the last sentence of the article. There is no attempt to even outline how one could explain or investigate any remaining discrepancies. Leaving aside the problem that “moderately well” turns out to be a statistical (and not an economic) notion (see more on this in the next paragraph), the theory and test are designed in such a way that the remaining, acknowledged discrepancies between theory and data cannot become meaningful to improve the theory. For the structure that is imposed on the data makes it the case that “the deviations from a fitted trend line must average to zero” (Lucas 1973, 330). Systematic deviations have become obscured. What makes Lucas’s “test” possible makes the data uninformative to generate improvements of the theory! There is no feedback mechanism that can enable theorists to learn from systematic failure.
Second, the mindset is evident in how statistics are used in economics (and other social and medical sciences). As DeirdreMcCloskey has been arguing, in economics a theory is often said to fit the data by an arbitrary criterion that has nothing to do with the theory that is being tested. It then is said to have “statistical significance.” Leaving aside the fact that many scientists often misapply P-values (see, e.g., García-Berthou and Alcaraz 2004), in the absence of a well-confirmed background theory that informs our choice of parameters, normal distribution, and our conception of the proper notion of causation applicable to the case at hand, the use of “statistical significance” creates the illusion of scientific rigor. Statistics alone cannot decide the goodness or badness of a hypothesis (see McCloskey 1985, chaps. 8 and 9). This is not the place to get into the technical details because the main problem can be stated without such an analysis. Even when the “fit” between data and curve is remarkable, or has so-called “statistical significance,” this should not be the end of the matter. Newton’s methodological innovation is that exceptions can become evidentially meaningful. When one assumes the world is too complex, then one is likely to be tempted to find statistical techniques to impose order and ignore failure. Exceptions should become invitations to further research. But this requires the use of theory in which systematic deviations from expected regularities are a source of possible evidence about the world, not something to be explained away. The design of theory and test must take this into account. This insight is nicely captured by Vernon Smith: “Establishing the anatomy of failure is essential to any research program concerned with modifying the theory” (Smith 1994, 114). Of course, the world must cooperate; the Newtonians tried to explain the empirical violation in the expected orbit of Mercury by positing another planet, Vulcan, between the sun and Mercury to no avail (Smith 1994, 126-27). The systematic deviation was seen as an opportunity to enable refinement and improvement of the theory. The anomaly could only be accounted for by Einstein, who changed the most fundamental working background assumptions. (Schliesser, 2005: 67-9)
Recent Comments