Is Freakonomics Economics for those who cannot understand Statistics?

by Ken Houghton

Via ESPN came the claim that some Penn professors "refute[d]" the report that Roger Clemens's late-career performance for the past decade was not unique.

This gets us to the Freakonomics blog, with a guest post from Justin Wolfers, which gives credence to Daniel Davies description of the book's central conceit:
So, for example, sumo wrestlers are "cheating" because wrestlers who need to win a bout to stay in the top league do statistically significantly better when fighting wrestlers for whom the match is a dead rubber. Not "taking sensible steps to minimise the risk of injury". Not "following unwritten social conventions of the sport". Not anything else, but "cheating", and most likely doing so because of bribery and corruption from betting syndicates.

Why do I have such a reaction? From Wolfers say in his guest post:
There’s a pretty neat trick at work [in the report from Clemens's attorneys]: if you compare Clemens only to those who had a terrific last decade of their careers, then the last decade of Clemens’ career doesn’t look that unusual....

So we put together data on all 31 other pitchers since 1968 who started at least 10 games in at least 15 seasons and have pitched at least 3,000 innings...

To be clear, we don’t know whether Roger Clemens took steroids or not. But to argue that somehow the statistical record proves that he didn’t is simply dishonest, incompetent, or both. If anything, the very same data presented in the report — if analyzed properlytends to suggest an unusual reversal of fortune for Clemens at around age 36 or 37, which is when the Mitchell Report suggests that, well, something funny was going on. [emphases mine]

and, from the NYT tarringarticle:
Other measures suggest Clemens performed similarly to his contemporaries. But these comparisons do not provide evidence of his innocence; they simply fail to provide evidence of his guilt.

and ends by giving the lie to their own lie:
Statistics provide powerful tools for understanding the world around us, but the value of any analysis invariably comes down to choosing a useful statistic and an appropriate comparison group. Statisticians-for-hire have a tendency to choose comparison groups that support their clients. A careful analysis, and a better informed public, are the best defense against such smoke and mirrors. [again, emphases mine]

You would think, from this, that the evidence presented by the Penn professors and Mr. Wolfers would be somehow sacrosanct, a Holy Grail, a definitive "smackdown" to make Brad DeLong proud.

Instead, we get "all 31 other pitchers since 1968 who started at least 10 games in at least 15 seasons, and pitched at least 3,000 innings." Now, some of these are Hall of Famers: Dennis Eckersley but maybe not Steve Carlton (career began in 1965), and if Carlton, then Doyle Alexander, Charlie Hough, Milt Pappas, Joe Niekro, Chuck Finley, Danny Darwin, Mike Mussina, and Jamie Moyer, to name a few from this site.

In short, most of those 31 pitchers aren't people whose career, up until the age of 33, compares to Clemens. Which is why—not to put too fine a point on it—the Wolfers graphic for walks and hits per innings pitched looks more like an Edgeworth Box than a statistical analysis.

The authors are careful not to name the pitchers in their sample. But they are perfectly willing to cast aspersions on the sample that named pitchers, since the pitchers named—Ryan, Randy Johnson—are generally presumed above suspicion and have early careers rather closer to Clemens's than the Charlie Houghs and Danny Darwins and Jamie Moyers ever were.

So the "statisticians" managed to show that Clemens's final decade is an outlier. But their own graphics shows that Clemens's career is an outlier from their smoothed data. You might think from this that they would know that their comparison was rather suspect.

From their text, they clearly knew it. Also from their text, they didn't and don't care. They got to cast aspersions, got published in the NYT, got pushed on ESPN, and got a nice writeup on the Freakonomics blog.

Let's look at their last sentence again:
A careful analysis, and a better informed public, are the best defense against such smoke and mirrors.

That would be nice. I wonder: do the authors intend to do one?

Nicely done KH.

Medium term reader; first time commenter. ;)
Welcome, V.

Agreed that it's a good post. Though I think it's a symptom of Generalized Economist Syndrome to use "tends to suggest" or similar double-qualifiers.
