Thursday, March 27, 2014

"Data" the buzzword vs. data the actual thing

by Noah Smith, Noapinion, March 25, 2014


I'm a big Nate Silver fan, but let me join the chorus of people looking at his new "data-driven" blog site and saying "WTF?" As far as I can tell, it's barely data-driven at all!

For example, take this post about how climate change is not increasing the cost of natural disasters. The blogger, Roger Pielke, notes that natural disaster losses have slightly decreased since 1989 as a percentage of world GDP, and concludes that climate change is not causing (and will never cause!) increased losses. The post has been much maligned by professional climate scientists for having crappy and misleading data, but put that aside for now. Let's focus on the idea that this post represents "data-driven" journalism at all. It doesn't. 

The "data" in the post consists of one annual time series with a sample size of 23. That's too small to do any sort of statistical analysis on, but then again, the post doesn't do any statistical analysis. It shows a trendline, and from that trendline it draws broad, sweeping conclusions about the effects of climate change. How is that any more "data-driven" than what any blog does? Every newbie blogger and his dog draws a trendline and extrapolates it - and if the blogger is worth his salt, he'll at least have the common decency to qualify his extrapolation with "if this trend continues," which Pielke does not.

Furthermore, Pielke's analysis is just sloppy. What happens if you strip out earthquakes, tsunamis, volcanoes, etc., from the data? What if you extend the trend back 40 years instead of just 23? How does the recent trend compare with the trend from before climate change started significantly affecting global temperatures? And so on.

And the economic theory behind the conclusion is even sloppier. What about the costs of mitigation - levees, reinforced buildings, relocation of crops, and the like? That won't show up in loss measurements, but it represents a real cost to the economy. And what about variance? Aren't people risk-averse ? As Paul Krugman pointed out the other day, if you try to pretend you're just looking at data without any theory, you've just ignored your hidden theoretical assumptions.

OK, the Pielke post sucks, but that's just one post. Let's look at a few others. 

How about this post by Ben Casselman? This one has barely more data in it than the Pielke post! It shows a single monthly aggregate time series (of voluntary job quits), observes that quits are lower now than in 2008, and concludes that (A) the economy may be becoming less dynamic, and (B) wage gains may be suppressed going forward. This is less sophisticated than the average econ blog post. Well, at least Casselman used the word "may."

And how does this post extract any information from the data? Where is the data analysis connecting quits to dynamism, or to wages? There is none. Instead, Casselman links to a bunch of Wall Street Journal articles, and one speech about "dynamism" by Dennis Lockhart of the Atlanta Fed. The linked articles are all taken to support Casselman's central thesis. This is pure hedgehoggery, not foxiness

Or take this Casselman post on long-term unemployment. Compare the amount and detail of the data, the sophistication of the analysis, or careful thought about the data's implications to any of the following Matt O'Brien posts on the same topic: Post 1Post 2Post 3Post 4. Casselman and O'Brien reach the same conclusion, but in terms of "data-driven journalism," Casselman's post is nowhere near O'Brien's league. 

Or take this post by Andrew Flowers on whether the labor market is slack or tight. Compare this to the average econ blog post on the topic, in terms of data quantity, data quality, data analysis, and data interpretation. Again, no contest.

Looking at a bunch of other posts, you can see that this is par for the course. And not one of the posts attempts a single quantitative prediction, which is what Silver has famously thrilled the world by doing in the past.

In sum, this so-called "data-driven" website is significantly less data-driven (and less sophisticated) than Business Insider or Bloomberg View or The Atlantic. It consists nearly entirely of hedgehoggy posts supporting simplistic theories with sparse data and zero statistical analysis, making no quantitative predictions whatsoever. It has no relationship whatsoever to the sophisticated analysis of rich data sets for which Nate Silver himself has become famous.

The problem with the new FiveThirtyEight is not one of data vs. theory. It is one of "data" the buzzword vs. data the actual thing. Nate Silver is a hero of mine, but this site is not living up to its billing at all.

No comments:

Post a Comment