I was stunned to see a recent paper on fiscal multipliers use a 90% confidence interval, which seemed far too lenient. After all, economics and many other sciences suffer from problems such as data mining, publication bias, and inability to replicate findings. I’d like to see the standard statistical significance cut-off point raised from 95% to something stronger, maybe 98%. When I did this recent post I wondered if I was making some elementary error, as econometrics is not my strong suit.
It turns out the problem is even worse than I assumed. Indeed Ryan Murphy recently published a study of fiscal multiplier research (in Econ Journal Watch), and found that many studies use 68%!!
In recent decades, vector autoregression, especially structural vector autoregression, has been used to study the size of the government spending multiplier (Blanchard and Perotti 2002; Fatás and Mihov 2001; Mountford and Uhlig 2009). Such methods are used in a significant proportion of empirical research designed to estimate the multiplier (see Ramey 2011a). Despite being published in respected journals and cited by prominent members of the profession, much of this literature does not use the conventional standard of statistical significance that economists are accustomed to in empirical research.
Results in the literature on the fiscal multiplier are typically communicated using a graph of the estimated impulse-response functions. For instance, the effect of government spending on output may be reported by reproducing a graph of an impulse-response function of a one-unit (generally, one percentage point or one standard error) change in government spending. The graph would show the percent change in output over time following the change in government spending. To report statistical significance, authors of these studies may then draw confidence bands around the impulse response function. Ostensibly, if zero lies outside the confidence band, it is statistically distinguishable from zero. But very frequently in this literature the confidence bands correspond to only one standard error. In other words, instead of representing what corresponds to rejecting the null hypothesis at a 90% level or 95% level, the confidence bands correspond to rejecting the null hypothesis at a 68% level. By conventional standards, this confidence band is insufficient for hypothesis testing. Not every useful empirical study must achieve significance at the 95% level to be considered meaningful, of course, but a pattern of studies which do not use and reach the conventional benchmark is a cause for attention and perhaps concern. Statistical significance is not the only standard by which we should judge empirical research (Ziliak and McCloskey 2008). It is, however, a useful standard, and still an important one. Here I examine papers in the fiscal multiplier literature which apply vector autoregression methods. Sixteen of the thirty-one papers identified use narrow, one-standard-error confidence bands to the exclusion of confidence bands corresponding to the conventional standard of 90% or 95% confidence. This practice will often not be clear to the reader of a paper unless its text is read rather carefully.
I can’t even fathom what people are thinking when they use 68%. It seems like something you’d see in The Onion, and yet apparently this stuff gets published. Can someone help me here, what am I missing?