A Microfounded Monster
I just read a very interesting new paper (via Mark Thoma) from the Center of Financial studies at Goethe University, titled “Complexity and monetary policy”. The paper probably filled a “critical gap” in someone’s knowledge toolbox, but failed to consider certain “meta level” deficiencies in methodology. Furthermore, certain implicit assumptions with regard to modeling philosophy were ignored. The authors certainly acknowledged limitations to the DSGE mindset, but did not consider the rich and interesting consequences thereof. I will try to do that, but first context.
A summary
The authors, Athanasios Orphanides and Volker Wieland, set out to test a general policy rule against specific designs for 11 models. The models examined fall under four categories, broadly:
- Traditional Keynesian (as formalized by Dieppe, Kuester, and McAdam in 2005)
- New Keynesian (less rigorous without household budget optimization)
- New Keynesian (more rigorous monetary business cycle models)
- DSGEs built post-crisis
So mostly DSGEs, and DSGE-lites.
The general monetary policy rule considered is stated as,
i_t = ρi_t−1 + α(p_t+h − p_t+h−4) + βy_t+h + β'(y_t+h − y_t+h−4)
where i is the short-term nominal interest wait, ρ is a smoothing parameter thereof. p is the log of price level at time t and hence p_t+h − p_t+h−4 captures the continuously compounded rate of change so α is the policy sensitivity to inflation. y is the deviation of output from flexible wage conditions, so β represents the policy sensitivity to output gap, and β’ represents that to growth rate. h is the horizon in consideration (limited to multiples of 2).
The model-specific optimal parameters are considered against four benchmark rules:
- Well known Taylor rule (where policy responds only to current inflation and output gap, i.e. ρ, β, β’ = 0)
- A simple differences rule (ρ = 1, α = β’ = 0.5, β = 0)
- Gerdesmeier and Roffia (GR: ρ = α = 0.66, β = 0.1, β ̃ = h = 0)
The “fitness” of each policy rule for a given DSGE is measured by a loss function defined as,
L_m =Var(π)+Var(y)+Var(∆i)
or the weighted total of unconditional variances of inflation deviation from target, output gap, and change in interest rate. The best parameters by L for each model, as well as L thereof compared against standard policy rules are noted below:
However, as the authors note, the best parametric set for one model is far from it in another, with explosive and erratic behavior:
To “overcome” this obstacle, Orphanides and Wieland use Bayesian model averaging, starting with flat priors, to minimize L over all models. That is the model set that minimizes,
sigma((1/M)*L_m) for m = 1 to m = M, where M is the total number of models.
Under this procedure, the optimal average policy is:
i_t = 0.96i_t−1 + 0.30π_t + 0.19y_t + 0.31(y_t − y_t−4)
Indeed, as we would expect, it performs fairly well measured against the model-specific policy without more than twice exceeding optimal L by above 50%.
The authors then similarly manipulate and determine optimal averages within a subset of models, and perform out-of-sample tests thereof. They further consider the parametric effects of output gap mismeasurement on L, The paper describes this in further detail, and it is – in my opinion – irrelevant to the ultimate conclusion of the paper. That is, the simple first-differences rule giving equal weight to inflation and output gap growth rate are fairly robust as long as it is targeted on outcomes rather than forecasts.
A critique
More than anything, this paper silently reveals the limitations of model-based policy decisions in the first place. Here’s the silent-but-deadly assertion in the paper:
The robustness exhibited by the model-averaging rule is in a sense, in sample. It per- forms robustly across the sample of 11 models that is included in the average loss that the rule minimizes. An open question, however, is how well such a procedure for deriving robust policies performs out of sample. For example, what if only a subset of models is used in averaging? How robust is such a rule in models not considered in the average loss?
The operative “what if” propels the authors to test subsets within their arsenal of just eleven models. They never even mention that their total set is just a minuscule part of an infinite set S of all possible models. Of course, a whopping majority of this infinite set will be junk models with idiotic assumptions like calvo pricing or perfect rationality utility monsters or intertemporal minimization of welfare which aggregate into nonsense.
Those uncomfortable with predictive modeling such as myself may reject the notion of such a set altogether, however the implicit assumption of this paper (and all modelers in general) is that there is some near-optimal model M that perfectly captures all economic dynamics. To the extent that none of the models in the considered set C meet this criteria (hint: they don’t), S must exist and C ipso facto is a subset thereof.
The next unconsidered assumption, then, is that C is a representative sample of S. Think about it as if each model occupies a point on an n-dimensional space, C is a random selection from S. But C is actually just a corner of S, for three reasons:
- They all by-and-large assume perfect rationality, long-run neutrality, immutable preferences, and sticky wages.
- Economics as a discipline is path dependent. That is, each model builds on the next. Therefore, there may be an unobservable dynamic that has to exist for a near-ideal model, which all designed ones miss.
- S exists independent of mathematical constraints. That is, since all considered models are by definition tractable, it may be that they all miss certain aspects necessary for an optimal model.
But if the eleven considered models are just a corner of all possible models, the Bayesian average means nothing. Moreover, I think it’s fundamentally wrong to calculate this average based on equal priors for each model. There are four classes of models concerned, within which many of the assumptions and modes of aggregation are very similar. Therefore, to the extent there is some correlation within subsets (and the authors go on to show that there is), the traditional Keynesian model is unfairly underweighted because it is the only one in its class. There are many more than 11 formalized models, what if we used 5 traditional models? What if we used 6? What if one of them was crappy? This chain of thought illustrates the fundamental flaw with “Bayesian” model averaging.
And by the way, Bayesian thinking requires that we have some good way of forming priors (a heuristic, say) and some good way of knowing when they need to be updated. As far as models are concerned, we have neither. If I give you two models, both with crappy out-of-sample correlation, can you seriously form a relative prior on efficacy? That’s what I thought. So the very intuitional premise of Bayesian updating is incorrect.
I did notice one thing that the authors ignored. It might be my confirmation bias, but the best performing model was the also completely ignored in further analysis. Not surprisingly, the traditional Keynesian formalization. Go back to table 3, which shows how the model-specific policy rule performs for each model. You see a lot of explosive behavior or equilibrium indeterminacy (indicated by ∞). But see how well the Keynesian-specific policy rule does (column 1). It has the best worst-case for all of the considered cases. Its robustness does not end there, consider how all the other best-cases do on the Keynesian model (row 1), where it again “wins” in terms of average loss or best worst case.
The parameters for policy determination across the models look almost random. There is no reason to believe there is something “optimal” about figures like 1.099 in an economy. Human systems don’t work that way. What we do know is that thirty years of microfoundations have perhaps enriched economic thinking and refinement, but not at all policy confidence. Right now, it would be beyond ridiculous for the ECB to debate the model, or set of models on which to conduct Bayesian averages, used to decide interest rate policy.
Why do I say this? As late as 2011, the ECB was increasing rates when nominal income had flatlined for years and heroin addiction was afflicting all the unemployed kids. The market monetarists told us austerity would be offset, while the ECB asphyxiated the continent of growth. We know that lower interest rates increase employment. We know that quantitative easing does at least a little good.
And yet, it does not look like the serious economists care. Instead we’re debating nonsense like the best way to average a bunch of absolutely unfounded models in a random context. This is intimately connected with the debate over the AS-AD model all over the blogosphere in recent weeks. This is why we teach AS-AD and IS-LM as an end in and of itself rather than a means thereof. AS-AD does not pretend to possess any predictive power, but maps the thematic movements of an economy. To a policymaker, far more important. IS-LM tells you that when the equilibrium interest rate is below zero, markets cannot clear and fiscal policy will not increase yields. It also tells you that the only way for monetary policy to reach out of a liquidity trap is “credibly commit to stay irresponsible”. Do we have good microfoundations on the way inflationary expectations are formed?
It was a good, well-written, and readable paper. It also ignored its most interesting implicit assumptions without which we cannot ascribe a prior for its policy relevance.
Ashok, In other words: “A load of crock (i.e. BS)!”
Yep, you should have told me that before I wrote a 1,000 words 😉