Skip to content

Commit 9092422

Browse files
Tom's July 17 edits of calvo_ML lecture
1 parent fed912e commit 9092422

File tree

1 file changed

+77
-42
lines changed

1 file changed

+77
-42
lines changed

lectures/calvo_gradient.md

Lines changed: 77 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -14,55 +14,68 @@ kernelspec:
1414
# Machine Learning a Ramsey Plan
1515

1616

17-
In this lecture, we'll study the same Ramsey problem that we also study in this quantecon lecture
17+
## Introduction
18+
19+
This lecture studies a problem that we also study in another quantecon lecture
1820
{doc}`calvo`.
1921

20-
In that lecture, we an analytic approach based on ``dynamic programming squared`` to guide computation of a Ramsey plan.
22+
That lecture used an analytic approach based on ``dynamic programming squared`` to guide computation of a Ramsey plan in a version of a model of Calvo {cite}`Calvo1978`.
23+
24+
Dynamic programming squared guided the calculations in that lecture by providing much useful information about mathematical objects in terms of which the Ramsey plan can be represented recursively.
2125

22-
Dynamic programming squared provided us with much useful information about mathematical objects that represent a Ramsey plan recursively and how to compute it efficiently.
26+
That paved the way to computing the Ramsey plan efficiently.
2327

24-
Included in that information are descriptions of
28+
Included in the structural information that dynamic programming squared provided in quantecon lecture {doc}`calvo` are descriptions of
2529

26-
* the **state** variable confronting a continuation Ramsey planner
30+
* a **state** variable confronting a continuation Ramsey planner, and
2731
* two Bellman equations
2832
* one that describes the behavior of the representative agent
2933
* another that describes the decision problems of a Ramsey planner and of a continuation Ramsey planner
3034

3135

32-
In this lecture, we approach the Ramsey planner in a less sophisticated way that proceeds not knowing any of the structure imparted by dynamic programming squared.
36+
In this lecture, we approach the Ramsey planner in a much less sophisticated way that proceeds without knowing the structure imparted by dynamic programming squared.
3337

34-
Instead, we use a brute force **machine learning** approach that naively states the Ramsey problem
38+
Instead, we use a brute force approach that naively states the Ramsey problem
3539
in terms of a pair of infinite sequences of real numbers that the Ramsey planner chooses
3640
* a sequence $\vec \theta$ of inflation rates
3741
* a sequence $\vec \mu$ of money growh rates
3842

43+
We take the liberty of calling this a **machine learning** approach because of how it fails to take advantage of the structure exploited by dynamic programming squared, at the cost of proliferating parameters.
44+
45+
This is what many machine learning algorithms do.
46+
47+
Comparing the calculations in this lecture with those in our sister lecture {doc}`calvo` provides us
48+
with good laboratory to help appreciate the promises and limits of machine learning approaches
49+
more generally.
50+
3951
We'll actually deploy two machine learning approaches, one more naive than the other.
4052

41-
* the first is really lazy.
42-
* it just hands a Python function that computes the Ramsey planner's objective over to a gradient descent algorithm
43-
* the second is less lazy.
44-
* it exerts the effort required to express the Ramsey planner's criterion as an affine quadratic form in $\vec \mu$, computes first-order conditions for an optimum, and solves the resulting system of simultaneous linear equations for $\vec \mu$ and then $\vec \theta$.
53+
* the first is really lazy
54+
* it just writes a Python function to computes the Ramsey planner's objective as a function of a money growth rate sequence and then hands it over to a gradient descent optimizer
55+
* the second is less lazy
56+
* it exerts the effort required to express the Ramsey planner's objective as an affine quadratic form in $\vec \mu$, computes first-order conditions for an optimum, and solves the resulting system of simultaneous linear equations for $\vec \mu$ and then $\vec \theta$.
57+
58+
While both of these machine learning (ML) approaches succeed in recovering the Ramsey plan computed in via dynamic programming squared in quantecon lecture {doc}`calvo`, they don't reveal the structure that is exploited in that lecture.
59+
60+
That structure lies hidden with the answers provided by our ML approach
4561

46-
While these machine learning (ML) approaches succeed in recovering the same Ramsey plan computed in
47-
this quantecon lecture {doc}`calvo`, they don't reveal the structure that is exploited in that
48-
lecture's application of dynamic programming squared.
62+
We can ferret out that structure if only we ask the right questions.
4963

50-
But that structure is lurking in the answers provided by our ML approach, if only we ask exactly the right questions.
64+
At the end of this lecture we show what those questions are and how they can be answered by running particular linear regressions on components of
65+
$\vec \mu, \vec \theta$.
5166

52-
Those questions can be answered by running particular linear regressions on components of
53-
$\vec \mu, \vec \theta$, as we show at the end of this lecture.
67+
Application of human intelligence, not the artificial intelligence exhibited in our machine learning approaches, is a key input into figuring out what regressions to run.
5468

5569

5670
## The Model
5771

58-
The basic model is linear-quadratic version of a model that Guillermo Calvo {cite}`Calvo1978` used to illustrate the **time inconsistency** of optimal government
59-
plans.
72+
We study a linear-quadratic version of a model that Guillermo Calvo {cite}`Calvo1978` used to illustrate the **time inconsistency** of optimal government plans.
6073

6174

6275
The model focuses attention on intertemporal tradeoffs between
6376

64-
- welfare benefits that a representative agent's anticipations of future deflation generate by decreasing costs of holding real money balances and thereby increasing a representative agent's *liquidity*, as measured by holdings of real money balances, and
65-
- costs associated with the distorting taxes that the government levies to acquire the paper money that it destroys in order to generate anticipated deflation
77+
- utility that a representative agent's anticipations of future deflation generate by decreasing costs of holding real money balances and thereby increasing the agent's *liquidity*, as measured by holdings of real money balances, and
78+
- social costs associated with the distorting taxes that the government levies to acquire the paper money that it destroys in order to generate anticipated deflation
6679

6780
The model features
6881

@@ -136,7 +149,7 @@ the linear difference equation {eq}`eq_grad_old2` can be solved forward to get:
136149
```{math}
137150
:label: eq_grad_old3
138151
139-
\theta_t = \frac{1}{1+\alpha} \sum_{j=0}^\infty \left(\frac{\alpha}{1+\alpha}\right)^j \mu_{t+j}
152+
\theta_t = \frac{1}{1+\alpha} \sum_{j=0}^\infty \left(\frac{\alpha}{1+\alpha}\right)^j \mu_{t+j}, \quad t \geq 0
140153
```
141154

142155
```{note}
@@ -180,29 +193,33 @@ $t$ when it changes the stock of nominal money
180193
balances at rate $\mu_t$.
181194
182195
Therefore, the one-period welfare function of a benevolent government
183-
is:
196+
is
197+
184198
185199
$$
186-
v_0 = \sum_{t=0}^\infty \beta^t s(\theta_t, \mu_t)
200+
s(\theta_t,\mu_t) = U(-\alpha \theta_t) - \frac{c}{2} \mu_t^2 .
187201
$$
188202
189-
where $\beta \in (0,1)$ is a discount factor and the goverment's one-period welfare function is
203+
The Ramsey planner's criterion is
190204
191205
$$
192-
s(\theta_t,\mu_t) = U(-\alpha \theta_t) - \frac{c}{2} \mu_t^2 .
193-
$$
206+
V = \sum_{t=0}^\infty \beta^t s(\theta_t, \mu_t)
207+
$$ (eq:RamseyV)
194208
209+
where $\beta \in (0,1)$ is a discount factor.
195210
211+
The Ramsey planner chooses
212+
a vector of money growth rates $\vec \mu$
213+
to maximize criterion {eq}`eq:Ramsey` subject to equations {eq}`eq_grad_old3`.
196214
197215
198216
199217
200-
## Parameters and variables
201218
202-
We want to compute a vector of money growth rates $(\mu_0, \mu_1, \ldots, \mu_{T-1}, \bar \mu)$
203-
to maximize the function $\tilde V$ below.
204219
205-
We'll start by setting them at the default values from {doc}`calvo`.
220+
221+
## Parameters and variables
222+
206223
207224
**Parameters** are
208225
@@ -229,7 +246,7 @@ We'll start by setting them at the default values from {doc}`calvo`.
229246
230247
### Basic objects
231248
232-
To prepare the way for our calculations, we'll remind ourselves of the key mathematical objects
249+
To prepare the way for our calculations, we'll remind ourselves of the mathematical objects
233250
in play.
234251
235252
* sequences of inflation rates and money creation rates:
@@ -311,9 +328,9 @@ $$
311328
\theta_t = \bar \theta \quad \forall t \geq T
312329
$$
313330
314-
**Formula for truncated $\vec \theta$ **
331+
**Formula for truncated $\vec \theta$**
315332
316-
In light of our approximation, we now seek a function that takes
333+
In light of our approximation that $\mu_t = \bar \mu$ for all $t \geq T$, we now seek a function that takes
317334
318335
$$
319336
\tilde \mu = \begin{bmatrix}\mu_0 & \mu_1 & \cdots & \mu_{T-1} & \bar \mu
@@ -652,7 +669,7 @@ First, recall that a Ramsey planner chooses $\vec \mu$ to maximize the governmen
652669
We now define a distinct problem in which the planner chooses $\vec \mu$ to maximize the government's value function {eq}`eq:Ramseyvalue`subject to equation {eq}`eq:inflation101` and
653670
the additional restriction that $\mu_t = \bar \mu$ for all $t$.
654671
655-
The solution of this problem is a single $\mu$ that this quantecon lecture {doc}`calvo` calls $\mu^{CR}$.
672+
The solution of this problem is a time-invariant $\mu_t$ that this quantecon lecture {doc}`calvo` calls $\mu^{CR}$.
656673
657674
```{code-cell} ipython3
658675
# Initial guess for single μ
@@ -790,7 +807,7 @@ V = \sum_{t=0}^\infty \beta^t (h_0 + h_1 \theta_t + h_2 \theta_t^2 -
790807
\frac{c}{2} \mu_t^2 )
791808
$$
792809
793-
With out assumption above, criterion $V$ can be rewritten as
810+
With our assumption above, criterion $V$ can be rewritten as
794811
795812
$$
796813
\begin{align*}
@@ -916,7 +933,7 @@ print(f'deviation = {np.linalg.norm(optimized_μ - clq.μ_series)}')
916933
compute_V(optimized_μ, β=0.85, c=2)
917934
```
918935
919-
We find that, with a simple understanding of the structure of the problem, we can significantly speed up our computation.
936+
We find that by exploiting more knowledge about the structure of the problem, we can significantly speed up our computation.
920937
921938
We can also derive a closed-form solution for $\vec \mu$
922939
@@ -989,10 +1006,29 @@ closed_grad
9891006
print(f'deviation = {np.linalg.norm(closed_grad - (- grad_J(jnp.ones(T))))}')
9901007
```
9911008
992-
### Some regressions
1009+
## Informative regressions
9931010
9941011
In the interest of looking for some parameters that might help us learn about the structure of
995-
the Ramsey plan, we shall some least squares linear regressions of various components of $\vec \theta$ and $\vec \mu$ on others.
1012+
the Ramsey plan, we shall compute some least squares linear regressions of particular components of $\vec \theta$ and $\vec \mu$ on others.
1013+
1014+
These regressions will reveal structure that is hidden within the $\vec \mu^R, \vec \theta^R$ sequences associated with the Ramsey plan.
1015+
1016+
It is worth pausing here and noting the roles played by human intelligence and artificial intelligence (ML) here.
1017+
1018+
AI (a.k.a. ML) is running the regressions for us.
1019+
1020+
But you can regress anything on anything else.
1021+
1022+
Human intelligence is telling us which regressions to run.
1023+
1024+
And when we have those regressions in hand, considerably more human intelligence is required fully to
1025+
appreciate what they reveal about the structure of the Ramsey plan.
1026+
1027+
```{note}
1028+
At this point, an advanced reader might want to read Chang {cite}`chang1998credible` and think about why he Chang takes
1029+
$\theta_t$ as a key state variable.
1030+
```
1031+
9961032
9971033
```{code-cell} ipython3
9981034
# Compute θ using optimized_μ
@@ -1124,6 +1160,5 @@ where $b_0, b_1, g_0, g_1, g_2$ were positive parameters that the lecture comput
11241160
11251161
By running regressions on the outcomes $\vec \mu^R, \vec \theta^R$ that we have computed with the brute force gradient descent method in this lecture, we have recovered the same representation.
11261162
1127-
However, in this lecture we have more or less discovered the representation by brute force -- i.e.,
1128-
just by running some regressions and staring at the result, noticing that the $R^2$ of unity tell us
1129-
that the fits are perfect.
1163+
However, in this lecture we have discovered the representation partly by brute force -- i.e.,
1164+
just by running some well chosen regressions and staring at the results, noticing that the $R^2$ of unity tell us that the fits are perfect.

0 commit comments

Comments
 (0)