6. Principle of Optimality and Dynamic Programming
本章主题:最优性原理与动态规划。 Bellman 的最优性原理给出序列问题等价于递归规划问题(即贝尔曼方程)的条件。§6.1 离散时间:序列问题 (6.1) 写成递归的贝尔曼方程 \(V(x)=\max_{y\in\Gamma(x)}[F(x,y)+\beta V(y)]\)(6.2,泛函方程),\(g(x)\) 为最优决策规则;最优性原理 \(V^\star(x)=V(x)\);由 FOC (6.3) + 包络定理 \(V'(x)=F_x(x,g(x))\)(6.4)代回得到 EE。§6.2 连续时间:贝尔曼方程 \(\rho V(x)=\max_u[h(x,u)+V'(x)g(x,u)]\)(6.5)、FOC (6.6)、两个泛函方程 (6.7) 解出 \(V\) 与 \(u^\star\)。§6.3 贝尔曼方程与最大值原理:令 \(\lambda=V'(x)\) 即得哈密顿量的最优控制条件,对 (6.7) 求导经 (6.8) 推出协态运动规律 \(\dot\lambda=\rho\lambda-H_x(x,u,\lambda)\)。
Chapter theme: principle of optimality and dynamic programming. Bellman's principle of optimality gives the conditions for a sequence problem to be equivalent to a recursive programming problem (the Bellman equation). §6.1 Discrete time: the sequence problem (6.1) written recursively as the Bellman equation \(V(x)=\max_{y\in\Gamma(x)}[F(x,y)+\beta V(y)]\) (6.2, a functional equation), with \(g(x)\) the optimal decision rule; the principle of optimality \(V^\star(x)=V(x)\); from FOC (6.3) + the envelope theorem \(V'(x)=F_x(x,g(x))\) (6.4), substituting back recovers EE. §6.2 Continuous time: the Bellman equation \(\rho V(x)=\max_u[h(x,u)+V'(x)g(x,u)]\) (6.5), FOC (6.6), two functional equations (6.7) solving for \(V\) and \(u^\star\). §6.3 Bellman equation and the maximum principle: set \(\lambda=V'(x)\) to recover the Hamiltonian's optimal-control condition; differentiating (6.7) through (6.8) yields the co-state law of motion \(\dot\lambda=\rho\lambda-H_x(x,u,\lambda)\).
Bellman 的最优性原理(Principle of Optimality) 给出了一个序列问题等价于一个递归规划问题的条件,该递归规划问题称为贝尔曼方程(Bellman equation)。
6.1 Discrete Time
此前已得到离散状态序列问题: $$V^\star(x_0)=\max_{\{x_{t+1}\}_{t=0}^\infty}\sum_{t=0}^\infty\beta^t F(x_t,x_{t+1})\quad\text{s.t.}\quad x_{t+1}\in\Gamma(x_t),\ \text{for }\forall t\ge0,\quad x_0\text{ given}\tag{6.1}$$ 现在以递归方式重写该问题,使其看起来像一个两期问题。
6.1.1 贝尔曼方程
贝尔曼方程是一个递归问题,其解是满足如下条件的函数 \(V:X\to\mathbb R\): $$V(x)=\max_{y\in\Gamma(x)}[F(x,y)+\beta V(y)],\quad\text{for }\forall x\in X\tag{6.2}$$ 由于解是一个函数,这是一个泛函方程(functional equation)。若我们知道 \(V(x)\) 的函数形式,则 (6.2) 变成一个两期问题。沿用此前同样的逻辑,记 \(g(x)\) 为最优决策规则——它也是 (6.2) 右边的最大化者、以及两期问题的解: $$V(x)=F(x,g(x))+\beta V(g(x))$$
6.1.2 最优性原理
最优性原理是 $$V^\star(x)=V(x),\quad\text{for }\forall x\in X$$ 这意味着:求解 (6.2) 中关于 \(V\) 与 \(g\) 的两期问题,等价于求解 (6.1) 中关于 \(V^\star\) 的无穷维问题。一旦得到 \(V\) 与 \(g\) 的函数形式,就得到 (6.1) 对任意初值 \(x_0\in X\) 的解。
为看清最优性原理为何成立,考虑如下论证(把最优性原理与 EE 联系起来,非严格证明)。
由于 \(g(x)\) 求解 \(V(x)=\max_{y\in\Gamma(x)}[F(x,y)+\beta V(y)]\) 对 \(\forall x\in X\),\(g(x)\) 必满足一阶条件: $$0=\frac{\partial V(x)}{\partial y}=\frac{\partial F(x,g(x))}{\partial y}+\frac{\partial\beta V(g(x))}{\partial y}=F_y(x,g(x))+\beta V'(g(x))\tag{6.3}$$ 再对 \(V(x)=F(x,g(x))+\beta V(g(x))\) 关于 \(x\) 求导(这不是一阶条件,但应恒成立): $$\begin{aligned}V'(x)&=F_x(x,g(x))+F_y(x,g(x))g'(x)+\beta V'(g(x))g'(x)\\&=F_x(x,g(x))+g'(x)\underbrace{(F_y(x,g(x))+\beta V'(g(x)))}_{=0,\ \text{envelope thm}}\\\Rightarrow V'(x)&=F_x(x,g(x))\end{aligned}\tag{6.4}$$ 将 (6.4) 代入 (6.3) 得 $$F_y(x,g(x))+\beta F_x(g(x),g(g(x)))=0$$ 这正是 EE。
The Principle of Optimality by Bellman gives us the conditions for a sequence problem to be equivalent to a recursive programming problem, which is called the Bellman equation.
6.1 Discrete Time
Previously, we have obtained that the discrete state sequence problem is $$V^\star(x_0)=\max_{\{x_{t+1}\}_{t=0}^\infty}\sum_{t=0}^\infty\beta^t F(x_t,x_{t+1})\quad\text{s.t.}\quad x_{t+1}\in\Gamma(x_t),\ \text{for }\forall t\ge0,\quad x_0\text{ given}\tag{6.1}$$ Now, we can write the problem in a recursive way to make it seem like a two-period problem.
6.1.1 Bellman equation
The Bellman equation is a recursive problem, and the solution to that problem is the function \(V:X\to\mathbb R\) such that $$V(x)=\max_{y\in\Gamma(x)}[F(x,y)+\beta V(y)],\quad\text{for }\forall x\in X\tag{6.2}$$ Since the solution is a function, this equation is a functional equation. If we know the functional form of \(V(x)\), then (6.2) becomes a two-period problem. Following the same logic as before, we denote \(g(x)\) as the optimal decision rule, which is also the maximizer of the RHS of equation (6.2) and a solution to the two-period problem: $$V(x)=F(x,g(x))+\beta V(g(x))$$
6.1.2 Principle of optimality
The principle of optimality is that $$V^\star(x)=V(x),\quad\text{for }\forall x\in X$$ which means that solving the two-period problem in (6.2) for \(V\) and \(g\) is equivalent to solving the infinite-dimension problem in (6.1) for \(V^\star\). Once we obtain the functional form of \(V\) and \(g\), we also obtain the solution to (6.1) for any initial value \(x_0\in X\).
To see why the principle of optimality is true, consider the following argument, which relates the principle of optimality to EE. This is not a rigorous proof.
Since \(g(x)\) solves \(V(x)=\max_{y\in\Gamma(x)}[F(x,y)+\beta V(y)]\) for \(\forall x\in X\), \(g(x)\) must satisfy the f.o.c., i.e. $$0=\frac{\partial V(x)}{\partial y}=\frac{\partial F(x,g(x))}{\partial y}+\frac{\partial\beta V(g(x))}{\partial y}=F_y(x,g(x))+\beta V'(g(x))\tag{6.3}$$ Also take derivative of \(V(x)=F(x,g(x))+\beta V(g(x))\) w.r.t. \(x\) (which is not f.o.c. but should be always true): $$\begin{aligned}V'(x)&=F_x(x,g(x))+F_y(x,g(x))g'(x)+\beta V'(g(x))g'(x)\\&=F_x(x,g(x))+g'(x)\underbrace{(F_y(x,g(x))+\beta V'(g(x)))}_{=0,\ \text{envelope theorem}}\\\Rightarrow V'(x)&=F_x(x,g(x))\end{aligned}\tag{6.4}$$ Substitute equation (6.4) into equation (6.3) to obtain $$F_y(x,g(x))+\beta F_x(g(x),g(g(x)))=0$$ which is exactly the EE.
6.2 Continuous Time
连续时间贝尔曼方程定义为 $$\rho V(x)=\max_{u\in U}[h(x,u)+V'(x)g(x,u)]\tag{6.5}$$ 其中 \(\dot x=g(x,u)\) 与此前相同。
在正则条件下,(6.5) 右边的解 \(u^\star\) 应满足关于 \(u\) 的一阶条件,即 $$h_u(x,u^\star)+V'(x)g_u(x,u^\star)=0\tag{6.6}$$ 故可用如下两个方程汇总 (6.5) 中连续动态规划问题的解: $$\rho V(x)=h(x,u^\star)+V'(x)g(x,u^\star)\tag{6.7}$$ $$h_u(x,u^\star)+V'(x)g_u(x,u^\star)=0$$ 一言以蔽之,问题变成两个泛函方程的问题,故解是两个函数 \(V\) 与 \(u^\star\),二者都是 \(x\) 的函数。
6.3 Bellman Equation and the Maximum Principle
这里讨论为何贝尔曼方程等价于哈密顿量的最大值原理方程。
由 (6.6),\(u^\star\) 最大化 $$h(x,u)+V'(x)g(x,u)$$ 回忆哈密顿量中 \(u^\star\) 最大化 $$h(x,u)+\lambda g(x,u)$$ 故若令 $$\lambda=V'(x)$$ 二者等价——这确实成立,因为由定义,\(\lambda\) 是 \(x\) 边际增加的现值增量,而 \(V'(x)\) 含义恰好相同。
然后对 (6.7) 关于 \(t\) 求导: $$\begin{aligned}\rho V'\dot x&=h_x\dot x+h_u u^{\star\prime}\dot x+(V''g+V'g_x+V'g_u u^{\star\prime})\dot x\\&=h_x\dot x+(V''g+V'g_x)\dot x+\underbrace{(h_u+V'g_u)}_{=0,\ \text{f.o.c.}}u^{\star\prime}\dot x\\&=h_x\dot x+(V''g+V'g_x)\dot x\end{aligned}$$ 当 \(\dot x\ne0\) 时,两边除以 \(\dot x\): $$\rho V'=h_x+V''g+V'g_x\tag{6.8}$$ 由于已讨论 \(\lambda=V'(x)\),对其关于 \(t\) 求导并用 \(\dot x=g(x,u)\) 重排: $$\dot\lambda=V''\dot x=V''g$$ 将此 \(\dot\lambda\) 表达式代入 (6.8): $$\begin{aligned}\dot\lambda&=\rho\lambda-(h_x+V'g_x)\\&=\rho\lambda-(h_x+\lambda g_x)\\&=\rho\lambda-H_x(x,u,\lambda)\end{aligned}$$ 这正是我们为最大值原理导出的协态变量运动规律。
The continuous time Bellman equation is defined as $$\rho V(x)=\max_{u\in U}[h(x,u)+V'(x)g(x,u)]\tag{6.5}$$ where \(\dot x=g(x,u)\) is the same as before.
Under the regularity conditions, the solution \(u^\star\) of the RHS of equation (6.5) should satisfy the f.o.c. of \(u\), i.e. $$h_u(x,u^\star)+V'(x)g_u(x,u^\star)=0\tag{6.6}$$ So we can use the following two equations to summarize the solution to the continuous dynamic programming problem in equation (6.5): $$\rho V(x)=h(x,u^\star)+V'(x)g(x,u^\star)\tag{6.7}$$ $$h_u(x,u^\star)+V'(x)g_u(x,u^\star)=0$$ In a word, the problem becomes a two functional equations problem, so the solutions are two functions \(V\) and \(u^\star\), which are both functions of \(x\).
6.3 Bellman Equation and the Maximum Principle
Here we will discuss why the Bellman equation is equivalent to the Maximum Principle equations for Hamiltonian.
From equation (6.6), we know that \(u^\star\) maximizes $$h(x,u)+V'(x)g(x,u)$$ Recall that for Hamiltonian, \(u^\star\) maximizes $$h(x,u)+\lambda g(x,u)$$ So they are equivalent if we set $$\lambda=V'(x)$$ which is indeed true because by definition, \(\lambda\) is the current value increase of marginal increase in \(x\) and \(V'(x)\) means exactly the same.
Then let's differentiate equation (6.7) w.r.t. \(t\): $$\begin{aligned}\rho V'\dot x&=h_x\dot x+h_u u^{\star\prime}\dot x+(V''g+V'g_x+V'g_u u^{\star\prime})\dot x\\&=h_x\dot x+(V''g+V'g_x)\dot x+\underbrace{(h_u+V'g_u)}_{=0,\ \text{f.o.c.}}u^{\star\prime}\dot x\\&=h_x\dot x+(V''g+V'g_x)\dot x\end{aligned}$$ When \(\dot x\ne0\), we can divide both sides by \(\dot x\): $$\rho V'=h_x+V''g+V'g_x\tag{6.8}$$ Now, since we discussed \(\lambda=V'(x)\), we can differentiate this equation w.r.t. \(t\) and rearrange using \(\dot x=g(x,u)\): $$\dot\lambda=V''\dot x=V''g$$ Plug in this expression for \(\dot\lambda\) to equation (6.8): $$\begin{aligned}\dot\lambda&=\rho\lambda-(h_x+V'g_x)\\&=\rho\lambda-(h_x+\lambda g_x)\\&=\rho\lambda-H_x(x,u,\lambda)\end{aligned}$$ which is the law of motion of the co-state variable we derived for the Maximum Principle.