37. Convexity and Concavity of Multivariate Function
第四部分导读:数学附录(Math Appendix) 第四部分(第 37–42 章)是贯穿全书所用数学工具的参考附录:多元函数的凹凸性、Leibniz 积分法则、最大值理论、Blackwell 充分条件、压缩映射定理、马尔可夫过程等。
37. 多元函数的凹凸性
对有 \(n\) 个自变量的多元函数 \(F(x_1,x_2,\ldots,x_n)\),记其二阶导数 \(F_{ij}=\dfrac{\partial^2 F(x_1,x_2,\ldots,x_n)}{\partial x_i\partial x_j}\)(\(i,j=1,2,\ldots,n\))。则相联的 Hessian 矩阵 \(\mathbf{H}\) 定义为
$$ \mathbf{H}=\begin{bmatrix}F_{11} & F_{12} & \ldots & F_{1n}\\ F_{21} & F_{22} & \ldots & F_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ F_{n1} & F_{n2} & \ldots & F_{nn}\end{bmatrix} $$
定义 37.1(多元凹凸性) 函数 \(F(x_1,x_2,\ldots,x_n)\) 在 \((x_1,\ldots,x_n)\) 上凹,当且仅当相联 Hessian 矩阵 \(\mathbf{H}\) 负半定;凹性严格当且仅当 \(\mathbf{H}\) 负定。函数 \(F\) 凸当且仅当 \(\mathbf{H}\) 正半定;凸性严格当且仅当 \(\mathbf{H}\) 正定。
注记 37.1 定义 37.1 的多元凹凸性也可以用类比一元凹凸性的等价形式定义:对任意 \(\alpha>0\)、任意 \(\mathbf{x}=(x_1,\ldots,x_n)\)、\(\mathbf{y}=(y_1,\ldots,y_n)\),\(F(\cdot)\) 凹当且仅当 \(F((1-\alpha)\mathbf{x}+\alpha\mathbf{y})\ge(1-\alpha)F(\mathbf{x})+\alpha F(\mathbf{y})\);\(F(\cdot)\) 凸当且仅当 \(F((1-\alpha)\mathbf{x}+\alpha\mathbf{y})\le(1-\alpha)F(\mathbf{x})+\alpha F(\mathbf{y})\)。
命题 37.1 \(F(x_1,\ldots,x_n)\) (严格)凹当且仅当 \(\mathbf{H}\) 的所有特征值(严格)为负;(严格)凸当且仅当 \(\mathbf{H}\) 的所有特征值(严格)为正。
证明 先证 \(\mathbf{H}\) 负半定当且仅当其所有特征值非正、正半定当且仅当所有特征值非负。对角化 $$ > \mathbf{H}=\underbrace{\begin{bmatrix}\boldsymbol\alpha_1 & \boldsymbol\alpha_2 & \ldots & \boldsymbol\alpha_n\end{bmatrix}}_{=\mathbf{P}^{-1}}\underbrace{\begin{bmatrix}\lambda_1 & 0 & \ldots & 0\\ 0 & \lambda_2 & \ldots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \ldots & \lambda_n\end{bmatrix}}_{=\mathbf{D}}\underbrace{\begin{bmatrix}\boldsymbol\alpha_1 & \boldsymbol\alpha_2 & \ldots & \boldsymbol\alpha_n\end{bmatrix}^{-1}}_{=\mathbf{P}} \tag{37.1} > $$ \(\lambda_i\) 为特征值、\(\boldsymbol\alpha_i\) 为特征向量。可归一化 \(\boldsymbol\alpha_i'\boldsymbol\alpha_i=1\)、\(\boldsymbol\alpha_i'\boldsymbol\alpha_j=0\)(\(i\ne j\)),故 \((\mathbf{P}^{-1})'\mathbf{P}^{-1}=\mathbf{I}_n\Rightarrow(\mathbf{P}^{-1})'=\mathbf{P}\),于是 \(\mathbf{H}=\mathbf{P}^{-1}\mathbf{D}(\mathbf{P}^{-1})'\)。记 \(\mathbf{s}=(s_1,\ldots,s_n)'\equiv(\mathbf{P}^{-1})'\mathbf{z}\)。对任意非零 \(\mathbf{z}\in\mathbb{R}^n\), $$ > \mathbf{z}'\mathbf{H}\mathbf{z}\le0\ \Leftrightarrow\ \mathbf{z}'\mathbf{P}^{-1}\mathbf{D}(\mathbf{P}^{-1})'\mathbf{z}\le0\ \Leftrightarrow\ \mathbf{s}'\mathbf{D}\mathbf{s}\le0\ \Leftrightarrow\ \sum_{i=1}^n s_i^2\lambda_i\le0 > $$ 仅当 \(\lambda_i\le0\) \(\forall i\) 时成立(负定情形把 \(\le\) 换为 $<$)。同理正半定 \(\mathbf{H}\) 要求 \(\lambda_i\ge0\) \(\forall i\)。由定义 37.1,证毕。\(\blacksquare\)
命题 37.2(快速检验凹凸性的方法) \(\mathbf{H}\) 负半定当且仅当所有奇数阶左上子矩阵的行列式都非正、所有偶数阶左上子矩阵的行列式都非负。\(\mathbf{H}\) 正半定当且仅当所有阶的左上子矩阵行列式都非负。对负定与正定,把"非正"换为"严格负"、"非负"换为"严格正"。
证明与注记 37.2 负半定 \(\mathbf{H}\) 要求对任意非零 \(\mathbf{z}\) 有 \(\mathbf{z}'\mathbf{H}\mathbf{z}\le0\);由 \(\mathbf{z}\) 任意,任一 \(j\) 阶左上子矩阵 \(\mathbf{H}_j\)(\(j=1,\ldots,n\))须负半定,其特征值 \(\lambda_{jk}\le0\) \(\forall k\)。由特征分解 \(\mathbf{H}_j=\mathbf{P}_j^{-1}\mathbf{D}_j\mathbf{P}_j\): $$ > \det(\mathbf{H}_j)=\det(\mathbf{P}_j^{-1})\det(\mathbf{D}_j)\det(\mathbf{P}_j)=\det(\mathbf{D}_j)=\lambda_{j1}\times\lambda_{j2}\times\cdots\times\lambda_{jj} > $$ 故奇数 \(j\):\(\det(\mathbf{H}_j)\le0\);偶数 \(j\):\(\det(\mathbf{H}_j)\ge0\)。正半定同理 \(\lambda_{jk}\ge0\) \(\forall k\Rightarrow\det(\mathbf{H}_j)\ge0\) \(\forall j\)。负定与正定严格。\(\blacksquare\)
注记 37.2:此命题中的条件必要但不充分,故它提出一个检验凹凸性的方法、而非证明的方法。要严格证明凹凸性,须显式计算 Hessian 矩阵 \(\mathbf{H}\) 的所有特征值并检验其符号。
例 37.1 考虑 \(F(x,y)\),记二阶导数 \(F_{11}=\dfrac{\partial^2 F}{\partial x\partial x}\)、\(F_{22}=\dfrac{\partial^2 F}{\partial y\partial y}\)、\(F_{12}=F_{21}=\dfrac{\partial^2 F}{\partial x\partial y}\)。则 \(F(x,y)\) 在 \((x,y)\) 上凹当且仅当(二维特殊情形下此条件充要,因 \(F_{11},F_{22}\) 构成所有方形子矩阵之情形) $$ > \det(F_{11})=F_{11}\le0,\quad \det(F_{22})=F_{22}\le0,\quad \det(\mathbf{H})=F_{11}F_{22}-F_{12}^2\ge0 > $$ \(F(x,y)\) 凸当且仅当 $$ > \det(F_{11})=F_{11}\ge0,\quad \det(F_{22})=F_{22}\ge0,\quad \det(\mathbf{H})=F_{11}F_{22}-F_{12}^2\ge0 > $$ (负定与正定情形不等式皆严格。)
Part IV overview: Math Appendix Part IV (Chapters 37–42) is a reference appendix for the mathematical tools used throughout the book: convexity/concavity of multivariate functions, the Leibniz integral rule, the theory of maximum, Blackwell's sufficient conditions, the contraction mapping theorem, and Markov processes.
37. Convexity and concavity of multivariate function
For a multivariate function with \(n\) arguments, \(F(x_1,x_2,\ldots,x_n)\), denote its second-order derivatives by \(F_{ij}=\dfrac{\partial^2 F(x_1,x_2,\ldots,x_n)}{\partial x_i\partial x_j}\) for \(i,j=1,2,\ldots,n\). Then, the associated Hessian matrix \(\mathbf{H}\) is defined by
$$ \mathbf{H}=\begin{bmatrix}F_{11} & F_{12} & \ldots & F_{1n}\\ F_{21} & F_{22} & \ldots & F_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ F_{n1} & F_{n2} & \ldots & F_{nn}\end{bmatrix} $$
and we have the following definition and proposition.
Definition 37.1 (Multivariate concavity and convexity) Function \(F(x_1,x_2,\ldots,x_n)\) is concave in \((x_1,x_2,\ldots,x_n)\) if and only if the associated Hessian matrix \(\mathbf{H}\) is negative semi-definite. The concavity is strict if \(\mathbf{H}\) is negative definite. And function \(F(x_1,x_2,\ldots,x_n)\) is convex in \((x_1,x_2,\ldots,x_n)\) if and only if the associated Hessian matrix \(\mathbf{H}\) is positive semi-definite. The convexity is strict if \(\mathbf{H}\) is positive definite.
Remark 37.1 The multivariate concavity and convexity in definition 37.1 can also be equivalently defined in an analogous form of univariate concavity and convexity: for any \(\alpha>0\), and any \(\mathbf{x}=(x_1,x_2,\ldots,x_n)\) and \(\mathbf{y}=(y_1,y_2,\ldots,y_n)\), \(F(\cdot)\) is concave if \(F((1-\alpha)\mathbf{x}+\alpha\mathbf{y})\ge(1-\alpha)F(\mathbf{x})+\alpha F(\mathbf{y})\), and \(F(\cdot)\) is convex if \(F((1-\alpha)\mathbf{x}+\alpha\mathbf{y})\le(1-\alpha)F(\mathbf{x})+\alpha F(\mathbf{y})\).
Proposition 37.1 \(F(x_1,x_2,\ldots,x_n)\) is (strictly) concave if all eigenvalues of \(\mathbf{H}\) are (strictly) negative. \(F(x_1,x_2,\ldots,x_n)\) is (strictly) convex if all eigenvalues of \(\mathbf{H}\) are (strictly) positive.
Proof We first prove that \(\mathbf{H}\) is negative semi-definite if all of its eigenvalues are non-positive, and is positive semi-definite if all of its eigenvalues are non-negative. Diagonalize \(\mathbf{H}\) by $$ > \mathbf{H}=\underbrace{\begin{bmatrix}\boldsymbol\alpha_1 & \boldsymbol\alpha_2 & \ldots & \boldsymbol\alpha_n\end{bmatrix}}_{=\mathbf{P}^{-1}}\underbrace{\begin{bmatrix}\lambda_1 & 0 & \ldots & 0\\ 0 & \lambda_2 & \ldots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \ldots & \lambda_n\end{bmatrix}}_{=\mathbf{D}}\underbrace{\begin{bmatrix}\boldsymbol\alpha_1 & \boldsymbol\alpha_2 & \ldots & \boldsymbol\alpha_n\end{bmatrix}^{-1}}_{=\mathbf{P}} \tag{37.1} > $$ where \(\lambda_1,\ldots,\lambda_n\) are the eigenvalues and \(\boldsymbol\alpha_1,\ldots,\boldsymbol\alpha_n\) are the eigenvectors. We can always normalize \(\boldsymbol\alpha_i\) such that \(\boldsymbol\alpha_i'\boldsymbol\alpha_i=1\) and \(\boldsymbol\alpha_i'\boldsymbol\alpha_j=0\) for \(i\ne j\), so we can always decompose \(\mathbf{H}\) such that \((\mathbf{P}^{-1})'\mathbf{P}^{-1}=\mathbf{I}_n\Rightarrow(\mathbf{P}^{-1})'=\mathbf{P}\), which implies \(\mathbf{H}=\mathbf{P}^{-1}\mathbf{D}(\mathbf{P}^{-1})'\). Denote \(\mathbf{s}=(s_1,s_2,\ldots,s_n)'\equiv(\mathbf{P}^{-1})'\mathbf{z}\). For any non-zero \(\mathbf{z}\in\mathbb{R}^n\), negative semi-definite \(\mathbf{H}\) implies $$ > \mathbf{z}'\mathbf{H}\mathbf{z}\le0\ \Leftrightarrow\ \mathbf{z}'\mathbf{P}^{-1}\mathbf{D}(\mathbf{P}^{-1})'\mathbf{z}\le0\ \Leftrightarrow\ \mathbf{s}'\mathbf{D}\mathbf{s}\le0\ \Leftrightarrow\ \sum_{i=1}^n s_i^2\lambda_i\le0 > $$ which is only true when \(\lambda_i\le0\) \(\forall i\) (for negative definite case, replace all \(\le\) by $<\(). Similarly, positive semi-definite \)\mathbf{H}$ requires \(\lambda_i\ge0\) \(\forall i\). Then, by definition 37.1, the proof is complete. \(\blacksquare\)
Proposition 37.2 (A quick way to check concavity and convexity) \(\mathbf{H}\) is negative semi-definite only if the determinants of odd-order upper-left sub-matrices are all non-positive, and the determinants of even-order upper-left sub-matrices are all non-negative. And \(\mathbf{H}\) is positive semi-definite only if the determinants of upper-left sub-matrices of all orders are non-negative. For negative definite and positive definite, replace non-positive by strictly negative, and replace non-negative by strictly positive in the statement.
Proof and Remark 37.2 Negative semi-definite \(\mathbf{H}\) requires that for any \(\mathbf{z}\in\mathbb{R}^n\), \(\mathbf{z}'\mathbf{H}\mathbf{z}\le0\). Since non-zero \(\mathbf{z}\) is arbitrary, any upper-left sub-matrix of \(\mathbf{H}\) must be negative semi-definite. Denote the \(j\)th order upper-left sub-matrix of \(\mathbf{H}\) by \(\mathbf{H}_j\) for \(j=1,2,\ldots,n\), and denote its eigenvalues by \(\lambda_{j1},\ldots,\lambda_{jj}\); then \(\lambda_{jk}\le0\) for \(k=1,2,\ldots,j\). By the eigen-decomposition \(\mathbf{H}_j=\mathbf{P}_j^{-1}\mathbf{D}_j\mathbf{P}_j\): $$ > \det(\mathbf{H}_j)=\det(\mathbf{P}_j^{-1})\det(\mathbf{D}_j)\det(\mathbf{P}_j)=\det(\mathbf{D}_j)=\lambda_{j1}\times\lambda_{j2}\times\cdots\times\lambda_{jj} > $$ So, for odd number \(j\), \(\det(\mathbf{H}_j)\le0\); and for even number \(j\), \(\det(\mathbf{H}_j)\ge0\). Similarly, for positive semi-definite \(\mathbf{H}\), \(\lambda_{jk}\ge0\) \(\forall k\Rightarrow\det(\mathbf{H}_j)\ge0\) for \(j=1,2,\ldots,n\). For negative definite and positive definite \(\mathbf{H}\), the inequalities are all strict. \(\blacksquare\)
Remark 37.2: Notice that the condition in this proposition is necessary but not sufficient. So, it proposes a way to check concavity and convexity, not a way to prove them. To rigorously prove concavity and convexity, we need to explicitly calculate all the eigenvalues of the Hessian matrix \(\mathbf{H}\) and then check their signs.
Example 37.1 Consider the function \(F(x,y)\). Denote its second-order derivatives by \(F_{11}=\dfrac{\partial^2 F(x,y)}{\partial x\partial x}\), \(F_{22}=\dfrac{\partial^2 F(x,y)}{\partial y\partial y}\), \(F_{12}=F_{21}=\dfrac{\partial^2 F(x,y)}{\partial x\partial y}\). Then, \(F(x,y)\) is concave in \((x,y)\) if and only if (here the condition is sufficient and necessary because in the special case of two dimensions, \(F_{11}\) and \(F_{22}\) constitute all the cases of square sub-matrices) $$ > \det(F_{11})=F_{11}\le0,\quad \det(F_{22})=F_{22}\le0,\quad \det(\mathbf{H})=F_{11}F_{22}-F_{12}^2\ge0 > $$ and \(F(x,y)\) is convex in \((x,y)\) if and only if $$ > \det(F_{11})=F_{11}\ge0,\quad \det(F_{22})=F_{22}\ge0,\quad \det(\mathbf{H})=F_{11}F_{22}-F_{12}^2\ge0 > $$ (For negative definite and positive definite cases, the inequalities are all strict.)