37. Convexity and Concavity of Multivariate Function

Jun He June 01, 2026

中文English 凹凸性Convexity Concavity Hessian 矩阵Hessian Matrix 半正定Semi-definite 特征值Eigenvalue 数学附录Math Appendix

Note

第四部分导读：数学附录（Math Appendix）第四部分（第 37–42 章）是贯穿全书所用数学工具的参考附录：多元函数的凹凸性、Leibniz 积分法则、最大值理论、Blackwell 充分条件、压缩映射定理、马尔可夫过程等。

37. 多元函数的凹凸性

对有 $n$ 个自变量的多元函数 $F(x_1,x_2,\ldots,x_n)$，记其二阶导数 $F_{ij}=\dfrac{\partial^2 F(x_1,x_2,\ldots,x_n)}{\partial x_i\partial x_j}$（$i,j=1,2,\ldots,n$）。则相联的 Hessian 矩阵 $\mathbf{H}$ 定义为

$$ \mathbf{H}=\begin{bmatrix}F_{11} & F_{12} & \ldots & F_{1n}\\ F_{21} & F_{22} & \ldots & F_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ F_{n1} & F_{n2} & \ldots & F_{nn}\end{bmatrix} $$

Important

定义 37.1（多元凹凸性）函数 $F(x_1,x_2,\ldots,x_n)$ 在 $(x_1,\ldots,x_n)$ 上凹，当且仅当相联 Hessian 矩阵 $\mathbf{H}$ 负半定；凹性严格当且仅当 $\mathbf{H}$ 负定。函数 $F$ 凸当且仅当 $\mathbf{H}$ 正半定；凸性严格当且仅当 $\mathbf{H}$ 正定。

Tip

注记 37.1 定义 37.1 的多元凹凸性也可以用类比一元凹凸性的等价形式定义：对任意 $\alpha>0$、任意 $\mathbf{x}=(x_1,\ldots,x_n)$、$\mathbf{y}=(y_1,\ldots,y_n)$，$F(\cdot)$ 凹当且仅当 $F((1-\alpha)\mathbf{x}+\alpha\mathbf{y})\ge(1-\alpha)F(\mathbf{x})+\alpha F(\mathbf{y})$；$F(\cdot)$ 凸当且仅当 $F((1-\alpha)\mathbf{x}+\alpha\mathbf{y})\le(1-\alpha)F(\mathbf{x})+\alpha F(\mathbf{y})$。

Important

命题 37.1 $F(x_1,\ldots,x_n)$ （严格）凹当且仅当 $\mathbf{H}$ 的所有特征值（严格）为负；（严格）凸当且仅当 $\mathbf{H}$ 的所有特征值（严格）为正。

Note

证明先证 $\mathbf{H}$ 负半定当且仅当其所有特征值非正、正半定当且仅当所有特征值非负。对角化 $$ > \mathbf{H}=\underbrace{\begin{bmatrix}\boldsymbol\alpha_1 & \boldsymbol\alpha_2 & \ldots & \boldsymbol\alpha_n\end{bmatrix}}_{=\mathbf{P}^{-1}}\underbrace{\begin{bmatrix}\lambda_1 & 0 & \ldots & 0\\ 0 & \lambda_2 & \ldots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \ldots & \lambda_n\end{bmatrix}}_{=\mathbf{D}}\underbrace{\begin{bmatrix}\boldsymbol\alpha_1 & \boldsymbol\alpha_2 & \ldots & \boldsymbol\alpha_n\end{bmatrix}^{-1}}_{=\mathbf{P}} \tag{37.1} > $$ $\lambda_i$ 为特征值、$\boldsymbol\alpha_i$ 为特征向量。可归一化 $\boldsymbol\alpha_i'\boldsymbol\alpha_i=1$、$\boldsymbol\alpha_i'\boldsymbol\alpha_j=0$（$i\ne j$），故 $(\mathbf{P}^{-1})'\mathbf{P}^{-1}=\mathbf{I}_n\Rightarrow(\mathbf{P}^{-1})'=\mathbf{P}$，于是 $\mathbf{H}=\mathbf{P}^{-1}\mathbf{D}(\mathbf{P}^{-1})'$。记 $\mathbf{s}=(s_1,\ldots,s_n)'\equiv(\mathbf{P}^{-1})'\mathbf{z}$。对任意非零 $\mathbf{z}\in\mathbb{R}^n$， $$ > \mathbf{z}'\mathbf{H}\mathbf{z}\le0\ \Leftrightarrow\ \mathbf{z}'\mathbf{P}^{-1}\mathbf{D}(\mathbf{P}^{-1})'\mathbf{z}\le0\ \Leftrightarrow\ \mathbf{s}'\mathbf{D}\mathbf{s}\le0\ \Leftrightarrow\ \sum_{i=1}^n s_i^2\lambda_i\le0 > $$ 仅当 $\lambda_i\le0$ $\forall i$ 时成立（负定情形把 $\le$ 换为 $<$）。同理正半定 $\mathbf{H}$ 要求 $\lambda_i\ge0$ $\forall i$。由定义 37.1，证毕。$\blacksquare$

Important

命题 37.2（快速检验凹凸性的方法） $\mathbf{H}$ 负半定当且仅当所有奇数阶左上子矩阵的行列式都非正、所有偶数阶左上子矩阵的行列式都非负。$\mathbf{H}$ 正半定当且仅当所有阶的左上子矩阵行列式都非负。对负定与正定，把"非正"换为"严格负"、"非负"换为"严格正"。

Note

证明与注记 37.2 负半定 $\mathbf{H}$ 要求对任意非零 $\mathbf{z}$ 有 $\mathbf{z}'\mathbf{H}\mathbf{z}\le0$；由 $\mathbf{z}$ 任意，任一 $j$ 阶左上子矩阵 $\mathbf{H}_j$（$j=1,\ldots,n$）须负半定，其特征值 $\lambda_{jk}\le0$ $\forall k$。由特征分解 $\mathbf{H}_j=\mathbf{P}_j^{-1}\mathbf{D}_j\mathbf{P}_j$： $$ > \det(\mathbf{H}_j)=\det(\mathbf{P}_j^{-1})\det(\mathbf{D}_j)\det(\mathbf{P}_j)=\det(\mathbf{D}_j)=\lambda_{j1}\times\lambda_{j2}\times\cdots\times\lambda_{jj} > $$ 故奇数 $j$：$\det(\mathbf{H}_j)\le0$；偶数 $j$：$\det(\mathbf{H}_j)\ge0$。正半定同理 $\lambda_{jk}\ge0$ $\forall k\Rightarrow\det(\mathbf{H}_j)\ge0$ $\forall j$。负定与正定严格。$\blacksquare$

注记 37.2：此命题中的条件必要但不充分，故它提出一个检验凹凸性的方法、而非证明的方法。要严格证明凹凸性，须显式计算 Hessian 矩阵 $\mathbf{H}$ 的所有特征值并检验其符号。

Important

例 37.1 考虑 $F(x,y)$，记二阶导数 $F_{11}=\dfrac{\partial^2 F}{\partial x\partial x}$、$F_{22}=\dfrac{\partial^2 F}{\partial y\partial y}$、$F_{12}=F_{21}=\dfrac{\partial^2 F}{\partial x\partial y}$。则 $F(x,y)$ 在 $(x,y)$ 上凹当且仅当（二维特殊情形下此条件充要，因 $F_{11},F_{22}$ 构成所有方形子矩阵之情形） $$ > \det(F_{11})=F_{11}\le0,\quad \det(F_{22})=F_{22}\le0,\quad \det(\mathbf{H})=F_{11}F_{22}-F_{12}^2\ge0 > $$ $F(x,y)$ 凸当且仅当 $$ > \det(F_{11})=F_{11}\ge0,\quad \det(F_{22})=F_{22}\ge0,\quad \det(\mathbf{H})=F_{11}F_{22}-F_{12}^2\ge0 > $$ （负定与正定情形不等式皆严格。）

Note

Part IV overview: Math Appendix Part IV (Chapters 37–42) is a reference appendix for the mathematical tools used throughout the book: convexity/concavity of multivariate functions, the Leibniz integral rule, the theory of maximum, Blackwell's sufficient conditions, the contraction mapping theorem, and Markov processes.

37. Convexity and concavity of multivariate function

For a multivariate function with $n$ arguments, $F(x_1,x_2,\ldots,x_n)$, denote its second-order derivatives by $F_{ij}=\dfrac{\partial^2 F(x_1,x_2,\ldots,x_n)}{\partial x_i\partial x_j}$ for $i,j=1,2,\ldots,n$. Then, the associated Hessian matrix $\mathbf{H}$ is defined by

$$ \mathbf{H}=\begin{bmatrix}F_{11} & F_{12} & \ldots & F_{1n}\\ F_{21} & F_{22} & \ldots & F_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ F_{n1} & F_{n2} & \ldots & F_{nn}\end{bmatrix} $$

and we have the following definition and proposition.

Important

Definition 37.1 (Multivariate concavity and convexity) Function $F(x_1,x_2,\ldots,x_n)$ is concave in $(x_1,x_2,\ldots,x_n)$ if and only if the associated Hessian matrix $\mathbf{H}$ is negative semi-definite. The concavity is strict if $\mathbf{H}$ is negative definite. And function $F(x_1,x_2,\ldots,x_n)$ is convex in $(x_1,x_2,\ldots,x_n)$ if and only if the associated Hessian matrix $\mathbf{H}$ is positive semi-definite. The convexity is strict if $\mathbf{H}$ is positive definite.

Tip

Remark 37.1 The multivariate concavity and convexity in definition 37.1 can also be equivalently defined in an analogous form of univariate concavity and convexity: for any $\alpha>0$, and any $\mathbf{x}=(x_1,x_2,\ldots,x_n)$ and $\mathbf{y}=(y_1,y_2,\ldots,y_n)$, $F(\cdot)$ is concave if $F((1-\alpha)\mathbf{x}+\alpha\mathbf{y})\ge(1-\alpha)F(\mathbf{x})+\alpha F(\mathbf{y})$, and $F(\cdot)$ is convex if $F((1-\alpha)\mathbf{x}+\alpha\mathbf{y})\le(1-\alpha)F(\mathbf{x})+\alpha F(\mathbf{y})$.

Important

Proposition 37.1 $F(x_1,x_2,\ldots,x_n)$ is (strictly) concave if all eigenvalues of $\mathbf{H}$ are (strictly) negative. $F(x_1,x_2,\ldots,x_n)$ is (strictly) convex if all eigenvalues of $\mathbf{H}$ are (strictly) positive.

Note

Proof We first prove that $\mathbf{H}$ is negative semi-definite if all of its eigenvalues are non-positive, and is positive semi-definite if all of its eigenvalues are non-negative. Diagonalize $\mathbf{H}$ by $$ > \mathbf{H}=\underbrace{\begin{bmatrix}\boldsymbol\alpha_1 & \boldsymbol\alpha_2 & \ldots & \boldsymbol\alpha_n\end{bmatrix}}_{=\mathbf{P}^{-1}}\underbrace{\begin{bmatrix}\lambda_1 & 0 & \ldots & 0\\ 0 & \lambda_2 & \ldots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \ldots & \lambda_n\end{bmatrix}}_{=\mathbf{D}}\underbrace{\begin{bmatrix}\boldsymbol\alpha_1 & \boldsymbol\alpha_2 & \ldots & \boldsymbol\alpha_n\end{bmatrix}^{-1}}_{=\mathbf{P}} \tag{37.1} > $$ where $\lambda_1,\ldots,\lambda_n$ are the eigenvalues and $\boldsymbol\alpha_1,\ldots,\boldsymbol\alpha_n$ are the eigenvectors. We can always normalize $\boldsymbol\alpha_i$ such that $\boldsymbol\alpha_i'\boldsymbol\alpha_i=1$ and $\boldsymbol\alpha_i'\boldsymbol\alpha_j=0$ for $i\ne j$, so we can always decompose $\mathbf{H}$ such that $(\mathbf{P}^{-1})'\mathbf{P}^{-1}=\mathbf{I}_n\Rightarrow(\mathbf{P}^{-1})'=\mathbf{P}$, which implies $\mathbf{H}=\mathbf{P}^{-1}\mathbf{D}(\mathbf{P}^{-1})'$. Denote $\mathbf{s}=(s_1,s_2,\ldots,s_n)'\equiv(\mathbf{P}^{-1})'\mathbf{z}$. For any non-zero $\mathbf{z}\in\mathbb{R}^n$, negative semi-definite $\mathbf{H}$ implies $$ > \mathbf{z}'\mathbf{H}\mathbf{z}\le0\ \Leftrightarrow\ \mathbf{z}'\mathbf{P}^{-1}\mathbf{D}(\mathbf{P}^{-1})'\mathbf{z}\le0\ \Leftrightarrow\ \mathbf{s}'\mathbf{D}\mathbf{s}\le0\ \Leftrightarrow\ \sum_{i=1}^n s_i^2\lambda_i\le0 > $$ which is only true when $\lambda_i\le0$ $\forall i$ (for negative definite case, replace all $\le$ by $<$). Similarly, positive semi-definite $\mathbf{H}$ requires $\lambda_i\ge0$ $\forall i$. Then, by definition 37.1, the proof is complete. $\blacksquare$

Important

Proposition 37.2 (A quick way to check concavity and convexity) $\mathbf{H}$ is negative semi-definite only if the determinants of odd-order upper-left sub-matrices are all non-positive, and the determinants of even-order upper-left sub-matrices are all non-negative. And $\mathbf{H}$ is positive semi-definite only if the determinants of upper-left sub-matrices of all orders are non-negative. For negative definite and positive definite, replace non-positive by strictly negative, and replace non-negative by strictly positive in the statement.

Note

Proof and Remark 37.2 Negative semi-definite $\mathbf{H}$ requires that for any $\mathbf{z}\in\mathbb{R}^n$, $\mathbf{z}'\mathbf{H}\mathbf{z}\le0$. Since non-zero $\mathbf{z}$ is arbitrary, any upper-left sub-matrix of $\mathbf{H}$ must be negative semi-definite. Denote the $j$th order upper-left sub-matrix of $\mathbf{H}$ by $\mathbf{H}_j$ for $j=1,2,\ldots,n$, and denote its eigenvalues by $\lambda_{j1},\ldots,\lambda_{jj}$; then $\lambda_{jk}\le0$ for $k=1,2,\ldots,j$. By the eigen-decomposition $\mathbf{H}_j=\mathbf{P}_j^{-1}\mathbf{D}_j\mathbf{P}_j$: $$ > \det(\mathbf{H}_j)=\det(\mathbf{P}_j^{-1})\det(\mathbf{D}_j)\det(\mathbf{P}_j)=\det(\mathbf{D}_j)=\lambda_{j1}\times\lambda_{j2}\times\cdots\times\lambda_{jj} > $$ So, for odd number $j$, $\det(\mathbf{H}_j)\le0$; and for even number $j$, $\det(\mathbf{H}_j)\ge0$. Similarly, for positive semi-definite $\mathbf{H}$, $\lambda_{jk}\ge0$ $\forall k\Rightarrow\det(\mathbf{H}_j)\ge0$ for $j=1,2,\ldots,n$. For negative definite and positive definite $\mathbf{H}$, the inequalities are all strict. $\blacksquare$

Remark 37.2: Notice that the condition in this proposition is necessary but not sufficient. So, it proposes a way to check concavity and convexity, not a way to prove them. To rigorously prove concavity and convexity, we need to explicitly calculate all the eigenvalues of the Hessian matrix $\mathbf{H}$ and then check their signs.

Important

Example 37.1 Consider the function $F(x,y)$. Denote its second-order derivatives by $F_{11}=\dfrac{\partial^2 F(x,y)}{\partial x\partial x}$, $F_{22}=\dfrac{\partial^2 F(x,y)}{\partial y\partial y}$, $F_{12}=F_{21}=\dfrac{\partial^2 F(x,y)}{\partial x\partial y}$. Then, $F(x,y)$ is concave in $(x,y)$ if and only if (here the condition is sufficient and necessary because in the special case of two dimensions, $F_{11}$ and $F_{22}$ constitute all the cases of square sub-matrices) $$ > \det(F_{11})=F_{11}\le0,\quad \det(F_{22})=F_{22}\le0,\quad \det(\mathbf{H})=F_{11}F_{22}-F_{12}^2\ge0 > $$ and $F(x,y)$ is convex in $(x,y)$ if and only if $$ > \det(F_{11})=F_{11}\ge0,\quad \det(F_{22})=F_{22}\ge0,\quad \det(\mathbf{H})=F_{11}F_{22}-F_{12}^2\ge0 > $$ (For negative definite and positive definite cases, the inequalities are all strict.)