21. Multivariate Normal Distribution

Note

本章主题:多元正态分布。 多元(联合)正态分布是单变量正态向高维的推广,其关键特征是:所有分量的任意线性组合都是单变量正态分布——这对多元统计推断至关重要。§21.1 定义定义 21.1 标准正态随机向量(各分量 i.i.d. 标准正态);定义 1(定义 21.2) \(\mathbf X=\mathbf A\mathbf Z+\mu\)(存在标准正态向量 \(\mathbf Z\) 与矩阵 \(\mathbf A\));定义 2(定义 21.3) 对任意 \(\mathbf b\),\(\mathbf b'\mathbf X\) 是单变量正态。§21.2 两定义的等价性定义 21.4 卷积 \((f*g)(x)=\int f(t)g(x-t)\,dt\);引理 21.1 独立正态的任意线性组合仍正态(卷积归纳证明);由此证两定义等价。§21.3 多元正态的密度:\(f_{\mathbf X}(\mathbf x)=\frac{1}{\sqrt{(2\pi)^k|\Omega|}}e^{-\frac12(\mathbf x-\mu)'\Omega^{-1}(\mathbf x-\mu)}\)(因 \((\mathbf x-\mu)'\Omega^{-1}(\mathbf x-\mu)\sim\chi_k^2\))。

Note

Chapter theme: the multivariate normal distribution. The multivariate (joint) normal distribution generalizes the univariate normal to higher dimensions; its key feature is that any linear combination of all components is a univariate normal distribution — crucial to multivariate statistical inference. §21.1 Definition: Definition 21.1 standard normal random vector (components i.i.d. standard normal); Definition 1 (Definition 21.2) \(\mathbf X=\mathbf A\mathbf Z+\mu\) (there exist a standard normal vector \(\mathbf Z\) and a matrix \(\mathbf A\)); Definition 2 (Definition 21.3) for any \(\mathbf b\), \(\mathbf b'\mathbf X\) is univariate normal. §21.2 Equivalence of the two definitions: Definition 21.4 convolution \((f*g)(x)=\int f(t)g(x-t)\,dt\); Lemma 21.1 any linear combination of independent normals is normal (proved by induction via convolution); hence the two definitions are equivalent. §21.3 Density of the multivariate normal: \(f_{\mathbf X}(\mathbf x)=\frac{1}{\sqrt{(2\pi)^k|\Omega|}}e^{-\frac12(\mathbf x-\mu)'\Omega^{-1}(\mathbf x-\mu)}\) (since \((\mathbf x-\mu)'\Omega^{-1}(\mathbf x-\mu)\sim\chi_k^2\)).

多元正态分布,或称联合正态分布,是单变量正态分布向高维的推广。其关键特征之一是:多元正态分布所有分量的任意线性组合都是一个单变量正态分布——这对多元统计推断至关重要。

21.1 Definition

定义多元正态分布之前,先引入一个有用术语。

Important

定义 21.1(标准正态随机向量 Standard normal random vector) 实随机向量 \(\mathbf Z=(Z_1,Z_2,\dots,Z_l)'\) 称为标准正态随机向量,若它的所有分量 \(Z_n\)(\(n=1,2,\dots,l\))都是 i.i.d. 的单变量标准正态分布。

下面给出多元正态分布的两个等价定义。

The multivariate normal distribution, or joint normal distribution is a generalization of the univariate normal distribution to higher dimensions. One of its key feature is that any linear combination of all components of a multivariate normal distribution is a univariate normal distribution, which is crucial to the multivariate statistical inference.

21.1 Definition

Before defining multivariate normal distribution, we will introduce a useful term.

Important

Definition 21.1 (Standard normal random vector) A real random vector \(\mathbf Z=(Z_1,Z_2,\dots,Z_l)'\) is called a standard normal random vector if all of its components \(Z_n\), \(n=1,2,\dots,l\), are i.i.d. with univariate standard normal distribution.

The following are two equivalent definitions of multivariate normal distribution.

21.1.1 定义 1

Important

定义 21.2(多元正态分布——1) 实随机向量 \(\mathbf X=(X_1,X_2,\dots,X_k)'\) 称为正态随机向量、其分量称为多元正态分布,当且仅当存在一个标准正态随机向量 \(\mathbf Z_{l\times1}\) 与一个矩阵 \(\mathbf A_{k\times l}\) 使得 $$\mathbf X=\mathbf A\mathbf Z+\mu$$ 其中 \(\mu\) 是 \(\mathbf X\) 的均值向量。

换言之,多元正态分布随机向量的任一分量都可表示为某些 i.i.d. 标准正态随机变量的一个线性组合加上其均值,即 \(\forall i=1,2,\dots,k\), $$X_i=\mu_i+a_1Z_1+a_2Z_2+\cdots+a_lZ_l$$ 对某些 \(a_1,\dots,a_l\in\mathbb R\)。

21.1.2 定义 2

Important

定义 21.3(多元正态分布——2) 实随机向量 \(\mathbf X=(X_1,X_2,\dots,X_k)'\) 称为正态随机向量、其分量称为多元正态分布,当且仅当其分量的任意线性组合都是(单变量)正态分布,即对 \(\forall\mathbf b=(b_1,\dots,b_k)'\in\mathbb R^k\), $$Y=\mathbf b'\mathbf X=b_1X_1+b_2X_2+\cdots+b_kX_k\sim\mathcal N(\mu_Y,\sigma_Y^2)$$

21.1.1 Definition 1

Important

Definition 21.2 (Multivariate normal distribution-1) A real random vector \(\mathbf X=(X_1,X_2,\dots,X_k)'\) is called a normal random vector and its components are multivariate normally distributed if and only if there exist a standard normal random vector \(\mathbf Z_{l\times1}\) and a matrix \(\mathbf A_{k\times l}\) such that $$\mathbf X=\mathbf A\mathbf Z+\mu$$ where \(\mu\) is the mean vector of \(\mathbf X\).

In other words, any component of a multivariate normally distributed random vector can be represented by a linear combination of some i.i.d. random variables with standard normal distribution plus its mean, i.e. \(\forall i=1,2,\dots,k\), $$X_i=\mu_i+a_1Z_1+a_2Z_2+\cdots+a_lZ_l$$ for some \(a_1,\dots,a_l\in\mathbb R\).

21.1.2 Definition 2

Important

Definition 21.3 (Multivariate normal distribution-2) A real random vector \(\mathbf X=(X_1,X_2,\dots,X_k)'\) is called a normal random vector and its components are multivariate normally distributed if and only if any linear combination of its components is (univariate) normally distributed, i.e. for \(\forall\mathbf b=(b_1,\dots,b_k)'\in\mathbb R^k\), $$Y=\mathbf b'\mathbf X=b_1X_1+b_2X_2+\cdots+b_kX_k\sim\mathcal N(\mu_Y,\sigma_Y^2)$$

21.2 Equivalence of Two Definitions

为证明两个定义的等价性,需先引入一个术语。

Important

定义 21.4(卷积 Convolution) 卷积定义为 \(\mathbb R\) 上两个函数 \(f\) 与 \(g\) 之间的算子 $*$,使得 $$(f*g)(x)\equiv\int_{-\infty}^{\infty}f(t)g(x-t)\,dt$$

随后需要下面的引理。

Important

引理 21.1 一组独立正态分布随机变量 \((X_1,X_2,\dots,X_k)\) 的任意线性组合都是正态分布的。

Note

证明 用归纳法证明。先考虑任意两个独立正态变量之和 \(X_i\sim\mathcal N(\mu_i,\sigma_i^2)\)、\(X_j\sim\mathcal N(\mu_j,\sigma_j^2)\)。设 \(X_i\) 的密度函数为 \(f_i(x_i)\)、\(X_j\) 的密度函数为 \(f_j(x_j)\),记 \(Y=X_i+X_j\)、其密度为 \(g(y)\)。由 \(X_i\) 与 \(X_j\) 独立,有 $$\begin{aligned}g(y)&=(f_i*f_j)(y)\\&=\int_{-\infty}^{\infty}f_i(t)f_j(y-t)\,dt\\&=\int_{-\infty}^{\infty}\frac{1}{\sqrt{2\pi}\sigma_i}e^{-\frac12\left(\frac{t-\mu_i}{\sigma_i}\right)^2}\frac{1}{\sqrt{2\pi}\sigma_j}e^{-\frac12\left(\frac{y-t-\mu_j}{\sigma_j}\right)^2}\,dt\end{aligned}$$ 对指数项配方,可把被积式分离为一个关于 \(t\) 的高斯核(其形如 \(e^{-\frac12\left(\frac{t-\text{(linear in }y)}{\sigma_i\sigma_j/\sqrt{\sigma_i^2+\sigma_j^2}}\right)^2}\),对 \(t\) 积分后给出归一化常数)乘以一个与 \(t\) 无关的因子 \(R\)。其中 $$R=e^{-\frac12\left(\frac{y-\mu_i-\mu_j}{\sqrt{\sigma_j^2+\sigma_i^2}}\right)^2}$$ 故 $$\begin{aligned}g(y)&=\frac{1}{\sqrt{2\pi}}R\int_{-\infty}^{\infty}\frac{1}{\sqrt{2\pi}\sigma_i\sigma_j}e^{-\frac12\left(\frac{t-\cdots}{\sigma_i\sigma_j/\sqrt{\sigma_j^2+\sigma_i^2}}\right)^2}\,dt\\&=\frac{1}{\sqrt{2\pi}\sqrt{\sigma_j^2+\sigma_i^2}}R\underbrace{\int_{-\infty}^{\infty}\frac{1}{\sqrt{2\pi}\sigma_i\sigma_j/\sqrt{\sigma_j^2+\sigma_i^2}}e^{-\frac12\left(\frac{t-\cdots}{\sigma_i\sigma_j/\sqrt{\sigma_j^2+\sigma_i^2}}\right)^2}\,dt}_{=1}\\&=\frac{1}{\sqrt{2\pi}\sqrt{\sigma_j^2+\sigma_i^2}}R\\&=\frac{1}{\sqrt{2\pi}\sqrt{\sigma_j^2+\sigma_i^2}}e^{-\frac12\left(\frac{y-\mu_i-\mu_j}{\sqrt{\sigma_j^2+\sigma_i^2}}\right)^2}\end{aligned}$$ 显然,\(g(y)\) 即正态分布 \(\mathcal N(\mu_i+\mu_j,\sigma_i^2+\sigma_j^2)\) 的密度。因此,任意两个独立正态变量之和服从正态分布。

接着,基于此结果,可证任意线性组合 \(V=aX_i+bX_j\)(\(\forall a,b\in\mathbb R\))服从单变量正态分布。这是显然的,因为可记 $$W_i=aX_i\sim\mathcal N(\tilde\mu_i,\tilde\sigma_i^2)\quad\text{and}\quad W_j=bX_j\sim\mathcal N(\tilde\mu_j,\tilde\sigma_j^2)$$ 其中 \(\tilde\mu_i=a\mu_i\)、\(\tilde\mu_j=b\mu_j\)、\(\tilde\sigma_i^2=a^2\sigma_i^2\)、\(\tilde\sigma_j^2=b^2\sigma_j^2\)。由 \(W_i\) 与 \(W_j\) 仍相互独立,上面对简单和的证明同样适用,这意味着两个独立正态变量的任意线性组合都是单变量正态分布。

最后,可对线性组合 \(V\) 与另一个 \(cX_k\)(\(c\in\mathbb R\))重复此过程,以证明 \(\mathbf X\) 各元素的任意线性组合都给出一个单变量正态随机变量。\(\blacksquare\)

To show the equivalence of two definitions, we need to introduce a terminology.

Important

Definition 21.4 (Convolution) A convolution is defined as an operator $*$ for two functions \(f\) and \(g\) on \(\mathbb R\) such that $$(f*g)(x)\equiv\int_{-\infty}^{\infty}f(t)g(x-t)\,dt$$

Then, we need to show the following lemma.

Important

Lemma 21.1 Any linear combination of a set of independent normally distributed random variables \((X_1,X_2,\dots,X_k)\) is normally distributed.

Note

Proof The proof is by induction. We first consider any simple sum of two independent normally distributed variables \(X_i\sim\mathcal N(\mu_i,\sigma_i^2)\) and \(X_j\sim\mathcal N(\mu_j,\sigma_j^2)\). Suppose \(X_i\) has density function \(f_i(x_i)\) and \(X_j\) has density function \(f_j(x_j)\). Then denote \(Y=X_i+X_j\), whose density is \(g(y)\). Since \(X_i\) and \(X_j\) are independent, we have that $$\begin{aligned}g(y)&=(f_i*f_j)(y)\\&=\int_{-\infty}^{\infty}f_i(t)f_j(y-t)\,dt\\&=\int_{-\infty}^{\infty}\frac{1}{\sqrt{2\pi}\sigma_i}e^{-\frac12\left(\frac{t-\mu_i}{\sigma_i}\right)^2}\frac{1}{\sqrt{2\pi}\sigma_j}e^{-\frac12\left(\frac{y-t-\mu_j}{\sigma_j}\right)^2}\,dt\end{aligned}$$ Completing the square in the exponent separates the integrand into a Gaussian kernel in \(t\) (of the form \(e^{-\frac12\left(\frac{t-\text{(linear in }y)}{\sigma_i\sigma_j/\sqrt{\sigma_i^2+\sigma_j^2}}\right)^2}\), which integrates over \(t\) to its normalizing constant) times a factor \(R\) not related to \(t\), where $$R=e^{-\frac12\left(\frac{y-\mu_i-\mu_j}{\sqrt{\sigma_j^2+\sigma_i^2}}\right)^2}$$ So, $$\begin{aligned}g(y)&=\frac{1}{\sqrt{2\pi}}R\int_{-\infty}^{\infty}\frac{1}{\sqrt{2\pi}\sigma_i\sigma_j}e^{-\frac12\left(\frac{t-\cdots}{\sigma_i\sigma_j/\sqrt{\sigma_j^2+\sigma_i^2}}\right)^2}\,dt\\&=\frac{1}{\sqrt{2\pi}\sqrt{\sigma_j^2+\sigma_i^2}}R\underbrace{\int_{-\infty}^{\infty}\frac{1}{\sqrt{2\pi}\sigma_i\sigma_j/\sqrt{\sigma_j^2+\sigma_i^2}}e^{-\frac12\left(\frac{t-\cdots}{\sigma_i\sigma_j/\sqrt{\sigma_j^2+\sigma_i^2}}\right)^2}\,dt}_{=1}\\&=\frac{1}{\sqrt{2\pi}\sqrt{\sigma_j^2+\sigma_i^2}}R\\&=\frac{1}{\sqrt{2\pi}\sqrt{\sigma_j^2+\sigma_i^2}}e^{-\frac12\left(\frac{y-\mu_i-\mu_j}{\sqrt{\sigma_j^2+\sigma_i^2}}\right)^2}\end{aligned}$$ Clearly, \(g(y)\) is the density of normal distribution \(\mathcal N(\mu_i+\mu_j,\sigma_i^2+\sigma_j^2)\). Therefore, the simple sum of any two independent normally distributed random variables also follows a normal distribution.

Then, based on this result, we can prove that any arbitrary linear combination \(V=aX_i+bX_j\), \(\forall a,b\in\mathbb R\) follows univariate normal distribution. This is obvious since we can denote $$W_i=aX_i\sim\mathcal N(\tilde\mu_i,\tilde\sigma_i^2)\quad\text{and}\quad W_j=bX_j\sim\mathcal N(\tilde\mu_j,\tilde\sigma_j^2)$$ where \(\tilde\mu_i=a\mu_i\), \(\tilde\mu_j=b\mu_j\), \(\tilde\sigma_i^2=a^2\sigma_i^2\), \(\tilde\sigma_j^2=b^2\sigma_j^2\). Since \(W_i\) and \(W_j\) are also independent with each other, the proof above for simple sum also applies here, which means that any linear combination of two independent normally distributed random variables is univariate normally distributed.

Finally, we can do the linear combination of \(V\) and another \(cX_k\) (\(c\in\mathbb R\)) and simply repeat this procedure to prove that any linear combination of \(\mathbf X\)'s elements yields a univariate normally distributed random variable. \(\blacksquare\)

Tip

注记 21.1 卷积的直觉是:\(Y=y\) 的概率密度应当是 \(X_i\) 与 \(X_j\) 所有可能取值组合(其和为 \(y\))的可能性之总和。

于是可证明两个定义的等价性。

Note

等价性证明 由定义 21.2,\(\mathbf X\) 的每个元素都是 \(\mathbf Z\) 中元素的一个线性组合,故 \(\mathbf X\) 中元素的任意线性组合也是 \(\mathbf Z\) 中元素的一个线性组合,从而由引理 21.1,它是单变量正态分布的——这给出定义 21.3。

还注意:引理 21.1 的证明严重依赖 \(X_i\) 与 \(X_j\) 的独立性及其正态分布。故为使 §21.3 描述的性质成立,我们需要 \(\mathbf X\) 中每个元素都是某些独立正态分布随机变量的线性组合,而它们又可被变换为 i.i.d. 标准正态随机变量的线性组合——这便给出定义 21.2。\(\blacksquare\)

21.3 Density of Multivariate Normal Distribution

一个服从 \(\mathcal N(\mu,\Omega)\) 的多元正态分布向量 \(\mathbf X=(X_1,X_2,\dots,X_k)'\) 的联合密度为 $$f_{\mathbf X}(x_1,x_2,\dots,x_k)=\frac{1}{\sqrt{(2\pi)^k|\Omega|}}e^{-\frac12(\mathbf x-\mu)'\Omega^{-1}(\mathbf x-\mu)}$$

这非常直观,因为 \((\mathbf x-\mu)'\Omega^{-1}(\mathbf x-\mu)\sim\chi_k^2=z_1^2+z_2^2+\cdots+z_k^2\),其中 \(z_i\) 是 i.i.d. 标准正态随机变量。

Tip

Remark 21.1 The intuition for the convolution is that the probability density of \(Y=y\) should be the sum of the possibilities of all possible combinations of values of \(X_i\) and \(X_j\).

Then, we can prove the equivalence of two definitions.

Note

Proof (Equivalence of two definitions) By Definition 21.2, each element of \(\mathbf X\) is a linear combination of elements in \(\mathbf Z\), so any linear combination of elements in \(\mathbf X\) is again a linear combination of elements in \(\mathbf Z\) and thus, by lemma 21.1, is univariate normally distributed, which gives us Definition 21.3.

Also note that the proof of lemma 21.1 heavily depends on the independence of \(X_i\) and \(X_j\) and on their normal distribution. So, for the property described in 21.3 to hold, we need each element in \(\mathbf X\) to be a linear combination of some independent normally distributed random variables, which can be transformed to a linear combination of i.i.d. standard normally distributed random variables, which gives us Definition 21.2. \(\blacksquare\)

21.3 Density of Multivariate Normal Distribution

The joint density of a multivariate normally distributed vector \(\mathbf X=(X_1,X_2,\dots,X_k)'\sim\mathcal N(\mu,\Omega)\) is $$f_{\mathbf X}(x_1,x_2,\dots,x_k)=\frac{1}{\sqrt{(2\pi)^k|\Omega|}}e^{-\frac12(\mathbf x-\mu)'\Omega^{-1}(\mathbf x-\mu)}$$

which is very intuitive since \((\mathbf x-\mu)'\Omega^{-1}(\mathbf x-\mu)\sim\chi_k^2=z_1^2+z_2^2+\cdots+z_k^2\) where \(z_i\)'s are independent standard normal random variables.