2. Conditional Expectation

Note

本章主题:条件期望。 把条件期望 \(\mathbb E[Y\mid\mathbf X]\) 定义为 \(L^2\) 意义下的最优预测——在所有 \(\mathbf X\) 的(平方可积)函数中,最小化与 \(Y\) 的均方距离者。§2.1 定义:当 \(\mathbb E[Y^2]<\infty\) 时,\(\mathbb E[Y\mid\mathbf X]=\arg\min_{m\in\mathbb M}\mathbb E[(Y-m(\mathbf X))^2]\) (2.1);定理 2.1(正交条件 I)给出等价刻画 \(\mathbb E[(Y-m^*(\mathbf X))m(\mathbf X)]=0,\forall m\in\mathbb M\) (2.2) 并证唯一性。当仅 \(\mathbb E[|Y|]<\infty\) 时,定理 2.2(正交条件 II):\(\mathbb E[(Y-m^*(\mathbf X))\mathbf 1\{\mathbf X\in\mathbf B\}]=0\) 对任意 Borel 集 (2.3)。§2.2 性质(命题 2.1,七条):函数可提出、线性、\(f(\mathbf X)\) 可提出、保号、重期望定律 LIE \(\mathbb E[Y]=\mathbb E[\mathbb E[Y\mid\mathbf X]]\)、独立 ⇒ \(\mathbb E[Y\mid\mathbf X]=\mathbb E[Y]\)、均值独立。蕴含链:独立 ⇒ 均值独立 ⇒ 不相关,且两个箭头都不可逆(反例 \(Y=X^2\)、\(Y\mid X\sim N(0,\sigma^2X)\))。

Note

Chapter theme: conditional expectation. Define the conditional expectation \(\mathbb E[Y\mid\mathbf X]\) as the best predictor in the \(L^2\) sense — the function of \(\mathbf X\) (among all square-integrable ones) that minimizes the mean-squared distance to \(Y\). §2.1 Definition: when \(\mathbb E[Y^2]<\infty\), \(\mathbb E[Y\mid\mathbf X]=\arg\min_{m\in\mathbb M}\mathbb E[(Y-m(\mathbf X))^2]\) (2.1); Theorem 2.1 (orthogonality condition I) gives the equivalent characterization \(\mathbb E[(Y-m^*(\mathbf X))m(\mathbf X)]=0,\forall m\in\mathbb M\) (2.2) and proves uniqueness. When only \(\mathbb E[|Y|]<\infty\), Theorem 2.2 (orthogonality condition II): \(\mathbb E[(Y-m^*(\mathbf X))\mathbf 1\{\mathbf X\in\mathbf B\}]=0\) for any Borel set (2.3). §2.2 Properties (Proposition 2.1, seven items): pulling out functions, linearity, pulling out \(f(\mathbf X)\), sign preservation, the law of iterated expectations (LIE) \(\mathbb E[Y]=\mathbb E[\mathbb E[Y\mid\mathbf X]]\), independence ⇒ \(\mathbb E[Y\mid\mathbf X]=\mathbb E[Y]\), and mean independence. The implication chain: independence ⇒ mean independence ⇒ uncorrelatedness, with neither arrow reversible (counterexamples \(Y=X^2\) and \(Y\mid X\sim N(0,\sigma^2X)\)).

2.1 Definition of Conditional Expectation

设随机向量 \((Y,\mathbf X)\),\(Y\in\mathbb R\)、\(\mathbf X\in\mathbb R^k\)、\(\mathbb E[Y^2]<\infty\)。定义函数空间

$$\mathbb M=\{m(\mathbf X):m:\mathbb R^k\to\mathbb R,\ \mathbb E[m^2(\mathbf X)]<\infty\}$$

即所有「以 \(\mathbf X\) 为自变量、二阶原点矩有限」的函数。

2.1.1 假设 \(\mathbb E[Y^2]<\infty\) 时的定义.

Important

定义 2.1(条件期望) 给定 \(\mathbf X\) 时 \(Y\) 的条件期望 \(\mathbb E[Y\mid\mathbf X]\) 是下列问题的任一解: $$\inf_{m\in\mathbb M}\mathbb E[(Y-m(\mathbf X))^2] \tag{2.1}$$

即 \(\mathbb E[Y\mid\mathbf X]\) 是在均方误差意义下 \(Y\) 的最优预测——它是 \(\mathbf X\) 的函数中与 \(Y\) 平方距离最小者。

Important

定理 2.1(正交条件 I) 对随机向量 \((Y,\mathbf X)\),\(Y\in\mathbb R\)、\(\mathbf X\in\mathbb R^k\)、\(\mathbb E[Y^2]<\infty\),则 \(m^*(\mathbf X)\in\mathbb M\) 求解 (2.1) 当且仅当 $$\mathbb E[(Y-m^*(\mathbf X))m(\mathbf X)]=0,\quad\forall m(\mathbf X)\in\mathbb M \tag{2.2}$$ 且 \(m^*\) 在以下意义下唯一:若 \(\tilde m(\mathbf X)\) 也解 (2.1),则 \(\mathbb P(\tilde m(\mathbf X)=m^*(\mathbf X))=1\)。

Let \((Y,\mathbf X)\) be a random vector, \(Y\in\mathbb R\), \(\mathbf X\in\mathbb R^k\), \(\mathbb E[Y^2]<\infty\). Define the function space

$$\mathbb M=\{m(\mathbf X):m:\mathbb R^k\to\mathbb R,\ \mathbb E[m^2(\mathbf X)]<\infty\}$$

i.e. all functions of \(\mathbf X\) with finite second raw moment.

2.1.1 Definition assuming \(\mathbb E[Y^2]<\infty\).

Important

Definition 2.1 (Conditional expectation) The conditional expectation of \(Y\) given \(\mathbf X\), \(\mathbb E[Y\mid\mathbf X]\), is any solution to: $$\inf_{m\in\mathbb M}\mathbb E[(Y-m(\mathbf X))^2] \tag{2.1}$$

So \(\mathbb E[Y\mid\mathbf X]\) is the best predictor of \(Y\) in the mean-squared-error sense — the function of \(\mathbf X\) with the smallest squared distance to \(Y\).

Important

Theorem 2.1 (Orthogonality condition I) For a random vector \((Y,\mathbf X)\), \(Y\in\mathbb R\), \(\mathbf X\in\mathbb R^k\), \(\mathbb E[Y^2]<\infty\), then \(m^*(\mathbf X)\in\mathbb M\) solves (2.1) if and only if $$\mathbb E[(Y-m^*(\mathbf X))m(\mathbf X)]=0,\quad\forall m(\mathbf X)\in\mathbb M \tag{2.2}$$ and \(m^*\) is unique in the sense that if \(\tilde m(\mathbf X)\) also solves (2.1), then \(\mathbb P(\tilde m(\mathbf X)=m^*(\mathbf X))=1\).

Note

证明(定理 2.1,正交条件 ⟺ 最小化) (⇐) 正交条件 ⇒ 最小化. 任取 \(m\in\mathbb M\),加减 \(m^*(\mathbf X)\): $$\mathbb E[(Y-m(\mathbf X))^2]=\mathbb E[(Y-m^*(\mathbf X))^2]+2\mathbb E[(Y-m^*(\mathbf X))(m^*(\mathbf X)-m(\mathbf X))]+\mathbb E[(m^*(\mathbf X)-m(\mathbf X))^2]$$ 由 Cauchy-Schwarz,\(\mathbb E[(m^*(\mathbf X)-m(\mathbf X))^2]<\infty\),故 \(m_0(\mathbf X)\equiv m^*(\mathbf X)-m(\mathbf X)\in\mathbb M\),由正交条件中间项 $=0$。于是 $$\mathbb E[(Y-m(\mathbf X))^2]=\mathbb E[(Y-m^*(\mathbf X))^2]+\mathbb E[(m^*(\mathbf X)-m(\mathbf X))^2]\ge\mathbb E[(Y-m^*(\mathbf X))^2]$$ 故 \(m^*\) 求解 (2.1);不等式严格除非 \(\mathbb P(m(\mathbf X)=m^*(\mathbf X))=1\),即得唯一性。 (⇒) 最小化 ⇒ 正交条件. 设 \(m^*\) 解 (2.1),则对 \(\forall\alpha\in\mathbb R\), $$\mathbb E[(Y-m^*(\mathbf X))^2]\le\mathbb E[(Y-m^*(\mathbf X)-\alpha m(\mathbf X))^2]=\mathbb E[(Y-m^*(\mathbf X))^2]+\alpha^2\mathbb E[(m(\mathbf X))^2]-2\alpha\mathbb E[(Y-m^*(\mathbf X))m(\mathbf X)]$$ 即 \(\alpha^2\mathbb E[(m(\mathbf X))^2]-2\alpha\mathbb E[(Y-m^*(\mathbf X))m(\mathbf X)]\ge0\) 对 \(\forall\alpha\) 成立。这个 \(\alpha\) 的二次式非负要求一次项系数为 \(0\):\(\mathbb E[(Y-m^*(\mathbf X))m(\mathbf X)]=0\)(正交条件 I)。\(\blacksquare\)

可见 \(\mathbb E[Y\mid\mathbf X]\) 是把 \(Y\) 投影到「\(\mathbf X\) 的函数」张成的空间 \(\mathbb M\) 上——残差 \(Y-\mathbb E[Y\mid\mathbf X]\) 与一切 \(m(\mathbf X)\) 正交,这正是它作为 \(Y\mid\mathbf X\) 最优预测的几何含义。

Note

Proof (Theorem 2.1, orthogonality ⟺ minimization) (⇐) Orthogonality ⇒ minimization. Take any \(m\in\mathbb M\) and add/subtract \(m^*(\mathbf X)\): $$\mathbb E[(Y-m(\mathbf X))^2]=\mathbb E[(Y-m^*(\mathbf X))^2]+2\mathbb E[(Y-m^*(\mathbf X))(m^*(\mathbf X)-m(\mathbf X))]+\mathbb E[(m^*(\mathbf X)-m(\mathbf X))^2]$$ By Cauchy-Schwarz, \(\mathbb E[(m^*(\mathbf X)-m(\mathbf X))^2]<\infty\), so \(m_0(\mathbf X)\equiv m^*(\mathbf X)-m(\mathbf X)\in\mathbb M\), and the cross term $=0$ by the orthogonality condition. Hence $$\mathbb E[(Y-m(\mathbf X))^2]=\mathbb E[(Y-m^*(\mathbf X))^2]+\mathbb E[(m^*(\mathbf X)-m(\mathbf X))^2]\ge\mathbb E[(Y-m^*(\mathbf X))^2]$$ so \(m^*\) solves (2.1); the inequality is strict unless \(\mathbb P(m(\mathbf X)=m^*(\mathbf X))=1\), giving uniqueness. (⇒) Minimization ⇒ orthogonality. Suppose \(m^*\) solves (2.1); then for all \(\alpha\in\mathbb R\), $$\mathbb E[(Y-m^*(\mathbf X))^2]\le\mathbb E[(Y-m^*(\mathbf X)-\alpha m(\mathbf X))^2]=\mathbb E[(Y-m^*(\mathbf X))^2]+\alpha^2\mathbb E[(m(\mathbf X))^2]-2\alpha\mathbb E[(Y-m^*(\mathbf X))m(\mathbf X)]$$ i.e. \(\alpha^2\mathbb E[(m(\mathbf X))^2]-2\alpha\mathbb E[(Y-m^*(\mathbf X))m(\mathbf X)]\ge0\) for all \(\alpha\). For this quadratic in \(\alpha\) to be nonnegative, the linear coefficient must vanish: \(\mathbb E[(Y-m^*(\mathbf X))m(\mathbf X)]=0\) (orthogonality condition I). \(\blacksquare\)

So \(\mathbb E[Y\mid\mathbf X]\) is the projection of \(Y\) onto the space \(\mathbb M\) spanned by functions of \(\mathbf X\) — the residual \(Y-\mathbb E[Y\mid\mathbf X]\) is orthogonal to every \(m(\mathbf X)\), which is the geometric meaning of its being the best predictor of \(Y\) given \(\mathbf X\).

2.1.2 仅假设 \(\mathbb E[|Y|]<\infty\) 时的定义. 去掉 \(\mathbb E[Y^2]<\infty\)、仅设 \(\mathbb E[|Y|]<\infty\),则用如下方式定义。

Important

定理 2.2(正交条件 II) 对随机向量 \((Y,\mathbf X)\),\(Y\in\mathbb R\)、\(\mathbf X\in\mathbb R^k\)、\(\mathbb E[|Y|]<\infty\),给定 \(\mathbf X\) 时 \(Y\) 的条件期望 \(\mathbb E[Y\mid\mathbf X]\) 是任一满足 \(\mathbb E[|m^*(\mathbf X)|]<\infty\) 且对 \(\mathbb R^k\) 中任意 Borel 集 \(\mathbf B\) $$\mathbb E[(Y-m^*(\mathbf X))\mathbf 1\{\mathbf X\in\mathbf B\}]=0 \tag{2.3}$$ 的 \(m^*(\mathbf X)\)。

正交条件 II 用指示函数 \(\mathbf 1\{\mathbf X\in\mathbf B\}\) 替代一般的 \(m(\mathbf X)\),从而只需一阶矩有限。下面 §2.2 的性质均可由 (2.3) 证明。

2.1.2 Definition assuming only \(\mathbb E[|Y|]<\infty\). Dropping \(\mathbb E[Y^2]<\infty\) and assuming only \(\mathbb E[|Y|]<\infty\), define as follows.

Important

Theorem 2.2 (Orthogonality condition II) For a random vector \((Y,\mathbf X)\), \(Y\in\mathbb R\), \(\mathbf X\in\mathbb R^k\), \(\mathbb E[|Y|]<\infty\), the conditional expectation of \(Y\) given \(\mathbf X\), \(\mathbb E[Y\mid\mathbf X]\), is any \(m^*(\mathbf X)\) with \(\mathbb E[|m^*(\mathbf X)|]<\infty\) such that for any Borel set \(\mathbf B\) in \(\mathbb R^k\), $$\mathbb E[(Y-m^*(\mathbf X))\mathbf 1\{\mathbf X\in\mathbf B\}]=0 \tag{2.3}$$

Orthogonality condition II replaces a general \(m(\mathbf X)\) with the indicator \(\mathbf 1\{\mathbf X\in\mathbf B\}\), so only a finite first moment is needed. All the properties in §2.2 can be proved from (2.3).

2.2 Properties of Conditional Expectation

Important

命题 2.1(条件期望的性质) 1. 若 \(Y=f(\mathbf X)\),则 \(\mathbb E[Y\mid\mathbf X]=f(\mathbf X)\)。 2. (线性)\(\mathbb E[Y+Z\mid\mathbf X]=\mathbb E[Y\mid\mathbf X]+\mathbb E[Z\mid\mathbf X]\)。 3. (提出已知函数)\(\mathbb E[f(\mathbf X)Y\mid\mathbf X]=f(\mathbf X)\mathbb E[Y\mid\mathbf X]\)。 4. (保号)若 \(\mathbb P(Y\ge0)=1\),则 \(\mathbb P(\mathbb E[Y\mid\mathbf X]\ge0)=1\)。 5. (重期望定律 LIE)取 \(\mathbf B=\mathbb R^k\),则 (2.3) 蕴含 \(\mathbb E[Y-\mathbb E[Y\mid\mathbf X]]=0\),即 \(\mathbb E[Y]=\mathbb E[\mathbb E[Y\mid\mathbf X]]\)。更一般地 \(\mathbb E[\mathbb E[Y\mid\mathbf X_1,\mathbf X_2]\mid\mathbf X_1]=\mathbb E[Y\mid\mathbf X_1]\)。 6. (独立)若 \(\mathbf X\perp Y\),则 \(\mathbb E[Y\mid\mathbf X]=\mathbb E[Y]\)。 7. (均值独立)若 \(\mathbb E[Y\mid\mathbf X]=c\)(常数),则由 LIE \(\mathbb E[\mathbb E[Y\mid\mathbf X]]=\mathbb E[c]=c\),即 \(\mathbb E[Y\mid\mathbf X]=\mathbb E[Y]\)。

Note

证明(性质 1、2) 性质 1:对这类问题先「猜」一个好的 \(m^*(\mathbf X)\) 再代入正交条件验证。试 \(m^*(\mathbf X)=f(\mathbf X)\): $$\mathbb E[(Y-m^*(\mathbf X))\mathbf 1\{\mathbf X\in\mathbf B\}]=\mathbb E[(f(\mathbf X)-f(\mathbf X))\mathbf 1\{\mathbf X\in\mathbf B\}]=0$$ 对 \(\forall\mathbf B\) 成立。由 \(\mathbb E[Y\mid\mathbf X]\) 唯一,\(\mathbb E[Y\mid\mathbf X]=m^*(\mathbf X)=f(\mathbf X)\)。 性质 2:设 \(\mathbb E[Y\mid\mathbf X]=m^*(\mathbf X)\) 满足 (2.3)、\(\mathbb E[Z\mid\mathbf X]=\tilde m^*(\mathbf X)\) 满足 (2.3)。由期望线性,\(\mathbb E[(Y-m^*(\mathbf X))\mathbf 1\{\mathbf X\in\mathbf B\}]+\mathbb E[(Z-\tilde m^*(\mathbf X))\mathbf 1\{\mathbf X\in\mathbf B\}]=\mathbb E[(Y+Z-m^*(\mathbf X)-\tilde m^*(\mathbf X))\mathbf 1\{\mathbf X\in\mathbf B\}]=0\) 对 \(\forall\mathbf B\),故 \(\mathbb E[Y+Z\mid\mathbf X]=m^*(\mathbf X)+\tilde m^*(\mathbf X)=\mathbb E[Y\mid\mathbf X]+\mathbb E[Z\mid\mathbf X]\)。\(\blacksquare\)

Important

Proposition 2.1 (Properties of conditional expectation) 1. If \(Y=f(\mathbf X)\), then \(\mathbb E[Y\mid\mathbf X]=f(\mathbf X)\). 2. (Linearity) \(\mathbb E[Y+Z\mid\mathbf X]=\mathbb E[Y\mid\mathbf X]+\mathbb E[Z\mid\mathbf X]\). 3. (Pulling out a known function) \(\mathbb E[f(\mathbf X)Y\mid\mathbf X]=f(\mathbf X)\mathbb E[Y\mid\mathbf X]\). 4. (Sign preservation) If \(\mathbb P(Y\ge0)=1\), then \(\mathbb P(\mathbb E[Y\mid\mathbf X]\ge0)=1\). 5. (Law of iterated expectations, LIE) Taking \(\mathbf B=\mathbb R^k\), (2.3) implies \(\mathbb E[Y-\mathbb E[Y\mid\mathbf X]]=0\), i.e. \(\mathbb E[Y]=\mathbb E[\mathbb E[Y\mid\mathbf X]]\). More generally \(\mathbb E[\mathbb E[Y\mid\mathbf X_1,\mathbf X_2]\mid\mathbf X_1]=\mathbb E[Y\mid\mathbf X_1]\). 6. (Independence) If \(\mathbf X\perp Y\), then \(\mathbb E[Y\mid\mathbf X]=\mathbb E[Y]\). 7. (Mean independence) If \(\mathbb E[Y\mid\mathbf X]=c\) (a constant), then by LIE \(\mathbb E[\mathbb E[Y\mid\mathbf X]]=\mathbb E[c]=c\), i.e. \(\mathbb E[Y\mid\mathbf X]=\mathbb E[Y]\).

Note

Proof (Properties 1, 2) Property 1: for this type of problem, first "guess" a good \(m^*(\mathbf X)\) and plug it into the orthogonality condition. Try \(m^*(\mathbf X)=f(\mathbf X)\): $$\mathbb E[(Y-m^*(\mathbf X))\mathbf 1\{\mathbf X\in\mathbf B\}]=\mathbb E[(f(\mathbf X)-f(\mathbf X))\mathbf 1\{\mathbf X\in\mathbf B\}]=0$$ for all \(\mathbf B\). By uniqueness of \(\mathbb E[Y\mid\mathbf X]\), \(\mathbb E[Y\mid\mathbf X]=m^*(\mathbf X)=f(\mathbf X)\). Property 2: let \(\mathbb E[Y\mid\mathbf X]=m^*(\mathbf X)\) satisfy (2.3) and \(\mathbb E[Z\mid\mathbf X]=\tilde m^*(\mathbf X)\) satisfy (2.3). By linearity of expectation, \(\mathbb E[(Y-m^*(\mathbf X))\mathbf 1\{\mathbf X\in\mathbf B\}]+\mathbb E[(Z-\tilde m^*(\mathbf X))\mathbf 1\{\mathbf X\in\mathbf B\}]=\mathbb E[(Y+Z-m^*(\mathbf X)-\tilde m^*(\mathbf X))\mathbf 1\{\mathbf X\in\mathbf B\}]=0\) for all \(\mathbf B\), so \(\mathbb E[Y+Z\mid\mathbf X]=m^*(\mathbf X)+\tilde m^*(\mathbf X)=\mathbb E[Y\mid\mathbf X]+\mathbb E[Z\mid\mathbf X]\). \(\blacksquare\)

蕴含关系.

$$\text{Independence}\Rightarrow\text{Mean independence}\Rightarrow\text{Uncorrelatedness}$$

(均值独立 ⇒ 不相关用到性质 3 和 5。)但两个箭头都不可逆

$$\text{Uncorrelatedness}\not\Rightarrow\text{Mean independence}\not\Rightarrow\text{Independence}$$

Note

证明(蕴含链与反例) 独立 ⇒ 均值独立:验证 \(m^*(\mathbf X)=\mathbb E[Y\mid\mathbf X]=\mathbb E[Y]\) 满足 (2.3): $$\mathbb E[(Y-\mathbb E[Y])\mathbf 1\{\mathbf X\in\mathbf B\}]\overset{\text{indep}}=\mathbb E[(Y-\mathbb E[Y])]\,\mathbb E[\mathbf 1\{\mathbf X\in\mathbf B\}]=\underbrace{(\mathbb E[Y]-\mathbb E[Y])}_{=0}\mathbb E[\mathbf 1\{\mathbf X\in\mathbf B\}]=0$$ 对 \(\forall\mathbf B\),故 \(\mathbb E[Y\mid\mathbf X]=\mathbb E[Y]\)。 均值独立 ⇒ 不相关:设 \(\mathbb E[Y\mid X]=\mathbb E[Y]\),则 $$\mathrm{Cov}(X,Y)=\mathbb E[XY]-\mathbb E[X]\mathbb E[Y]\overset{\text{LIE}}=\mathbb E[\mathbb E[XY\mid X]]-\mathbb E[X]\mathbb E[Y]\overset{\text{Prop 3}}=\mathbb E[X\mathbb E[Y\mid X]]-\mathbb E[X]\mathbb E[Y]=\mathbb E[X]\mathbb E[Y]-\mathbb E[X]\mathbb E[Y]=0$$ (第二个等号用性质 3。) 不相关 ⇏ 均值独立(反例):\(X\sim N(0,1)\)、\(Y\overset d=X^2\)。则 \(\mathrm{Cov}(X,Y)=\mathbb E[X^3]-\mathbb E[X]\mathbb E[X^2]=0-0=0\)(不相关),但 \(\mathbb E[Y\mid X]=\mathbb E[X^2\mid X]=X^2\ne\mathbb E[Y]=1\)(非均值独立)。 均值独立 ⇏ 独立(反例):\(Y\mid X\sim N(0,\sigma^2X)\),则 \(\mathbb E[Y\mid X]=0\) 为常数(均值独立),但由 \(Y\mid X\) 的方差依赖 \(X\) 可见 \(Y\) 独立于 \(X\)。\(\blacksquare\)

Implications.

$$\text{Independence}\Rightarrow\text{Mean independence}\Rightarrow\text{Uncorrelatedness}$$

(Mean independence ⇒ uncorrelatedness uses Properties 3 and 5.) But neither arrow is reversible:

$$\text{Uncorrelatedness}\not\Rightarrow\text{Mean independence}\not\Rightarrow\text{Independence}$$

Note

Proof (implication chain and counterexamples) Independence ⇒ mean independence: verify that \(m^*(\mathbf X)=\mathbb E[Y\mid\mathbf X]=\mathbb E[Y]\) satisfies (2.3): $$\mathbb E[(Y-\mathbb E[Y])\mathbf 1\{\mathbf X\in\mathbf B\}]\overset{\text{indep}}=\mathbb E[(Y-\mathbb E[Y])]\,\mathbb E[\mathbf 1\{\mathbf X\in\mathbf B\}]=\underbrace{(\mathbb E[Y]-\mathbb E[Y])}_{=0}\mathbb E[\mathbf 1\{\mathbf X\in\mathbf B\}]=0$$ for all \(\mathbf B\), so \(\mathbb E[Y\mid\mathbf X]=\mathbb E[Y]\). Mean independence ⇒ uncorrelatedness: suppose \(\mathbb E[Y\mid X]=\mathbb E[Y]\); then $$\mathrm{Cov}(X,Y)=\mathbb E[XY]-\mathbb E[X]\mathbb E[Y]\overset{\text{LIE}}=\mathbb E[\mathbb E[XY\mid X]]-\mathbb E[X]\mathbb E[Y]\overset{\text{Prop 3}}=\mathbb E[X\mathbb E[Y\mid X]]-\mathbb E[X]\mathbb E[Y]=\mathbb E[X]\mathbb E[Y]-\mathbb E[X]\mathbb E[Y]=0$$ Uncorrelatedness ⇏ mean independence (counterexample): \(X\sim N(0,1)\), \(Y\overset d=X^2\). Then \(\mathrm{Cov}(X,Y)=\mathbb E[X^3]-\mathbb E[X]\mathbb E[X^2]=0-0=0\) (uncorrelated), but \(\mathbb E[Y\mid X]=\mathbb E[X^2\mid X]=X^2\ne\mathbb E[Y]=1\) (not mean independent). Mean independence ⇏ independence (counterexample): \(Y\mid X\sim N(0,\sigma^2X)\), so \(\mathbb E[Y\mid X]=0\) is constant (mean independent), but since the variance of \(Y\mid X\) depends on \(X\), \(Y\) is not independent of \(X\). \(\blacksquare\)

Tip

小结 条件期望的核心是「投影 / 最优预测」这一几何视角:\(\mathbb E[Y\mid\mathbf X]\) 是 \(Y\) 在 \(\mathbf X\)-可测函数空间上的 \(L^2\) 投影,残差与所有 \(m(\mathbf X)\) 正交(正交条件 I/II)。下一章对线性回归的三种解读,正是把这一投影从「任意函数空间 \(\mathbb M\)」缩小到「\(\mathbf X\) 的线性函数空间」的特例。蕴含链 独立 ⇒ 均值独立 ⇒ 不相关 及其不可逆性,是后续判别「外生性」强弱的基础。

Tip

Summary The core of conditional expectation is the geometric view of "projection / best prediction": \(\mathbb E[Y\mid\mathbf X]\) is the \(L^2\) projection of \(Y\) onto the space of \(\mathbf X\)-measurable functions, with the residual orthogonal to all \(m(\mathbf X)\) (orthogonality conditions I/II). The next chapter's three interpretations of linear regression are exactly the special case of shrinking this projection from "the space of arbitrary functions \(\mathbb M\)" to "the space of linear functions of \(\mathbf X\)." The chain independence ⇒ mean independence ⇒ uncorrelatedness and its irreversibility are the basis for later distinguishing strong vs weak "exogeneity."