1. Asymptotic Theory
本章主题:渐近(大样本)理论。 有限样本性质依赖对分布 \(P\) 的强假设,故转而研究 \(n\to\infty\) 的大样本性质。§1.1 三个原型问题:估计、检验、置信区间。§1.2 依概率收敛(Markov 不等式 → 弱大数定律 WLLN)。§1.3 矩的存在性(Jensen 不等式;高阶矩存在 ⇒ 低阶矩存在;边际依概率收敛 ⇒ 联合依概率收敛;连续映射定理 CMT;一致性 vs 无偏性;经验分布 Glivenko-Cantelli)。§1.4 依矩收敛(\(L^q\) 收敛 ⇒ 依概率收敛,反之不然)。§1.5 依分布收敛(Portmanteau 引理 7 个等价刻画;与依概率收敛的关系;Slutsky 引理;分布版 CMT)。§1.6 收敛概念的比较(几乎必然 a.s.、逐点;收敛强弱次序 p.w.⇒a.s.⇒p⇒d,\(L^q\)⇒p)。§1.7 中心极限定理(一元 + Cramér-Wold + 多元 CLT)。§1.8 假设检验(两类错误;水平一致性;\(p\) 值;置信域;多维检验用 \(\chi^2\);Delta 方法;相关系数与 Cauchy-Schwarz;中位数的极限分布 Berry-Esseen)。§1.9 紧性(\(\to_d\) ⇒ 紧;Prokhorov 定理;\(\tau_n\)-一致性)。§1.10 随机阶(\(o_P(1)\)、\(O_P(1)\) 及其运算法则)。
Chapter theme: asymptotic (large-sample) theory. Finite-sample properties require strong assumptions on the distribution \(P\), so we instead study large-sample properties as \(n\to\infty\). §1.1 Three prototypical problems: estimation, testing, confidence regions. §1.2 Convergence in probability (Markov's inequality → weak law of large numbers, WLLN). §1.3 Existence of moments (Jensen's inequality; higher-order moment exists ⇒ lower-order exists; marginal convergence in probability ⇒ joint convergence; the Continuous Mapping Theorem, CMT; consistency vs unbiasedness; empirical distribution & Glivenko-Cantelli). §1.4 Convergence in moments (\(L^q\) convergence ⇒ convergence in probability, but not conversely). §1.5 Convergence in distribution (Portmanteau lemma's 7 equivalent characterizations; its relation to convergence in probability; Slutsky's lemma; the distributional CMT). §1.6 Comparison of convergence notions (almost sure, point-wise; the ordering p.w.⇒a.s.⇒p⇒d, and \(L^q\)⇒p). §1.7 Central limit theorem (univariate + Cramér-Wold + multivariate CLT). §1.8 Hypothesis testing (two types of error; consistency in level; \(p\)-value; confidence region; multidimensional testing via \(\chi^2\); the delta method; correlation & Cauchy-Schwarz; the limiting distribution of the median via Berry-Esseen). §1.9 Tightness (\(\to_d\) ⇒ tight; Prokhorov's theorem; \(\tau_n\)-consistency). §1.10 Stochastic order (\(o_P(1)\), \(O_P(1)\) and their rules of calculus).
1.1 Three Prototypical Problems
设 \(X_1,X_2,\dots,X_n\) 独立同分布(i.i.d.),其累积分布函数(c.d.f.)为 \(P\)。计量经济学希望从样本中「学到」 \(P\) 的某些特征 \(\theta(P)\)。三个原型问题:
- 估计 \(\theta(P)\):估计量是样本的函数 \(\hat\theta_n=\hat\theta_n(X_1,\dots,X_n)\),给出 \(\theta(P)\) 的「最佳猜测」。
- 检验关于 \(\theta(P)\) 的假设:检验 \(\theta(P)\) 是否落在某预设子集;设计检验 \(\phi_n=\phi_n(X_1,\dots,X_n)\in[0,1]\) 表示拒绝原假设的概率(惯例上 \(\phi_n\) 只取 \(0\) 或 \(1\),\(\phi_n=1\) 拒绝、\(\phi_n=0\) 不拒绝)。
- 构造置信域:构造随机集 \(C_n=C_n(X_1,\dots,X_n)\) 满足
$$\mathbb P\{\theta(P)\in C_n\}=1-\alpha$$
对给定的 \(\alpha\in(0,1)\)。
由于有限样本性质难以在缺乏对 \(P\) 的强假设下研究,本章余下专注大样本性质。
Let \(X_1,X_2,\dots,X_n\) be independent and identically distributed (i.i.d.) with cumulative distribution function (c.d.f.) \(P\). Econometrics hopes to "learn" some feature \(\theta(P)\) of \(P\) from the sample. Three prototypical problems:
- Estimate \(\theta(P)\): an estimator is a function of the sample \(\hat\theta_n=\hat\theta_n(X_1,\dots,X_n)\) that gives a "best guess" of \(\theta(P)\).
- Test a hypothesis about \(\theta(P)\): test whether \(\theta(P)\) lies in some pre-specified subset; design a test \(\phi_n=\phi_n(X_1,\dots,X_n)\in[0,1]\), the probability with which we reject the null (conventionally \(\phi_n\) takes only \(0\) or \(1\), rejecting if \(\phi_n=1\), not rejecting if \(\phi_n=0\)).
- Construct a confidence region: construct a random set \(C_n=C_n(X_1,\dots,X_n)\) satisfying
$$\mathbb P\{\theta(P)\in C_n\}=1-\alpha$$
for a given \(\alpha\in(0,1)\).
Because finite-sample properties are hard to study without strong assumptions on \(P\), the rest of the chapter focuses on large-sample properties.
1.2 Convergence in Probability
定义 1.1(依概率收敛) 随机向量序列 \(\{X_n\in\mathbb R^k:n\ge1\}\) 依概率收敛到随机变量 \(X\in\mathbb R^k\)(记 \(X_n\xrightarrow{p}X\)),若对所有 \(\varepsilon>0\), $$\mathbb P(|X_n-X|>\varepsilon)\to0\quad\text{as }n\to\infty$$ 其中 \(|X_n-X|\) 是欧氏距离。两个等价表述:(i) \(\lim_{n\to\infty}\mathbb P(|X_n-X|>\varepsilon)=0\);(ii) 对任意 \(\delta,\varepsilon>0\),\(\exists N\) 使 \(\mathbb P(|X_n-X|>\varepsilon)\le\delta\) 对 \(\forall n\ge N\)。
一维时还可定义 \(X_n\xrightarrow{p}+\infty\)(即 \(\forall c>0,\mathbb P\{X_n>c\}\to1\)),\(-\infty\) 类似。
1.2.1 Markov 不等式.
定理 1.1(Markov 不等式) 对任意随机变量(向量)\(X\)、\(\forall q>0\)、\(\forall\varepsilon>0\), $$\mathbb P(|X|>\varepsilon)\le\frac{\mathbb E[|X|^q]}{\varepsilon^q}$$ 其中 \(|\cdot|\) 为欧氏范数。
证明(定理 1.1) 利用「\(g(x)\le f(x)\Rightarrow\mathbb E[g(x)]\le\mathbb E[f(x)]\)」。由指示函数定义, $$\mathbf 1\{|X|>\varepsilon\}\le\frac{|X|^q}{\varepsilon^q} \tag{1.1}$$ (当 \(|X|>\varepsilon\) 时左边 $=1$、右边 $>1$;当 \(|X|\le\varepsilon\) 时左边 $=0$、右边 \(\ge0\),故恒成立。)两边取期望: $$\mathbb E[\mathbf 1\{|X|>\varepsilon\}]=\mathbb P(|X|>\varepsilon)\le\frac{\mathbb E[|X|^q]}{\varepsilon^q}\quad\blacksquare$$
Markov 不等式不仅用于证明 WLLN,也可用来构造(较「松」的)置信区间。
Definition 1.1 (Convergence in Probability) A sequence of random vectors \(\{X_n\in\mathbb R^k:n\ge1\}\) converges in probability to a random variable \(X\in\mathbb R^k\) (denoted \(X_n\xrightarrow{p}X\)) if for all \(\varepsilon>0\), $$\mathbb P(|X_n-X|>\varepsilon)\to0\quad\text{as }n\to\infty$$ where \(|X_n-X|\) is the Euclidean distance. Two equivalent statements: (i) \(\lim_{n\to\infty}\mathbb P(|X_n-X|>\varepsilon)=0\); (ii) for any \(\delta,\varepsilon>0\), \(\exists N\) with \(\mathbb P(|X_n-X|>\varepsilon)\le\delta\) for all \(n\ge N\).
In one dimension one can also define \(X_n\xrightarrow{p}+\infty\) (i.e. \(\forall c>0,\mathbb P\{X_n>c\}\to1\)), with \(-\infty\) analogous.
1.2.1 Markov's inequality.
Theorem 1.1 (Markov's inequality) For any random variable (vector) \(X\), \(\forall q>0\), \(\forall\varepsilon>0\), $$\mathbb P(|X|>\varepsilon)\le\frac{\mathbb E[|X|^q]}{\varepsilon^q}$$ where \(|\cdot|\) is the Euclidean norm.
Proof (Theorem 1.1) Use "\(g(x)\le f(x)\Rightarrow\mathbb E[g(x)]\le\mathbb E[f(x)]\)". By the definition of the indicator function, $$\mathbf 1\{|X|>\varepsilon\}\le\frac{|X|^q}{\varepsilon^q} \tag{1.1}$$ (when \(|X|>\varepsilon\) the LHS $=1$ and the RHS $>1$; when \(|X|\le\varepsilon\) the LHS $=0$ and the RHS \(\ge0\), so it always holds.) Taking expectations on both sides: $$\mathbb E[\mathbf 1\{|X|>\varepsilon\}]=\mathbb P(|X|>\varepsilon)\le\frac{\mathbb E[|X|^q]}{\varepsilon^q}\quad\blacksquare$$
Markov's inequality is used not only to prove the WLLN but also to construct ("loose") confidence intervals.
1.2.2 弱大数定律.
定义 1.2(i.i.d.) \(\{X_n:n\ge1\}\) 独立分布若 \(\mathbb P(X_1\le x_1,\dots,X_n\le x_n)=\prod_{i=1}^n\mathbb P(X_i\le x_i)\);同分布若 \(\mathbb P(X_i\le x)=\mathbb P(X_j\le x),\forall i,j\)。二者皆满足即 i.i.d.。
定理 1.2(弱大数定律 WLLN) 设 \(\{X_n:n\ge1\}\) 为 \(\mathbb R\) 上 i.i.d. 序列、c.d.f. \(P\),且 \(\mu(P)\) 存在(\(\mathbb E[|\mu(P)|]<\infty\)),则 $$\bar X_n\xrightarrow{p}\mu(P)$$ 其中样本均值 \(\bar X_n=\frac1n\sum_{i=1}^nX_i\)。(强大数定律即 \(\bar X_n\xrightarrow{a.s.}\mu\)。)
证明(定理 1.2,附加 \(\sigma^2(P)<\infty\)) 进一步假设 \(\sigma^2(P)<\infty\)。由 Markov 不等式(\(q=2\)): $$\mathbb P(|\bar X_n-\mu(P)|>\varepsilon)\le\frac{\mathbb E[(\bar X_n-\mu(P))^2]}{\varepsilon^2}=\frac{\mathrm{Var}(\bar X_n)}{\varepsilon^2}=\frac{1}{n\varepsilon^2}\mathrm{Var}(X_i)\to0 \tag{1.2}$$ 故 \(\bar X_n\xrightarrow{p}\mu(P)\)。\(\blacksquare\)
1.2.2 Weak law of large numbers.
Definition 1.2 (i.i.d.) \(\{X_n:n\ge1\}\) are independently distributed if \(\mathbb P(X_1\le x_1,\dots,X_n\le x_n)=\prod_{i=1}^n\mathbb P(X_i\le x_i)\); identically distributed if \(\mathbb P(X_i\le x)=\mathbb P(X_j\le x),\forall i,j\). Satisfying both is i.i.d.
Theorem 1.2 (Weak Law of Large Numbers, WLLN) Let \(\{X_n:n\ge1\}\) be an i.i.d. sequence on \(\mathbb R\) with c.d.f. \(P\), and suppose \(\mu(P)\) exists (\(\mathbb E[|\mu(P)|]<\infty\)). Then $$\bar X_n\xrightarrow{p}\mu(P)$$ where the sample mean \(\bar X_n=\frac1n\sum_{i=1}^nX_i\). (The strong law is \(\bar X_n\xrightarrow{a.s.}\mu\).)
Proof (Theorem 1.2, assuming additionally \(\sigma^2(P)<\infty\)) Further assume \(\sigma^2(P)<\infty\). By Markov's inequality (\(q=2\)): $$\mathbb P(|\bar X_n-\mu(P)|>\varepsilon)\le\frac{\mathbb E[(\bar X_n-\mu(P))^2]}{\varepsilon^2}=\frac{\mathrm{Var}(\bar X_n)}{\varepsilon^2}=\frac{1}{n\varepsilon^2}\mathrm{Var}(X_i)\to0 \tag{1.2}$$ so \(\bar X_n\xrightarrow{p}\mu(P)\). \(\blacksquare\)
1.3 Existence of Moments
定义 1.3(均值存在) 随机向量 \(X\) 的均值 \(\mathbb E[X]\) 存在 若 \(\mathbb E[|X|]<\infty\)。对 \(\mathbb R\) 上随机变量,分解正负部 \(X^+\equiv\max\{X,0\}\)、\(X^-\equiv\max\{-X,0\}\),则 \(\mathbb E[X]=\mathbb E[X^+]-\mathbb E[X^-]\),且 \(\mathbb E[|X|]=\mathbb E[X^+]+\mathbb E[X^-]\)。故 \(\mathbb E[|X|]<\infty\iff\mathbb E[X^+]<\infty\) 且 \(\mathbb E[X^-]<\infty\)。
1.3.1 Jensen 不等式.
定理 1.3(Jensen 不等式) 设 \(I\subseteq\mathbb R\) 为凸集、\(f:I\to\mathbb R\) 为凸函数。则对任意满足 \(\mathbb P(x\in I)=1\)、\(\mathbb E[|X|]<\infty\)、\(\mathbb E[|f(X)|]<\infty\) 的随机变量 \(X\), $$f(\mathbb E[X])\le\mathbb E[f(X)]$$
证明(定理 1.3)
令 \(c=\mathbb E[X]\)(设 \(c\in\mathrm{int}(I)\),端点情形显然)。由凸性,左右差商 \(\Delta_{+,h}(c)=\frac{f(c+h)-f(c)}{h}\)、\(\Delta_{-,h}(c)=\frac{f(c)-f(c-h)}{h}\) 当 \(h\downarrow0\) 单调有界,极限 \(D_+(c)\ge D_-(c)\)。任取 \(m\in[D_-(c),D_+(c)]\),定义支撑线 \(L(x)\equiv f(c)+m(x-c)\)。可证 \(L(x)\le f(x)\) 恒成立(分 \(x=c\)、\(x>c\)、\(x
Definition 1.3 (Existence of the mean) The mean \(\mathbb E[X]\) of a random vector \(X\) exists if \(\mathbb E[|X|]<\infty\). For a random variable on \(\mathbb R\), decompose into positive/negative parts \(X^+\equiv\max\{X,0\}\), \(X^-\equiv\max\{-X,0\}\); then \(\mathbb E[X]=\mathbb E[X^+]-\mathbb E[X^-]\) and \(\mathbb E[|X|]=\mathbb E[X^+]+\mathbb E[X^-]\). So \(\mathbb E[|X|]<\infty\iff\mathbb E[X^+]<\infty\) and \(\mathbb E[X^-]<\infty\).
1.3.1 Jensen's inequality.
Theorem 1.3 (Jensen's inequality) Let \(I\subseteq\mathbb R\) be a convex set and \(f:I\to\mathbb R\) a convex function. Then for any random variable \(X\) with \(\mathbb P(x\in I)=1\), \(\mathbb E[|X|]<\infty\), \(\mathbb E[|f(X)|]<\infty\), $$f(\mathbb E[X])\le\mathbb E[f(X)]$$
Proof (Theorem 1.3)
Let \(c=\mathbb E[X]\) (assume \(c\in\mathrm{int}(I)\); the boundary case is obvious). By convexity, the difference quotients \(\Delta_{+,h}(c)=\frac{f(c+h)-f(c)}{h}\), \(\Delta_{-,h}(c)=\frac{f(c)-f(c-h)}{h}\) are monotone and bounded as \(h\downarrow0\), with limits \(D_+(c)\ge D_-(c)\). Pick any \(m\in[D_-(c),D_+(c)]\) and define the supporting line \(L(x)\equiv f(c)+m(x-c)\). One shows \(L(x)\le f(x)\) always (split into \(x=c\), \(x>c\), \(x
1.3.2 高阶矩 ⇒ 低阶矩.
定义 1.4 / 命题 1.1 原点矩 \(\mathbb E[X^k]\) 为 \(X\) 的 \(k\) 阶原点矩;中心矩 \(\mathbb E[(X-\mathbb E[X])^k]\) 为 \(k\) 阶中心矩。命题 1.1:高阶矩存在 ⇒ 低阶矩存在,即对 \(\forall1\le j\le k\), $$\mathbb E[|X|^k]<\infty\Rightarrow\mathbb E[|X|^j]<\infty$$
证明(命题 1.1,截断技巧) 设 \(1\le j\le k\),\(f(x)=x^{k/j}\)(凸)。定义 \(Y=|X|^j\)、截断 \(Y_n=\min\{|X|^j,n\}\)。先证 \(\mathbb E[|f(Y_n)|]<\infty\) 且 \(\mathbb E[||Y_n||]<\infty\)。由 \(Y_n\le Y\)、\(f\) 递增、\(f(Y_n)\ge0\): $$\mathbb E[|f(Y_n)|]\le\mathbb E[|f(Y)|]=\mathbb E[(|X|^j)^{k/j}]=\mathbb E[|X|^k]<\infty \tag{1.3}$$ 再由 \(f(x)>x\) (\(x>1\))、\(f(x)\le x\) (\(0\le x\le1\)) 把 \(\mathbb E[|Y_n|]\) 拆两部分证其有界 \(<\infty\)。当 \(Y_n\to Y\)、期望连续,得 \(\mathbb E[|X^j|]=\mathbb E[|Y|]<\infty\)。\(\blacksquare\)
1.3.2 Higher-order ⇒ lower-order moments.
Definition 1.4 / Proposition 1.1 The raw moment \(\mathbb E[X^k]\) is the \(k\)th raw moment of \(X\); the centered moment \(\mathbb E[(X-\mathbb E[X])^k]\) is the \(k\)th centered moment. Proposition 1.1: existence of a higher-order moment implies existence of a lower-order moment, i.e. for \(\forall1\le j\le k\), $$\mathbb E[|X|^k]<\infty\Rightarrow\mathbb E[|X|^j]<\infty$$
Proof (Proposition 1.1, truncation trick) Let \(1\le j\le k\) and \(f(x)=x^{k/j}\) (convex). Define \(Y=|X|^j\) and the truncation \(Y_n=\min\{|X|^j,n\}\). First show \(\mathbb E[|f(Y_n)|]<\infty\) and \(\mathbb E[||Y_n||]<\infty\). Since \(Y_n\le Y\), \(f\) increasing, \(f(Y_n)\ge0\): $$\mathbb E[|f(Y_n)|]\le\mathbb E[|f(Y)|]=\mathbb E[(|X|^j)^{k/j}]=\mathbb E[|X|^k]<\infty \tag{1.3}$$ Then, using \(f(x)>x\) (\(x>1\)) and \(f(x)\le x\) (\(0\le x\le1\)), split \(\mathbb E[|Y_n|]\) into two parts to show it is bounded \(<\infty\). As \(Y_n\to Y\) and expectation is continuous, \(\mathbb E[|X^j|]=\mathbb E[|Y|]<\infty\). \(\blacksquare\)
1.3.3 边际依概率收敛 ⇒ 联合依概率收敛.
定理 1.4 设随机向量序列 \(X_n(X_{n,1},\dots,X_{n,k})\) 与 \(X(X_1,\dots,X_k)\) 同在 \(\mathbb R^k\)。若 \(X_{n,j}\xrightarrow{p}X_j\) 对 \(\forall1\le j\le k\),则 \(X_n\xrightarrow{p}X\)。(此命题把 WLLN 从一维推广到多维。)
证明(定理 1.4) 由欧氏距离定义及并集上界: $$\mathbb P(|X_n-X|>\varepsilon)=\mathbb P\Big(\sum_{i=1}^k(X_{n,i}-X_i)^2>\varepsilon^2\Big)\le\mathbb P\Big(\bigcup_{i=1}^k\{(X_{n,i}-X_i)^2>\tfrac{\varepsilon^2}{k}\}\Big)\le\sum_{i=1}^k\mathbb P\Big(|X_{n,i}-X_i|>\tfrac{\varepsilon}{\sqrt k}\Big)\to0$$ 故 \(X_n\xrightarrow{p}X\)。\(\blacksquare\)
1.3.4 连续映射定理(依概率).
定理 1.5(连续映射定理 CMT,依概率) 设 \(\{X_n:n\ge1\}\) 与 \(X\) 为 \(\mathbb R^k\) 上随机向量,\(g:\mathbb R^k\to\mathbb R^d\) 在集合 \(C\subseteq\mathbb R^k\) 上每点连续且 \(\mathbb P(X\in C)=1\)。则 \(X_n\xrightarrow{p}X\Rightarrow g(X_n)\xrightarrow{p}g(X)\)。
证明(定理 1.5) 对「\(X_n\) 与 \(X\) 靠近但 \(g(X_n),g(X)\) 仍远」的坏点集 \(B_\delta\equiv\{x:\exists y,|x-y|<\delta\text{ s.t. }|g(x)-g(y)|>\varepsilon\}\), $$\mathbb P(|g(X_n)-g(X)|>\varepsilon)=\mathbb P(\{\dots\}\cap\{X\notin B_\delta\})+\mathbb P(\{\dots\}\cap\{X\in B_\delta\}) \tag{1.4}$$ 第一项 \(\le\mathbb P(|X_n-X|\ge\delta)\to0\);第二项 \(\le\mathbb P(X\in B_\delta)\to0\)(\(\delta\) 可任意小)。故 \(g(X_n)\xrightarrow{p}g(X)\)。\(\blacksquare\)
Remark 1.1 用 CMT 须检查所关注点的连续性。反例:\(g(x)=\mathbf 1\{x>0\}\),取 \(X_n=\frac1n,X=0\),虽 \(X_n\xrightarrow{p}X\),但 \(g(X_n)=1\not\xrightarrow{p}0=g(X)\)(\(g\) 在 \(0\) 不连续)。
1.3.3 Marginal convergence in probability ⇒ joint convergence.
Theorem 1.4 Let random-vector sequences \(X_n(X_{n,1},\dots,X_{n,k})\) and \(X(X_1,\dots,X_k)\) both be in \(\mathbb R^k\). If \(X_{n,j}\xrightarrow{p}X_j\) for all \(1\le j\le k\), then \(X_n\xrightarrow{p}X\). (This generalizes the WLLN from one dimension to many.)
Proof (Theorem 1.4) By the definition of Euclidean distance and the union bound: $$\mathbb P(|X_n-X|>\varepsilon)=\mathbb P\Big(\sum_{i=1}^k(X_{n,i}-X_i)^2>\varepsilon^2\Big)\le\mathbb P\Big(\bigcup_{i=1}^k\{(X_{n,i}-X_i)^2>\tfrac{\varepsilon^2}{k}\}\Big)\le\sum_{i=1}^k\mathbb P\Big(|X_{n,i}-X_i|>\tfrac{\varepsilon}{\sqrt k}\Big)\to0$$ so \(X_n\xrightarrow{p}X\). \(\blacksquare\)
1.3.4 Continuous Mapping Theorem (in probability).
Theorem 1.5 (Continuous Mapping Theorem, CMT, in probability) Let \(\{X_n:n\ge1\}\) and \(X\) be random vectors on \(\mathbb R^k\), and \(g:\mathbb R^k\to\mathbb R^d\) be continuous at each point in a set \(C\subseteq\mathbb R^k\) with \(\mathbb P(X\in C)=1\). Then \(X_n\xrightarrow{p}X\Rightarrow g(X_n)\xrightarrow{p}g(X)\).
Proof (Theorem 1.5) For the bad set of points where \(X_n\) and \(X\) are close but \(g(X_n),g(X)\) stay far apart, \(B_\delta\equiv\{x:\exists y,|x-y|<\delta\text{ s.t. }|g(x)-g(y)|>\varepsilon\}\), $$\mathbb P(|g(X_n)-g(X)|>\varepsilon)=\mathbb P(\{\dots\}\cap\{X\notin B_\delta\})+\mathbb P(\{\dots\}\cap\{X\in B_\delta\}) \tag{1.4}$$ The first term \(\le\mathbb P(|X_n-X|\ge\delta)\to0\); the second \(\le\mathbb P(X\in B_\delta)\to0\) (\(\delta\) arbitrarily small). So \(g(X_n)\xrightarrow{p}g(X)\). \(\blacksquare\)
Remark 1.1 Using the CMT requires checking continuity at the point of interest. Counterexample: \(g(x)=\mathbf 1\{x>0\}\), take \(X_n=\frac1n,X=0\); though \(X_n\xrightarrow{p}X\), we have \(g(X_n)=1\not\xrightarrow{p}0=g(X)\) (\(g\) is discontinuous at \(0\)).
1.3.5 一致性.
定义 1.5 / 1.6 估计量 \(\hat\mu(P)\) 是 \(\mu(P)\) 的一致估计量若 \(\hat\mu(P)\xrightarrow{p}\mu(P)\);是无偏估计量若 \(\mathbb E[\hat\mu(P)]=\mu(P)\)。
Remark 1.2 无偏与一致密切但不等价:无偏非渐近性质、不蕴含一致;一致也不蕴含无偏。
例 1.1(无偏且一致):由 WLLN,\(\bar X_n\xrightarrow{p}\mathbb E[X_i]\),且 \(\mathbb E[\bar X_n]=\frac1n\sum\mathbb E[X_i]=\mathbb E[X_i]\) 无偏。有偏但一致:\(\bar X_n+\frac1n\),\(\mathbb E[\bar X_n+\frac1n]=\mathbb E[X_i]+\frac1n\ne\mathbb E[X_i]\)(有偏),但由边际收敛 ⇒ 联合收敛 + CMT,\(\bar X_n+\frac1n\xrightarrow{p}\mathbb E[X_i]\)(一致)。无偏但不一致:取 \(X_1\) 作估计量,\(\mathbb E[X_1]=\mathbb E[X_i]\) 无偏,但 \(X_1\) 与 \(n\) 无关、不依概率收敛。
例 1.2(\(\sigma^2\) 的一致估计):\(s_n^2=\frac1{n-1}\sum(X_i-\bar X_n)^2\)。代数化简 \(s_n^2=\frac{n}{n-1}[\frac1n\sum X_i^2-\bar X_n^2]\),令 \(g(a,b,c)=a(b-c^2)\),由 \(\frac1n\sum X_i^2\xrightarrow{p}\mathbb E[X^2]\)、\(\bar X_n\xrightarrow{p}\mathbb E[X]\)、CMT:\(s_n^2\xrightarrow{p}\mathbb E[X^2]-\mathbb E[X]^2=\sigma^2(P)\)。
例 1.3(经验分布):经验分布 \(\hat F_n(x)=\frac1n\sum_{i=1}^n\mathbf 1\{X_i\le x\}\) 是 \(P\) 的自然估计量。对每个 \(x\),由 WLLN \(\hat F_n(x)\xrightarrow{p}\mathbb E[\mathbf 1\{X\le x\}]=F(x)\)。更强地,Glivenko-Cantelli 定理:\(\sup_{x\in\mathbb R}|\hat F_n(x)-F(x)|\xrightarrow{p}0\)(一致收敛)。
1.3.5 Consistency.
Definitions 1.5 / 1.6 An estimator \(\hat\mu(P)\) is a consistent estimator of \(\mu(P)\) if \(\hat\mu(P)\xrightarrow{p}\mu(P)\); an unbiased estimator if \(\mathbb E[\hat\mu(P)]=\mu(P)\).
Remark 1.2 Unbiasedness and consistency are closely related but not equivalent: unbiasedness is not asymptotic and does not imply consistency; consistency does not imply unbiasedness.
Example 1.1 (unbiased and consistent): by the WLLN, \(\bar X_n\xrightarrow{p}\mathbb E[X_i]\), and \(\mathbb E[\bar X_n]=\frac1n\sum\mathbb E[X_i]=\mathbb E[X_i]\) is unbiased. Biased but consistent: \(\bar X_n+\frac1n\) has \(\mathbb E[\bar X_n+\frac1n]=\mathbb E[X_i]+\frac1n\ne\mathbb E[X_i]\) (biased), but by marginal ⇒ joint convergence + CMT, \(\bar X_n+\frac1n\xrightarrow{p}\mathbb E[X_i]\) (consistent). Unbiased but not consistent: take \(X_1\) as the estimator; \(\mathbb E[X_1]=\mathbb E[X_i]\) is unbiased, but \(X_1\) does not depend on \(n\) and does not converge in probability.
Example 1.2 (consistent estimator of \(\sigma^2\)): \(s_n^2=\frac1{n-1}\sum(X_i-\bar X_n)^2\). Algebraically \(s_n^2=\frac{n}{n-1}[\frac1n\sum X_i^2-\bar X_n^2]\); with \(g(a,b,c)=a(b-c^2)\), and \(\frac1n\sum X_i^2\xrightarrow{p}\mathbb E[X^2]\), \(\bar X_n\xrightarrow{p}\mathbb E[X]\), CMT gives \(s_n^2\xrightarrow{p}\mathbb E[X^2]-\mathbb E[X]^2=\sigma^2(P)\).
Example 1.3 (empirical distribution): the empirical distribution \(\hat F_n(x)=\frac1n\sum_{i=1}^n\mathbf 1\{X_i\le x\}\) is a natural estimator of \(P\). For each \(x\), the WLLN gives \(\hat F_n(x)\xrightarrow{p}\mathbb E[\mathbf 1\{X\le x\}]=F(x)\). More strongly, the Glivenko-Cantelli theorem: \(\sup_{x\in\mathbb R}|\hat F_n(x)-F(x)|\xrightarrow{p}0\) (uniform convergence).
1.4 Convergence in Moments
定义 1.7 / 命题 1.2 / 命题 1.3 定义 1.7(依 \(q\) 阶矩收敛):\(X_n\xrightarrow{L^q}X\)(\(q\ge1\))若 \(\mathbb E[|X_n-X|^q]\to0\)。命题 1.2:依矩收敛 ⇒ 依概率收敛(\(X_n\xrightarrow{L^q}X\Rightarrow X_n\xrightarrow{p}X\))。命题 1.3:高阶矩收敛 ⇒ 低阶矩收敛,即对 \(1\le j\le k\),\(\mathbb E[|X_n-X|^k]\to0\Rightarrow\mathbb E[|X_n-X|^j]\to0\)。
证明(命题 1.2 与 1.3) 命题 1.2:由 Markov,\(\mathbb P(|X_n-X|>\varepsilon)\le\frac{\mathbb E[|X_n-X|^q]}{\varepsilon^q}\to0\)。\(\blacksquare\) 命题 1.3:设 \(\mathbb E[|X_n-X|^k]\to0\),则大 \(n\) 时 \(\mathbb E[|X_n-X|^k]<\infty\)。取凹函数 \(f(x)=x^{j/k}\),由命题 1.1 知 \(\mathbb E[|X_n-X|^j]<\infty\)。由 Jensen(凹号反向):\(0\le\mathbb E[|X_n-X|^j]=\mathbb E[f(|X_n-X|^k)]\le f(\mathbb E[|X_n-X|^k])=(\mathbb E[|X_n-X|^k])^{j/k}\to0\)。\(\blacksquare\)
例 1.4(依概率收敛 不 蕴含依矩收敛):取 \(q=1,X=0\),\(X_n=n\) 概率 \(\frac1n\)、$=0$ 概率 \(1-\frac1n\)。则 \(\mathbb P(|X_n-0|>\varepsilon)=\frac1n\to0\)(\(X_n\xrightarrow{p}X\)),但 \(\mathbb E[|X_n-X|]=1\not\to0\)。(若对 \(X_n\) 加有界约束,则依概率收敛 可 推出依矩收敛。)
Definition 1.7 / Proposition 1.2 / Proposition 1.3 Definition 1.7 (convergence in \(q\)th moment): \(X_n\xrightarrow{L^q}X\) (\(q\ge1\)) if \(\mathbb E[|X_n-X|^q]\to0\). Proposition 1.2: convergence in moments ⇒ convergence in probability (\(X_n\xrightarrow{L^q}X\Rightarrow X_n\xrightarrow{p}X\)). Proposition 1.3: convergence in higher moments ⇒ in lower moments, i.e. for \(1\le j\le k\), \(\mathbb E[|X_n-X|^k]\to0\Rightarrow\mathbb E[|X_n-X|^j]\to0\).
Proof (Propositions 1.2 and 1.3) Prop 1.2: by Markov, \(\mathbb P(|X_n-X|>\varepsilon)\le\frac{\mathbb E[|X_n-X|^q]}{\varepsilon^q}\to0\). \(\blacksquare\) Prop 1.3: suppose \(\mathbb E[|X_n-X|^k]\to0\), so for large \(n\), \(\mathbb E[|X_n-X|^k]<\infty\). Take the concave \(f(x)=x^{j/k}\); Proposition 1.1 gives \(\mathbb E[|X_n-X|^j]<\infty\). By Jensen (reversed for concave): \(0\le\mathbb E[|X_n-X|^j]=\mathbb E[f(|X_n-X|^k)]\le f(\mathbb E[|X_n-X|^k])=(\mathbb E[|X_n-X|^k])^{j/k}\to0\). \(\blacksquare\)
Example 1.4 (convergence in probability does not imply convergence in moments): take \(q=1,X=0\), \(X_n=n\) with probability \(\frac1n\) and $=0$ with probability \(1-\frac1n\). Then \(\mathbb P(|X_n-0|>\varepsilon)=\frac1n\to0\) (\(X_n\xrightarrow{p}X\)), but \(\mathbb E[|X_n-X|]=1\not\to0\). (Imposing a boundedness constraint on \(X_n\) DOES make convergence in probability imply convergence in moments.)
1.5 Convergence in Distribution
定义 1.8(依分布收敛) \(X_n\xrightarrow{d}X\) 若 \(\mathbb P(X_n\le x)\to\mathbb P(X\le x)\) 对所有使 \(\mathbb P(X\le x)\) 连续的 \(x\)。(也称依律收敛、弱收敛。)
定理 1.6(Portmanteau 引理) 下列对依分布收敛的刻画等价:(1) \(X_n\xrightarrow{d}X\);(2) \(\mathbb E[f(X_n)]\to\mathbb E[f(X)]\) 对所有有界连续 \(f\);(3) 对所有有界 Lipschitz \(f\);(4) 对所有有界一致连续 \(f\);(5) \(\liminf\mathbb P(X_n\in G)\ge\mathbb P(X\in G)\) 对开集 \(G\);(6) \(\limsup\mathbb P(X_n\in F)\le\mathbb P(X\in F)\) 对闭集 \(F\);(7) \(\mathbb P(X_n\in B)\to\mathbb P(X\in B)\) 对所有满足 \(\mathbb P(X\in\partial B)=0\) 的 Borel 集 \(B\)。
例 1.5(退化):\(X_n=\frac1n,X=0\)。\(x\ne0\) 时 \(\mathbb P(X_n\le x)\to\mathbb P(X\le x)\);但 \(x=0\) 处 \(\mathbb P(X_n\le0)=0\,\forall n\) 而 \(\mathbb P(X\le0)=1\),两者永不相等——这正说明依分布收敛只关心连续点。
Definition 1.8 (Convergence in distribution) \(X_n\xrightarrow{d}X\) if \(\mathbb P(X_n\le x)\to\mathbb P(X\le x)\) for all \(x\) at which \(\mathbb P(X\le x)\) is continuous. (Also called convergence in law, or weak convergence.)
Theorem 1.6 (Portmanteau lemma) The following characterizations of convergence in distribution are equivalent: (1) \(X_n\xrightarrow{d}X\); (2) \(\mathbb E[f(X_n)]\to\mathbb E[f(X)]\) for all bounded continuous \(f\); (3) for all bounded Lipschitz \(f\); (4) for all bounded uniformly continuous \(f\); (5) \(\liminf\mathbb P(X_n\in G)\ge\mathbb P(X\in G)\) for open \(G\); (6) \(\limsup\mathbb P(X_n\in F)\le\mathbb P(X\in F)\) for closed \(F\); (7) \(\mathbb P(X_n\in B)\to\mathbb P(X\in B)\) for all Borel sets \(B\) with \(\mathbb P(X\in\partial B)=0\).
Example 1.5 (degenerate): \(X_n=\frac1n,X=0\). For \(x\ne0\), \(\mathbb P(X_n\le x)\to\mathbb P(X\le x)\); but at \(x=0\), \(\mathbb P(X_n\le0)=0\,\forall n\) while \(\mathbb P(X\le0)=1\), never equal — illustrating that convergence in distribution only cares about continuity points.
1.5.1 依概率收敛与依分布收敛的关系.
- 引理 1.1:若 \(X_n\xrightarrow{d}X\) 且 \(Y_n-X_n\xrightarrow{p}0\),则 \(Y_n\xrightarrow{d}X\)。
- 命题 1.4:\(X_n\xrightarrow{p}X\Rightarrow X_n\xrightarrow{d}X\)(依概率收敛强于依分布收敛)。
- 例 1.6:\(X_n\xrightarrow{d}X\) 不 蕴含 \(X_n\xrightarrow{p}X\)。取 \(X\sim N(0,1)\)、\(X_n\overset d=-X\),由对称性 \(X_n\xrightarrow{d}X\),但 \(\mathbb P(|X_n-X|>\varepsilon)=\mathbb P(|2X|>\varepsilon)\not\to0\)。
- 引理 1.2(特例):若 \(X_n\xrightarrow{d}c\)(常数),则 \(X_n\xrightarrow{p}c\)。
- 命题 1.5:边际依分布收敛 不 蕴含联合依分布收敛。反例:\((X_n,Y_n)'\sim N\big(\mathbf 0,\bigl(\begin{smallmatrix}1&(-1)^n\\(-1)^n&1\end{smallmatrix}\bigr)\big)\),边际 \(X_n\xrightarrow{d}N(0,1)\)、\(Y_n\xrightarrow{d}N(0,1)\),但联合密度来回跳动不收敛。
- 引理 1.3(特例):若 \(X_n\xrightarrow{d}X\) 且 \(Y_n\xrightarrow{d}c\)(常数),则 \((X_n,Y_n)\xrightarrow{d}(X,c)\)。
Remark 1.3 依概率收敛关心实现值(两随机变量的实现可以差很远),依分布收敛只关心分布、与实现无关。
1.5.1 Relationship between convergence in probability and in distribution.
- Lemma 1.1: if \(X_n\xrightarrow{d}X\) and \(Y_n-X_n\xrightarrow{p}0\), then \(Y_n\xrightarrow{d}X\).
- Proposition 1.4: \(X_n\xrightarrow{p}X\Rightarrow X_n\xrightarrow{d}X\) (convergence in probability is stronger).
- Example 1.6: \(X_n\xrightarrow{d}X\) does not imply \(X_n\xrightarrow{p}X\). Take \(X\sim N(0,1)\), \(X_n\overset d=-X\); by symmetry \(X_n\xrightarrow{d}X\), but \(\mathbb P(|X_n-X|>\varepsilon)=\mathbb P(|2X|>\varepsilon)\not\to0\).
- Lemma 1.2 (special case): if \(X_n\xrightarrow{d}c\) (a constant), then \(X_n\xrightarrow{p}c\).
- Proposition 1.5: marginal convergence in distribution does not imply joint convergence. Counterexample: \((X_n,Y_n)'\sim N\big(\mathbf 0,\bigl(\begin{smallmatrix}1&(-1)^n\\(-1)^n&1\end{smallmatrix}\bigr)\big)\); marginally \(X_n\xrightarrow{d}N(0,1)\), \(Y_n\xrightarrow{d}N(0,1)\), but the joint density jumps back and forth and never converges.
- Lemma 1.3 (special case): if \(X_n\xrightarrow{d}X\) and \(Y_n\xrightarrow{d}c\) (a constant), then \((X_n,Y_n)\xrightarrow{d}(X,c)\).
Remark 1.3 Convergence in probability is about realizations (two random variables' realizations can be far apart), while convergence in distribution is only about the distribution, with nothing to do with realizations.
1.5.2 连续映射定理(依分布)与 Slutsky.
定理 1.7 / 引理 1.4 定理 1.7(CMT,依分布):\(g\) 在 \(C\) 上连续、\(\mathbb P(X\in C)=1\),则 \(X_n\xrightarrow{d}X\Rightarrow g(X_n)\xrightarrow{d}g(X)\)。引理 1.4(Slutsky 引理):若 \(X_n\xrightarrow{d}X\)、\(Y_n\xrightarrow{d}c\)(常数),则 \(X_n'Y_n\xrightarrow{d}X'c\) 且 \(X_n+Y_n\xrightarrow{d}X+c\)。
Slutsky 是引理 1.3(联合收敛)+ CMT 的直接推论:加法/乘法是连续函数。
1.5.2 Continuous Mapping Theorem (in distribution) and Slutsky.
Theorem 1.7 / Lemma 1.4 Theorem 1.7 (CMT, in distribution): if \(g\) is continuous on \(C\) and \(\mathbb P(X\in C)=1\), then \(X_n\xrightarrow{d}X\Rightarrow g(X_n)\xrightarrow{d}g(X)\). Lemma 1.4 (Slutsky's lemma): if \(X_n\xrightarrow{d}X\) and \(Y_n\xrightarrow{d}c\) (a constant), then \(X_n'Y_n\xrightarrow{d}X'c\) and \(X_n+Y_n\xrightarrow{d}X+c\).
Slutsky follows directly from Lemma 1.3 (joint convergence) + CMT: addition/multiplication are continuous.
1.6 Comparison of Convergence Notions
定义 1.9 / 1.10 几乎必然收敛(a.s.):\(X_n\xrightarrow{a.s.}X\) 若 \(\mathbb P(\lim_{n\to\infty}X_n=X)=1\),等价 \(\mathbb P(\{\omega\in\Omega:\lim X_n(\omega)=X(\omega)\})=1\)。逐点(surely)收敛:\(\lim X_n(\omega)=X(\omega)\) 对 \(\forall\omega\in\Omega\),等价 \(\{\omega:\lim X_n(\omega)=X(\omega)\}=\Omega\)。
命题 1.6(收敛次序) $$X_n\xrightarrow{p.w.}X\;\Rightarrow\;X_n\xrightarrow{a.s.}X\;\Rightarrow\;X_n\xrightarrow{p}X\;\Rightarrow\;X_n\xrightarrow{d}X$$ 且 $$X_n\xrightarrow{L^q}X\;\Rightarrow\;X_n\xrightarrow{p}X\;\Rightarrow\;X_n\xrightarrow{d}X$$
证明(命题 1.6 中 a.s. ⇒ p) 逐点⇒a.s. 显然,a.s.⇒p 关键。设 \(X_n\xrightarrow{a.s.}X\),即 \(\mathbb P(\omega\in\mathcal A)=0\),\(\mathcal A\equiv\{\omega:\lim X_n(\omega)\ne X(\omega)\}\)。对固定 \(\varepsilon>0\) 定义 \(M_n=\bigcup_{m\ge n}\{|X_m-X|>\varepsilon\}\),则 \(M_n\supseteq M_{n+1}\supseteq\dots\),\(\mathbb P(\omega\in M_n)\) 递减;记 \(M_\infty=\bigcap_{j=1}^\infty M_j\)。由 \(\Omega\backslash\mathcal A\subseteq M_\infty^c\) 得 \(M_\infty\subseteq\mathcal A\),故 \(\mathbb P(\omega\in M_\infty)=0\)。于是 $$\lim_{n\to\infty}\mathbb P(|X_n-X|>\varepsilon)\le\lim_{n\to\infty}\mathbb P(\omega\in M_n)=\mathbb P(\omega\in M_\infty)=0 \tag{1.6}$$ 即由几乎必然收敛得到依概率收敛。\(\blacksquare\)
Definitions 1.9 / 1.10 Almost sure convergence (a.s.): \(X_n\xrightarrow{a.s.}X\) if \(\mathbb P(\lim_{n\to\infty}X_n=X)=1\), equivalently \(\mathbb P(\{\omega\in\Omega:\lim X_n(\omega)=X(\omega)\})=1\). Point-wise (surely) convergence: \(\lim X_n(\omega)=X(\omega)\) for all \(\omega\in\Omega\), equivalently \(\{\omega:\lim X_n(\omega)=X(\omega)\}=\Omega\).
Proposition 1.6 (Order of convergence) $$X_n\xrightarrow{p.w.}X\;\Rightarrow\;X_n\xrightarrow{a.s.}X\;\Rightarrow\;X_n\xrightarrow{p}X\;\Rightarrow\;X_n\xrightarrow{d}X$$ and $$X_n\xrightarrow{L^q}X\;\Rightarrow\;X_n\xrightarrow{p}X\;\Rightarrow\;X_n\xrightarrow{d}X$$
Proof (Proposition 1.6, the a.s. ⇒ p step) Point-wise ⇒ a.s. is trivial; a.s. ⇒ p is the key step. Suppose \(X_n\xrightarrow{a.s.}X\), i.e. \(\mathbb P(\omega\in\mathcal A)=0\) with \(\mathcal A\equiv\{\omega:\lim X_n(\omega)\ne X(\omega)\}\). For fixed \(\varepsilon>0\) define \(M_n=\bigcup_{m\ge n}\{|X_m-X|>\varepsilon\}\), so \(M_n\supseteq M_{n+1}\supseteq\dots\) and \(\mathbb P(\omega\in M_n)\) decreases; let \(M_\infty=\bigcap_{j=1}^\infty M_j\). Since \(\Omega\backslash\mathcal A\subseteq M_\infty^c\), we get \(M_\infty\subseteq\mathcal A\), so \(\mathbb P(\omega\in M_\infty)=0\). Hence $$\lim_{n\to\infty}\mathbb P(|X_n-X|>\varepsilon)\le\lim_{n\to\infty}\mathbb P(\omega\in M_n)=\mathbb P(\omega\in M_\infty)=0 \tag{1.6}$$ i.e. almost sure convergence yields convergence in probability. \(\blacksquare\)
1.7 Central Limit Theorem
定理 1.8 / 引理 1.5 / 定理 1.9 定理 1.8(一元 CLT):\(X_1,\dots,X_n\) i.i.d. 于 \(\mathbb R\)、\(\sigma^2(P)<\infty\),则 $$\sqrt n(\bar X_n-\mu(P))\xrightarrow{d}N(0,\sigma^2(P))$$ 引理 1.5(Cramér-Wold):\(X_n\xrightarrow{d}X\) 等价于 \(t'X_n\xrightarrow{d}t'X\) 对 \(\forall t\in\mathbb R^k\)。定理 1.9(多元 CLT):\(X_1,\dots,X_n\) i.i.d. 于 \(\mathbb R^k\)、\(\Sigma(P)<\infty\),则 $$\sqrt n(\bar X_n-\mu(P))\xrightarrow{d}N(0,\Sigma(P))$$
证明(定理 1.9,由 Cramér-Wold 化约到一元) 只需证 \(t'\sqrt n(\bar X_n-\mu)\xrightarrow{d}N(0,\Sigma(P))\) 投影。\(\mathbb E[t'X_i]=t'\mu(P)\)、\(\mathrm{Var}[t'X_i]=t'\Sigma(P)t<\infty\),\(t'X_i\) 仍 i.i.d.,对其用一元 CLT: $$\sqrt n\Big(\tfrac1n\sum t'X_i-t'\mu(P)\Big)\xrightarrow{d}N(0,\mathrm{Var}(t'X_i))=N(0,t'\Sigma(P)t)$$ 即 \(t'\sqrt n(\bar X_n-\mu(P))\xrightarrow{d}t'N(0,\Sigma(P))\)。由 Cramér-Wold 得 \(\sqrt n(\bar X_n-\mu(P))\xrightarrow{d}N(0,\Sigma(P))\)。\(\blacksquare\)
Theorem 1.8 / Lemma 1.5 / Theorem 1.9 Theorem 1.8 (Univariate CLT): \(X_1,\dots,X_n\) i.i.d. on \(\mathbb R\) with \(\sigma^2(P)<\infty\). Then $$\sqrt n(\bar X_n-\mu(P))\xrightarrow{d}N(0,\sigma^2(P))$$ Lemma 1.5 (Cramér-Wold): \(X_n\xrightarrow{d}X\) iff \(t'X_n\xrightarrow{d}t'X\) for all \(t\in\mathbb R^k\). Theorem 1.9 (Multivariate CLT): \(X_1,\dots,X_n\) i.i.d. on \(\mathbb R^k\) with \(\Sigma(P)<\infty\). Then $$\sqrt n(\bar X_n-\mu(P))\xrightarrow{d}N(0,\Sigma(P))$$
Proof (Theorem 1.9, reducing to the univariate case via Cramér-Wold) It suffices to show the projection \(t'\sqrt n(\bar X_n-\mu)\xrightarrow{d}N(0,\Sigma(P))\). We have \(\mathbb E[t'X_i]=t'\mu(P)\), \(\mathrm{Var}[t'X_i]=t'\Sigma(P)t<\infty\), and \(t'X_i\) is still i.i.d.; apply the univariate CLT: $$\sqrt n\Big(\tfrac1n\sum t'X_i-t'\mu(P)\Big)\xrightarrow{d}N(0,\mathrm{Var}(t'X_i))=N(0,t'\Sigma(P)t)$$ i.e. \(t'\sqrt n(\bar X_n-\mu(P))\xrightarrow{d}t'N(0,\Sigma(P))\). By Cramér-Wold, \(\sqrt n(\bar X_n-\mu(P))\xrightarrow{d}N(0,\Sigma(P))\). \(\blacksquare\)
1.8 Hypothesis Testing
1.8.1 两类错误. 第一类错误(Type I):\(H_0\) 为真时拒绝(弃真);第二类错误(Type II):\(H_0\) 为假时不拒绝(取伪)。无法同时最小化两者,惯例控制第一类错误。
1.8.2 水平一致性. 检验的功效函数为 \(\mathbb E[\phi_n]=\mathbb E_P[\phi_n]\):对满足 \(H_0\) 的 \(P\),它是犯第一类错误的概率;对满足 \(H_1\) 的 \(P\),\(1-\mathbb E_P[\phi_n]\) 是犯第二类错误的概率。
定义 1.11(水平一致性) 检验 \(\phi_n\in[0,1]\) 在水平 \(\alpha\) 上一致 若 $$\limsup_{n\to\infty}\mathbb E_P[\phi_n]\le\alpha$$ 对满足 \(H_0\) 的 \(P\) 成立,\(\alpha\in(0,1)\) 为显著性水平。常取 \(\phi_n=\mathbf 1\{T_n>c_n\}\),\(T_n\) 检验统计量、\(c_n\) 临界值。
例 1.8(双侧检验,\(H_0:\mu(P)=0\) vs \(H_1:\mu(P)\ne0\)):由 CLT + CMT,\(H_0\) 下 \(\frac{\sqrt n|\bar X_n|}{s_n}\xrightarrow{d}|N(0,1)|\)。取统计量 \(T_n=\frac{\sqrt n|\bar X_n|}{s_n}\)、临界值 \(c_n=\Phi^{-1}(1-\frac\alpha2)=z_{1-\frac\alpha2}\)(\(\Phi\) 为标准正态 c.d.f.),\(\phi_n=\mathbf 1\{T_n>c_n\}\)。可证 \(\limsup\mathbb E_P[\phi_n]\le\alpha\)(一致)。
例 1.9(单侧检验,\(H_0:\mu(P)\le0\) vs \(H_1:\mu(P)>0\)):\(c_n=\Phi^{-1}(1-\alpha)=z_{1-\alpha}\)、\(T_n=\frac{\sqrt n\bar X_n}{s_n}\)。加减 \(\mu(P)\) 凑分子,\(H_0\)(\(\mu\le0\))下 \(\frac{\sqrt n\mu(P)}{s_n}\le0\),由 Portmanteau 得 \(\limsup\mathbb E_P[\phi_n]\le\mathbb P(Z\ge z_{1-\alpha})=\alpha\)。
1.8.1 Two types of error. Type I error: rejecting \(H_0\) when it is true (false rejection); Type II error: not rejecting \(H_0\) when it is false (false accepting). One cannot minimize both at once, so conventionally we control the Type I error.
1.8.2 Consistency in level. The power function of a test is \(\mathbb E[\phi_n]=\mathbb E_P[\phi_n]\): for \(P\) satisfying \(H_0\) it is the probability of a Type I error; for \(P\) satisfying \(H_1\), \(1-\mathbb E_P[\phi_n]\) is the probability of a Type II error.
Definition 1.11 (Consistency in level) A test \(\phi_n\in[0,1]\) is consistent in level \(\alpha\) if $$\limsup_{n\to\infty}\mathbb E_P[\phi_n]\le\alpha$$ for \(P\) satisfying \(H_0\), where \(\alpha\in(0,1)\) is the significance level. Often \(\phi_n=\mathbf 1\{T_n>c_n\}\) with test statistic \(T_n\) and critical value \(c_n\).
Example 1.8 (two-sided test, \(H_0:\mu(P)=0\) vs \(H_1:\mu(P)\ne0\)): by CLT + CMT, under \(H_0\), \(\frac{\sqrt n|\bar X_n|}{s_n}\xrightarrow{d}|N(0,1)|\). Take statistic \(T_n=\frac{\sqrt n|\bar X_n|}{s_n}\), critical value \(c_n=\Phi^{-1}(1-\frac\alpha2)=z_{1-\frac\alpha2}\) (\(\Phi\) the standard normal c.d.f.), \(\phi_n=\mathbf 1\{T_n>c_n\}\). One shows \(\limsup\mathbb E_P[\phi_n]\le\alpha\) (consistent).
Example 1.9 (one-sided test, \(H_0:\mu(P)\le0\) vs \(H_1:\mu(P)>0\)): \(c_n=\Phi^{-1}(1-\alpha)=z_{1-\alpha}\), \(T_n=\frac{\sqrt n\bar X_n}{s_n}\). Adding/subtracting \(\mu(P)\) in the numerator, under \(H_0\) (\(\mu\le0\)) we have \(\frac{\sqrt n\mu(P)}{s_n}\le0\), and by Portmanteau \(\limsup\mathbb E_P[\phi_n]\le\mathbb P(Z\ge z_{1-\alpha})=\alpha\).
1.8.3 \(p\) 值.
定义 1.12(\(p\) 值) 在某检验中,\(p\) 值是能在水平 \(\alpha\) 上拒绝 \(H_0\) 的最小 \(\alpha\): $$\hat p_n\equiv\inf\{\alpha\in(0,1):T_n>c_n\}$$
单侧检验(例 1.9)的 \(p\) 值:\(\hat p_n=\inf\{\alpha:\frac{\sqrt n\bar X_n}{s_n}>z_{1-\alpha}\}=1-\Phi(\frac{\sqrt n\bar X_n}{s_n})\)。双侧检验(例 1.8)的 \(p\) 值:\(\hat p_n=2(1-\Phi(\frac{\sqrt n|\bar X_n|}{s_n}))\)。
1.8.4 置信域.
定义 1.13(置信域) 水平 \(1-\alpha\) 的置信域 \(C_n=C_n(X_1,\dots,X_n)\) 满足真值落入集合的概率不小于 \(1-\alpha\): $$\mathbb P(\mu(P)\in C_n)\ge1-\alpha$$
例 1.10(用 CLT 构造,\(X_i\sim\) Bernoulli\((q)\)):\(\bar X_n\xrightarrow{p}q\)、\(\sigma^2(P)=q(1-q)\),取 \(s_n^2=\bar X_n(1-\bar X_n)\),由 CMT + Slutsky,\(\frac{\sqrt n(\bar X_n-\mu(P))}{\sqrt{\bar X_n(1-\bar X_n)}}\xrightarrow{d}N(0,1)\)。故 $$C_n=[\bar X_n-c_n,\bar X_n+c_n],\qquad c_n=z_{1-\frac\alpha2}\frac{\sqrt{\bar X_n(1-\bar X_n)}}{\sqrt n} \tag{1.8}$$ 满足 \(\mathbb P(\mu(P)\in C_n)\to1-\alpha\)(渐近性质)。
例 1.11(用 Markov 不等式构造):\(q=2\) 时 \(\mathbb P(|\bar X_n-\mu(P)|>\varepsilon)\le\frac{\mathrm{Var}(\bar X_n)}{\varepsilon^2}=\frac{q(1-q)}{n\varepsilon^2}\le\frac1{4n\varepsilon^2}\)(\(q(1-q)\) 在 \(q=\frac12\) 最大)。令 \(1-\frac1{4n\varepsilon^2}=1-\alpha\) 解出 \(\varepsilon=\frac1{\sqrt{4\alpha n}}\),得 \(C_n=[\bar X_n-\frac1{\sqrt{4\alpha n}},\bar X_n+\frac1{\sqrt{4\alpha n}}]\)。覆盖概率:CLT 法在有限样本下覆盖率可能很差(如全实现为 1 的特例 \(\mathbb P(\mu\notin C_n)\le\varepsilon\) 但 CLT 法不成立),而 Markov 法无此问题(无渐近性、保守但稳健)。
1.8.3 \(p\)-value.
Definition 1.12 (\(p\)-value) In a test, the \(p\)-value is the smallest \(\alpha\) at which \(H_0\) can be rejected consistently at level \(\alpha\): $$\hat p_n\equiv\inf\{\alpha\in(0,1):T_n>c_n\}$$
For the one-sided test (Example 1.9): \(\hat p_n=\inf\{\alpha:\frac{\sqrt n\bar X_n}{s_n}>z_{1-\alpha}\}=1-\Phi(\frac{\sqrt n\bar X_n}{s_n})\). For the two-sided test (Example 1.8): \(\hat p_n=2(1-\Phi(\frac{\sqrt n|\bar X_n|}{s_n}))\).
1.8.4 Confidence region.
Definition 1.13 (Confidence region) A confidence region \(C_n=C_n(X_1,\dots,X_n)\) of level \(1-\alpha\) satisfies that the probability the true value lies in the set is no smaller than \(1-\alpha\): $$\mathbb P(\mu(P)\in C_n)\ge1-\alpha$$
Example 1.10 (constructed via CLT, \(X_i\sim\) Bernoulli\((q)\)): \(\bar X_n\xrightarrow{p}q\), \(\sigma^2(P)=q(1-q)\); take \(s_n^2=\bar X_n(1-\bar X_n)\), and by CMT + Slutsky, \(\frac{\sqrt n(\bar X_n-\mu(P))}{\sqrt{\bar X_n(1-\bar X_n)}}\xrightarrow{d}N(0,1)\). So $$C_n=[\bar X_n-c_n,\bar X_n+c_n],\qquad c_n=z_{1-\frac\alpha2}\frac{\sqrt{\bar X_n(1-\bar X_n)}}{\sqrt n} \tag{1.8}$$ satisfies \(\mathbb P(\mu(P)\in C_n)\to1-\alpha\) (an asymptotic property).
Example 1.11 (constructed via Markov's inequality): for \(q=2\), \(\mathbb P(|\bar X_n-\mu(P)|>\varepsilon)\le\frac{\mathrm{Var}(\bar X_n)}{\varepsilon^2}=\frac{q(1-q)}{n\varepsilon^2}\le\frac1{4n\varepsilon^2}\) (\(q(1-q)\) maximized at \(q=\frac12\)). Setting \(1-\frac1{4n\varepsilon^2}=1-\alpha\) gives \(\varepsilon=\frac1{\sqrt{4\alpha n}}\), so \(C_n=[\bar X_n-\frac1{\sqrt{4\alpha n}},\bar X_n+\frac1{\sqrt{4\alpha n}}]\). Coverage probability: the CLT-based set can have poor coverage in finite samples (e.g. the special case where all realizations equal 1, where \(\mathbb P(\mu\notin C_n)\le\varepsilon\) but the CLT set fails), whereas the Markov-based set does not suffer this (non-asymptotic, conservative but robust).
1.8.5 多维假设检验. \(X_1,\dots,X_n\) i.i.d. 于 \(\mathbb R^k\)、\(\Sigma(P)<\infty\) 为 \(k\times k\) 方差协方差矩阵,检验 \(H_0:\mu(P)=\mathbf 0\) vs \(H_1:\mu(P)\ne\mathbf 0\)。由 CLT,\(\sqrt n(\bar X_n-\mu(P))\xrightarrow{d}N(0,\Sigma(P))\)。
Remark 1.4(有用事实) \(x\sim N(\mu_x,\Sigma(P))\Rightarrow(x-\mu_x)'\Sigma^{-1}(P)(x-\mu_x)\sim\chi^2_k\)(\(\Sigma(P)\) 可逆、\(x\) 为 \(k\times1\))。证明:由多元正态定义 \(x-\mu_x=Az\)、\(z\sim N(0,\mathbf I)\),\(\Sigma(P)=AA'\),则 \((x-\mu_x)'\Sigma^{-1}(P)(x-\mu_x)=z'z\sim\chi^2_k\)。
实践中用样本估 \(\hat\Sigma_n=\frac1n\sum(X_i-\bar X_n)(X_i-\bar X_n)'\xrightarrow{p}\Sigma(P)\)(一致),由 CMT \(\hat\Sigma_n^{-1}\xrightarrow{p}\Sigma^{-1}(P)\),于是 $$n(\bar X_n-\mu(P))'\hat\Sigma_n^{-1}(\bar X_n-\mu(P))\xrightarrow{d}\chi^2_k \tag{1.9}$$ 取统计量 \(T_n=n\bar X_n'\hat\Sigma_n^{-1}\bar X_n\)、临界值 \(c_n=c_{k,1-\alpha}\)(\(\chi^2_k\) 的 \(1-\alpha\) 分位),\(H_0\) 下 \(\mu(P)=0\)、\(T_n\xrightarrow{d}\chi^2_k\),由 Portmanteau 得检验水平一致。
1.8.5 Multidimensional hypothesis testing. \(X_1,\dots,X_n\) i.i.d. on \(\mathbb R^k\) with \(k\times k\) variance-covariance matrix \(\Sigma(P)<\infty\); test \(H_0:\mu(P)=\mathbf 0\) vs \(H_1:\mu(P)\ne\mathbf 0\). By CLT, \(\sqrt n(\bar X_n-\mu(P))\xrightarrow{d}N(0,\Sigma(P))\).
Remark 1.4 (a useful fact) \(x\sim N(\mu_x,\Sigma(P))\Rightarrow(x-\mu_x)'\Sigma^{-1}(P)(x-\mu_x)\sim\chi^2_k\) (\(\Sigma(P)\) invertible, \(x\) a \(k\times1\) vector). Proof: by the multivariate-normal definition \(x-\mu_x=Az\), \(z\sim N(0,\mathbf I)\), \(\Sigma(P)=AA'\); then \((x-\mu_x)'\Sigma^{-1}(P)(x-\mu_x)=z'z\sim\chi^2_k\).
In practice estimate via the sample \(\hat\Sigma_n=\frac1n\sum(X_i-\bar X_n)(X_i-\bar X_n)'\xrightarrow{p}\Sigma(P)\) (consistent), with \(\hat\Sigma_n^{-1}\xrightarrow{p}\Sigma^{-1}(P)\) by CMT, so $$n(\bar X_n-\mu(P))'\hat\Sigma_n^{-1}(\bar X_n-\mu(P))\xrightarrow{d}\chi^2_k \tag{1.9}$$ Take statistic \(T_n=n\bar X_n'\hat\Sigma_n^{-1}\bar X_n\), critical value \(c_n=c_{k,1-\alpha}\) (the \(1-\alpha\) quantile of \(\chi^2_k\)); under \(H_0\), \(\mu(P)=0\) and \(T_n\xrightarrow{d}\chi^2_k\), so by Portmanteau the test is consistent in level.
1.8.6 Delta 方法.
定理 1.10(Delta 方法) 设 \(\{X_n\}\)、\(X\) 为 \(\mathbb R^k\) 上随机向量,\(c\in\mathbb R^k\) 常向量,\(\tau_n\to\infty\) 非随机序列使 \(\tau_n(X_n-c)\xrightarrow{d}X\);\(g:\mathbb R^k\to\mathbb R^d\) 在 \(c\) 处连续可微,雅可比 \(D_g(c)\equiv\frac{\partial g(x)}{\partial x'}\big|_{x=c}\)(\(d\times k\))。则 $$\tau_n(g(X_n)-g(c))\xrightarrow{d}D_g(c)X$$ 特别地,当 \(X\sim N(0,\Sigma)\) 时,\(\tau_n(g(X_n)-g(c))\xrightarrow{d}N(0,D_g(c)\Sigma D_g(c)')\)。
证明(定理 1.10,一阶 Taylor 展开) 对 \(g(c)\) 在 \(c\) 处一阶 Taylor 展开: $$g(X_n)=g(c)+D_g(c)(X_n-c)+R(X_n-c) \tag{1.10}$$ 余项 \(R(0)=0\)、\(\frac{R(X_n-c)}{|X_n-c|}\to0\) (1.11)。乘 \(\tau_n\) 移项:\(\tau_n(g(X_n)-g(c))=D_g(c)\tau_n(X_n-c)+\tau_n R(X_n-c)\)。其中 \(\tau_n R(X_n-c)=\tau_n|X_n-c|\cdot\frac{R(X_n-c)}{|X_n-c|}\xrightarrow{d}|X|\times0=0\)(Slutsky)。故 \(\tau_n(g(X_n)-g(c))\xrightarrow{d}D_g(c)X\)。\(\blacksquare\)
例 1.12(一阶退化 → 二阶 Delta 方法):\(X_i\sim\) Bernoulli\((q)\),\(\sqrt n(\bar X_n-q)\xrightarrow{d}N(0,q(1-q))\),估 \(g(q)=q(1-q)\)。\(D_g(q)=1-2q\),一般地 \(\sqrt n(g(\bar X_n)-g(q))\xrightarrow{d}N(0,(1-2q)^2q(1-q))\)。但 \(q=\frac12\) 时 \(D_g=0\)、分布退化为 \(0\)。此时须用二阶 Taylor 展开:\(D_g^2(q)=-2\), $$n(g(\bar X_n)-g(q))=-n(\bar X_n-q)^2+nR((\bar X_n-q)^2)=-[\sqrt n(\bar X_n-q)]^2+o_P(1)\xrightarrow{d}-[\tfrac12N(0,1)]^2=-\tfrac14\chi^2_1$$ 即 \(q=\frac12\) 时 \(n(g(\bar X_n)-g(q))\xrightarrow{d}-\frac14\chi^2_1\)。
1.8.6 Delta method.
Theorem 1.10 (Delta method) Let \(\{X_n\}\), \(X\) be random vectors on \(\mathbb R^k\), \(c\in\mathbb R^k\) a constant vector, and \(\tau_n\to\infty\) a non-random sequence with \(\tau_n(X_n-c)\xrightarrow{d}X\); let \(g:\mathbb R^k\to\mathbb R^d\) be continuously differentiable at \(c\) with Jacobian \(D_g(c)\equiv\frac{\partial g(x)}{\partial x'}\big|_{x=c}\) (\(d\times k\)). Then $$\tau_n(g(X_n)-g(c))\xrightarrow{d}D_g(c)X$$ In particular, when \(X\sim N(0,\Sigma)\), \(\tau_n(g(X_n)-g(c))\xrightarrow{d}N(0,D_g(c)\Sigma D_g(c)')\).
Proof (Theorem 1.10, first-order Taylor expansion) First-order Taylor expansion of \(g(c)\) around \(c\): $$g(X_n)=g(c)+D_g(c)(X_n-c)+R(X_n-c) \tag{1.10}$$ with remainder \(R(0)=0\) and \(\frac{R(X_n-c)}{|X_n-c|}\to0\) (1.11). Multiply by \(\tau_n\) and rearrange: \(\tau_n(g(X_n)-g(c))=D_g(c)\tau_n(X_n-c)+\tau_n R(X_n-c)\), where \(\tau_n R(X_n-c)=\tau_n|X_n-c|\cdot\frac{R(X_n-c)}{|X_n-c|}\xrightarrow{d}|X|\times0=0\) (Slutsky). So \(\tau_n(g(X_n)-g(c))\xrightarrow{d}D_g(c)X\). \(\blacksquare\)
Example 1.12 (first-order degeneracy → second-order delta method): \(X_i\sim\) Bernoulli\((q)\), \(\sqrt n(\bar X_n-q)\xrightarrow{d}N(0,q(1-q))\), estimating \(g(q)=q(1-q)\). \(D_g(q)=1-2q\), generically \(\sqrt n(g(\bar X_n)-g(q))\xrightarrow{d}N(0,(1-2q)^2q(1-q))\). But at \(q=\frac12\), \(D_g=0\) and the distribution degenerates to \(0\). Then use a second-order Taylor expansion: \(D_g^2(q)=-2\), $$n(g(\bar X_n)-g(q))=-n(\bar X_n-q)^2+nR((\bar X_n-q)^2)=-[\sqrt n(\bar X_n-q)]^2+o_P(1)\xrightarrow{d}-[\tfrac12N(0,1)]^2=-\tfrac14\chi^2_1$$ i.e. at \(q=\frac12\), \(n(g(\bar X_n)-g(q))\xrightarrow{d}-\frac14\chi^2_1\).
1.8.7 相关系数:定义与性质.
定义 1.14 / 定理 1.11 / 命题 1.7 定义 1.14(相关系数):\(\mathbb E[X_i^2]<\infty,\mathbb E[Y_i^2]<\infty\)、\(\mathrm{Var}(X_i)>0,\mathrm{Var}(Y_i)>0\) 时,\(\rho_{X_i,Y_i}(P)\equiv\frac{\mathrm{Cov}(X_i,Y_i)}{\sqrt{\mathrm{Var}(X_i)}\sqrt{\mathrm{Var}(Y_i)}}\)(衡量线性关系)。定理 1.11(Cauchy-Schwarz):对随机变量 \(u,v\)(\(\mathbb E[u^2],\mathbb E[v^2]<\infty\)),\(\mathbb E[uv]^2\le\mathbb E[u^2]\mathbb E[v^2]\) (1.12),严格不等除非 \(\exists\alpha,\mathbb P(u=\alpha v)=1\)。命题 1.7:\(|\rho_{X_i,Y_i}(P)|\le1\),等号成立当且仅当 \(\exists a,b,\mathbb P(Y_i=a+bX_i)=1\)。
证明(定理 1.11 Cauchy-Schwarz) 平凡情形 \(\mathbb E[u^2]\) 或 \(\mathbb E[v^2]=0\) 显然。否则考虑 \(\mathbb E[(u-\alpha v)^2]=\mathbb E[u^2]-2\alpha\mathbb E[uv]+\alpha^2\mathbb E[v^2]\ge0\),作为 \(\alpha\) 的二次函数在 \(\alpha=\frac{\mathbb E[uv]}{\mathbb E[v^2]}\) 最小,代入得 \(\mathbb E[u^2]-\frac{\mathbb E[uv]^2}{\mathbb E[v^2]}\ge0\) 即 \(\mathbb E[u^2]\mathbb E[v^2]\ge\mathbb E[uv]^2\)。严格不等除非 \(\mathbb P(u-\alpha v=0)=1\)。\(\blacksquare\)
估计:样本相关 \(\hat\rho_{X_i,Y_i,n}=\frac{\hat\sigma_{X_i,Y_i,n}}{s_{X_i,n}s_{Y_i,n}}\),由 Delta 方法(若 \(\mathbb E[X^4],\mathbb E[Y^4]<\infty\)),\(\sqrt n(\hat\rho_{X_i,Y_i,n}-\rho_{X_i,Y_i}(P))\xrightarrow{d}N(0,\Sigma)\)(\(\Sigma\) 依赖 \(X_i\) 的四阶原点矩;\(\hat\rho\) 是 \((X_i,Y_i,X_iY_i,X_i^2,Y_i^2)\) 样本均值的连续函数)。
1.8.7 Correlation: definition and properties.
Definition 1.14 / Theorem 1.11 / Proposition 1.7 Definition 1.14 (correlation): when \(\mathbb E[X_i^2]<\infty,\mathbb E[Y_i^2]<\infty\), \(\mathrm{Var}(X_i)>0,\mathrm{Var}(Y_i)>0\), \(\rho_{X_i,Y_i}(P)\equiv\frac{\mathrm{Cov}(X_i,Y_i)}{\sqrt{\mathrm{Var}(X_i)}\sqrt{\mathrm{Var}(Y_i)}}\) (measures the linear relationship). Theorem 1.11 (Cauchy-Schwarz): for random variables \(u,v\) (\(\mathbb E[u^2],\mathbb E[v^2]<\infty\)), \(\mathbb E[uv]^2\le\mathbb E[u^2]\mathbb E[v^2]\) (1.12), strict unless \(\exists\alpha,\mathbb P(u=\alpha v)=1\). Proposition 1.7: \(|\rho_{X_i,Y_i}(P)|\le1\), with equality iff \(\exists a,b,\mathbb P(Y_i=a+bX_i)=1\).
Proof (Theorem 1.11, Cauchy-Schwarz) The trivial case \(\mathbb E[u^2]\) or \(\mathbb E[v^2]=0\) is obvious. Otherwise consider \(\mathbb E[(u-\alpha v)^2]=\mathbb E[u^2]-2\alpha\mathbb E[uv]+\alpha^2\mathbb E[v^2]\ge0\), a quadratic in \(\alpha\) minimized at \(\alpha=\frac{\mathbb E[uv]}{\mathbb E[v^2]}\); substituting gives \(\mathbb E[u^2]-\frac{\mathbb E[uv]^2}{\mathbb E[v^2]}\ge0\), i.e. \(\mathbb E[u^2]\mathbb E[v^2]\ge\mathbb E[uv]^2\). Strict unless \(\mathbb P(u-\alpha v=0)=1\). \(\blacksquare\)
Estimation: the sample correlation \(\hat\rho_{X_i,Y_i,n}=\frac{\hat\sigma_{X_i,Y_i,n}}{s_{X_i,n}s_{Y_i,n}}\); by the delta method (if \(\mathbb E[X^4],\mathbb E[Y^4]<\infty\)), \(\sqrt n(\hat\rho_{X_i,Y_i,n}-\rho_{X_i,Y_i}(P))\xrightarrow{d}N(0,\Sigma)\) (\(\Sigma\) depends on \(X_i\)'s fourth raw moment; \(\hat\rho\) is a continuous function of the sample averages of \((X_i,Y_i,X_iY_i,X_i^2,Y_i^2)\)).
1.8.8 中位数的检验统计量. 总体中位数 \(\theta\equiv\inf\{x\in\mathbb R:F(x)\ge\frac12\}\),样本中位数 \(\hat\theta_n\equiv\inf\{x:\hat F_n(x)\ge0.5\}\)。
定理 1.12 / 命题 1.8 定理 1.12(Berry-Esseen CLT):\(X_1,\dots,X_n\) i.i.d.、\(\sigma^2(P)<\infty\),则一致地 \(\sup_{x}|\mathbb P(\frac{\sqrt n(\bar X_n-\mu(P))}{\sigma(P)}\le x)-\Phi(x)|\le\frac c{\sqrt n}\frac{\mathbb E[|X_i-\mu(P)|^3]}{\sigma^3(P)}\)。命题 1.8(中位数的极限分布):\(F\) 在 \(\theta\) 处连续可微、对应 p.d.f. 为 \(f\),则 $$\sqrt n(\hat\theta_n-\theta)\xrightarrow{d}N\Big(0,\frac1{4f^2(\theta)}\Big)$$
证明思路(命题 1.8) 不能用 Delta 方法(\(\theta\) 非光滑函数)。设 \(n\) 偶、\(\hat\theta_n\) 为第 \(\frac n2\) 高的 \(X_i\)。定义 \(Z_i\equiv\mathbf 1\{X_i>\theta+\frac x{\sqrt n}\}\),\(\mu_n=\mathbb E[Z_i]=1-F(\theta+\frac x{\sqrt n})\to\frac12\)、\(\sigma_n^2=\mu_n(1-\mu_n)\to\frac14\)。由 Berry-Esseen,\(\mathbb P(\frac{\sqrt n(\bar Z_n-\mu_n)}{\sigma_n}\le z_n)\to\Phi(z_n)\),其中 \(z_n=\frac{\sqrt n(\frac12-\mu_n)}{\sigma_n}\to\frac{xf(\theta)}{\frac12}=2xf(\theta)\) (1.14)。故 \(\mathbb P(\sqrt n(\hat\theta_n-\theta)\le x)\to\Phi(2xf(\theta))\),即 \(\sqrt n(\hat\theta_n-\theta)\xrightarrow{d}N(0,\frac1{4f^2(\theta)})\)。\(\blacksquare\)
1.8.8 Test statistic for the median. The population median \(\theta\equiv\inf\{x\in\mathbb R:F(x)\ge\frac12\}\), the sample median \(\hat\theta_n\equiv\inf\{x:\hat F_n(x)\ge0.5\}\).
Theorem 1.12 / Proposition 1.8 Theorem 1.12 (Berry-Esseen CLT): \(X_1,\dots,X_n\) i.i.d., \(\sigma^2(P)<\infty\); then uniformly \(\sup_{x}|\mathbb P(\frac{\sqrt n(\bar X_n-\mu(P))}{\sigma(P)}\le x)-\Phi(x)|\le\frac c{\sqrt n}\frac{\mathbb E[|X_i-\mu(P)|^3]}{\sigma^3(P)}\). Proposition 1.8 (limiting distribution of the median): if \(F\) is continuously differentiable at \(\theta\) with p.d.f. \(f\), then $$\sqrt n(\hat\theta_n-\theta)\xrightarrow{d}N\Big(0,\frac1{4f^2(\theta)}\Big)$$
Proof idea (Proposition 1.8) The delta method does not apply (\(\theta\) is not a smooth function). Assume \(n\) even, \(\hat\theta_n\) the \(\frac n2\)th highest \(X_i\). Define \(Z_i\equiv\mathbf 1\{X_i>\theta+\frac x{\sqrt n}\}\), \(\mu_n=\mathbb E[Z_i]=1-F(\theta+\frac x{\sqrt n})\to\frac12\), \(\sigma_n^2=\mu_n(1-\mu_n)\to\frac14\). By Berry-Esseen, \(\mathbb P(\frac{\sqrt n(\bar Z_n-\mu_n)}{\sigma_n}\le z_n)\to\Phi(z_n)\), where \(z_n=\frac{\sqrt n(\frac12-\mu_n)}{\sigma_n}\to\frac{xf(\theta)}{\frac12}=2xf(\theta)\) (1.14). So \(\mathbb P(\sqrt n(\hat\theta_n-\theta)\le x)\to\Phi(2xf(\theta))\), i.e. \(\sqrt n(\hat\theta_n-\theta)\xrightarrow{d}N(0,\frac1{4f^2(\theta)})\). \(\blacksquare\)
1.9 Tightness
定义 1.15 / 命题 1.9 / 定理 1.13 定义 1.15(紧性):随机向量序列 \(\{X_n:n\ge1\}\) 是紧的(tight)若对任意 \(\varepsilon>0\),存在有限常数 \(B>0\) 使 \(\inf_n\mathbb P(|X_n|\le B)\ge1-\varepsilon\)。命题 1.9:依分布收敛 ⇒ 紧。定理 1.13(Prokhorov 定理):若 \(\{X_n:n\ge1\}\) 紧,则存在子列 \(\{X_{n_j}\}\) 与随机向量 \(X\) 使 \(X_{n_j}\xrightarrow{d}X\)。
例 1.13(紧但不收敛):\(X_{2n}\sim\) Unif$[0,1]$、\(X_{2n+1}\sim\) Unif$[2,3]$(偶/奇元素分布不同)。序列不依分布收敛,但仍紧(取 \(B=3\) 使 \(\liminf\mathbb P(|X_n|\le3)=1>1-\varepsilon\))。
1.9.2 \(\tau_n\)-一致性.
定义 1.16 / 命题 1.10 定义 1.16(\(\tau_n\)-一致性):若 \(\tau_n(\hat\theta_n-\theta(P))\) 紧(\(\tau_n\to\infty\)),则 \(\hat\theta_n\) 是 \(\theta(P)\) 的 \(\tau_n\)-一致估计量。命题 1.10:\(\tau_n\)-一致 ⇒ 一致。
证明(命题 1.10) 把估计误差放大 \(\tau_n\)(趋于 \(\infty\))仍有界(紧),则放大前的误差必收敛到 \(0\)。\(\blacksquare\)
样本均值的 \(\sqrt n(\bar X_n-\mu(P))\) 紧(依分布收敛),故 \(\bar X_n\) 是 \(\sqrt n\)-一致估计量。
Definition 1.15 / Proposition 1.9 / Theorem 1.13 Definition 1.15 (tightness): a sequence \(\{X_n:n\ge1\}\) is tight if for any \(\varepsilon>0\) there exists a finite constant \(B>0\) with \(\inf_n\mathbb P(|X_n|\le B)\ge1-\varepsilon\). Proposition 1.9: convergence in distribution ⇒ tight. Theorem 1.13 (Prokhorov's theorem): if \(\{X_n:n\ge1\}\) is tight, there exist a subsequence \(\{X_{n_j}\}\) and a random vector \(X\) with \(X_{n_j}\xrightarrow{d}X\).
Example 1.13 (tight but not converging): \(X_{2n}\sim\) Unif$[0,1]$, \(X_{2n+1}\sim\) Unif$[2,3]$ (even/odd elements differ). The sequence does not converge in distribution, but is still tight (take \(B=3\) so \(\liminf\mathbb P(|X_n|\le3)=1>1-\varepsilon\)).
1.9.2 \(\tau_n\)-consistency.
Definition 1.16 / Proposition 1.10 Definition 1.16 (\(\tau_n\)-consistency): if \(\tau_n(\hat\theta_n-\theta(P))\) is tight (with \(\tau_n\to\infty\)), then \(\hat\theta_n\) is a \(\tau_n\)-consistent estimator of \(\theta(P)\). Proposition 1.10: \(\tau_n\)-consistent ⇒ consistent.
Proof (Proposition 1.10) If the estimation error blown up by \(\tau_n\) (which goes to \(\infty\)) is still bounded (tight), then the un-blown-up error must converge to \(0\). \(\blacksquare\)
For the sample mean, \(\sqrt n(\bar X_n-\mu(P))\) is tight (converges in distribution), so \(\bar X_n\) is a \(\sqrt n\)-consistent estimator.
1.10 Stochastic Order
随机阶记号.
- 若 \(X_n\xrightarrow{p}0\),记 \(X_n=o_P(1)\);更一般地 \(X_n=o_P(R_n)\) 若 \(X_n=Y_nR_n\) 且 \(Y_n=o_P(1)\)。
- 若 \(X_n\) 紧,记 \(X_n=O_P(1)\);更一般地 \(X_n=O_P(R_n)\) 若 \(X_n=Y_nR_n\) 且 \(Y_n=O_P(1)\)。
Remark 1.7 \(o_P(1)\Rightarrow O_P(1)\)(依概率收敛到 0 ⇒ 紧),但反之不然(\(O_P(1)\not\Rightarrow o_P(1)\))。
命题 1.11(随机阶运算法则) 1. \(o_P(1)+o_P(1)=o_P(1)\); 2. \(o_P(1)+O_P(1)=O_P(1)\); 3. \(o_P(1)O_P(1)=o_P(1)\); 4. \((1+o_P(1))^{-1}=O_P(1)\); 5. \(o_P(O_P(1))=o_P(1)\)。
证明要点(命题 1.11) (1) \(X_n,Y_n\xrightarrow{p}0\),由 CMT \(X_n+Y_n\xrightarrow{p}0\)。(2) \(X_n=o_P(1)\)、\(Y_n=O_P(1)\) 紧,用并集上界证 \(X_n+Y_n\) 紧。(3) 反证 + Prokhorov + Slutsky。(4) \(f(x)=\frac1{1+x}\) 在 \(x\ne-1\) 连续,由 CMT \(f(X_n)\xrightarrow{p}f(0)=1\)(依概率收敛 ⇒ 紧),故 \(f(X_n)=O_P(1)\)。(5) 即法则 3 的改写。\(\blacksquare\)
Stochastic order notation.
- If \(X_n\xrightarrow{p}0\), write \(X_n=o_P(1)\); more generally \(X_n=o_P(R_n)\) if \(X_n=Y_nR_n\) and \(Y_n=o_P(1)\).
- If \(X_n\) is tight, write \(X_n=O_P(1)\); more generally \(X_n=O_P(R_n)\) if \(X_n=Y_nR_n\) and \(Y_n=O_P(1)\).
Remark 1.7 \(o_P(1)\Rightarrow O_P(1)\) (convergence in probability to 0 ⇒ tight), but not conversely (\(O_P(1)\not\Rightarrow o_P(1)\)).
Proposition 1.11 (Rules of calculus with stochastic order) 1. \(o_P(1)+o_P(1)=o_P(1)\); 2. \(o_P(1)+O_P(1)=O_P(1)\); 3. \(o_P(1)O_P(1)=o_P(1)\); 4. \((1+o_P(1))^{-1}=O_P(1)\); 5. \(o_P(O_P(1))=o_P(1)\).
Proof highlights (Proposition 1.11) (1) \(X_n,Y_n\xrightarrow{p}0\), so by CMT \(X_n+Y_n\xrightarrow{p}0\). (2) \(X_n=o_P(1)\), \(Y_n=O_P(1)\) tight; use the union bound to show \(X_n+Y_n\) tight. (3) By contradiction + Prokhorov + Slutsky. (4) \(f(x)=\frac1{1+x}\) is continuous at \(x\ne-1\), so by CMT \(f(X_n)\xrightarrow{p}f(0)=1\) (convergence in probability ⇒ tight), hence \(f(X_n)=O_P(1)\). (5) A restatement of rule 3. \(\blacksquare\)
本章脉络 从「收敛概念」到「统计推断」。 §1.2–1.6 建立四类收敛(依概率、依矩、依分布、几乎必然)及其层级 (p.w.⇒a.s.⇒p⇒d, \(L^q\)⇒p),核心工具是 Markov 不等式、Jensen 不等式、连续映射定理 CMT、Slutsky 引理、Portmanteau 引理。§1.7 的 CLT 给出 \(\sqrt n\)-标准化后的渐近正态性。§1.8 把这些工具用于推断:检验(水平一致性、\(p\) 值、多维 \(\chi^2\) 检验、Delta 方法)与置信域(CLT vs Markov 两种构造)。§1.9–1.10 的紧性与随机阶 \(o_P/O_P\) 是后续章节处理估计量渐近展开的语言基础。
Chapter arc From "notions of convergence" to "statistical inference." §1.2–1.6 build the four types of convergence (in probability, in moments, in distribution, almost sure) and their hierarchy (p.w.⇒a.s.⇒p⇒d, \(L^q\)⇒p), with the core tools being Markov's inequality, Jensen's inequality, the Continuous Mapping Theorem, Slutsky's lemma, and the Portmanteau lemma. The CLT of §1.7 delivers asymptotic normality after \(\sqrt n\)-standardization. §1.8 applies these tools to inference: testing (consistency in level, \(p\)-value, multidimensional \(\chi^2\) test, delta method) and confidence regions (CLT vs Markov constructions). The tightness and stochastic-order \(o_P/O_P\) of §1.9–1.10 are the language for the asymptotic expansions of estimators in later chapters.