8. The Kalman Filter
本章主题:卡尔曼滤波。 §8.1 两个引理:联合正态下的条件分布——引理 8.1 \((X,Y)'\sim N(0,\cdot)\) 则 \(X\mid Y\sim N(AY,S_{XX'\mid Y})\),\(A=S_{XY'}S_{YY'}^{-1}\);引理 8.2 由 \(Y\mid\xi\sim N(H\xi,\Sigma)\)、\(\xi\sim N(\hat\xi,\Omega)\) 得后验 \(\xi\mid Y\sim N(\hat\xi+G(Y-H\hat\xi),\Omega-GS_{YY'}G')\)(即贝叶斯递归更新的正态版本)。§8.2 状态空间系统:可观测方程 \(Y_t=H_t\xi_t+\varepsilon_t\)、状态方程 \(\xi_{t+1}=F_{t+1}\xi_t+\eta_{t+1}\),状态 \(\xi_t\) 不可观测。递归更新三步:(1) 用 \(t-1\) 信息预测 \(Y_t\);(2) 观测到 \(Y_t\) 后更新对 \(\xi_t\) 的推断——卡尔曼增益 \(G_t\)、卡尔曼滤波方程 \(\hat\xi_{t\mid t}=\hat\xi_{t\mid t-1}+G_t(Y_t-H_t\hat\xi_{t\mid t-1})\);(3) 由状态方程预测 \(\xi_{t+1}\)——卡尔曼预测 \(\hat\xi_{t+1\mid t}=F_{t+1}\hat\xi_{t\mid t}\)。卡尔曼滤波不需正态假设(彼时为线性最小二乘)。
Chapter theme: the Kalman filter. §8.1 Two lemmas: conditional distributions under joint normality — Lemma 8.1 \((X,Y)'\sim N(0,\cdot)\) gives \(X\mid Y\sim N(AY,S_{XX'\mid Y})\), \(A=S_{XY'}S_{YY'}^{-1}\); Lemma 8.2 from \(Y\mid\xi\sim N(H\xi,\Sigma)\), \(\xi\sim N(\hat\xi,\Omega)\) gives the posterior \(\xi\mid Y\sim N(\hat\xi+G(Y-H\hat\xi),\Omega-GS_{YY'}G')\) (the normal version of Bayesian recursive updating). §8.2 State-space system: observable equation \(Y_t=H_t\xi_t+\varepsilon_t\), state equation \(\xi_{t+1}=F_{t+1}\xi_t+\eta_{t+1}\), with the state \(\xi_t\) unobservable. Recursive updating in three steps: (1) forecast \(Y_t\) using information up to \(t-1\); (2) after observing \(Y_t\), update the inference about \(\xi_t\) — the Kalman gain \(G_t\), the Kalman filter equation \(\hat\xi_{t\mid t}=\hat\xi_{t\mid t-1}+G_t(Y_t-H_t\hat\xi_{t\mid t-1})\); (3) forecast \(\xi_{t+1}\) from the state equation — the Kalman predictor \(\hat\xi_{t+1\mid t}=F_{t+1}\hat\xi_{t\mid t}\). The Kalman filter needs no normality assumption (then it is linear least squares).
8.1 Two Useful Lemmas
引理 8.1(联合正态的条件分布) 设 \(\begin{pmatrix}X\\Y\end{pmatrix}\sim N\Big(\begin{pmatrix}0\\0\end{pmatrix},\begin{bmatrix}S_{XX'}&S_{XY'}\\S_{YX'}&S_{YY'}\end{bmatrix}\Big)\)。则 $$X\mid Y\sim N(AY,S_{XX'\mid Y})$$ \(A=S_{XY'}S_{YY'}^{-1}\)、\(S_{XX'\mid Y}=S_{XX'}-S_{XY'}S_{YY'}^{-1}S_{YX'}=S_{XX'}-AS_{YX'}A'\)。
证明(引理 8.1) 考虑线性回归 \(X=AY+\varepsilon\),\(\mathbb E(\varepsilon Y')=0\)、\(\mathbb E(\varepsilon\mid Y)=0\)。则 \(\mathbb E(X\mid Y)=\mathbb E(AY+\varepsilon\mid Y)=AY\)。两边乘 \(Y'\) 取期望解 \(A\):\(\mathbb E(XY')=A\mathbb E(YY')\Rightarrow S_{XY'}=AS_{YY'}\Rightarrow A=S_{XY'}S_{YY'}^{-1}\)。条件方差 \(\mathrm{Var}(X\mid Y)=\mathrm{Var}(\varepsilon)=\mathbb E((X-AY)(X-AY)')=S_{XX'}-AS_{YY'}A'=S_{XX'}-S_{XY'}S_{YY'}^{-1}S_{YX'}\)。\(\blacksquare\)
引理 8.2 设 \(Y\mid H,\xi\sim N(H\xi,\Sigma)\)、\(\xi\mid H\sim N(\hat\xi,\Omega)\)。则 \(\begin{pmatrix}\xi\\Y\end{pmatrix}\mid H\sim N\Big(\begin{pmatrix}\hat\xi\\H\hat\xi\end{pmatrix},\begin{bmatrix}S_{\xi\xi'}&S_{\xi Y'}\\S_{Y\xi'}&S_{YY'}\end{bmatrix}\Big)\),且 $$\xi\mid Y,H\sim N\big(\hat\xi+S_{\xi Y'}S_{YY'}^{-1}(Y-H\hat\xi),\;\Omega-S_{\xi Y'}S_{YY'}^{-1}S_{Y\xi'}\big)\sim N(\hat\xi+G\hat\varepsilon,\;\Omega-GS_{YY'}G')$$ 其中 \(S_{\xi\xi'}=\Omega\)、\(S_{\xi Y'}=\Omega H'\)、\(S_{YY'}=H\Omega H'+\Sigma\)、\(G=S_{\xi Y'}S_{YY'}^{-1}\)、\(\hat\varepsilon=Y-H\hat\xi\)。
证明(引理 8.2) 写 \(Y=H\xi+\varepsilon\)、\(\xi=\hat\xi+\nu\)(\(\varepsilon\sim N(0,\Sigma)\)、\(\nu\sim N(0,\Omega)\)),则 \(Y=H\hat\xi+H\nu+\varepsilon\)。\(\mathbb E(\xi\mid H)=\hat\xi\)、\(\mathbb E(Y\mid H)=H\hat\xi\);\(S_{\xi\xi'}=\mathrm{Cov}(\nu,\nu'\mid H)=\Omega\)、\(S_{\xi Y'}=\mathrm{Cov}(\nu,\nu'\mid H)H'=\Omega H'\)、\(S_{YY'}=H\Omega H'+\Sigma\)。再对 \((\xi-\hat\xi;Y-H\hat\xi)\) 用引理 8.1 得条件均值 \(\hat\xi+G(Y-H\hat\xi)\)、条件方差 \(\Omega-GS_{YY'}G'\)。\(\blacksquare\)
Lemma 8.1 (Conditional distribution under joint normality) Let \(\begin{pmatrix}X\\Y\end{pmatrix}\sim N\Big(\begin{pmatrix}0\\0\end{pmatrix},\begin{bmatrix}S_{XX'}&S_{XY'}\\S_{YX'}&S_{YY'}\end{bmatrix}\Big)\). Then $$X\mid Y\sim N(AY,S_{XX'\mid Y})$$ with \(A=S_{XY'}S_{YY'}^{-1}\), \(S_{XX'\mid Y}=S_{XX'}-S_{XY'}S_{YY'}^{-1}S_{YX'}=S_{XX'}-AS_{YX'}A'\).
Proof (Lemma 8.1) Consider the linear regression \(X=AY+\varepsilon\) with \(\mathbb E(\varepsilon Y')=0\), \(\mathbb E(\varepsilon\mid Y)=0\). Then \(\mathbb E(X\mid Y)=\mathbb E(AY+\varepsilon\mid Y)=AY\). Multiply by \(Y'\) and take expectations to solve for \(A\): \(\mathbb E(XY')=A\mathbb E(YY')\Rightarrow S_{XY'}=AS_{YY'}\Rightarrow A=S_{XY'}S_{YY'}^{-1}\). The conditional variance \(\mathrm{Var}(X\mid Y)=\mathrm{Var}(\varepsilon)=\mathbb E((X-AY)(X-AY)')=S_{XX'}-AS_{YY'}A'=S_{XX'}-S_{XY'}S_{YY'}^{-1}S_{YX'}\). \(\blacksquare\)
Lemma 8.2 Let \(Y\mid H,\xi\sim N(H\xi,\Sigma)\), \(\xi\mid H\sim N(\hat\xi,\Omega)\). Then \(\begin{pmatrix}\xi\\Y\end{pmatrix}\mid H\sim N\Big(\begin{pmatrix}\hat\xi\\H\hat\xi\end{pmatrix},\begin{bmatrix}S_{\xi\xi'}&S_{\xi Y'}\\S_{Y\xi'}&S_{YY'}\end{bmatrix}\Big)\), and $$\xi\mid Y,H\sim N\big(\hat\xi+S_{\xi Y'}S_{YY'}^{-1}(Y-H\hat\xi),\;\Omega-S_{\xi Y'}S_{YY'}^{-1}S_{Y\xi'}\big)\sim N(\hat\xi+G\hat\varepsilon,\;\Omega-GS_{YY'}G')$$ where \(S_{\xi\xi'}=\Omega\), \(S_{\xi Y'}=\Omega H'\), \(S_{YY'}=H\Omega H'+\Sigma\), \(G=S_{\xi Y'}S_{YY'}^{-1}\), \(\hat\varepsilon=Y-H\hat\xi\).
Proof (Lemma 8.2) Write \(Y=H\xi+\varepsilon\), \(\xi=\hat\xi+\nu\) (\(\varepsilon\sim N(0,\Sigma)\), \(\nu\sim N(0,\Omega)\)), so \(Y=H\hat\xi+H\nu+\varepsilon\). \(\mathbb E(\xi\mid H)=\hat\xi\), \(\mathbb E(Y\mid H)=H\hat\xi\); \(S_{\xi\xi'}=\mathrm{Cov}(\nu,\nu'\mid H)=\Omega\), \(S_{\xi Y'}=\mathrm{Cov}(\nu,\nu'\mid H)H'=\Omega H'\), \(S_{YY'}=H\Omega H'+\Sigma\). Applying Lemma 8.1 to \((\xi-\hat\xi;Y-H\hat\xi)\) gives the conditional mean \(\hat\xi+G(Y-H\hat\xi)\) and conditional variance \(\Omega-GS_{YY'}G'\). \(\blacksquare\)
8.2 The State Space System
8.2.1 要素. 数据 \(Y_t\in\mathbb R^n\)(\(t=1,\dots,T\),可观测);状态变量 \(\xi_t\in\mathbb R^r\)(不可观测);参数 \(H_t,F_t,\Sigma_t,\Phi_t\)(可随时间变、有时已知有时需估计;本章视为已知)。
8.2.2 状态空间系统. 可观测方程:
$$Y_t=H_t\xi_t+\varepsilon_t,\quad\varepsilon_t\sim N(0,\Sigma_t)$$
状态方程:
$$\xi_{t+1}=F_{t+1}\xi_t+\eta_{t+1},\quad\eta_{t+1}\sim N(0,\Phi_{t+1})$$
假设 \(\varepsilon_t\) 与 \(\eta_t\) 独立。卡尔曼滤波不需正态假设(彼时所有公式为线性最小二乘)。
8.2.3 递归更新. 给定 \(t-1\) 及之前信息预测 \(\xi_t\):\(\xi_t\sim N(\hat\xi_{t\mid t-1},\Omega_{t\mid t-1})\)(下标 \(t\mid t-1\) 表示用截至 \(t-1\) 的信息对日期 \(t\) 的估计)。目标是求卡尔曼预测 \(\hat\xi_{t+1\mid t}\) 与 \(\xi_{t+1}\) 的预测误差协方差矩阵 \(\Omega_{t+1\mid t}\)。三步:
8.2.1 Ingredients. Data \(Y_t\in\mathbb R^n\) (\(t=1,\dots,T\), observable); state variable \(\xi_t\in\mathbb R^r\) (unobservable); parameters \(H_t,F_t,\Sigma_t,\Phi_t\) (may vary over time, sometimes known and sometimes to be estimated; treated as known in this chapter).
8.2.2 The state-space system. Observable equation:
$$Y_t=H_t\xi_t+\varepsilon_t,\quad\varepsilon_t\sim N(0,\Sigma_t)$$
State equation:
$$\xi_{t+1}=F_{t+1}\xi_t+\eta_{t+1},\quad\eta_{t+1}\sim N(0,\Phi_{t+1})$$
assuming \(\varepsilon_t\) and \(\eta_t\) are independent. The Kalman filter needs no normality assumption (then all formulas are linear least squares).
8.2.3 Recursive updating. Forecasting \(\xi_t\) given information up to and including \(t-1\): \(\xi_t\sim N(\hat\xi_{t\mid t-1},\Omega_{t\mid t-1})\) (the subscript \(t\mid t-1\) means the estimate of date \(t\) based on information up to \(t-1\)). The goal is the Kalman predictor \(\hat\xi_{t+1\mid t}\) and the prediction error covariance matrix \(\Omega_{t+1\mid t}\) for \(\xi_{t+1}\). Three steps:
(1) 用 \(t-1\) 信息预测 \(Y_t\). 由 \(\xi_t\sim N(\hat\xi_{t\mid t-1},\Omega_{t\mid t-1})\) 与 \(Y_t=H_t\xi_t+\varepsilon_t\):
$$Y_t\sim N(\hat Y_t,S_{YY'\mid t}),\quad\hat Y_t=H_t\hat\xi_{t\mid t-1},\;S_{YY'\mid t}=H_t\Omega_{t\mid t-1}H_t'+\Sigma_t$$
(2) 观测 \(Y_t\),更新对 \(\xi_t\) 的推断. 预测误差 \(\hat\varepsilon_t=Y_t-\hat Y_t=Y_t-H_t\hat\xi_{t\mid t-1}\)。考虑回归
$$\xi_t-\hat\xi_{t\mid t-1}=G_t(Y_t-H_t\hat\xi_{t\mid t-1})+e_t \tag{8.1}$$
卡尔曼增益 \(G_t=S_{\xi Y'\mid t}S_{YY'\mid t}^{-1}\),\(S_{\xi Y'\mid t}=\mathrm{Cov}(\xi_t,Y_t')=\Omega_{t\mid t-1}H_t'\)。由引理 8.2 得卡尔曼滤波方程(\(\xi_t\) 的更新分布):
$$\xi_t\sim N(\hat\xi_{t\mid t},\Omega_{t\mid t}),\quad\hat\xi_{t\mid t}=\hat\xi_{t\mid t-1}+G_t(Y_t-H_t\hat\xi_{t\mid t-1})$$
$$\Omega_{t\mid t}=\Omega_{t\mid t-1}-G_tS_{YY'\mid t}G_t'=\Omega_{t\mid t-1}-S_{\xi Y'\mid t}S_{YY'\mid t}^{-1}S_{Y\xi'\mid t}$$
Remark 8.1(更新的直觉) 事前 \(\xi_t\sim N(\hat\xi_{t\mid t-1},\Omega_{t\mid t-1})\) 对应事前回归 \(\xi_t-\hat\xi_{t\mid t-1}=G_t(Y_t-H_t\hat\xi_{t\mid t-1})+e_t\)(观测 \(Y_t\) 前 \(G_t(Y_t-H_t\hat\xi_{t\mid t-1})\) 是均值零的随机量)。观测到 \(Y_t\) 后,\(G_t(Y_t-H_t\hat\xi_{t\mid t-1})\) 变为已实现的常数,故新均值 \(\hat\xi_{t\mid t}\) 应纳入它、新方差 \(\Omega_{t\mid t}\) 应减去 \(G_t(Y_t-H_t\hat\xi_{t\mid t-1})\) 的方差——这正是「观测降低不确定性」。
(3) 预测 \(\xi_{t+1}\). 由状态方程 \(\xi_{t+1}=F_{t+1}\xi_t+\eta_{t+1}\) 与更新分布 \(\xi_t\sim N(\hat\xi_{t\mid t},\Omega_{t\mid t})\):
$$\xi_{t+1}\sim N(\hat\xi_{t+1\mid t},\Omega_{t+1\mid t}),\quad\hat\xi_{t+1\mid t}=F_{t+1}\hat\xi_{t\mid t},\;\Omega_{t+1\mid t}=F_{t+1}\Omega_{t\mid t}F_{t+1}'+\Phi_{t+1}$$
8.2.4 初始化. 当起始状态 \(\xi_1\) 已知时,可设 \(\hat\xi_{1\mid0}=\mathbf 0\)、\(\Omega_{1\mid0}=\mathbf 0\),递归即可逐期推进。
(1) Forecast \(Y_t\) using information up to \(t-1\). From \(\xi_t\sim N(\hat\xi_{t\mid t-1},\Omega_{t\mid t-1})\) and \(Y_t=H_t\xi_t+\varepsilon_t\):
$$Y_t\sim N(\hat Y_t,S_{YY'\mid t}),\quad\hat Y_t=H_t\hat\xi_{t\mid t-1},\;S_{YY'\mid t}=H_t\Omega_{t\mid t-1}H_t'+\Sigma_t$$
(2) Observe \(Y_t\), update the inference about \(\xi_t\). The forecast error \(\hat\varepsilon_t=Y_t-\hat Y_t=Y_t-H_t\hat\xi_{t\mid t-1}\). Consider the regression
$$\xi_t-\hat\xi_{t\mid t-1}=G_t(Y_t-H_t\hat\xi_{t\mid t-1})+e_t \tag{8.1}$$
with the Kalman gain \(G_t=S_{\xi Y'\mid t}S_{YY'\mid t}^{-1}\), \(S_{\xi Y'\mid t}=\mathrm{Cov}(\xi_t,Y_t')=\Omega_{t\mid t-1}H_t'\). By Lemma 8.2, the Kalman filter equation (the updated distribution of \(\xi_t\)):
$$\xi_t\sim N(\hat\xi_{t\mid t},\Omega_{t\mid t}),\quad\hat\xi_{t\mid t}=\hat\xi_{t\mid t-1}+G_t(Y_t-H_t\hat\xi_{t\mid t-1})$$
$$\Omega_{t\mid t}=\Omega_{t\mid t-1}-G_tS_{YY'\mid t}G_t'=\Omega_{t\mid t-1}-S_{\xi Y'\mid t}S_{YY'\mid t}^{-1}S_{Y\xi'\mid t}$$
Remark 8.1 (intuition for updating) The ex-ante \(\xi_t\sim N(\hat\xi_{t\mid t-1},\Omega_{t\mid t-1})\) matches the ex-ante regression \(\xi_t-\hat\xi_{t\mid t-1}=G_t(Y_t-H_t\hat\xi_{t\mid t-1})+e_t\) (before observing \(Y_t\), \(G_t(Y_t-H_t\hat\xi_{t\mid t-1})\) is a mean-zero random quantity). After observing \(Y_t\), \(G_t(Y_t-H_t\hat\xi_{t\mid t-1})\) becomes a realized constant, so the new mean \(\hat\xi_{t\mid t}\) should incorporate it and the new variance \(\Omega_{t\mid t}\) should subtract the variance of \(G_t(Y_t-H_t\hat\xi_{t\mid t-1})\) — exactly "observation reduces uncertainty."
(3) Forecast \(\xi_{t+1}\). From the state equation \(\xi_{t+1}=F_{t+1}\xi_t+\eta_{t+1}\) and the updated distribution \(\xi_t\sim N(\hat\xi_{t\mid t},\Omega_{t\mid t})\):
$$\xi_{t+1}\sim N(\hat\xi_{t+1\mid t},\Omega_{t+1\mid t}),\quad\hat\xi_{t+1\mid t}=F_{t+1}\hat\xi_{t\mid t},\;\Omega_{t+1\mid t}=F_{t+1}\Omega_{t\mid t}F_{t+1}'+\Phi_{t+1}$$
8.2.4 Initializing. When the starting state \(\xi_1\) is known, set \(\hat\xi_{1\mid0}=\mathbf 0\), \(\Omega_{1\mid0}=\mathbf 0\), and the recursion advances period by period.
本章脉络 卡尔曼滤波 = 状态空间模型上的贝叶斯递归更新。 §8.1 的两个正态条件分布引理是引擎:引理 8.1 给联合正态的条件均值/方差,引理 8.2 把它包装成「先验 \(N(\hat\xi,\Omega)\) + 观测方程 \(N(H\xi,\Sigma)\) → 后验」的更新公式。§8.2 把这套更新沿时间递归:预测 \(Y_t\) → 用观测误差经卡尔曼增益 \(G_t\) 更新 \(\xi_t\)(滤波方程,方差下降)→ 经状态方程外推到 \(\xi_{t+1}\)(预测方程,方差因 \(\Phi\) 上升),循环往复。这正是 [[bayesian-inference]] 中后验更新思想在动态状态空间中的落地。下一章转入(单变量)时间序列分析本身。
Chapter arc The Kalman filter = Bayesian recursive updating on a state-space model. §8.1's two normal-conditional-distribution lemmas are the engine: Lemma 8.1 gives the conditional mean/variance under joint normality, and Lemma 8.2 packages it into the updating formula "prior \(N(\hat\xi,\Omega)\) + observation equation \(N(H\xi,\Sigma)\) → posterior." §8.2 recurses this update over time: forecast \(Y_t\) → update \(\xi_t\) via the Kalman gain \(G_t\) using the observation error (the filter equation, variance falls) → extrapolate to \(\xi_{t+1}\) via the state equation (the prediction equation, variance rises through \(\Phi\)), and repeat. This is exactly the posterior-updating idea of [[bayesian-inference]] realized in a dynamic state space. The next chapter turns to (univariate) time-series analysis itself.