17. Philosophy and Method
Part VI 主题:简约式 (reduced form) 实证分析。 本章 (Ch 17) 哲学与方法。§17.1 为何简约式:公司金融无单一主导框架——MM (1958) 只是无摩擦基准(每篇研究以某种打破 MM 假设的方式提出问题);故公司金融由问题而非模型驱动,简约式工具只要能回答重要问题即有用(理论与实证相互启发)。§17.2 因果分析框架(Angrist-Pischke 2014):处置 \(D_i\in\{0,1\}\)、潜在结果 \(Y_{1i}/Y_{0i}\);处置效应 ATE/ATT/ATU;因无法同时观测 \(Y_{1i},Y_{0i}\),用 \(\text{Avg}[Y_{1i}\mid D_i{=}1]-\text{Avg}[Y_{0i}\mid D_i{=}0]=\kappa+\text{selection bias}\) (17.1),偏误来自驱动处置与结果的共同变异。§17.3 克服选择偏误:随机化(实践中公司金融难);控制变量回归 \(\beta_1^{Reg}=\frac{\text{Cov}(Y,\tilde X_1)}{\text{Var}(\tilde X_1)}\) (17.2)(正交化掉 \(X_2\) 渠道)、固定效应 组内估计量 (17.5)、遗漏变量偏误 OVB \(\beta^s=\beta^l+\pi_1\gamma\) (17.7);异方差与聚类稳健方差 (17.8)/(17.12)(二者不同概念,聚类稳健已含异方差稳健);工具变量 IV 解经典测量误差(衰减偏误 (17.13))与 OVB,\(\beta^{IV}=\frac{\text{Cov}(Y,Z)}{\text{Cov}(X,Z)}\) (17.15),需相关性 + 排他性(排他性不可检验,Remark 17.2;分子=简约式系数、分母=第一阶段系数,Remark 17.3);2SLS(比 IV 更灵活,两阶段可加控制、可多工具;Shue-Townsend 2017 期权周期工具;缺点 LATE 外部效度);断点回归 RD(连续 running 变量在阈值跳变,\(\lim_{R\downarrow r}Y-\lim_{R\uparrow r}Y\),Remark 17.4 与 DiD 概念不同);双重差分 DiD(处置组 vs 对照组 + 共同趋势,交互项 \(\delta\) 为兴趣系数;Richardson-Troost 2009 圣路易斯 vs 亚特兰大联储)。
Part VI theme: reduced-form empirical analysis. This chapter (Ch 17): philosophy and method. §17.1 Why reduced form: corporate finance has no single dominating framework — MM (1958) is only a frictionless benchmark (every study poses a question that breaks MM's assumptions somehow); so corporate finance is driven by questions not models, and reduced-form tools are useful as long as they answer important questions (theory and empirics inspire each other). §17.2 The causal-analysis framework (Angrist-Pischke 2014): treatment \(D_i\in\{0,1\}\), potential outcomes \(Y_{1i}/Y_{0i}\); treatment effects ATE/ATT/ATU; since we can't observe both \(Y_{1i},Y_{0i}\), we use \(\text{Avg}[Y_{1i}\mid D_i{=}1]-\text{Avg}[Y_{0i}\mid D_i{=}0]=\kappa+\text{selection bias}\) (17.1), the bias coming from common variation driving both treatment and outcome. §17.3 Overcoming selection bias: randomization (impractical in corp fin); regression with controls \(\beta_1^{Reg}=\frac{\text{Cov}(Y,\tilde X_1)}{\text{Var}(\tilde X_1)}\) (17.2) (orthogonalizing away the \(X_2\) channel), fixed effects within-estimator (17.5), omitted variable bias (OVB) \(\beta^s=\beta^l+\pi_1\gamma\) (17.7); heteroskedasticity and clustering robust variances (17.8)/(17.12) (different concepts, cluster-robust already heteroskedasticity-robust); instrumental variables (IV) solving classical measurement error (attenuation bias (17.13)) and OVB, \(\beta^{IV}=\frac{\text{Cov}(Y,Z)}{\text{Cov}(X,Z)}\) (17.15), needing relevance + exclusion (exclusion untestable, Remark 17.2; numerator = reduced-form coefficient, denominator = first-stage coefficient, Remark 17.3); 2SLS (more flexible than IV, controls in both stages, multiple instruments; Shue-Townsend 2017 option-cycle instrument; drawback LATE external validity); regression discontinuity (RD) (a continuous running variable jumps at a threshold, \(\lim_{R\downarrow r}Y-\lim_{R\uparrow r}Y\), Remark 17.4 conceptually different from DiD); difference-in-differences (DiD) (treatment vs control + common trend, the interaction coefficient \(\delta\) is of interest; Richardson-Troost 2009 St. Louis vs Atlanta Fed).
17.1 Philosophy: Why Reduced Form
与资产定价、宏观经济不同,公司金融没有单一的主导框架。Modigliani-Miller (1958) 是无摩擦基准、用来对照而非真正使用的框架;它把公司金融文献组织为:每一项研究都以某种实际方式打破 MM 假设、从而使公司金融研究变得有意义。
因此公司金融研究更多由问题而非经济模型驱动。各类技术都可能用来回答某个有趣问题,其中一些是简约式实证工具;只要能回答重要问题,简约式工具即便没有结构模型也有用。典型公司金融研究常估计不直接系于经济模型的参数,例如:
- 外生现金流冲击对投资决策的影响;
- 更高杠杆对企业价值的影响;
- 管理层薪酬对企业风险承担行为的影响;
- 信贷可得性对消费的影响。
简约式实证研究不需建模的经济假设,在统计意义(而非经济意义)上理解参数/效应。但好的实证公司金融论文常由结构理论模型激励,而非仅识别因果效应。实证与理论相互启发:理论从干净识别的因果效应中获取灵感来写模型;实证从理论中获取洞见来知道哪些因果效应值得检验。
Unlike asset pricing and macroeconomics, corporate finance does not have a single dominating framework. Modigliani-Miller (1958) is the frictionless benchmark to compare against, not the framework actually used; it organizes the corporate finance literature as: every study poses a question that breaks the MM assumptions in some practical way that makes corporate finance research meaningful.
Therefore corporate finance research is more driven by questions than economic models. All sorts of techniques could answer an interesting question, of which some are reduced-form empirical tools; such tools are useful as long as they answer important questions even without a structural model. Typical corporate finance research estimates parameters not directly tied to economic models, such as:
- the effect of an exogenous cash flow shock on investment decisions;
- the effect of higher leverage on enterprise value;
- the effect of management compensation on a firm's risk-taking behavior;
- the effect of credit availability on consumption.
Reduced-form empirical research doesn't need economic assumptions for modeling, and understands a parameter/effect in a statistical sense rather than an economic sense. But a good empirical corporate finance paper is often motivated by structural theoretical models rather than just identifying a causal effect. Empirical and theoretical works inspire each other: theory draws on cleanly identified causal effects to write a model; empirical work draws insights from theory to know which causal effects are interesting to test.
17.2 The Causal Analysis Framework
公司金融虽无主导框架,但简约式实证研究有。Angrist and Pischke (2014) 的因果分析框架在实证公司金融中日益常见。
17.2.1 记号. 个体 \(i\) 的处置:受处置 \(D_i=1\)、未受处置 \(D_i=0\)。潜在结果:受处置 \(Y_{1i}\equiv Y_i\mid(D_i=1)\)、未受处置 \(Y_{0i}\equiv Y_i\mid(D_i=0)\)。样本均值 \(\text{Avg}_n[X_i]\),总体均值 \(\mathbb{E}[X_i]\)。
17.2.2 处置效应与选择偏误.
- 平均处置效应 ATE:总体 \(\mathbb{E}[Y_{1i}]-\mathbb{E}[Y_{0i}]\)。
- 处置组的处置效应 ATT:总体 \(\mathbb{E}[Y_{1i}\mid D_i=1]-\mathbb{E}[Y_{0i}\mid D_i=1]\)。
- 未处置组的处置效应 ATU:总体 \(\mathbb{E}[Y_{1i}\mid D_i=0]-\mathbb{E}[Y_{0i}\mid D_i=0]\)。
但对个体 \(i\) 无法同时观测 \(Y_{1i}\) 与 \(Y_{0i}\),故上述效应都不可直接观测。改用可观测的 \(\text{Avg}_n[Y_{1i}\mid D_i=1]-\text{Avg}_n[Y_{0i}\mid D_i=0]\)。设兴趣在 ATT \(=\kappa\):
$$\text{Avg}_n[Y_{1i}\mid D_i{=}1]-\text{Avg}_n[Y_{0i}\mid D_i{=}0]=\kappa+\underbrace{\text{Avg}_n[Y_{0i}\mid D_i{=}1]-\text{Avg}_n[Y_{0i}\mid D_i{=}0]}_{\text{selection bias}} \tag{17.1}$$
选择偏误不可观测(\(\text{Avg}_n[Y_{0i}\mid D_i{=}1]\) 是反事实)。它可能非零,研究者须思考其符号,关键是想清楚什么底层变异同时驱动处置与结果。例:\(D_i=1\) 表示公司近期换 CEO、\(D_i=0\) 表示未换,结果 \(Y_i\) 为 ROE;选择偏误可能为负,因差业绩同时导致更低 ROE 与管理层更替。
Although corporate finance has no dominating framework, reduced-form empirical studies do. The causal-analysis framework of Angrist and Pischke (2014) has become increasingly common.
17.2.1 Notation. Treatment of individual \(i\): treated \(D_i=1\), not treated \(D_i=0\). Potential outcomes: if treated \(Y_{1i}\equiv Y_i\mid(D_i=1)\), if not \(Y_{0i}\equiv Y_i\mid(D_i=0)\). Sample average \(\text{Avg}_n[X_i]\), population average \(\mathbb{E}[X_i]\).
17.2.2 Treatment effect and selection bias.
- Average treatment effect (ATE): population \(\mathbb{E}[Y_{1i}]-\mathbb{E}[Y_{0i}]\).
- ATT (on the treated): population \(\mathbb{E}[Y_{1i}\mid D_i=1]-\mathbb{E}[Y_{0i}\mid D_i=1]\).
- ATU (on the untreated): population \(\mathbb{E}[Y_{1i}\mid D_i=0]-\mathbb{E}[Y_{0i}\mid D_i=0]\).
But for individual \(i\) we cannot observe both \(Y_{1i}\) and \(Y_{0i}\), so none of these is directly observable. Instead we use the observable \(\text{Avg}_n[Y_{1i}\mid D_i=1]-\text{Avg}_n[Y_{0i}\mid D_i=0]\). Suppose we want ATT \(=\kappa\):
$$\text{Avg}_n[Y_{1i}\mid D_i{=}1]-\text{Avg}_n[Y_{0i}\mid D_i{=}0]=\kappa+\underbrace{\text{Avg}_n[Y_{0i}\mid D_i{=}1]-\text{Avg}_n[Y_{0i}\mid D_i{=}0]}_{\text{selection bias}} \tag{17.1}$$
The selection bias is unobserved (\(\text{Avg}_n[Y_{0i}\mid D_i{=}1]\) is counterfactual). It could be non-zero, so the researcher must think about its sign, the key being what underlying variation drives both treatment and outcome. Example: \(D_i=1\) means a firm recently changed its CEO, \(D_i=0\) no change, outcome \(Y_i\) = ROE; the selection bias could be negative since bad performance causes both lower ROE and a management change.
17.3.1 Randomization & 17.3.2 Regression with Controls
随机化. 若处置随机分配,则 \(\mathbb{E}[Y_{0i}\mid D_i{=}1]=\mathbb{E}[Y_{0i}\mid D_i{=}0]\),(17.1) 的选择偏误为零,\(\kappa\) 由可观测量一致估计:\(\kappa=\text{Avg}_n[Y_{1i}\mid D_i{=}1]-\text{Avg}_n[Y_{0i}\mid D_i{=}0]\)。但随机处置在公司金融中几乎不可行。
控制变量回归. 设真实模型 \(Y_i=\alpha+\beta_1 X_{1i}+\beta_2 X_{2i}+\varepsilon_i\),\(\text{Cov}(X_{1i},\varepsilon_i)=\text{Cov}(X_{2i},\varepsilon_i)=0\),兴趣在 \(\beta_1\)。可由
$$\beta_1^{\text{Reg}}=\frac{\text{Cov}(Y_i,\tilde X_{1i})}{\text{Var}(\tilde X_{1i})} \tag{17.2}$$
识别 \(\beta_1\),其中 \(\tilde X_{1i}\) 为 \(X_{1i}\) 对 \(X_{2i}\) 回归的残差(\(X_{1i}=\pi_0+\pi_1 X_{2i}+\tilde X_{1i}\),\(\mathbb{E}[\tilde X_{1i}]=0\)、\(\text{Cov}(\tilde X_{1i},X_{2i})=0\))。要点:识别 \(\beta_1\) 即把 \(X_1\) 的效应从 \(X_2\) 的效应中正交化出来。若只关心 \(\beta_1\),此正交化合理;可加控制以「关掉」不想要的因果渠道。但若关心 \(X_1\) 经所有渠道对 \(Y\) 的合成效应,则不应加控制。核心问题:加入某些来源作控制后,还剩哪些底层变异驱动兴趣变量。
固定效应回归. 真实模型
$$Y_i=\alpha+\beta X_{ij}+\sum_{j=1}^J\gamma_j\cdot\text{Group}_{ji}+\varepsilon_{ij} \tag{17.3}$$
(\(\text{Group}_{ji}=1\) 若 \(i\) 属组 \(j\),控制不可观测固定效应)。由 \(\beta^{\text{FE}}=\frac{\text{Cov}(Y_i,\tilde X_{ij})}{\text{Var}(\tilde X_{ij})}\) (17.4) 识别 \(\beta\),\(\tilde X_{ij}=X_{ij}-\bar X_j\),故
$$\beta^{\text{FE}}=\frac{\text{Cov}(Y_i,X_{ij}-\bar X_j)}{\text{Var}(X_{ij}-\bar X_j)} \tag{17.5}$$
即组内估计量 (within estimator)。固定效应仅当相信真实模型为 (17.3) 时才解决不一致;当有多种可能的固定效应时应同时报告含/不含 FE、及不同 FE 的结果以揭示经济故事、展示稳健性。例如下表(Mian-Sufi 2009)中生产率增长对按揭信贷增长的影响随固定效应层级(地区/区划/州/县)而异:
| (1) | (2) | (3) | (4) | (5) | |
|---|---|---|---|---|---|
| IRS 收入增长 02–05 | 0.089 | −0.449** | −0.488** | −0.768** | −0.662** |
| (0.098) | (0.092) | (0.092) | (0.090) | (0.089) | |
| 常数 | 0.189** | 0.215** | 0.217** | 0.230** | 0.225** |
| (0.006) | (0.005) | (0.005) | (0.005) | (0.005) | |
| 固定效应 | 无 | 地区 | 区划 | 州 | 县 |
| 观测数 | 3014 | 3014 | 3014 | 3014 | 3014 |
| \(R^2\) | 0.000 | 0.166 | 0.175 | 0.247 | 0.380 |
遗漏变量偏误 (OVB). 「长」模型 \(Y_i=\alpha^l+\beta^l X_{1i}+\gamma X_{2i}+\varepsilon_i^l\) (17.6)、「短」模型 \(Y_i=\alpha^s+\beta^s X_{1i}+\varepsilon_i^s\)。则
$$\beta^s=\beta^l+\pi_1\gamma \tag{17.7}$$
其中 \(\pi_1\) 来自 \(X_{2i}=\pi_0+\pi_1 X_{1i}+u_i\),OVB \(=\pi_1\gamma\)(未必为 0)。加控制可解 OVB,但控制也可能搅乱 \(X_{1i}\)(甚至近乎完全共线)。
Randomization. If treatment is randomly assigned, \(\mathbb{E}[Y_{0i}\mid D_i{=}1]=\mathbb{E}[Y_{0i}\mid D_i{=}0]\), the selection bias in (17.1) is zero, and \(\kappa\) is consistently estimated by the observable \(\kappa=\text{Avg}_n[Y_{1i}\mid D_i{=}1]-\text{Avg}_n[Y_{0i}\mid D_i{=}0]\). But randomized treatment is almost impractical for corporate finance.
Regression with controls. Suppose the true model is \(Y_i=\alpha+\beta_1 X_{1i}+\beta_2 X_{2i}+\varepsilon_i\), \(\text{Cov}(X_{1i},\varepsilon_i)=\text{Cov}(X_{2i},\varepsilon_i)=0\), and we want \(\beta_1\). It is identified by
$$\beta_1^{\text{Reg}}=\frac{\text{Cov}(Y_i,\tilde X_{1i})}{\text{Var}(\tilde X_{1i})} \tag{17.2}$$
where \(\tilde X_{1i}\) is the residual of regressing \(X_{1i}\) on \(X_{2i}\) (\(X_{1i}=\pi_0+\pi_1 X_{2i}+\tilde X_{1i}\), \(\mathbb{E}[\tilde X_{1i}]=0\), \(\text{Cov}(\tilde X_{1i},X_{2i})=0\)). The point: identifying \(\beta_1\) means orthogonalizing \(X_1\)'s effect from \(X_2\)'s. If we only care about \(\beta_1\), this is legitimate; we can add controls to "shut down" unwanted channels of causality. But if we care about the synthesized effect of \(X_1\) on \(Y\) through all channels, no controls should be added. The core question: what underlying sources of variation remain to drive the variable of interest after taking out the sources added as controls.
Fixed effects regression. The true model
$$Y_i=\alpha+\beta X_{ij}+\sum_{j=1}^J\gamma_j\cdot\text{Group}_{ji}+\varepsilon_{ij} \tag{17.3}$$
(\(\text{Group}_{ji}=1\) if \(i\) belongs to group \(j\), controlling for unobserved fixed effects). \(\beta\) is identified by \(\beta^{\text{FE}}=\frac{\text{Cov}(Y_i,\tilde X_{ij})}{\text{Var}(\tilde X_{ij})}\) (17.4) with \(\tilde X_{ij}=X_{ij}-\bar X_j\), so
$$\beta^{\text{FE}}=\frac{\text{Cov}(Y_i,X_{ij}-\bar X_j)}{\text{Var}(X_{ij}-\bar X_j)} \tag{17.5}$$
the within estimator. Fixed effects solve inconsistency only when we believe the true model is (17.3); with multiple possible fixed effects, present results both with and without FE, and with different FE, to uncover the economic story and show robustness. For example, the table below (Mian-Sufi 2009) shows the effect of productivity growth on mortgage credit growth varies with the level of fixed effects (region / division / state / county):
| (1) | (2) | (3) | (4) | (5) | |
|---|---|---|---|---|---|
| IRS income growth 02–05 | 0.089 | −0.449** | −0.488** | −0.768** | −0.662** |
| (0.098) | (0.092) | (0.092) | (0.090) | (0.089) | |
| Constant | 0.189** | 0.215** | 0.217** | 0.230** | 0.225** |
| (0.006) | (0.005) | (0.005) | (0.005) | (0.005) | |
| Fixed effects | None | Region | Division | State | County |
| Observations | 3014 | 3014 | 3014 | 3014 | 3014 |
| \(R^2\) | 0.000 | 0.166 | 0.175 | 0.247 | 0.380 |
Omitted variable bias (OVB). The "long" model \(Y_i=\alpha^l+\beta^l X_{1i}+\gamma X_{2i}+\varepsilon_i^l\) (17.6) and the "short" model \(Y_i=\alpha^s+\beta^s X_{1i}+\varepsilon_i^s\). Then
$$\beta^s=\beta^l+\pi_1\gamma \tag{17.7}$$
where \(\pi_1\) comes from \(X_{2i}=\pi_0+\pi_1 X_{1i}+u_i\), and OVB \(=\pi_1\gamma\) (not necessarily 0). Adding controls solves OVB, but the controls might also mess up \(X_{1i}\) (even near-perfect collinearity).
Heteroskedasticity and Clustering
对模型 \(Y=\mathbf X'\boldsymbol\beta+u\)(\(\mathbb{E}[\mathbf X u]=0\)、\(\mathbb{E}[\mathbf{XX}']<\infty\) 可逆),OLS 估计量的极限分布 \(\sqrt n(\hat{\boldsymbol\beta}_n-\boldsymbol\beta)\xrightarrow{d}\mathcal N(\mathbf 0,\boldsymbol\Omega)\),其中 \(\boldsymbol\Omega=\mathbb{E}[\mathbf{XX}']^{-1}\text{Var}(\mathbf X u)\mathbb{E}[\mathbf{XX}']^{-1}\)。
- 同方差 \(\text{Var}(u\mid\mathbf X)=\sigma^2\):\(\boldsymbol\Omega=\sigma^2\mathbb{E}[\mathbf{XX}']^{-1}\),一致估计为
$$\left(\frac1n\sum_{i=1}^n\mathbf X_i\mathbf X_i'\right)^{-1}\hat\sigma_n^2 \tag{17.8}$$
- 异方差 \(\text{Var}(u\mid\mathbf X)=\sigma^2(\mathbf X)\):估计为
$$\left(\frac1n\sum\mathbf X_i\mathbf X_i'\right)^{-1}\left(\frac1n\sum\mathbf X_i\mathbf X_i'\hat u_i^2\right)\left(\frac1n\sum\mathbf X_i\mathbf X_i'\right)^{-1}$$
即(非聚类的)异方差稳健方差-协方差估计量。
- 聚类(组内相关):
$$\begin{cases}\text{Cov}(u_m,u_n\mid\mathbf X_m,\mathbf X_n)=0 & m,n\text{ different groups}\\[2pt] \text{Cov}(u_m,u_n\mid\mathbf X_m,\mathbf X_n)\ne0 & m,n\text{ same group}\end{cases} \tag{17.9}$$
设 \(J\) 组。OLS 估计量 \(\hat{\boldsymbol\beta}_n=\boldsymbol\beta+\left(\frac1n\sum\mathbf X_i\mathbf X_i'\right)^{-1}\left[\frac1n\sum\mathbf X_i u_i\right]\) (17.10)。由 (17.9) 跨组协方差为零,
$$\text{Var}\left(\frac1n\sum\mathbf X_i u_i\right)=\frac1{n^2}\sum_{j=1}^J\text{Var}\left(\sum_{i\in\text{Group }j}\mathbf X_{ij}u_{ij}\right) \tag{17.11}$$
故 OLS 估计量方差-协方差估计为聚类稳健 (cluster robust) 估计量
$$\left(\sum\mathbf X_i\mathbf X_i'\right)^{-1}\left(\sum_{j=1}^J\hat{\boldsymbol\mu}_j\hat{\boldsymbol\mu}_j'\right)\left(\sum\mathbf X_i\mathbf X_i'\right)^{-1},\qquad \hat{\boldsymbol\mu}_j=\sum_{i\in\text{Group }j}\mathbf X_{ij}\hat u_{ij} \tag{17.12}$$
异方差与聚类是两回事:异方差关乎 \(\mathbf X,u\) 分布的结构(不涉样本相关);聚类关乎条件误差的样本相关(不涉 \(u\) 作为 \(\mathbf X\) 函数的方差)。聚类稳健估计量已是异方差稳健的(它不像 (17.8) 那样作同方差简化)。
For the model \(Y=\mathbf X'\boldsymbol\beta+u\) (\(\mathbb{E}[\mathbf X u]=0\), \(\mathbb{E}[\mathbf{XX}']<\infty\) invertible), the OLS estimator's limiting distribution is \(\sqrt n(\hat{\boldsymbol\beta}_n-\boldsymbol\beta)\xrightarrow{d}\mathcal N(\mathbf 0,\boldsymbol\Omega)\), with \(\boldsymbol\Omega=\mathbb{E}[\mathbf{XX}']^{-1}\text{Var}(\mathbf X u)\mathbb{E}[\mathbf{XX}']^{-1}\).
- Homoskedasticity \(\text{Var}(u\mid\mathbf X)=\sigma^2\): \(\boldsymbol\Omega=\sigma^2\mathbb{E}[\mathbf{XX}']^{-1}\), consistently estimated by
$$\left(\frac1n\sum_{i=1}^n\mathbf X_i\mathbf X_i'\right)^{-1}\hat\sigma_n^2 \tag{17.8}$$
- Heteroskedasticity \(\text{Var}(u\mid\mathbf X)=\sigma^2(\mathbf X)\): estimated by
$$\left(\frac1n\sum\mathbf X_i\mathbf X_i'\right)^{-1}\left(\frac1n\sum\mathbf X_i\mathbf X_i'\hat u_i^2\right)\left(\frac1n\sum\mathbf X_i\mathbf X_i'\right)^{-1}$$
the (unclustered) heteroskedasticity-robust variance-covariance estimator.
- Clustering (within-group correlation):
$$\begin{cases}\text{Cov}(u_m,u_n\mid\mathbf X_m,\mathbf X_n)=0 & m,n\text{ different groups}\\[2pt] \text{Cov}(u_m,u_n\mid\mathbf X_m,\mathbf X_n)\ne0 & m,n\text{ same group}\end{cases} \tag{17.9}$$
With \(J\) groups, the OLS estimator \(\hat{\boldsymbol\beta}_n=\boldsymbol\beta+\left(\frac1n\sum\mathbf X_i\mathbf X_i'\right)^{-1}\left[\frac1n\sum\mathbf X_i u_i\right]\) (17.10). By (17.9) cross-group covariances are zero,
$$\text{Var}\left(\frac1n\sum\mathbf X_i u_i\right)=\frac1{n^2}\sum_{j=1}^J\text{Var}\left(\sum_{i\in\text{Group }j}\mathbf X_{ij}u_{ij}\right) \tag{17.11}$$
so the variance-covariance is the cluster-robust estimator
$$\left(\sum\mathbf X_i\mathbf X_i'\right)^{-1}\left(\sum_{j=1}^J\hat{\boldsymbol\mu}_j\hat{\boldsymbol\mu}_j'\right)\left(\sum\mathbf X_i\mathbf X_i'\right)^{-1},\qquad \hat{\boldsymbol\mu}_j=\sum_{i\in\text{Group }j}\mathbf X_{ij}\hat u_{ij} \tag{17.12}$$
Heteroskedasticity and clustering are two different things: heteroskedasticity is about the structure of the distribution of \(\mathbf X\) and \(u\) (nothing about sample correlation); clustering is about the sample correlation of conditional errors (nothing about the variance of \(u\) as a function of \(\mathbf X\)). The cluster-robust estimator is already heteroskedasticity-robust (it doesn't make the homoskedasticity simplification of (17.8)).
17.3.3 Instrumental Variable
经典测量误差. 真实模型 \(Y_i=\alpha+\beta X_i^*+\varepsilon_i\),\(X_i^*\) 不可观测、\(\text{Cov}(\varepsilon_i,X_i^*)=0\)。以误测代理 \(X_i=X_i^*+e_i\) 替代(\(\text{Cov}(X_i^*,e_i)=0\)、\(\text{Cov}(\varepsilon_i,e_i)=0\),\(e_i\) 为经典测量误差)。OLS 回归 \(Y_i=\alpha+\beta^{\text{OLS}}X_i+\epsilon_i\) 得
$$\beta^{\text{OLS}}=\underbrace{\frac{\text{Var}(X_i^*)}{\text{Var}(X_i^*)+\text{Var}(e_i)}}_{\text{attenuation bias}}\beta \tag{17.13}$$
向 0 偏(衰减偏误),证明见 (17.14)。
IV 解经典测量误差. 工具 \(Z_i\) 满足相关性 \(\text{Cov}(X_i^*,Z_i)\ne0\)、排他性 \(\text{Cov}(\varepsilon_i,Z_i)=\text{Cov}(e_i,Z_i)=0\)。IV 估计量
$$\beta^{\text{IV}}=\frac{\text{Cov}(Y_i,Z_i)}{\text{Cov}(X_i,Z_i)} \tag{17.15}$$
识别 \(\beta\)。
Remark 17.1 即便经典测量误差只发生在控制变量上也可能是大问题。设真实模型 \(Y_i=\alpha+\beta_1 X_1+\beta_2 X_2^*+\varepsilon_i\)、\(X_2^*\) 误测为 \(X_2=X_2^*+e\)(\(X_2\) 仅为控制)。由 (17.14) \(\beta_2^{\text{OLS}}\) 向 0 偏;若 \(\text{Cov}(X_1,X_2^*)>0\) 且 \(\beta_1\cdot\beta_2>0\),则 \(\beta_1^{\text{OLS}}\) 反远离 0 偏,使 \(X_1\) 的显著性可疑。
IV 解 OVB. 真实 \(Y_i=\alpha+\beta_1 X_{1i}+\gamma X_{2i}+\varepsilon_i\),但只估 \(Y_i=\alpha+\beta_1^{\text{OVB}}X_{1i}+\eta_i\)(\(\eta_i=\gamma X_{2i}+\varepsilon_i\)),\(\beta_1^{\text{OVB}}\ne\beta_1\)。若工具 \(Z_i\) 满足相关性 \(\text{Cov}(X_{1i},Z_i)\ne0\)、排他性 \(\text{Cov}(\eta_i,Z_i)=0\)(\(Z\) 只能经 \(X_1\) 影响 \(Y\)),则
$$\beta_1^{\text{IV}}=\frac{\text{Cov}(Y_i,Z_i)}{\text{Cov}(X_{1i},Z_i)} \tag{17.16}$$
识别 \(\beta_1\)(证明 (17.17))。
Remark 17.2 / 17.3 17.2:排他性约束无法被检验,须用文字论证;通常满足排他性的工具是被外生地开启、除经 \(X_1\) 外无从影响 \(Y\)。17.3:(17.15)/(17.16) 中分子称简约式系数 (reduced form coefficient)、分母称第一阶段系数 (first stage coefficient)。
Classical measurement error. True model \(Y_i=\alpha+\beta X_i^*+\varepsilon_i\), \(X_i^*\) unobservable, \(\text{Cov}(\varepsilon_i,X_i^*)=0\). Replace with a mismeasured proxy \(X_i=X_i^*+e_i\) (\(\text{Cov}(X_i^*,e_i)=0\), \(\text{Cov}(\varepsilon_i,e_i)=0\), \(e_i\) the classical measurement error). The OLS regression \(Y_i=\alpha+\beta^{\text{OLS}}X_i+\epsilon_i\) gives
$$\beta^{\text{OLS}}=\underbrace{\frac{\text{Var}(X_i^*)}{\text{Var}(X_i^*)+\text{Var}(e_i)}}_{\text{attenuation bias}}\beta \tag{17.13}$$
biased towards 0 (attenuation bias), proven in (17.14).
IV solves classical measurement error. An instrument \(Z_i\) with relevance \(\text{Cov}(X_i^*,Z_i)\ne0\) and exclusion \(\text{Cov}(\varepsilon_i,Z_i)=\text{Cov}(e_i,Z_i)=0\). The IV estimator
$$\beta^{\text{IV}}=\frac{\text{Cov}(Y_i,Z_i)}{\text{Cov}(X_i,Z_i)} \tag{17.15}$$
identifies \(\beta\).
Remark 17.1 It could be a big problem even when classical measurement error only happens to a control variable. Suppose the true model is \(Y_i=\alpha+\beta_1 X_1+\beta_2 X_2^*+\varepsilon_i\) with \(X_2^*\) mismeasured as \(X_2=X_2^*+e\) (\(X_2\) just a control). By (17.14) \(\beta_2^{\text{OLS}}\) is biased toward 0; if \(\text{Cov}(X_1,X_2^*)>0\) and \(\beta_1\cdot\beta_2>0\), then \(\beta_1^{\text{OLS}}\) is biased away from 0, making \(X_1\)'s significance questionable.
IV solves OVB. True \(Y_i=\alpha+\beta_1 X_{1i}+\gamma X_{2i}+\varepsilon_i\), but we only estimate \(Y_i=\alpha+\beta_1^{\text{OVB}}X_{1i}+\eta_i\) (\(\eta_i=\gamma X_{2i}+\varepsilon_i\)), with \(\beta_1^{\text{OVB}}\ne\beta_1\). With an instrument \(Z_i\) satisfying relevance \(\text{Cov}(X_{1i},Z_i)\ne0\) and exclusion \(\text{Cov}(\eta_i,Z_i)=0\) (\(Z\) affects \(Y\) only through \(X_1\)), then
$$\beta_1^{\text{IV}}=\frac{\text{Cov}(Y_i,Z_i)}{\text{Cov}(X_{1i},Z_i)} \tag{17.16}$$
identifies \(\beta_1\) (proven in (17.17)).
Remark 17.2 / 17.3 17.2: the exclusion restriction cannot be tested, and must be argued by words; usually a valid instrument is exogenously turned on and has no way to affect \(Y\) other than through \(X_1\). 17.3: in (17.15)/(17.16), the numerator is the reduced-form coefficient and the denominator the first-stage coefficient.
17.3.4 2SLS
设真实模型 \(Y_i=\alpha+\beta X_i+\boldsymbol\gamma\cdot\mathbf V_i+\varepsilon_i\)(\(\boldsymbol\gamma,\mathbf V_i\) 为向量),兴趣在 \(\beta\)。若工具 \(Z_i\) 满足相关性与排他性,则两步得真 \(\beta\):
- 第一阶段:\(X\) 对 \(Z\) 回归 \(X_i=\mu+\phi Z_i+\epsilon_i\),得 \(\mu^{2SLS},\phi^{2SLS}\),构造 \(\hat X_i=\mu^{2SLS}+\phi^{2SLS}Z_i\)。
- 第二阶段:\(Y\) 对 \(\hat X\) 回归 \(Y_i=\xi+\lambda\hat X_i+e_i\),则 \(\lambda^{2SLS}=\beta\)。
2SLS 相对 IV 的好处:更灵活。 两阶段都可加额外控制(且两阶段可不同);第一阶段可用多个工具、并赋最优权重以最优构造 \(\hat X\) 供第二阶段。
例:Shue-Townsend (2017) 研究高管期权授予对其风险承担的影响。难点:可能有遗漏变量同时影响期权授予与风险承担(更冒险的公司更倾向用股权薪酬)。该文利用高管薪酬有固定周期:工具为虚拟变量 \(I_{ijt}^{\text{PredictedFY}}\),标识公司 \(i\)(行业 \(j\)、年 \(t\))每个周期中由上一周期长度预测的首年。预测首年虚拟变量由固定周期长度决定、不随遗漏变量共动 → 满足排他性;高管薪酬在每周期首年调整、与期权授予相关 → 满足相关性。第一阶段 \(\Delta O_{ijt}=\beta_0+\beta_1 I_{ijt}^{\text{PredictedFY}}+\gamma_t+\psi_j+\mu_{ijt}\)(\(\gamma_t\) 年 FE、\(\psi_j\) 行业 FE);第二阶段 \(\Delta y_{ijt}=\delta_0+\delta_1\Delta\hat O_{ijt}+\gamma_t+\psi_j+e_{ijt}\)。结果:更多期权授予提高公司股价波动(经理获更多期权后增加风险以最大化自身价值)。2SLS 的局限:识别的是局部平均处置效应 (LATE),仅对依从者 (compliers) 子群成立,外部效度难以论证。
Suppose the true model is \(Y_i=\alpha+\beta X_i+\boldsymbol\gamma\cdot\mathbf V_i+\varepsilon_i\) (\(\boldsymbol\gamma,\mathbf V_i\) vectors), and we want \(\beta\). If \(Z_i\) satisfies relevance and exclusion, two steps obtain the true \(\beta\):
- Stage 1: regress \(X\) on \(Z\), \(X_i=\mu+\phi Z_i+\epsilon_i\), obtain \(\mu^{2SLS},\phi^{2SLS}\), construct \(\hat X_i=\mu^{2SLS}+\phi^{2SLS}Z_i\).
- Stage 2: regress \(Y\) on \(\hat X\), \(Y_i=\xi+\lambda\hat X_i+e_i\), then \(\lambda^{2SLS}=\beta\).
Benefits of 2SLS over IV: more flexibility. It accommodates additional controls in both stages (which can differ across stages); the first stage allows multiple instruments with optimal weights to optimally construct \(\hat X\) for the second stage.
Example: Shue-Townsend (2017) study the effect of executive option grants on risk-taking. Difficulty: omitted variables may affect both option granting and risk-taking (riskier firms tend to use more equity compensation). They exploit that executive compensation has fixed cycles: the instrument is a dummy \(I_{ijt}^{\text{PredictedFY}}\) identifying the predicted first year in each cycle of firm \(i\) (industry \(j\), year \(t\)), determined by the length of the last cycle. The predicted-first-year dummy is pinned down by the fixed cycle length, not co-moving with omitted variables → exclusion satisfied; compensation is adjusted in the first year of each cycle, correlated with the option grant → relevance. First stage \(\Delta O_{ijt}=\beta_0+\beta_1 I_{ijt}^{\text{PredictedFY}}+\gamma_t+\psi_j+\mu_{ijt}\) (\(\gamma_t\) year FE, \(\psi_j\) industry FE); second stage \(\Delta y_{ijt}=\delta_0+\delta_1\Delta\hat O_{ijt}+\gamma_t+\psi_j+e_{ijt}\). Result: more option grants increase the firm's stock volatility (the manager raises risk to maximize his own value after receiving more options). Limitation of 2SLS: it identifies a local average treatment effect (LATE), only for the subgroup of compliers, so external validity is hard to justify.
17.3.5 Regression Discontinuity & 17.3.6 Difference in Differences
断点回归 (RD). 兴趣在 \(X\) 对 \(Y\) 的影响,但有遗漏变量同时驱动 \(X,Y\)。设连续 running 变量 \(R\) 在某点 \(R=r\) 处不连续地关联到 \(X\),即 \(\lim_{R\downarrow r}X(R)-\lim_{R\uparrow r}X(R)>0\)。在 \(R=r\) 左右,\(R\) 上移无穷小量,其余遗漏变量不动而 \(X\) 动,故计算
$$\lim_{R\downarrow r}Y(R)-\lim_{R\uparrow r}Y(R)$$
可识别纯由 \(X\) 贡献的 \(Y\) 之变异。
Remark 17.4 RD 是一种与参数/非参数估计方法正交的实证策略,可用于二者。线性回归版的 RD 只需一个标识是否越过 running 变量阈值的虚拟变量,形似 DiD。但概念不同:DiD 若被解读为 RD 则隐含以时间为 running 变量(需论证);DiD 需对照组 + 平行趋势假设,RD 不依赖对照组;RD 要求 running 变量连续,DiD 无需任何连续。故概念上视为两种独立策略,尽管线性回归里很像。
双重差分 (DiD). 两组:处置组与对照组。处置组在 \(t=\tau\) 受单一冲击;共同趋势假设——两组在处置前后都有相同时间趋势、彼此平行。DiD 回归
$$Y_{it}=\alpha+\beta\,\text{TREAT}_i+\gamma\,\text{POST}_t+\delta\,\text{TREAT}_i\times\text{POST}_t+\varepsilon_{it}$$
\(\text{TREAT}_i\) 标识处置组、\(\text{POST}_t\) 标识处置后各期(\(t\ge\tau\) 时 \(\text{POST}_t>0\)),兴趣参数为交互项系数 \(\delta\)。例:Richardson-Troost (2009) 研究央行流动性供给对银行倒闭的影响:利用 Caldwell and Company 1930 年 11 月 7 日倒闭、其一部分归圣路易斯联储(不救助)、另一部分归亚特兰大联储(救助);合理假设两部分时间趋势相同,取两个跨期差分之差即得救助对银行倒闭(如停业、清算)的效应估计。
Regression discontinuity (RD). We want the effect of \(X\) on \(Y\), but omitted variables drive both. Suppose a continuous running variable \(R\) relates to \(X\) discontinuously at some point \(R=r\), i.e. \(\lim_{R\downarrow r}X(R)-\lim_{R\uparrow r}X(R)>0\). Just left and right of \(R=r\), \(R\) moves up infinitesimally, the other omitted variables don't move but \(X\) does, so computing
$$\lim_{R\downarrow r}Y(R)-\lim_{R\uparrow r}Y(R)$$
identifies the variation in \(Y\) contributed purely by \(X\).
Remark 17.4 RD is an empirical strategy orthogonal to parametric/non-parametric estimation methods, applicable to both. The linear-regression version of RD only requires a dummy indicating passing the threshold of the running variable, which looks like DiD. But conceptually they differ: DiD, if interpreted as RD, implicitly uses time as the running variable (which needs justification); DiD requires a control group + parallel-trend assumption, while RD doesn't rely on a control group; RD requires continuity in the running variable, while DiD needs nothing to be continuous. So conceptually they are two separate strategies even though they look similar in linear regressions.
Difference-in-differences (DiD). Two groups: treatment and control. The treatment group receives a single shock at \(t=\tau\); the common-trend assumption — both groups have the same time trend, parallel before and after the treatment. The DiD regression
$$Y_{it}=\alpha+\beta\,\text{TREAT}_i+\gamma\,\text{POST}_t+\delta\,\text{TREAT}_i\times\text{POST}_t+\varepsilon_{it}$$
\(\text{TREAT}_i\) identifies the treatment group, \(\text{POST}_t\) the post-treatment periods (\(\text{POST}_t>0\) for \(t\ge\tau\)), and the parameter of interest is the interaction coefficient \(\delta\). Example: Richardson-Troost (2009) study the effect of central-bank liquidity on bank failures: they use that Caldwell and Company collapsed on Nov 7, 1930, with part under the St. Louis Fed (no bailout) and part under the Atlanta Fed (bailout); assuming the two parts have the same time trend, taking the difference of the two across-time differences estimates the effect of bailing out on bank failures (e.g. suspension, liquidation).
References
- Angrist, J. D. and J. S. Pischke (2014). Mastering 'Metrics: The Path from Cause to Effect. Princeton University Press.
- He, X. (2019a). Econometrics Notes by Xindi He.
- Mian, A. and A. Sufi (2009). The consequences of mortgage credit expansion: Evidence from the US mortgage default crisis. The Quarterly Journal of Economics 124(4), 1449–1496.
- Modigliani, F. and M. H. Miller (1958). The cost of capital, corporation finance and the theory of investment. The American Economic Review 1, 3.
- Richardson, G. and W. Troost (2009). Monetary intervention mitigated banking panics during the great depression: quasi-experimental evidence from a federal reserve district border, 1929–1933. Journal of Political Economy 117(6), 1031–1073.
- Shue, K. and R. R. Townsend (2017). How do quasi-random option grants affect CEO risk-taking? The Journal of Finance 72(6), 2551–2588.