16. Which Method?

16. Which Method?

Note

本章导读 本章(Cochrane 第 16 章,Part II 收官)几乎全是方法论思辨,无公式无图。核心立场:在经典设定(线性因子、i.i.d. 正态收益)里 GMM/贴现因子并不更有效或更简单——回归方法那里很难超越;GMM 的真正价值在于透明处理非线性、含条件信息的复杂模型,并绕开不可避免的模型设定误差,把计量火力聚焦于有趣的问题。实证资产定价长期面临两种哲学的张力:统计效率 vs 对(经济与统计)设定误差的稳健性、以及结果的经济可解释性。作者反复论证:很多实践情形里,用一阶/简单 GMM、聚焦经济上可解释的矩,足够有效、更稳健、最终也更有说服力——不必非做完整的最大似然不可。

16. Which Method?

Note

Overview This chapter (Cochrane Ch 16, the capstone of Part II) is almost entirely methodological reflection, with no formulas or figures. The core stance: in the classic setup (linear factors, i.i.d. normal returns) GMM/discount factor is neither more efficient nor simpler — regression methods are hard to beat there; GMM's real value is its ability to transparently handle nonlinear, conditioning-information-rich complex models and to circumvent inevitable model misspecification, keeping the econometrics focused on interesting issues. Empirical asset pricing faces an enduring tension between two philosophies: statistical efficiency vs. robustness to (economic and statistical) specification error and the economic interpretability of results. The author argues repeatedly that in many practical cases, a first-stage/simple GMM focusing on economically interpretable moments is efficient enough, more robust, and ultimately more persuasive — a full maximum-likelihood estimate is not mandatory.

Important

"ML vs GMM" 是个伪命题 / "ML vs GMM" is a false dichotomy 把争论说成"最大似然 vs GMM"是错误的提法。ML 是 GMM 的特例——它建议一组在明确意义下统计最优的矩。一切都是 GMM;真正的问题是矩的选择:是用辅助统计模型挑出的(哪怕经济上不可解释的)矩,还是用因其经济或数据概括意义而选的(哪怕统计上不有效的)矩。而且根本没有"那个" GMM 估计——GMM 是灵活工具,\(a_T\) 与 \(g_T\) 任你选。二者都是有思想的研究者借以了解数据的工具,而非照搬即得真理的石碑;若不动脑地字面执行,ML 与 GMM 都会给出可怕的结果。(也不必把 GMM 与贴现因子、ML 与期望收益-贝塔强行配对——可对贴现因子用 ML,也可对贝塔模型用 GMM。)Framing the debate as "maximum likelihood vs. GMM" is a bad way to put it. ML is a special case of GMM — it suggests a particular set of moments that are statistically optimal in a well-defined sense. It is all GMM; the real issue is the choice of moments: moments selected by an auxiliary statistical model (even if economically uninterpretable) vs. moments selected for their economic or data-summary interpretation (even if not statistically efficient). And there is no such thing as "the" GMM estimate — GMM is a flexible tool, with any \(a_T\) and \(g_T\) you choose. Both are tools a thoughtful researcher uses to learn what the data say, not stone tablets giving directions that lead to truth if followed literally; followed thoughtlessly, both ML and GMM give horrendous results. (Nor must GMM be paired with the discount factor and ML with expected return-beta — one can do ML on a discount factor, or GMM on a beta model.)

ML 常被忽视 / ML Is Often Ignored

ML 加 i.i.d. 正态扰动,导出易解释、贴近模型经济内容的时序/截面回归。但收益并非正态或 i.i.d.:尾更肥、有异方差(高低波动期)、有自相关、可被多种变量预测。若真把 ML 追求效率的哲学贯彻到底,就该把这些特征都建模,得到不同的似然、其得分会规定与熟悉的回归不同的矩条件。有趣的是,几乎没人这么做——ML 在它建议易解释回归时受欢迎,建议别的时人们照跑回归不误。例如:ML 规定估 β 不带常数,但几乎人人带常数估;ML 规定 GLS 截面回归,但许多人用 OLS(不信 GLS 权矩阵);ML 要求在 β、协方差阵、截面回归间迭代,实证里却用各自的非约束估计。回归先有,ML 的形式化后到——若须假设收益服从伽马分布才能正当化回归,人们一定会把那当作 ML 的"假设"!这说明研究者并不真信其零假设(统计的与经济的)精确成立;他们要的是对合理设定误差稳健、易解释、贴近模型经济概念的估计与检验——读者能看出其稳健,故更有说服力。ML 并非为稳健或易解释而设计,它的卖点是效率:模型为真时做"对"的有效之事,却未必为"近似"模型做"合理"之事。

OLS vs GLS 截面回归 / OLS vs. GLS Cross-Sectional Regressions

ML Is Often Ignored

ML plus i.i.d. normal disturbances leads to easily interpretable time-series/cross-sectional regressions close to the model's economic content. But returns are not normal or i.i.d.: fatter tails, heteroskedastic (high- and low-volatility periods), autocorrelated, predictable from many variables. Taking the ML quest for efficiency seriously, one should model these features, yielding a different likelihood whose scores prescribe moment conditions different from the familiar regressions. Interestingly, few do this — ML is fine when it suggests interpretable regressions, and when it suggests otherwise people run the regressions anyway. For example: ML prescribes estimating β without a constant, yet β's are almost universally estimated with one; ML prescribes a GLS cross-sectional regression, yet many use OLS (distrusting the GLS weighting matrix); true ML requires iterating among betas, the covariance matrix, and the cross-sectional regression, yet applications use the unconstrained estimates of each. The regressions came first, the ML formalization later — if we had to assume returns were gamma-distributed to justify the regressions, we'd surely make that the ML "assumption" instead of i.i.d. normal! This shows researchers do not really believe their null hypotheses (statistical and economic) are exactly correct; they want estimates robust to reasonable misspecification, interpretable, and close to the model's economic concepts — persuasive because the reader can see they are robust. ML was not designed for robustness or interpretability; its selling point is efficiency: it does the "right" efficient thing if the model is true, not the "reasonable" thing for "approximate" models.

OLS vs. GLS Cross-Sectional Regressions

Important

OLS vs GLS:效率与稳健的具体抉择 / OLS vs. GLS: where the trade-off crystallizes GLS 与二阶 GMM 的渐近效率来自协方差/谱密度阵已收敛到总体值;它们用这些矩阵找"测得准"的组合(GLS 找残差方差小的、GMM 找贴现收益方差小的)。危险在于:有限样本里这些量估得差,样本最小方差组合与总体的几无关系。对一个完美模型这不算大问题;但一个很好却不完美的模型,可能很好地给基础组合定价、却很差地给"含强多空头寸"的奇怪线性组合定价(这些头寸因交易/保证金/卖空约束实际不在支付空间内)。故真正的危险是虚假样本最小方差组合与模型设定误差的相互作用。有趣的是,Kandel-Stambaugh (1995)、Roll-Ross (1995) 反而因模型设定误差支持 GLS:只要有任何设定误差(定价误差不恰为零、市场代理不恰在前沿上),就存在能让"期望收益-β 图"任意好或任意坏的组合(取多空 α 即得巨大 α)。GLS 的关键优点是对组合重新打包不变(OLS 的 \(\hat\lambda=(\beta'\beta)^{-1}\beta'E(R^e)\) 依赖打包 \(A\),GLS 的 \(\hat\lambda=(\beta'\Sigma^{-1}\beta)^{-1}\beta'\Sigma^{-1}E(R^e)\) 不变;HJ 二阶矩权也有此性质)。但这只是事实,不证明 OLS 选的组合好坏——若你不认为 GLS 选的组合有信息量,正可用 OLS 聚焦于经济上有趣的组合。抉择微妙地取决于检验目的:想证伪模型,GLS 帮你聚焦最有力的组合(有效检验本该如此);但很多模型虽错却"相当好",丢掉"模型很好地给有趣组合定价"的信息太可惜。明智的折中:报告"有趣"组合上的 OLS 估计,同时报告显示模型被拒的 GLS 检验统计量——这恰是典型的事实集合。GLS and second-stage GMM gain asymptotic efficiency when the covariance/spectral density matrices have converged to population values; they use these to find well-measured portfolios (small residual variance for GLS, small variance of discounted return for GMM). The danger: in finite samples these are poorly estimated, and sample minimum-variance portfolios bear little relation to population ones. For a perfect model this is not a big problem; but a model that is very good yet imperfect may price a basic set of portfolios well while pricing strange long-short combinations badly (positions really outside the payoff space given transactions/margin/short-sale constraints). So the real danger is the interaction of spurious sample minimum-variance portfolios with the model's specification errors. Interestingly, Kandel-Stambaugh (1995) and Roll-Ross (1995) argue for GLS, also from misspecification: so long as there is any misspecification (pricing errors not exactly zero, the market proxy not exactly on the frontier), there exist portfolios producing arbitrarily good or bad expected-return-beta plots (go long positive-α and short negative-α securities for a huge α). GLS's key feature is invariance to repackaging of portfolios (OLS's \(\hat\lambda=(\beta'\beta)^{-1}\beta'E(R^e)\) depends on the repackaging \(A\); GLS's \(\hat\lambda=(\beta'\Sigma^{-1}\beta)^{-1}\beta'\Sigma^{-1}E(R^e)\) does not; the HJ second-moment weight shares this). But this is a fact, not proof that OLS picks good or bad portfolios — if you don't think GLS's choice is informative, use OLS precisely to focus on economically interesting portfolios. The choice subtly depends on the test's purpose: to prove the model wrong, GLS focuses on the most informative portfolios (as an efficient test should); but many models are wrong yet "pretty darn good," and it is a shame to discard the information that the model prices an interesting set of portfolios well. The sensible compromise: report the OLS estimate on "interesting" portfolios, and also report the GLS test statistic showing the model is rejected — which is the typical collection of facts.

用效率换稳健的更多例子 / More Examples of Trading Efficiency for Robustness

  • 低频时序模型。 估 AR(1) \(y_t=\rho y_{t-1}+\varepsilon_t\) 时 ML 最小化一步预测误差方差 \(E(\varepsilon_t^2)\),但任何时序模型都只是近似,研究者的目标也未必是一步预测。研究长期债券收益、或长期均值回复时,我们只关心自相关之和(写 \(p_t=a(1)\varepsilon_t\),只想知道 \(a(1)\))——最小化一步误差的模型可能与最佳刻画长期行为的模型大不相同。
  • Lucas 货币需求。 货币与收入在水平上跑回归得合理弹性 \(b\approx1\) 但误差强自相关;按标准建议做 GLS(约等于差分),误差通过了 Durbin-Watson 但 \(b\) 大降、无经济意义且不稳定。Lucas 意识到差分丢掉了数据中位于趋势里的大部分信息、只盯高频噪声;故"低效"的水平回归(标准误经相关性修正)才是该看的。ML/GLS 不知道数据里有"噪声",于是倒掉了孩子、留下了洗澡水。
  • 随机奇异性与校准。 利率期限结构模型与 RBC 模型用少数冲击生成多条时间序列,故模型预测存在"无误差项"的序列组合(随机奇异)。即便模型有丰富有趣的含义,ML 也会抓住这个经济上无趣的奇异性、拒绝估参、把任何此类模型判为 \(-\infty\) 对数似然而否决(如线性二次持久收入模型预测 \(c,y\) 间确定关系;仿射期限结构预测所有收益率是 K 个状态变量的确定函数——实际 N 个收益率总需 N 个冲击)。

应对模型设定误差 / Addressing Model Misspecification

More Examples of Trading Efficiency for Robustness

  • Low-frequency time-series models. Estimating an AR(1) \(y_t=\rho y_{t-1}+\varepsilon_t\), ML minimizes one-step-ahead forecast error variance \(E(\varepsilon_t^2)\), but any time-series model is only an approximation and the researcher's objective may not be one-step forecasting. Studying long-term bond yields or long-horizon mean reversion, we care only about the sum of autocorrelations (writing \(p_t=a(1)\varepsilon_t\), we want \(a(1)\)) — the model minimizing one-step error may differ greatly from the one best matching long-run behavior.
  • Lucas's money demand. Regressing money on income in log-levels gives a sensible elasticity \(b\approx1\) but a strongly autocorrelated error; following standard advice, GLS (≈ first-differencing) passes Durbin-Watson but yields a much lower \(b\) that makes no economic sense and is unstable. Lucas realized that differencing threw out most of the information, which was in the trend, and focused on high-frequency noise; so the "inefficient" levels regression (with correlation-corrected standard errors) is the right one. ML/GLS did not know there was "noise" in the data, so they threw out the baby and kept the bathwater.
  • Stochastic singularities and calibration. Term-structure and RBC models generate many series from a few shocks, so they predict combinations of the series with no error term (stochastic singularity). Even with rich interesting implications, ML seizes on this economically uninteresting singularity, refuses to estimate parameters, and rejects any such model with a \(-\infty\) log-likelihood (e.g. the linear-quadratic permanent-income model predicts a deterministic relation between \(c\) and \(y\); affine term-structure models predict all yields are deterministic functions of K state variables — but actual N yields always require N shocks).

Addressing Model Misspecification

Important

GMM 让你评价"被设错"的模型 / GMM lets you evaluate misspecified models ML 对设定误差的回答是:"把正确模型写下来再做 ML"——误差相关就建模协方差阵做 GLS,担心代理误差/交易成本/时间聚合/非正态/时变 β 就统统写下来再 ML。可这终究不可行:经济学研究的是定量寓言而非完全设定的模型;我们无法定量描述所有可能的设定误差(若你知道怎么建模它们,它们就不在那儿了)。给 RBC 加"测量误差"以打破奇异性即是例子——但假定的测量误差结构反过来决定了 ML 关注哪些矩,且认真建模测量误差又把我们拉离模型中经济上有趣的部分(既然往往最终归结到合理的矩,何不一开始就直接指定那些矩?)。judiciously 使用的 GMM 让我们把统计火力对准"有趣"的预测、无视世界不符合"无趣"简化的事实:ML 只给 OLS(标准误错)或 GLS(小样本不可信/聚焦无趣处)之选,GMM 则允许你保留 OLS 估计、却为非 i.i.d. 修正标准误;更一般地,允许你指定一组经济上有趣、或你认为对设定误差稳健的矩,而无需说清究竟何种误差使这些矩"最优"。同时 GMM 也灵活地把统计设定误差纳入分布理论(如明知收益非 i.i.d. 正态仍用时序回归——估计不偏,但 ML 公式吐出的标准误不一致,GMM 给出修正)。RBC 的"校准"其实多半就是用经济上合理的矩(产出增长、消费/产出比等)做的 GMM 估计,以避开会使 ML 崩溃的随机奇异性。"judiciously"是重要限定——很多 GMM 估计因随意选矩/资产/工具而受损(如行业组合平均收益几无差异、第七阶滞后几无预测力)。ML's answer to misspecification is "specify the right model, then do ML" — if errors are correlated model the covariance and do GLS; if worried about proxy errors/transaction costs/time aggregation/nonnormality/time-varying β, write them all down and do ML. But this is ultimately infeasible: economics studies quantitative parables, not fully specified models; we cannot quantitatively describe all possible specification errors (if you knew how to model them, they wouldn't be there). Adding "measurement errors" to RBC models to break the singularity is an example — but the assumed error structure then drives which moments ML attends to, and seriously modeling them takes us further from the economically interesting parts (since this often ends up specifying sensible moments anyway, why not specify the sensible moments in the first place?). GMM used judiciously lets us direct the statistical effort at the "interesting" predictions while ignoring that the world does not match the "uninteresting" simplifications: ML offers only OLS (wrong standard errors) or GLS (untrusted in small samples / focused on uninteresting parts), whereas GMM lets you keep an OLS estimate yet correct the standard errors for non-i.i.d.; more generally, it lets you specify an economically interesting or robustly-chosen set of moments without spelling out exactly what misspecification makes those moments "optimal." GMM also flexibly incorporates statistical misspecification into the distribution theory (e.g. knowing returns are not i.i.d. normal but using the time-series regression anyway — the estimate is not inconsistent, but ML's standard errors are; GMM corrects them). RBC "calibration" is often just a GMM parameter estimate using economically sensible moments (output growth, consumption/output ratios) to avoid the stochastic singularity that would doom ML. "Judiciously" is an important qualifier — many GMM estimations suffer from thoughtless choice of moments/assets/instruments (industry portfolios have almost no spread in average returns; the seventh lag predicts little).

辅助模型与有限样本分布 / Auxiliary Model and Finite-Sample Distributions

ML 需要一个辅助统计模型(如收益与因子联合 i.i.d. 正态);模型越复杂越现实,就越多精力耗在估计这个辅助统计模型上,而 ML 无从知道哪些参数(\(a,b;\beta,\lambda\)、风险厌恶 \(\gamma\))比另一些(描述时变条件矩的参数)更"重要"。GMM 的便利在于无需这样的辅助模型——直接从 \(p=\mathbb E(mx)\) 到矩条件、估计与分布理论。至于"GRS 有有限样本分布、不信 GMM 渐近分布"之说,力道不大:有限样本分布只在收益真为 i.i.d. 正态、因子完美测量时才成立;既然这些假设不成立,"忽略非 i.i.d. 的有限样本分布"未必比"修正了非 i.i.d. 的渐近分布"更好。况且选定估计方法后,给定辅助模型求其有限样本分布很简单——跑蒙特卡洛或自助法即可。当"非参数"自相关/异方差修正很大、或矩数相对样本太多、或所选矩很低效时,GMM 渐近分布对有限样本可能近似不佳;此时不必抗拒用参数时序模型去估计 GMM 分布理论里的项(如用 AR(1) 算 \(\sum\rho^j\) 而非一长串自相关)。

该用 ML 的情形 / The Case for ML

经典设定里 ML 相对"对定价误差做 GMM"的效率增益微乎其微;但有些情形统计动机的矩确有重要效率优势。例如 Jacquier-Polson-Rossi (1994) 的随机波动率模型 \(dS/S=\mu dt+V dZ_1,\ dV=\mu_V dt+\sigma dZ_2\)(\(S\) 可观测、\(V\) 不可),直观矩(平方收益或绝对收益的自相关)远不如 ML 得分有效——当然这预设模型 (16.1) 恰好为真。即便经典 OLS vs GLS,若误差协方差阵异方差极端,OLS 会把全部精力花在不重要的数据点上,此时"judicious"的 GMM(OLS) 至少需做单位变换以免 OLS 极度低效。

统计哲学 / Statistical Philosophy

Auxiliary Model and Finite-Sample Distributions

ML requires an auxiliary statistical model (e.g. returns and factors jointly i.i.d. normal); the more complex and realistic it becomes, the more effort goes into estimating it, and ML has no way of knowing which parameters (\(a,b;\beta,\lambda\), risk aversion \(\gamma\)) are more "important" than others (those describing time-varying conditional moments). GMM's convenience is that it needs no such auxiliary model — going straight from \(p=\mathbb E(mx)\) to moment conditions, estimates, and distribution theory. As for "GRS has a finite-sample distribution, distrust GMM's asymptotics," this has little force: the finite-sample distribution only holds if returns really are i.i.d. normal and the factor is perfectly measured; since these fail, a finite-sample distribution ignoring non-i.i.d. returns is not obviously better than an asymptotic one that corrects for them. And once you've picked an estimation method, finding its finite-sample distribution given an auxiliary model is simple — just run a Monte Carlo or bootstrap. When "nonparametric" autocorrelation/heteroskedasticity corrections are large, or the number of moments is large relative to the sample, or the chosen moments are very inefficient, GMM's asymptotic distribution can approximate the finite sample poorly; then don't resist using a parametric time-series model to estimate the terms of the GMM distribution theory too (e.g. compute \(\sum\rho^j\) from an AR(1) rather than a long sum of autocorrelations).

The Case for ML

In the classic setup ML's efficiency gain over GMM on the pricing errors is tiny; but in some cases statistically motivated moments have important efficiency advantages. For example Jacquier-Polson-Rossi's (1994) stochastic-volatility model \(dS/S=\mu dt+V dZ_1,\ dV=\mu_V dt+\sigma dZ_2\) (\(S\) observed, \(V\) not): the intuitive moments (autocorrelation of squared or absolute returns) are far less efficient than the ML scores — presuming, of course, that the model (16.1) is exactly true. Even in the canonical OLS-vs-GLS case, a wildly heteroskedastic error covariance can make OLS spend all its effort fitting unimportant data points, so a "judicious" GMM(OLS) needs at least a transformation of units to avoid being wildly inefficient.

Statistical Philosophy

Important

"要一个模型才能打败一个模型,而非一次拒绝" / "It takes a model to beat a model, not a rejection" 有说服力、真正改变了人们对数据与模型理解的实证工作,长得很不像教科书里宣讲的统计理论。CAPM 被拒多年后仍被教授、相信、使用——它退场不是因为被拒,而是因为多因子模型提供了另一套自洽的世界观(而多因子模型也被拒!)。即便评价单个模型,最有趣的计算也来自考察具体备择而非笼统的定价误差检验:CAPM 倒下,是因为发现规模、账面市值比等特征进入截面回归,而非因为通用定价误差检验拒绝了它。有影响力的实证工作讲一个故事——若读者看不出是哪些 stylized facts 驱动结果,最有效的程序也说服不了人。聚焦"模型能否解释有趣组合的截面平均收益"的检验,终究比"模型能否解释第二个组合的五阶矩"更有说服力,哪怕 ML 认为后者统计上信息量大得多。Fama-French (1988b, 1993) 即典范:长期可预测性在显著性边缘、多因子模型被 GRS 拒,但它们用平均收益与 β 的表格讲清了驱动结果的稳健事实——统计理论里没有这种表格的位置,但它比一堆 χ² 值有说服力得多。把 t 值从 1.5 改到 2.5 从未实质改变人们对一个问题的看法。统计检验只是评价理论的众多问题之一,且通常不是最重要的——这是对学界如何从一个理论走向另一个理论的实证描述,而非规范主张。Popper/经典统计决策论/Friedman 的"可证伪假设"哲学含一个内在矛盾:要数据说话,方法论作者却不看真实理论如何演化。Kuhn (1970)、McCloskey (1983, 1998) 发现实际过程与形式方法论关系甚微——最大的 t 值并没有获胜。想让自己的想法既正确又有说服力的研究者,宜研究历史上想法是如何说服人的。Empirical work that has been persuasive — that changed people's understanding of the data and which models explain it — looks very different from the statistical theory preached in textbooks. The CAPM was taught, believed, and used for years after formal rejection — it fell not because of a rejection but because the multifactor models offered a coherent alternative worldview (and the multifactor models are also rejected!). Even evaluating a single model, the most interesting calculations come from examining specific alternatives rather than overall pricing-error tests: the CAPM fell when size and book/market were found to enter cross-sectional regressions, not when a generic pricing-error test rejected it. Influential empirical work tells a story — the most efficient procedure does not convince if the reader cannot see what stylized facts drive the result. A test focused on a model's ability to explain the cross-section of average returns of interesting portfolios is ultimately more persuasive than one focused on its ability to explain the fifth moment of the second portfolio, even if ML finds the latter far more informative. Fama-French (1988b, 1993) are exemplars: long-horizon predictability is on the edge of significance and the multifactor model is rejected by GRS, but they made clear, with tables of average returns and betas, what robust facts drive the results — statistical theory has no place for such a table, yet it is far more persuasive than a table of χ² values. Changing a t-statistic from 1.5 to 2.5 has never substantially changed how people think about an issue. Statistical testing is one of many questions in evaluating theories, and usually not the most important — a positive description of how the profession moves from theory to theory, not a normative claim. Popper / classical decision theory / Friedman's "rejectable hypotheses" philosophy contains an inconsistency: it says let the data decide, yet methodologists don't look at how actual theories evolved. Kuhn (1970) and McCloskey (1983, 1998) found the actual process has little to do with formal methodology — the largest t-statistic did not win. A researcher who wants ideas to be convincing as well as right would do well to study how ideas have convinced people in the past.

小结 / Summary

结论很简单:做一阶或简单 GMM 估计、而非显式的最大似然估计与检验,是完全可以的。 许多人(不幸还有许多审稿人)以为非完整 ML 不可接受;本章用很长篇幅反驳这一印象,论证至少在许多有实践重要性的情形里,聚焦经济上可解释之矩的一阶 GMM 可以足够有效、对模型设定误差稳健、且最终更有说服力。方法之争的实质不是 ML 对 GMM(ML 本就是 GMM 的特例),而是矩的选择:统计效率 vs 经济可解释性与稳健性。在我们的数据是非实验的、理论是定量寓言、同一数据被无数人反复"捞"的现实里,学界靠"讲清楚什么稳健事实驱动结果"而非最大 t 值来确立新理论,这其实相当合理。下一部分(Part III)转向债券与期权——同样在贴现因子框架下。

Summary

The bottom line is simple: it is OK to do a first-stage or simple GMM estimate rather than an explicit maximum-likelihood estimate and test. Many people (and, unfortunately, many referees) think nothing less than full ML is acceptable; this chapter is long to counter that impression, arguing that in many cases of practical importance, a first-stage GMM focusing on economically interpretable moments can be efficient enough, robust to model misspecification, and ultimately more persuasive. The substance of the debate is not ML vs. GMM (ML is a special case of GMM) but the choice of moments: statistical efficiency vs. economic interpretability and robustness. Given that our data are nonexperimental, our theories are quantitative parables, and the same data have been "fished" by countless researchers, the profession's settling on new theories by "making clear what robust facts drive the results" rather than by the largest t-statistic actually makes good sense. The next part (Part III) turns to bonds and options — also in a discount factor framework.