10. Strategic Analysis
10. Strategic Analysis
本章导读 前面各章中 agent 孤立决策;本章纳入策略——决策基于彼此。先讨论不确定性下的偏好:§10.1 由六条公理(G1–G6)推出 VNM 期望效用表示,并定义风险厌恶、Arrow–Pratt 系数、保险费与确定性等价。§10.2 讨论完全信息策略型博弈(纯/混合策略纳什均衡、VNM 效用)。§10.3 讨论不完全信息(贝叶斯博弈、贝叶斯纳什均衡、行为策略与混合策略等价)。§10.4 讨论扩展型博弈(完美信息、子博弈、子博弈完美纳什均衡、信念系统、序贯理性、一致性与序贯均衡)。本章篇幅极大,按节展开。
10. Strategic Analysis
Overview In previous chapters agents decide in isolation; this chapter includes strategy — decisions based on each other. We first consider preferences under uncertainty: §10.1 derives the VNM expected-utility representation from six axioms (G1–G6), and defines risk aversion, the Arrow–Pratt coefficients, insurance premium, and certainty equivalence. §10.2 covers complete-information strategic-form games (pure/mixed-strategy Nash equilibrium, VNM utility). §10.3 covers incomplete information (Bayesian games, Bayes Nash equilibrium, equivalence of behavioral and mixed strategies). §10.4 covers extensive-form games (perfect information, subgames, subgame-perfect Nash equilibrium, systems of beliefs, sequential rationality, consistency, and sequential equilibrium). This is a very long chapter, developed section by section.
10.1 Preferences over gambles
10.1.1 Set of notations
10.1 Preferences over gambles
10.1.1 Set of notations
Definitions 10.1–10.4(gambles)
Def 10.1(最终结果):\(A=\{a_1,a_2,\dots,a_n\}\) 为有限的最终结果集(脚注 10.1:结果是非常宽泛的概念,可为市场结果、商品束等);一旦某 \(a_i\) 实现,便无进一步不确定性。
Def 10.2(简单赌局):对 \(A\) 的简单赌局给每个结果 \(a_i\) 赋概率 \(p_i\),\(p_1,\dots,p_n\ge0\)、\(\sum p_i=1\),记为 \((p_1\circ a_1,\,p_2\circ a_2,\,\dots,\,p_n\circ a_n)\);若 \(p_3=\dots=p_n=0\) 则简记 \((p_1\circ a_1,\,p_2\circ a_2)\)。
Def 10.3(简单赌局集):\(\mathcal{G}_s=\left\{(p_1\circ a_1,\dots,p_n\circ a_n):p_1,\dots,p_n\ge0,\sum_{i=1}^n p_i=1\right\}\)。
Def 10.4(复合赌局):\(\mathcal{G}\) 为所有复合赌局之集。\(n\) 阶复合赌局至多 \(n\) 轮解出全部不确定性并得最终结果;\(\mathcal{G}^1\equiv\mathcal{G}_s\);对 \(n\in\mathbb{N}_{++}\) 归纳定义 \(\mathcal{G}^{n+1}=\left\{g=(\alpha_1\circ g^1,\dots,\alpha_k\circ g^k):\alpha_1,\dots,\alpha_k\ge0,\sum_{i=1}^k\alpha_i=1,\ g^1,\dots,g^k\in\mathcal{G}^n\right\}\),于是 \(\mathcal{G}=\bigcup_{n=1}^\infty\mathcal{G}^n\)。Def 10.1 (Final outcome): \(A=\{a_1,a_2,\dots,a_n\}\) a finite set of final outcomes (footnote 10.1: outcome is very inclusive — a market outcome, a bundle of goods, etc.); once some \(a_i\) is realized there is no further uncertainty.
Def 10.2 (Simple gamble): a simple gamble over \(A\) assigns probability \(p_i\) to each \(a_i\) with \(p_1,\dots,p_n\ge0\), \(\sum p_i=1\), denoted \((p_1\circ a_1,\,p_2\circ a_2,\,\dots,\,p_n\circ a_n)\); if \(p_3=\dots=p_n=0\) write \((p_1\circ a_1,\,p_2\circ a_2)\).
Def 10.3 (Set of simple gambles): \(\mathcal{G}_s=\left\{(p_1\circ a_1,\dots,p_n\circ a_n):p_1,\dots,p_n\ge0,\sum_{i=1}^n p_i=1\right\}\).
Def 10.4 (Compound gamble): \(\mathcal{G}\) the set of all compound gambles. A compound gamble of order \(n\) takes at most \(n\) rounds to resolve all uncertainty; \(\mathcal{G}^1\equiv\mathcal{G}_s\); for \(n\in\mathbb{N}_{++}\) define by induction \(\mathcal{G}^{n+1}=\left\{g=(\alpha_1\circ g^1,\dots,\alpha_k\circ g^k):\alpha_1,\dots,\alpha_k\ge0,\sum_{i=1}^k\alpha_i=1,\ g^1,\dots,g^k\in\mathcal{G}^n\right\}\), so \(\mathcal{G}=\bigcup_{n=1}^\infty\mathcal{G}^n\).
10.1.2 Axioms
对决策者在 \(\mathcal{G}\) 上的二元关系 \(\succsim\) 作六条公理假设。
10.1.2 Axioms
We make six axioms about the decision maker's binary relation \(\succsim\) on \(\mathcal{G}\).
Axioms G1–G6
G1(完备性):\(\forall g,g'\in\mathcal{G}\),\(g\succsim g'\) 或 \(g'\succsim g\)。
G2(传递性):\(\forall g,g',g''\),\(g\succsim g'\) 且 \(g'\succsim g''\Rightarrow g\succsim g''\)。(满足 G1,G2 即偏好关系。)
WLOG 把最终结果排序 \(a_1\succsim a_2\succsim\dots\succsim a_n\) 并设 \(a_1\succ a_n\)(否则平凡);赋概率 1 于 \(a_i\) 的退化赌局 \((1\circ a_i)\) 简记为 \(a_i\),故 \(a_1\) 最优、\(a_n\) 最差。
G3(连续性):\(\forall g\in\mathcal{G}\),\(\exists\alpha\in[0,1]\) 使 \(g\sim(\alpha\circ a_1,\,(1-\alpha)\circ a_n)\)。
G4(单调性):\(\forall\alpha,\beta\in[0,1]\),\((\alpha\circ a_1,(1-\alpha)\circ a_n)\succsim(\beta\circ a_1,(1-\beta)\circ a_n)\Leftrightarrow\alpha\ge\beta\)(且 \(\sim\Leftrightarrow\alpha=\beta\)、\(\succ\Leftrightarrow\alpha>\beta\))。
G5(替代):\(\forall g=(p_1\circ g^1,\dots,p_k\circ g^k)\)、\(h=(p_1\circ h^1,\dots,p_k\circ h^k)\in\mathcal{G}\),若 \(g^i\sim h^i\)(\(\forall i\))则 \(g\sim h\)。
G6(化简为简单赌局):\(\forall g\in\mathcal{G}\),决策者对 \(g\) 与它所诱导的简单赌局无差异。G1 (Completeness): \(\forall g,g'\in\mathcal{G}\), \(g\succsim g'\) or \(g'\succsim g\).
G2 (Transitivity): \(\forall g,g',g''\), \(g\succsim g'\) and \(g'\succsim g''\Rightarrow g\succsim g''\). (G1,G2 make \(\succsim\) a preference relation.)
WLOG rank outcomes \(a_1\succsim a_2\succsim\dots\succsim a_n\) and impose \(a_1\succ a_n\) (else trivial); the degenerate gamble \((1\circ a_i)\) is written \(a_i\), so \(a_1\) is best and \(a_n\) worst.
G3 (Continuity): \(\forall g\in\mathcal{G}\), \(\exists\alpha\in[0,1]\) s.t. \(g\sim(\alpha\circ a_1,\,(1-\alpha)\circ a_n)\).
G4 (Monotonicity): \(\forall\alpha,\beta\in[0,1]\), \((\alpha\circ a_1,(1-\alpha)\circ a_n)\succsim(\beta\circ a_1,(1-\beta)\circ a_n)\Leftrightarrow\alpha\ge\beta\) (and \(\sim\Leftrightarrow\alpha=\beta\), \(\succ\Leftrightarrow\alpha>\beta\)).
G5 (Substitution): \(\forall g=(p_1\circ g^1,\dots,p_k\circ g^k)\), \(h=(p_1\circ h^1,\dots,p_k\circ h^k)\in\mathcal{G}\), if \(g^i\sim h^i\) (\(\forall i\)) then \(g\sim h\).
G6 (Reduction to simple gamble): \(\forall g\in\mathcal{G}\), the decision maker is indifferent between \(g\) and the simple gamble it induces.
Definition 10.5(Utility representation)& Proposition 10.1
Def 10.5:\(u:\mathcal{G}\to\mathbb{R}\) 表示 \(\succsim\),当且仅当 \(\forall g,g'\),\(g\succsim g'\Leftrightarrow u(g)\ge u(g')\)。
Prop 10.1:\(\forall g\in\mathcal{G}\),存在唯一的最终结果分布 \(p=(p_1,\dots,p_n)\)(\(p_i\ge0\)、\(\sum p_i=1\))使 \(p_i\) 是 \(g\) 下以 \(a_i\) 为最终结果的概率。Def 10.5: \(u:\mathcal{G}\to\mathbb{R}\) represents \(\succsim\) iff \(\forall g,g'\), \(g\succsim g'\Leftrightarrow u(g)\ge u(g')\).
Prop 10.1: \(\forall g\in\mathcal{G}\), there is a unique distribution \(p=(p_1,\dots,p_n)\) (\(p_i\ge0\), \(\sum p_i=1\)) over final outcomes such that \(p_i\) is the probability of \(a_i\) as the final outcome under \(g\).
证明 / Proof (Proposition 10.1)
$$\sum_{j=1}^n p_j=\sum_{j=1}^n\sum_{i=1}^n m_i p_{i,j}=\sum_{i=1}^n\sum_{j=1}^n m_i p_{i,j}=\sum_{i=1}^n m_i\left(\sum_{j=1}^n p_{i,j}\right)=\sum_{i=1}^n m_i=1$$
Proposition 10.2(Independence Axiom, from G5 & G6) 由 Prop 10.1,\(\forall g\in\mathcal{G}\) 诱导唯一简单赌局 \((p_1\circ a_1,\dots,p_n\circ a_n)\)。设 \(\succsim\) 满足 G5、G6,若 \((p_1\circ a_1,\dots,p_n\circ a_n)\sim(q_1\circ a_1,\dots,q_n\circ a_n)\),则对 \(\forall\alpha\in[0,1]\) 与任意简单赌局 \((r_1\circ a_1,\dots,r_n\circ a_n)\),\(((\alpha p_1+(1-\alpha)r_1)\circ a_1,\dots,(\alpha p_n+(1-\alpha)r_n)\circ a_n)\sim((\alpha q_1+(1-\alpha)r_1)\circ a_1,\dots,(\alpha q_n+(1-\alpha)r_n)\circ a_n)\)。即把两赌局各自以相同方式与第三个赌局混合时,个体对二者的排序与所用第三个赌局无关。By Prop 10.1, \(\forall g\in\mathcal{G}\) induces a unique simple gamble \((p_1\circ a_1,\dots,p_n\circ a_n)\). Suppose \(\succsim\) satisfies G5, G6. If \((p_1\circ a_1,\dots,p_n\circ a_n)\sim(q_1\circ a_1,\dots,q_n\circ a_n)\), then for \(\forall\alpha\in[0,1]\) and any simple gamble \((r_1\circ a_1,\dots,r_n\circ a_n)\), \(((\alpha p_1+(1-\alpha)r_1)\circ a_1,\dots)\sim((\alpha q_1+(1-\alpha)r_1)\circ a_1,\dots)\). I.e. combining each of two gambles with a third in the same way, the individual's ranking of the two is independent of which third gamble is used.
证明 / Proof (Proposition 10.2)
10.1.3 Expected utility property
10.1.3 Expected utility property
Definition 10.6(Expected utility property) \(u:\mathcal{G}\to\mathbb{R}\) 有期望效用性质,若 \(\forall(p_1\circ a_1,\dots,p_n\circ a_n)\in\mathcal{G}_s\),\(u((p_1\circ a_1,\dots,p_n\circ a_n))=\sum_{i=1}^n p_i u(a_i)\)。此性质极有用:若效用有它,结合 G6,要知道对所有赌局的偏好只需知 \(u(a_1),\dots,u(a_n)\)。但并非所有表示 \(\succsim\) 的效用都有此性质(如 \(v=f(u)=u^3\) 仍表示 \(\succsim\) 却未必有)。\(u:\mathcal{G}\to\mathbb{R}\) has the expected utility property if \(\forall(p_1\circ a_1,\dots,p_n\circ a_n)\in\mathcal{G}_s\), \(u((p_1\circ a_1,\dots,p_n\circ a_n))=\sum_{i=1}^n p_i u(a_i)\). Very useful: with it and G6, to know preferences over all gambles we only need \(u(a_1),\dots,u(a_n)\). But not all utilities representing \(\succsim\) have it (e.g. \(v=f(u)=u^3\) still represents \(\succsim\) but may not have it).
Theorem 10.1(VNM utility) 若 \(\succsim\) 满足 G1–G6,则称该 agent 为 Von Neumann–Morgenstern (VNM) 理性,且 \(\exists u:\mathcal{G}\to\mathbb{R}\) 表示 \(\succsim\) 并有期望效用性质——称为 VNM 效用。If \(\succsim\) satisfies G1–G6, the agent is Von Neumann–Morgenstern (VNM)-rational, and \(\exists u:\mathcal{G}\to\mathbb{R}\) representing \(\succsim\) with the expected utility property — called VNM utility.
证明 / Proof (Theorem 10.1)
Theorem 10.2 & Theorem 10.3(affine uniqueness)
Thm 10.2:若 \(u,v\) 都表示 \(\succsim\) 且有期望效用性质,则 \(v\) 是 \(u\) 的仿射变换:\(\exists\alpha\in\mathbb{R},\beta>0\) 使 \(v(a_i)=\beta u(a_i)+\alpha\)(\(\forall i\)),从而 \(\forall g\),\(v(g)=\beta u(g)+\alpha\)。
Thm 10.3:若 \(u\) 表示 \(\succsim\) 且有期望效用性质,且 \(\exists\alpha\in\mathbb{R},\beta>0\) 使 \(v(g)=\beta u(g)+\alpha\)(\(\forall g\)),则 \(v\) 也表示 \(\succsim\) 且有期望效用性质。Thm 10.2: if \(u,v\) both represent \(\succsim\) with expected utility property, then \(v\) is an affine transformation of \(u\): \(\exists\alpha\in\mathbb{R},\beta>0\) s.t. \(v(a_i)=\beta u(a_i)+\alpha\) (\(\forall i\)), hence \(\forall g\), \(v(g)=\beta u(g)+\alpha\).
Thm 10.3: if \(u\) represents \(\succsim\) with expected utility property and \(\exists\alpha\in\mathbb{R},\beta>0\) s.t. \(v(g)=\beta u(g)+\alpha\) (\(\forall g\)), then \(v\) also represents \(\succsim\) with expected utility property.
证明 / Proof (Theorems 10.2 & 10.3)
$$u(a_i)=(1-\alpha_i)u(a_1)+\alpha_i u(a_n)\tag{10.1}$$
$$a_i\sim((1-\alpha_i)\circ a_1,\,\alpha_i\circ a_n)\tag{10.2}$$
$$ > \begin{aligned} > \alpha_i=\frac{u(a_1)-u(a_i)}{u(a_1)-u(a_n)}&=\frac{v(a_1)-v(a_i)}{v(a_1)-v(a_n)}\\ > \Rightarrow v(a_i)&=\underbrace{\frac{v(a_1)-v(a_n)}{u(a_1)-u(a_n)}}_{\beta>0}u(a_i)+\underbrace{\left(v(a_1)-\frac{v(a_1)-v(a_n)}{u(a_1)-u(a_n)}u(a_1)\right)}_{\alpha} > \end{aligned} > $$
Thm 10.3:\(\beta>0\) 故 \(v\) 是 \(u\) 的严格递增变换,\(v\) 表示 \(\succsim\)。又对 \(\forall g=(p_1\circ a_1,\dots)\in\mathcal{G}_s\),\(v(g)=\beta u(g)+\alpha=\beta\sum p_i u(a_i)+\alpha=\sum p_i(\beta u(a_i)+\alpha)=\sum p_i v(a_i)\)(用了 \(\sum p_i=1\)、\(v(a_i)=\beta u(a_i)+\alpha\)),故 \(v\) 有期望效用性质。\(\blacksquare\)
Thm 10.3: \(\beta>0\) so \(v\) is a strictly increasing transformation of \(u\), hence represents \(\succsim\). For \(\forall g=(p_1\circ a_1,\dots)\in\mathcal{G}_s\), \(v(g)=\beta u(g)+\alpha=\beta\sum p_i u(a_i)+\alpha=\sum p_i(\beta u(a_i)+\alpha)=\sum p_i v(a_i)\) (using \(\sum p_i=1\), \(v(a_i)=\beta u(a_i)+\alpha\)), so \(v\) has the expected utility property. \(\blacksquare\)
注记 10.1 / Remark 10.1 尽管效用是序数的(仅用于排序,无实际意义),(10.1) 中的 \(\alpha_i\) 确有特殊经济含义:它决定使最优 \(a_1\) 与最差 \(a_n\) 的组合与 \(a_i\) 无差异时二者的权重,即 (10.2) 中 \(a_i\sim((1-\alpha_i)\circ a_1,\alpha_i\circ a_n)\)。Even though utility is ordinal (just for ranking, no actual meaning), the \(\alpha_i\) in (10.1) does have a special economic meaning: it determines the weights of the best \(a_1\) and worst \(a_n\) in a mixture indifferent with \(a_i\), as in (10.2): \(a_i\sim((1-\alpha_i)\circ a_1,\alpha_i\circ a_n)\).
10.1.4 Risk aversion
把 \(u\) 的定义域扩展到包含 \(\bar g=(1\circ\sum_{i=1}^n p_i a_i)\)(\(p_i\ge0\)、\(\sum p_i=1\))。
10.1.4 Risk aversion
Extend the domain of \(u\) to include \(\bar g=(1\circ\sum_{i=1}^n p_i a_i)\) (\(p_i\ge0\), \(\sum p_i=1\)).
Definition 10.7(Risk averse)& Proposition 10.3
Def 10.7:设 \(\succsim\) 满足 G1–G6。对 \(g\) 诱导的简单赌局 \(g_s=(p_1\circ a_1,\dots,p_n\circ a_n)\),定义 \(\mathbb{E}[g]\equiv\bar g=(1\circ\sum p_i a_i)\)。agent \(i\) 风险厌恶,若其有期望效用性质的效用 \(u^i\) 满足 \(u(\mathbb{E}[g])=u(\sum p_i a_i)>\sum p_i u(a_i)=\mathbb{E}[u(g)]\);\(u(\mathbb{E}[g])=\mathbb{E}[u(g)]\) 风险中性;\(u(\mathbb{E}[g])<\mathbb{E}[u(g)]\) 风险喜好。
Prop 10.3:agent \(i\) 风险厌恶当且仅当 \(u^i\) 凹。Def 10.7: suppose \(\succsim\) satisfies G1–G6. For the simple gamble \(g_s=(p_1\circ a_1,\dots,p_n\circ a_n)\) induced by \(g\), define \(\mathbb{E}[g]\equiv\bar g=(1\circ\sum p_i a_i)\). Agent \(i\) is risk averse if his expected-utility utility \(u^i\) satisfies \(u(\mathbb{E}[g])=u(\sum p_i a_i)>\sum p_i u(a_i)=\mathbb{E}[u(g)]\); risk neutral if $=\(; risk loving if \)<\(.
**Prop 10.3**: agent \)i$ is risk averse iff \(u^i\) is concave.
证明 / Proof (Proposition 10.3)
10.1.5 Coefficients of risk aversion
10.1.5 Coefficients of risk aversion
Definitions 10.8–10.9(risk-aversion coefficients)
Def 10.8(Arrow–Pratt 绝对风险厌恶系数):度量效用在 \(x\) 处的曲率,\(ra(x)=-\dfrac{u''(x)}{u'(x)}\);系数越高曲率越大、越风险厌恶。
Def 10.9(相对风险厌恶系数):\(rra(x)=ra(x)\cdot x=-\dfrac{u''(x)}{u'(x)}x\)。风险容忍度为绝对/相对风险厌恶的倒数。Def 10.8 (Arrow–Pratt Coefficient of Absolute Risk Aversion): curvature of \(u\) around \(x\), \(ra(x)=-\dfrac{u''(x)}{u'(x)}\); higher means more curvature and more risk averse.
Def 10.9 (Coefficient of Relative Risk Aversion): \(rra(x)=ra(x)\cdot x=-\dfrac{u''(x)}{u'(x)}x\). Risk tolerance is the inverse of absolute/relative risk aversion.
Example 10.1(specific \(ra(x)\), \(rra(x)\)) 见下式(线性、对数、CRRA、CARA)。CRRA 的 \(u(x)=\tfrac{x^{1-\gamma}-1}{1-\gamma}\) 是 \(\ln x\) 的推广(\(\gamma\to1\) 时由 L'Hôpital 得 \(\ln x\))。See below (linear, log, CRRA, CARA). The CRRA \(u(x)=\tfrac{x^{1-\gamma}-1}{1-\gamma}\) generalizes \(\ln x\) (\(\gamma\to1\) gives \(\ln x\) by L'Hôpital).
$$ \begin{aligned} \text{Linear: }&u(x)=ax+b\Rightarrow ra(x)=-\tfrac{0}{a}=0\Rightarrow rra(x)=0\\ \text{Log: }&u(x)=\ln x\Rightarrow ra(x)=-\tfrac{-1/x^2}{1/x}=\tfrac{1}{x}\Rightarrow rra(x)=1\\ \text{CRRA: }&u(x)=\tfrac{x^{1-\gamma}-1}{1-\gamma}\Rightarrow ra(x)=-\tfrac{-\gamma x^{-\gamma-1}}{x^{-\gamma}}=\tfrac{\gamma}{x}\Rightarrow rra(x)=\gamma\\ \text{CARA: }&u(x)=-\tfrac{1}{a}e^{-ax}\Rightarrow ra(x)=-\tfrac{-ae^{-ax}}{e^{-ax}}=a\Rightarrow rra(x)=ax \end{aligned} $$
$$\lim_{\gamma\to1}\frac{x^{1-\gamma}-1}{1-\gamma}\overset{\text{L'Hôpital}}{=}\lim_{\gamma\to1}\frac{-x^{1-\gamma}\ln x}{-1}=\ln x$$
10.1.6 (Absolute) Insurance premium: p
10.1.6 (Absolute) Insurance premium: p
Definition 10.10 & Proposition 10.4(absolute risk premium)
Def 10.10:绝对保险费 \(p\) 定义为 \(u(\mathbb{E}[\tilde x]-p)=\mathbb{E}[u(\tilde x)]\) (10.4);对赌局集设定,\(u(\sum p_i a_i-p)=\sum p_i u(a_i)\)。
Prop 10.4:风险很小时,\(p\approx\dfrac{1}{2}\left(-\dfrac{u''(\bar x)}{u'(\bar x)}\right)\sigma^2\) (10.5),其中 \(\bar x=\mathbb{E}[\tilde x]\)、\(\sigma^2=\text{Var}(\tilde x)\)。Def 10.10: absolute insurance premium \(p\) defined by \(u(\mathbb{E}[\tilde x]-p)=\mathbb{E}[u(\tilde x)]\) (10.4); for the gamble set-up, \(u(\sum p_i a_i-p)=\sum p_i u(a_i)\).
Prop 10.4: if the risk is small, \(p\approx\dfrac{1}{2}\left(-\dfrac{u''(\bar x)}{u'(\bar x)}\right)\sigma^2\) (10.5), where \(\bar x=\mathbb{E}[\tilde x]\), \(\sigma^2=\text{Var}(\tilde x)\).
证明 / Proof (Proposition 10.4)
$$ > \begin{aligned} > u(\tilde x)&\approx u(\bar x)+u'(\bar x)(\tilde x-\bar x)+\tfrac{u''(\bar x)}{2}(\tilde x-\bar x)^2\\ > \Rightarrow\mathbb{E}[u(\tilde x)]&\approx u(\bar x)+u'(\bar x)\mathbb{E}[\tilde x-\bar x]+\tfrac{u''(\bar x)}{2}\mathbb{E}[(\tilde x-\bar x)^2]\\ > &\approx u(\bar x)+\tfrac{u''(\bar x)}{2}\sigma^2 > \end{aligned} > $$
Definition 10.11(Certainty equivalence) \(\tilde x\) 的确定性等价 \(c_e(\tilde x)\) 定义为 \(u(c_e(\tilde x))=\mathbb{E}[u(\tilde x)]\),故 \(c_e(\tilde x)=\bar x-p\)。对赌局集,\(u(c_e)=\sum p_i u(a_i)\),即 \(c_e((p_1\circ a_1,\dots,p_n\circ a_n))=\sum p_i a_i-p\)。The certainty equivalence of \(\tilde x\) is \(c_e(\tilde x)\) defined by \(u(c_e(\tilde x))=\mathbb{E}[u(\tilde x)]\), so \(c_e(\tilde x)=\bar x-p\). For the gamble set, \(u(c_e)=\sum p_i u(a_i)\), i.e. \(c_e((p_1\circ a_1,\dots,p_n\circ a_n))=\sum p_i a_i-p\).
10.2 Strategic form game with complete information
策略行为的基本思想是:你的最佳选择可能取决于他人选择,反之亦然。我们讨论这种策略设定下的理性行为,并希望预测理性人的所为。
10.2.1 Strategic form games
先考虑简单情形:你与对手同时选择行动,行动选定后故事结束、所有人获得收益。
10.2 Strategic form game with complete information
The basic idea in strategic behavior is that your best choice may depend on what others choose, and vice versa. We discuss the rational behaviors under that strategic set-up and hope to predict what rational people would do.
10.2.1 Strategic form games
First, the simple scenario: you and your opponents choose actions simultaneously, and once actions are chosen, the story ends and all agents get payoffs.
Definition 10.12(Strategic form games) 策略型博弈是元组 \(G=(S_i,u_i)_{i=1}^N\),其中 \(\{1,\dots,N\}\) 为玩家集,\(S_i\) 为玩家 \(i\) 非空策略集,\(u_i:S\equiv\prod_{i=1}^N S_i\to\mathbb{R}\) 为 \(i\) 的收益函数。所有玩家同时选策略;若策略剖面 \(s=(s_1,\dots,s_N)\in S\)(\(s_i\in S_i\)),则玩家 \(i\) 的收益为 \(u_i(s_1,\dots,s_N)\)。A strategic form game is a tuple \(G=(S_i,u_i)_{i=1}^N\) where \(\{1,\dots,N\}\) is the set of players, \(S_i\) player \(i\)'s non-empty strategy set, and \(u_i:S\equiv\prod_{i=1}^N S_i\to\mathbb{R}\) is \(i\)'s payoff function. All players choose simultaneously; if the profile is \(s=(s_1,\dots,s_N)\in S\) (\(s_i\in S_i\)), then player \(i\)'s payoff is \(u_i(s_1,\dots,s_N)\).
Example 10.2(Strictly dominant strategy) \(N=2\),\(S_1=\{U,M,D\}\),\(S_2=\{L,R\}\),收益矩阵如下(典型元素 \((u_1(S_1,S_2),u_2(S_1,S_2))\))。玩家 1 总玩 \(U\)(无论 2 怎么选 \(U\) 都严格优于 \(D\));给定此,玩家 2 玩 \(L\);理性选择 \(S_1=U,S_2=L\),收益 $(3,0)\(。\)N=2$, \(S_1=\{U,M,D\}\), \(S_2=\{L,R\}\); payoff matrix below (typical element \((u_1(S_1,S_2),u_2(S_1,S_2))\)). Player 1 always plays \(U\) (\(U\) strictly better than \(D\) no matter what 2 chooses); given that, player 2 plays \(L\); rational choice \(S_1=U,S_2=L\), return $(3,0)$.
| \(1\backslash2\) | \(L\) | \(R\) |
|---|---|---|
| \(U\) | $(3,0)$ | $(0,-4)$ |
| \(M\) | $(1,-1)$ | $(-2,2)$ |
| \(D\) | $(2,4)$ | $(-1,8)$ |
Example 10.3(No overall dominance but partial dominance) \(N=2\),\(S_1=\{U,C,D\}\),\(S_2=\{L,M,R\}\),收益矩阵如下。无对任一玩家严格优于所有其他策略的占优策略;但玩家 1 的 \(D\) 无论如何严格优于 \(C\),故永不玩 \(C\);玩家 2 的 \(R\) 无论如何严格优于 \(M\),故永不玩 \(M\)。约简为 \(2\times2\) 博弈(\(\{U,D\}\times\{L,R\}\))后,玩家 1 的 \(U\) 严格更优、玩家 2 玩 \(L\),结果 \(S_1=U,S_2=L\),收益 $(3,0)\(。\)N=2$, \(S_1=\{U,C,D\}\), \(S_2=\{L,M,R\}\); matrix below. No dominant strategy strictly better than all others for either player; but player 1's \(D\) is strictly better than \(C\) regardless, so never plays \(C\); player 2's \(R\) is strictly better than \(M\) regardless, so never plays \(M\). The reduced \(2\times2\) game (\(\{U,D\}\times\{L,R\}\)) gives player 1's \(U\) strictly better and player 2 plays \(L\); result \(S_1=U,S_2=L\), return $(3,0)$.
| \(1\backslash2\) | \(L\) | \(M\) | \(R\) |
|---|---|---|---|
| \(U\) | $(3,0)$ | $(0,-5)$ | $(0,-4)$ |
| \(C\) | $(1,-1)$ | $(3,3)$ | $(-2,4)$ |
| \(D\) | $(2,4)$ | $(4,1)$ | $(-1,8)$ |
注记 10.2 / Remark 10.2 结果 $(3,0)$ 比某些结果(如 $(4,1)\()Pareto 更差。故此处定义的均衡不同于此前的一般均衡。一般均衡模型中人人是价格接受者、问题中无策略,结果 Pareto 有效(第一福利定理);但一旦移除决策的孤立性,所得均衡不必有效。The result \)(3,0)$ is Pareto inferior to some results such as $(4,1)$. So the equilibrium defined here differs from the general equilibrium before: in a general equilibrium model everyone is a price taker, no strategy is embedded, and the outcome is Pareto efficient (First Welfare Theorem); but once we remove isolation in decision making, the resulting equilibria are not necessarily efficient.
10.2.2 Pure strategy Nash equilibrium
10.2.2 Pure strategy Nash equilibrium
Definition 10.13(Pure strategy Nash equilibrium) 策略剖面 \(\hat s=(\hat s_1,\dots,\hat s_N)\) 是 \(G=(S_i,u_i)_{i=1}^N\) 的纯策略纳什均衡,当且仅当 \(\forall i\),\(u_i(\hat s)\ge u_i(s_i,\hat s_{-i})\)(\(\forall s_i\in S_i\)),其中 \(\hat s_{-i}=(\hat s_1,\dots,\hat s_{i-1},\hat s_{i+1},\dots,\hat s_N)\)。直觉:相信他人用 \(\hat s\),在均衡处你不能比 \(\hat s_i\) 做得更好,故不偏离;若对每个玩家都成立,则无人移动。A strategy profile \(\hat s=(\hat s_1,\dots,\hat s_N)\) is a pure strategy Nash equilibrium of \(G=(S_i,u_i)_{i=1}^N\) iff \(\forall i\), \(u_i(\hat s)\ge u_i(s_i,\hat s_{-i})\) (\(\forall s_i\in S_i\)), where \(\hat s_{-i}=(\hat s_1,\dots,\hat s_{i-1},\hat s_{i+1},\dots,\hat s_N)\). Intuition: confident others use \(\hat s\), at equilibrium you cannot do better than \(\hat s_i\), so won't deviate; if true for every player, no one moves.
Example 10.4(Nash equilibrium) \(N=2\),\(S_1=S_2=\{L,H\}\),收益矩阵如下。无占优策略,无法预测、也无法排除任何被占优的策略;但 \((H,L)\) 与 \((L,H)\) 都满足无人有偏离动机,故都是纯策略纳什均衡。\(N=2\), \(S_1=S_2=\{L,H\}\); matrix below. No dominant strategy, cannot predict or exclude any dominated strategy; but \((H,L)\) and \((L,H)\) both have no incentive to deviate, so both are pure strategy Nash equilibria.
| \(1\backslash2\) | \(L\) | \(H\) |
|---|---|---|
| \(L\) | $(0,0)$ | $(1,2)$ |
| \(H\) | $(2,1)$ | $(0,0)$ |
Example 10.5(Being unpredictable is good) \(N=2\),\(S_1=S_2=\{L,R\}\),收益矩阵如下(\(\delta\in(0,1)\))。这是点球场景:\(K\) 为踢球者、\(G\) 为守门员,\(L,R\) 为左右方向;\(\delta\) 越小踢球者右脚越强(即便守门员判对方向仍可能进球)。无论结果如何,至少一名玩家有偏离动机,故无纳什均衡。两名玩家都不应可预测;否则点球会有固定理性结果,与现实不符。要义:不可预测有时是好事。\(N=2\), \(S_1=S_2=\{L,R\}\); matrix below (\(\delta\in(0,1)\)). A soccer penalty scenario: \(K\) kicker, \(G\) goalie, \(L,R\) directions; smaller \(\delta\) means stronger right leg (even if the goalie reads the direction, a goal is still possible). Whatever the outcome, at least one player has an incentive to deviate, so there is no Nash equilibrium. Neither player should be predictable; otherwise the penalty game would have a fixed rational outcome, untrue in practice. Takeaway: being unpredictable is sometimes good.
| \(K\backslash G\) | \(L\) | \(R\) |
|---|---|---|
| \(L\) | $(-1,1)$ | $(1,-1)$ |
| \(R\) | $(1,-1)$ | \((-\delta,\delta)\) |
10.2.3 Mixed strategy
10.2.3 Mixed strategy
Definition 10.14(Mixed strategy) 玩家 \(i\) 的混合策略是 \(s_i\in S_i\) 上的概率分布。假设各 \(S_i\) 有限。记 \(i\) 的混合策略集 \(M_i=\left\{m_i:S_i\to[0,1]\text{ s.t. }\sum_{s_i\in S_i}m_i(s_i)=1\right\}\supseteq S_i\),其中 \(S_i\subseteq M_i\) 因 \(s_i\equiv(m_i(s_i)=1)\) 使每个纯策略是概率 1 的最简单混合策略。A mixed strategy for player \(i\) is a probability distribution over \(s_i\in S_i\). Assume each \(S_i\) finite. Let \(i\)'s set of mixed strategies \(M_i=\left\{m_i:S_i\to[0,1]\text{ s.t. }\sum_{s_i\in S_i}m_i(s_i)=1\right\}\supseteq S_i\), where \(S_i\subseteq M_i\) because \(s_i\equiv(m_i(s_i)=1)\) makes each pure strategy the simplest mixed strategy with probability 1.
10.2.4 Von Neumann Morgenstern (VNM) utility
在策略型博弈中,若对手随机化、选混合策略,则你考虑自己的策略时实则在不同赌局上选择。设所有玩家对 \(S\) 所致结果是 VNM 理性的(具期望效用性质的 VNM 效用)。若玩家选联合混合策略 \(m=(m_1,\dots,m_N)\in M\equiv\prod_{i=1}^N M_i\),则
10.2.4 Von Neumann Morgenstern (VNM) utility
In a strategic form game, if your opponent randomizes (chooses a mixed strategy), then when you consider your strategy you are actually choosing over different gambles. Suppose all players are VNM rational over the outcome resulting under \(S\) (VNM utility with expected utility property). If players choose the joint mixed strategy \(m=(m_1,\dots,m_N)\in M\equiv\prod_{i=1}^N M_i\), then
$$u_i(m)=\sum_{(s_1,s_2,\dots,s_N)\in S}m_1(s_1)\cdot m_2(s_2)\cdots m_N(s_N)\cdot u_i(s_1,s_2,\dots,s_N)\tag{10.7}$$
(10.7) 成立因个体在各自策略集上随机化且随机化相互独立。
10.2.5 Mixed strategy Nash equilibrium
(10.7) holds because individuals randomize over their strategy sets and the randomizations are independent.
10.2.5 Mixed strategy Nash equilibrium
Definition 10.15(Mixed strategy Nash equilibrium)& Theorem 10.4
Def 10.15:混合联合策略 \(\hat m\in M\) 是 \(G\) 的混合策略纳什均衡,当且仅当 \(\forall i\)、\(\forall m_i\in M_i\),\(u_i(\hat m)\ge u_i(m_i,\hat m_{-i})\)。(Rmk 10.3:给定他人选 \(\hat m_{-i}\),你无法通过偏离 \(\hat m_i\) 做得更好。)
Thm 10.4:设所有玩家有 VNM 效用,则以下等价:(a) \(\hat m\) 混合策略纳什均衡;(b) \(\forall i\)、\(\forall s_i\in S_i\) 使 \(\hat m_i(s_i)>0\) 有 \(u_i(\hat m)=u_i(s_i,\hat m_{-i})\),且 \(\forall s_i\in S_i\),\(u_i(\hat m)\ge u_i(s_i,\hat m_{-i})\);(c) \(\forall i\)、\(\forall s_i\in S_i\),\(u_i(\hat m)\ge u_i(s_i,\hat m_{-i})\)。(Rmk 10.4:均衡处混合时,须对任意两个有正概率的纯策略无差异、且对有正概率的纯策略与混合策略无差异。)Def 10.15: a mixed joint strategy \(\hat m\in M\) is a mixed strategy Nash equilibrium of \(G\) iff \(\forall i\), \(\forall m_i\in M_i\), \(u_i(\hat m)\ge u_i(m_i,\hat m_{-i})\). (Rmk 10.3: given others choose \(\hat m_{-i}\), you cannot do better by deviating from \(\hat m_i\).)
Thm 10.4: suppose all players have VNM utility; then the following are equivalent: (a) \(\hat m\) is a mixed strategy Nash equilibrium; (b) \(\forall i\), \(\forall s_i\in S_i\) with \(\hat m_i(s_i)>0\), \(u_i(\hat m)=u_i(s_i,\hat m_{-i})\), and \(\forall s_i\in S_i\), \(u_i(\hat m)\ge u_i(s_i,\hat m_{-i})\); (c) \(\forall i\), \(\forall s_i\in S_i\), \(u_i(\hat m)\ge u_i(s_i,\hat m_{-i})\). (Rmk 10.4: when mixing at equilibrium, you must be indifferent between any two pure strategies with strictly positive probabilities, and between any positive-probability pure strategy and the mixed strategy.)
证明 / Proof (Theorem 10.4)
$$u_i(m)=\sum_{s_i\in S_i}m_i(s_i)\cdot\underbrace{\sum_{s_{-i}\in S_{-i}}m_{-i}(s_{-i})u_i(s_i,s_{-i})}_{=u_i(s_i,m_{-i})}=\sum_{s_i\in S_i}m_i(s_i)\cdot u_i(s_i,m_{-i})$$
$$u_i(\tilde m_i,\hat m_{-i})-u_i(\hat m)=\frac{\hat m_i(s_i)}{1-\hat m_i(s_i)}\left(u_i(\hat m)-u_i(s_i,\hat m_{-i})\right)>0$$
Example 10.6(Revisit 10.4) 玩家 1 以 \(q\) 概率选 \(L\),玩家 2 以 \(p\) 概率选 \(L\)。非退化混合均衡 \(p,q\in(0,1)\),由 Thm 10.4 两玩家对 \(L,H\) 无差异。玩家 1:\(p\cdot0+(1-p)\cdot1=p\cdot2+(1-p)\cdot0\Rightarrow p=\tfrac13\);玩家 2:\(q\cdot0+(1-q)\cdot1=q\cdot2+(1-q)\cdot0\Rightarrow q=\tfrac13\)。故 \(m_1=m_2=(\tfrac13,\tfrac23)\),期望效用 \(u_1=u_2=\tfrac23<1\),可见不可预测可能让双方更差。Player 1 chooses \(L\) with probability \(q\), player 2 chooses \(L\) with probability \(p\). For a non-degenerate mixed equilibrium \(p,q\in(0,1)\), by Thm 10.4 both are indifferent between \(L,H\). Player 1: \(p\cdot0+(1-p)\cdot1=p\cdot2+(1-p)\cdot0\Rightarrow p=\tfrac13\); player 2: \(q\cdot0+(1-q)\cdot1=q\cdot2+(1-q)\cdot0\Rightarrow q=\tfrac13\). So \(m_1=m_2=(\tfrac13,\tfrac23)\), expected utility \(u_1=u_2=\tfrac23<1\), showing unpredictability may make both worse off.
$$u_1(m_1,m_2)=\tfrac13\cdot\tfrac13\cdot0+\tfrac13\cdot\tfrac23\cdot1+\tfrac23\cdot\tfrac13\cdot2+\tfrac23\cdot\tfrac23\cdot0=\tfrac23=u_2(m_1,m_2)$$
Example 10.7(Revisit 10.5) 守门员以 \(q\) 概率选 \(L\)、踢球者以 \(p\) 概率选 \(L\)。踢球者无差异:\(p(-1)+(1-p)1=p\cdot1+(1-p)(-\delta)\Rightarrow p=\tfrac{1+\delta}{3+\delta}\);守门员无差异:\(q\cdot1+(1-q)(-1)=q(-1)+(1-q)\delta\Rightarrow q=\tfrac{1+\delta}{3+\delta}\)。期望效用 \(u_K=\tfrac{-(1+\delta)^2+4}{(3+\delta)^2}\)(关于 \(\delta\) 递减,\(\in(0,\tfrac13)\))、\(u_G=\tfrac{(1+\delta)^2-4}{(3+\delta)^2}\)(关于 \(\delta\) 递增,\(\in(-\tfrac13,0)\))。两个观察:不可预测对双方都好(最低期望效用仍高于 $-1$);\(\delta\) 越小踢球者右脚越强,踢球者严格更优、守门员严格更差,且踢球者对 \(R\) 赋更高概率(用右脚)。The goalie chooses \(L\) with probability \(q\), the kicker with probability \(p\). Kicker indifferent: \(p(-1)+(1-p)1=p\cdot1+(1-p)(-\delta)\Rightarrow p=\tfrac{1+\delta}{3+\delta}\); goalie indifferent: \(q\cdot1+(1-q)(-1)=q(-1)+(1-q)\delta\Rightarrow q=\tfrac{1+\delta}{3+\delta}\). Expected utility \(u_K=\tfrac{-(1+\delta)^2+4}{(3+\delta)^2}\) (decreasing in \(\delta\), \(\in(0,\tfrac13)\)), \(u_G=\tfrac{(1+\delta)^2-4}{(3+\delta)^2}\) (increasing in \(\delta\), \(\in(-\tfrac13,0)\)). Two observations: unpredictability is good for both (the lowest expected utility is still above $-1$); smaller \(\delta\) means a stronger right leg, so the kicker is strictly better off and the goalie strictly worse off, and the kicker assigns higher probability to \(R\) (using his right leg).
上述例子表明:均衡中你须以使对手对任意两个有正概率纯策略无差异的方式混合。这看似奇怪——你选择使对手无差异以维持均衡,对手也选择使你无差异,是一种循环。下面证明存在性。
These examples show that in equilibrium you must mix so as to make your opponent indifferent between any two pure strategies with strictly positive probabilities. This is weird — you choose to make your opponent indifferent to maintain the equilibrium, and your opponent chooses to make you indifferent, a kind of circularity. Now we prove existence.
Theorem 10.5(Existence of mixed strategy Nash equilibrium) 设所有玩家有 VNM 效用,则每个有限策略型博弈至少有一个混合策略纳什均衡。Suppose all players have VNM utility; then every finite strategic form game has at least one mixed strategy Nash equilibrium.
证明 / Proof (Theorem 10.5)
$$f_{ij}(m)=\frac{m_{ij}+\max\{0,(u_i(j,m_{-i})-u_i(m))\}}{1+\sum_{j'=1}^n\max\{0,(u_i(j',m_{-i})-u_i(m))\}}$$
$$\hat m_{ij}\underbrace{\sum_{j'=1}^n\max\{0,(u_i(j',\hat m_{-i})-u_i(\hat m))\}}_{\equiv c}=\max\{0,(u_i(j,\hat m_{-i})-u_i(\hat m))\}\tag{10.9}$$
$$\sum_{j=1}^n(u_i(j,\hat m_{-i})-u_i(\hat m))\max\{0,(u_i(j,\hat m_{-i})-u_i(\hat m))\}=0\tag{10.10}$$
10.3 Strategic form game with incomplete information
游戏中玩家可能对他人的策略集、可用策略等有不完全信息。但因玩家真正关心的是其每个所选策略(纯或混合)的收益,可把注意力限于对收益的不完全信息。
10.3.1 Motivating example
Example 10.8(Motivating example):两玩家,\(S_1=\{L,R\}\)、\(S_2=\{L,R\}\),但都不确定彼此收益。玩家 1 的收益矩阵可能是下面 (a)、(b) 之一;玩家 2 的可能是 (c)、(d) 之一(每格的单个数为该玩家自己的收益)。
10.3 Strategic form game with incomplete information
A player may have incomplete information about others' strategy sets, available strategies, etc. But since a player really cares about the payoff for each chosen strategy (pure or mixed), we confine attention to incomplete information about payoffs.
10.3.1 Motivating example
Example 10.8 (Motivating example): two players, \(S_1=\{L,R\}\), \(S_2=\{L,R\}\), both uncertain about each other's payoffs. Player 1's payoff matrix could be one of (a),(b); player 2's one of (c),(d) (the single number in each cell is that player's own payoff).
| (a) \(t_1=1\) | \(L\) | \(R\) | (b) \(t_1=2\) | \(L\) | \(R\) | |
|---|---|---|---|---|---|---|
| \(L\) | \(3\) | \(0\) | \(L\) | \(1\) | $-1$ | |
| \(R\) | \(4\) | \(1\) | \(R\) | $-1$ | \(1\) |
| (c) \(t_2=1\) | \(L\) | \(R\) | (d) \(t_2=2\) | \(L\) | \(R\) | |
|---|---|---|---|---|---|---|
| \(L\) | \(1\) | \(0\) | \(L\) | $-1$ | \(1\) | |
| \(R\) | \(4\) | \(3\) | \(R\) | \(1\) | $-1$ |
定义不同类型:玩家 1 用 (a)/(b) 分别为 \(t_1=1\) / \(t_1=2\);玩家 2 用 (c)/(d) 分别为 \(t_2=1\) / \(t_2=2\)。类型集 \(T_1=\{1,2\}\)、\(T_2=\{1,2\}\)。类型给出收益信息(此例中知自己类型即可钉住自己收益矩阵;一般情形可能还需知他人类型)。引入共同先验(脚注 10.6:即便无共同先验,只要各玩家对他人类型有主观信念即可)\(\mathbb{P}(\cdot)\) 于类型积 \(T=T_1\times T_2\),\(\mathbb{P}(t_1,t_2)\ge0\)、\(\sum\mathbb{P}(t_1,t_2)=1\)。此例设类型独立、\(\mathbb{P}(t_1=1)=\mathbb{P}(t_2=1)=\tfrac13\)、\(\mathbb{P}(t_1=2)=\mathbb{P}(t_2=2)=\tfrac23\)。
注意 \(t_1=1\) 与 \(t_2=1\) 都有严格占优策略:(e) \(t_1=1\) 中 \(R\) 严格占优 \(L\),故 \(t_1=1\) 总选 \(R\);(f) \(t_2=1\) 中 \(L\) 严格占优 \(R\),故 \(t_2=1\) 总选 \(L\)。现考虑 \(t_1=2\)(偏好匹配策略)与 \(t_2=2\)(偏好相异策略)。设 \(t_1=2\) 以 \((p,1-p)\) 混合 \(L,R\)、\(t_2=2\) 以 \((q,1-q)\) 混合 \(L,R\)。可先排除 \(p\in\{0,1\}\)、\(q\in\{0,1\}\)(否则会推出矛盾——若 \(t_1=2\) 纯策略,则 \(t_2=2\) 因 \(\tfrac23\) 概率面对该纯策略而有严格最优、不混合,进而 \(t_1=2\) 也有严格最优、与假设矛盾),故 \(t_1=2\) 与 \(t_2=2\) 严格混合,须无差异。
考虑 \(t_1=2\):选 \(R\) 的期望效用 \(\tfrac13(-1)+\tfrac23[q(-1)+(1-q)\cdot1]=\tfrac{1-4q}{3}\);选 \(L\) 为 \(\tfrac13\cdot1+\tfrac23[q\cdot1+(1-q)(-1)]=\tfrac{4q-1}{3}\);无差异 \(\Rightarrow q=\tfrac14\)。考虑 \(t_2=2\):选 \(R\) 为 \(\tfrac13(-1)+\tfrac23[p\cdot1+(1-p)(-1)]=\tfrac{4p-3}{3}\);选 \(L\) 为 \(\tfrac13\cdot1+\tfrac23[p(-1)+(1-p)\cdot1]=\tfrac{3-4p}{3}\);无差异 \(\Rightarrow p=\tfrac34\)。故均衡:玩家 1:\(t_1=1\) 选 \(R\),\(t_1=2\) 以 \((\tfrac34,\tfrac14)\) 混合 \((L,R)\);玩家 2:\(t_2=1\) 选 \(L\),\(t_2=2\) 以 \((\tfrac14,\tfrac34)\) 混合 \((L,R)\)。
10.3.2 Bayesian game
Define types: player 1 with (a)/(b) is \(t_1=1\) / \(t_1=2\); player 2 with (c)/(d) is \(t_2=1\) / \(t_2=2\). Type sets \(T_1=\{1,2\}\), \(T_2=\{1,2\}\). Types give information about payoffs (here knowing your own type pins down your matrix; in general you may also need others' types). Add a common prior (footnote 10.6: even without a common prior, as long as each player has a subjective belief over others' types) \(\mathbb{P}(\cdot)\) on the product \(T=T_1\times T_2\), \(\mathbb{P}(t_1,t_2)\ge0\), \(\sum\mathbb{P}(t_1,t_2)=1\). Here suppose types independent and \(\mathbb{P}(t_1=1)=\mathbb{P}(t_2=1)=\tfrac13\), \(\mathbb{P}(t_1=2)=\mathbb{P}(t_2=2)=\tfrac23\).
Note \(t_1=1\) and \(t_2=1\) have strictly dominant strategies: (e) for \(t_1=1\), \(R\) strictly dominates \(L\), so \(t_1=1\) always chooses \(R\); (f) for \(t_2=1\), \(L\) strictly dominates \(R\), so \(t_2=1\) always chooses \(L\). Now consider \(t_1=2\) (prefers matched strategy) and \(t_2=2\) (prefers different strategy). Suppose \(t_1=2\) mixes \(L,R\) with \((p,1-p)\), \(t_2=2\) with \((q,1-q)\). We can rule out \(p\in\{0,1\}\), \(q\in\{0,1\}\) (else a contradiction — if \(t_1=2\) is pure, then \(t_2=2\), facing it with probability \(\tfrac23\), has a strict best response and won't mix, so \(t_1=2\) also has a strict best response, contradicting the assumption), so \(t_1=2\) and \(t_2=2\) strictly mix and must be indifferent.
Consider \(t_1=2\): expected utility from \(R\) is \(\tfrac13(-1)+\tfrac23[q(-1)+(1-q)\cdot1]=\tfrac{1-4q}{3}\); from \(L\) is \(\tfrac13\cdot1+\tfrac23[q\cdot1+(1-q)(-1)]=\tfrac{4q-1}{3}\); indifference \(\Rightarrow q=\tfrac14\). Consider \(t_2=2\): from \(R\) is \(\tfrac13(-1)+\tfrac23[p\cdot1+(1-p)(-1)]=\tfrac{4p-3}{3}\); from \(L\) is \(\tfrac13\cdot1+\tfrac23[p(-1)+(1-p)\cdot1]=\tfrac{3-4p}{3}\); indifference \(\Rightarrow p=\tfrac34\). So equilibrium: player 1: \(t_1=1\) chooses \(R\), \(t_1=2\) mixes \((L,R)\) by \((\tfrac34,\tfrac14)\); player 2: \(t_2=1\) chooses \(L\), \(t_2=2\) mixes \((L,R)\) by \((\tfrac14,\tfrac34)\).
10.3.2 Bayesian game
Definition 10.16(Bayesian game)& Definition 10.17(Bayes' rule)
Def 10.16:贝叶斯博弈 \(BG=(A_i,T_i,u_i,\mathbb{P})_{i=1}^N\) 由以下构成:行动集 \(A_i\);类型集 \(T_i\);VNM 效用 \(u_i:A\times T\to\mathbb{R}\)(\(A=\prod A_i\)、\(T=\prod T_i\)、\(t=(t_1,\dots,t_N)\in T\));类型上的共同概率分布 \(\mathbb{P}\in\Delta(T)\)(\(T\) 有限时 \(\mathbb{P}(t)>0\)、\(\sum_t\mathbb{P}(t)=1\))。
Def 10.17(Bayes' rule):\(T_i\) 有限,玩家 \(i\) 由先验经贝叶斯法则更新:\(\mathbb{P}(t_{-i}|t_i)\equiv\dfrac{\mathbb{P}(t_i,t_{-i})}{\sum_{t'_{-i}\in T_{-i}}\mathbb{P}(t_i,t'_{-i})}\),得在自己类型实现为 \(t_i\) 时他人类型的条件分布。Def 10.16: a Bayesian game \(BG=(A_i,T_i,u_i,\mathbb{P})_{i=1}^N\) consists of: action set \(A_i\); type set \(T_i\); VNM utility \(u_i:A\times T\to\mathbb{R}\) (\(A=\prod A_i\), \(T=\prod T_i\), \(t=(t_1,\dots,t_N)\in T\)); a common probability distribution \(\mathbb{P}\in\Delta(T)\) over types (\(\mathbb{P}(t)>0\), \(\sum_t\mathbb{P}(t)=1\) when \(T\) finite).
Def 10.17 (Bayes' rule): \(T_i\) finite, player \(i\) updates the prior by Bayes' rule: \(\mathbb{P}(t_{-i}|t_i)\equiv\dfrac{\mathbb{P}(t_i,t_{-i})}{\sum_{t'_{-i}\in T_{-i}}\mathbb{P}(t_i,t'_{-i})}\), the conditional distribution of others' types once his own type is \(t_i\).
注记 10.5 / Remark 10.5 称 \(A_i\) 为行动集而非策略集,因把"策略"留给类型相依的行动计划:策略指示玩家在某类型下采取某(纯或混合)行动。此前收益已定、任何行动带来对应收益,故(纯/混合)行动本身即策略;现收益方案不明,玩家需对每个可能所处情形(类型)的指示,使指示(计划)而非行动成为直接影响收益的最高层级,故把类型相依计划称为策略。We call \(A_i\) action sets rather than strategy sets because we reserve "strategy" for a type-contingent plan of actions: a strategy instructs the player to take certain (pure or mixed) actions in a certain type. Before, payoffs were settled and any action brought a corresponding payoff, so (pure/mixed) actions themselves were strategies; now payoff schemes are unclear, so the player needs instructions for each scenario (type), making instructions (plans) — not actions — the highest level directly influencing payoff, so type-contingent plans are the strategies.
Definitions 10.18–10.21(pure / mixed / behavioral strategy & expected utility)
Def 10.18(纯策略):\(s_i:T_i\to A_i\),故 \(S_i\equiv\{s_i:T_i\to A_i\}\)。(Rmk 10.6:纯策略即为自己类型的实现确定取某行动的一一映射;Rmk 10.7:\(A_i,T_i\) 有限时 \(\text{card}(S_i)=\text{card}(A_i)^{\text{card}(T_i)}\)。)
Def 10.19(混合策略):\(s_i\in S_i\) 上的概率分布,\(M_i=\{m_i:S_i\to[0,1]\text{ s.t. }\sum_{s_i\in S_i}m_i(s_i)=1\}\supseteq S_i\)。(Rmk 10.8:此处混合策略≠动机例中"类型内混合行动"——后者是行为策略;混合策略是在知类型前对涉及所有类型的计划随机化。)
Def 10.20(行为策略):\(b_i:A_i\times T_i\to[0,1]\) s.t. \(\sum_{a_i\in A_i}b_i(a_i|t_i)=1\),\(B_i\) 为其集。(Rmk 10.9:\(b_i\) 对给定类型 \(t_i\) 给每个行动 \(a_i\) 赋概率;而 \(m_i\) 给所有类型的行动计划赋概率。)
Def 10.21(期望效用):在类型 \(t_i\) 下取行动 \(a_i\) 的期望效用Def 10.18 (Pure strategy): \(s_i:T_i\to A_i\), so \(S_i\equiv\{s_i:T_i\to A_i\}\). (Rmk 10.6: a one-to-one mapping deciding an action for a realization of own type; Rmk 10.7: \(\text{card}(S_i)=\text{card}(A_i)^{\text{card}(T_i)}\) when finite.)
Def 10.19 (Mixed strategy): a distribution over \(s_i\in S_i\), \(M_i=\{m_i:S_i\to[0,1]\text{ s.t. }\sum_{s_i\in S_i}m_i(s_i)=1\}\supseteq S_i\). (Rmk 10.8: this mixed strategy \(\ne\) "mixing actions within a realized type" in the motivating example — that is a behavioral strategy; a mixed strategy randomizes over plans for all types before knowing the type.)
Def 10.20 (Behavioral strategy): \(b_i:A_i\times T_i\to[0,1]\) s.t. \(\sum_{a_i\in A_i}b_i(a_i|t_i)=1\), \(B_i\) its set. (Rmk 10.9: \(b_i\) assigns probability to each action \(a_i\) for a given type \(t_i\); \(m_i\) assigns probabilities to plans for all types.)
Def 10.21 (Expected utility): expected utility of taking action \(a_i\) in type \(t_i\):
$$V(a_i,b_{-i}|t_i)=\sum_{t_{-i}\in T_{-i}}\mathbb{P}(t_{-i}|t_i)\sum_{a_{-i}\in A_{-i}}b_{-i}(a_{-i}|t_{-i})\cdot u_i(a_i,a_{-i},t_i,t_{-i}),\quad\forall a_i\in A_i$$
其中 \(b_{-i}(a^k_{-i}|t^k_{-i})\equiv\prod_{j\ne i}b(a^k_j|t^k_j)\)。
10.3.3 Bayes Nash equilibrium
where \(b_{-i}(a^k_{-i}|t^k_{-i})\equiv\prod_{j\ne i}b(a^k_j|t^k_j)\).
10.3.3 Bayes Nash equilibrium
Definition 10.22(Bayes Nash equilibrium) \((b^\star_1,\dots,b^\star_N)\in B\equiv\prod_{i=1}^N B_i\) 是 BNE,当且仅当 \(\forall i\)、\(\forall t_i\in T_i\)、\(\forall a_i\in A_i\) 使 \(b^\star_i(a_i|t_i)>0\),\(V(a_i,b^\star_{-i}|t_i)\ge V(a'_i,b^\star_{-i}|t_i)\)(\(\forall a'_i\in A_i\))。(Rmk 10.10:BNE 处每个玩家在每个类型情形下,所有被赋正概率的行动对他无差异,且无其他行动能生更高效用,故不偏离;对所有玩家、所有类型成立——即给定他人一切固定时无可获利偏离。)\((b^\star_1,\dots,b^\star_N)\in B\equiv\prod_{i=1}^N B_i\) is a BNE iff \(\forall i\), \(\forall t_i\in T_i\), \(\forall a_i\in A_i\) with \(b^\star_i(a_i|t_i)>0\), \(V(a_i,b^\star_{-i}|t_i)\ge V(a'_i,b^\star_{-i}|t_i)\) (\(\forall a'_i\in A_i\)). (Rmk 10.10: at a BNE, for each player in each type scenario, all positive-probability actions are indifferent and no other action gives higher utility, so no deviation; true for all players and all types — non-existence of profitable deviation given everything else fixed.)
10.3.4 Equivalence of behavioral strategy and mixed strategy
先建立混合策略 \(m\in M=\prod M_i\) 的期望效用 \(U_i(m)\)。
10.3.4 Equivalence of behavioral strategy and mixed strategy
First establish the expected utility \(U_i(m)\) for a mixed strategy \(m\in M=\prod M_i\).
Definitions 10.23–10.25(expected utility & equivalence)
Def 10.23(纯策略剖面期望效用):\(u_i(s)=\sum_{(t_1,\dots,t_N)\in\prod T_i}\mathbb{P}(t_1,\dots,t_N)\cdot\tilde u_i(s_1(t_1),\dots,s_N(t_N),t_1,\dots,t_N)\)(\(\tilde u_i\) 为类型 \(t\) 与行动下的收益;Rmk 10.11:\(G=(S_i,u_i)\) 即策略型博弈;Rmk 10.12:因类型剖面随机,故 \(u_i(s)\) 随机)。
Def 10.24(混合策略剖面期望效用):\(U_i(m)=\sum_{s\in\prod S_i}m(s)\cdot u_i(s)\),\(m(s)\equiv\prod_{j=1}^N m_j(s_j)\)。
Def 10.25(行为策略与混合策略等价):\(b_i\) 与 \(m_i\) 等价,当且仅当 \(\forall a_i\in A_i,\forall t_i\in T_i\),\(b_i(a_i|t_i)=\sum_{\{s_i\in S_i:s_i(t_i)=a_i\}}m_i(s_i)\);\(b\) 与 \(m\) 等价若 \(b_i\) 对 \(\forall i\) 与 \(m_i\) 等价。(Rmk 10.13:\(m_i\) 唯一钉定 \(b_i\)。)Def 10.23 (Pure-profile expected utility): \(u_i(s)=\sum_{(t_1,\dots,t_N)\in\prod T_i}\mathbb{P}(t_1,\dots,t_N)\cdot\tilde u_i(s_1(t_1),\dots,s_N(t_N),t_1,\dots,t_N)\) (\(\tilde u_i\) payoff from type \(t\) and actions; Rmk 10.11: \(G=(S_i,u_i)\) is a strategic form game; Rmk 10.12: \(u_i(s)\) is random because the type profile is random).
Def 10.24 (Mixed-profile expected utility): \(U_i(m)=\sum_{s\in\prod S_i}m(s)\cdot u_i(s)\), \(m(s)\equiv\prod_{j=1}^N m_j(s_j)\).
Def 10.25 (Behavioral \(\equiv\) mixed): \(b_i\) equivalent to \(m_i\) iff \(\forall a_i\in A_i,\forall t_i\in T_i\), \(b_i(a_i|t_i)=\sum_{\{s_i\in S_i:s_i(t_i)=a_i\}}m_i(s_i)\); \(b\equiv m\) if \(b_i\equiv m_i\) for \(\forall i\). (Rmk 10.13: \(m_i\) uniquely pins down \(b_i\).)
Fact 10.1 & Theorem 10.6
Fact 10.1:若 \(b\in B\) 与 \(m\in M\) 等价,则二者诱导 \(A\times T\) 上(即每个 \((a_i,t_i)\) 对上)相同的概率分布。
Thm 10.6:若 \(b,m\) 等价,则对每个玩家 \(i\),\(U_i(m)=\sum_{t_i\in T_i,a_i\in A_i}p_i(t_i)\cdot b_i(a_i|t_i)\cdot V(a_i,b_{-i}|t_i)\) (10.11),其中 \(p_i(t_i)=\sum_{t_{-i}\in T_{-i}}\mathbb{P}(t_i,t_{-i})\)。Fact 10.1: if \(b\in B\) and \(m\in M\) are equivalent, they induce the same probability distribution over \(A\times T\) (over each \((a_i,t_i)\) pair).
Thm 10.6: if \(b,m\) equivalent, then for every player \(i\), \(U_i(m)=\sum_{t_i\in T_i,a_i\in A_i}p_i(t_i)\cdot b_i(a_i|t_i)\cdot V(a_i,b_{-i}|t_i)\) (10.11), where \(p_i(t_i)=\sum_{t_{-i}\in T_{-i}}\mathbb{P}(t_i,t_{-i})\).
证明 / Proof (Fact 10.1 & Theorem 10.6)
Thm 10.6:从定义 \(U_i(m)=\sum_{s\in\prod S_i}m(s)u_i(s)=\sum_s m(s)\sum_{(t_1,\dots,t_N)}\mathbb{P}(t_1,\dots,t_N)\tilde u_i(s_1(t_1),\dots,s_N(t_N),t)\) 与 \(V\) 的定义出发。从 (10.11) 右边展开,对求和指标重排(交换 \(\sum_{t}\) 与 \(\sum_a\)、按 \(s_i(t_i)=a_i\) 分组、用类型独立与 Def 10.25 把 \(b_i\) 还原为 \(\sum m_i\)、并用 \(m(s)\equiv\prod m_j(s_j)\)):
Thm 10.6: start from the definitions of \(U_i(m)=\sum_{s\in\prod S_i}m(s)u_i(s)=\sum_s m(s)\sum_{(t_1,\dots,t_N)}\mathbb{P}(t_1,\dots,t_N)\tilde u_i(s_1(t_1),\dots,s_N(t_N),t)\) and \(V\). Expanding the RHS of (10.11) and re-grouping the summation indices (swap \(\sum_t\) and \(\sum_a\), group by \(s_i(t_i)=a_i\), use type independence and Def 10.25 to fold \(b_i\) back into \(\sum m_i\), and use \(m(s)\equiv\prod m_j(s_j)\)):
$$ > \begin{aligned} > &\sum_{t_i\in T_i,a_i\in A_i}p_i(t_i)b_i(a_i|t_i)V(a_i,b_{-i}|t_i)\\ > &=\sum_{t_i,a_i}b_i(a_i|t_i)\sum_{t}\mathbb{P}(t_1,\dots,t_N)\sum_{t_{-i},a_{-i}}b_{-i}(a_{-i}|t_{-i})u_i(a_i,a_{-i},t_i,t_{-i})\\ > &=\sum_{t\in T,a\in A}\left(\sum_{\{s\in S:s(t)=a\}}m(s)\right)\sum_{t}\mathbb{P}(t_1,\dots,t_N)u_i(a,t)\\ > &=\sum_{s\in S}m(s)\cdot u_i(s)=U_i(m).\quad\blacksquare > \end{aligned} > $$
Theorem 10.7(Mixed \(\equiv\) behavioral) 对任意玩家 \(i\)、任意 \(b_i\in B_i\),由 \(m_i(s_i)=\prod_{t_i\in T_i}b_i(s_i(t_i)|t_i)\) (10.12)(\(\forall s_i\in S_i\))定义的混合策略 \(m_i\in M_i\) 与 \(b_i\) 等价。(Rmk 10.14:\(m_i\) 由 (10.12) 唯一钉定,但可能有多个 \(m_i\) 等价于 \(b_i\),见 Example 10.9。)For any player \(i\) and any \(b_i\in B_i\), the mixed strategy \(m_i\in M_i\) defined by \(m_i(s_i)=\prod_{t_i\in T_i}b_i(s_i(t_i)|t_i)\) (10.12) (\(\forall s_i\in S_i\)) is equivalent to \(b_i\). (Rmk 10.14: \(m_i\) is uniquely pinned by (10.12), but there may be multiple \(m_i\) equivalent to \(b_i\), see Example 10.9.)
证明 / Proof (Theorem 10.7)
Example 10.9(Multiple \(m_i\) equivalent to one \(b_i\)) \(T_i=\{t^L_i,t^R_i\}\)、\(A_i=\{L,R\}\),\(b_i(k|t^k_i)=\tfrac23\)(\(k\in\{L,R\}\))。\(b_i\) 等价于 \(m_1=(\tfrac29,\tfrac49,\tfrac19,\tfrac29)\) 与 \(m_2=(\tfrac13,\tfrac13,0,\tfrac13)\)(四元为四个纯策略 \(\{t^L_i\Rightarrow L,t^R_i\Rightarrow L\}\)、\(\{t^L_i\Rightarrow L,t^R_i\Rightarrow R\}\)、\(\{t^L_i\Rightarrow R,t^R_i\Rightarrow L\}\)、\(\{t^L_i\Rightarrow R,t^R_i\Rightarrow R\}\) 的概率),\(m_1,m_2\) 的任意凸组合也等价于 \(b_i\)。\(T_i=\{t^L_i,t^R_i\}\), \(A_i=\{L,R\}\), \(b_i(k|t^k_i)=\tfrac23\) (\(k\in\{L,R\}\)). \(b_i\) is equivalent to \(m_1=(\tfrac29,\tfrac49,\tfrac19,\tfrac29)\) and \(m_2=(\tfrac13,\tfrac13,0,\tfrac13)\) (the four entries are probabilities for the four pure strategies \(\{t^L_i\Rightarrow L,t^R_i\Rightarrow L\}\), \(\{t^L_i\Rightarrow L,t^R_i\Rightarrow R\}\), \(\{t^L_i\Rightarrow R,t^R_i\Rightarrow L\}\), \(\{t^L_i\Rightarrow R,t^R_i\Rightarrow R\}\)), and any convex combination of \(m_1,m_2\) is also equivalent to \(b_i\).
10.3.5 Relationship between Bayes Nash equilibrium and Nash equilibrium
BNE(在 \(BG\) 的行为策略中)与 NE(在策略型博弈 \(G\) 的混合策略中)如何比较?若 BNE 的行为策略与 NE 的混合策略等价,则二者应相同。事实上可借此关系证明 BNE 的存在性(因有限策略型博弈必存在混合 NE);现实中 BNE 更易计算应用。
10.3.5 Relationship between Bayes Nash equilibrium and Nash equilibrium
How does BNE (in behavioral strategies of \(BG\)) compare with NE (in mixed strategies of \(G\))? They should be the same if BNE behavioral strategies are equivalent to NE mixed strategies. In fact this relationship proves the existence of BNE (since a finite strategic form game always has a mixed NE); in reality BNE is easier to compute and apply.
Theorem 10.8(NE of G \(\Leftrightarrow\) BNE of BG) 若 \(m^\star\in M\) 是 \(G\) 的 NE,则与 \(m^\star\) 等价的唯一 \(b^\star\in B\) 是 \(BG\) 的 BNE;反之,若 \(b^\star\in B\) 是 \(BG\) 的 BNE,则任何与 \(b^\star\) 等价的 \(m^\star\in M\) 是 \(G\) 的 NE。If \(m^\star\in M\) is a NE of \(G\), then the unique \(b^\star\in B\) equivalent to \(m^\star\) is a BNE of \(BG\); conversely, if \(b^\star\in B\) is a BNE of \(BG\), then any \(m^\star\in M\) equivalent to \(b^\star\) is a NE of \(G\).
证明 / Proof (Theorem 10.8)
$$\sum_{t_i\in T_i,a_i\in A_i}p_i(t_i)\left[b^\star_i(a_i|t_i)-b_i(a_i|t_i)\right]V(a_i,b^\star_{-i}|t_i)\ge0\tag{10.13}$$
后半:设 \(b^\star\) 为 BNE,故 \(\forall i,t_i,a_i\) 使 \(b^\star_i(a_i|t_i)>0\),\(V(a_i,b^\star_{-i}|t_i)\ge V(a'_i,b^\star_{-i}|t_i)\)。取等价的 \(m^\star_i(s_i)=\prod_{t_i\in T_i}b^\star_i(s_i(t_i)|t_i)\),则 \(U_i(m^\star)=\sum_{t_i,a_i}p_i(t_i)b^\star_i(a_i|t_i)V(a_i,b^\star_{-i}|t_i)\ge\sum_{t_i}p_i(t_i)b^\star_i(s_i(t_i)|t_i)V(s_i(t_i),b^\star_{-i}|t_i)=U_i(s_i,m^\star_{-i})\),即 \(m^\star\) 为 NE。\(\blacksquare\)
Second half: suppose \(b^\star\) a BNE, so \(\forall i,t_i,a_i\) with \(b^\star_i(a_i|t_i)>0\), \(V(a_i,b^\star_{-i}|t_i)\ge V(a'_i,b^\star_{-i}|t_i)\). Take the equivalent \(m^\star_i(s_i)=\prod_{t_i\in T_i}b^\star_i(s_i(t_i)|t_i)\); then \(U_i(m^\star)=\sum_{t_i,a_i}p_i(t_i)b^\star_i(a_i|t_i)V(a_i,b^\star_{-i}|t_i)\ge\sum_{t_i}p_i(t_i)b^\star_i(s_i(t_i)|t_i)V(s_i(t_i),b^\star_{-i}|t_i)=U_i(s_i,m^\star_{-i})\), i.e. \(m^\star\) is a NE. \(\blacksquare\)
故 \(G\) 的 NE 总可映为 \(BG\) 的唯一 BNE,结合有限策略型博弈至少有一个 NE,得有限类型/行动空间下 BNE 的存在性。无限(连续)类型空间下 BNE 有时仍存在,见下例。
Example 10.10(连续类型空间):两玩家。玩家 1 类型空间 \(T_1=[0,1]\)、\(t_1\sim\text{unif}[0,1]\);玩家 2 类型空间 \(T_2=[1,2]\)、\(t_2\sim\text{unif}[1,2]\),二者独立。\(A_1=\{U,D\}\)、\(A_2=\{L,R\}\),收益矩阵如下。类型独立,故玩家 1 的类型不提供关于玩家 2 类型分布的信息。
So a NE of \(G\) always maps to a unique BNE of \(BG\); with the existence of at least one NE for finite strategic form games, this gives existence of BNE for finite type/action spaces. For infinite (continuous) type spaces, a BNE can sometimes still exist, see below.
Example 10.10 (Continuous type space): two players. Player 1's type space \(T_1=[0,1]\), \(t_1\sim\text{unif}[0,1]\); player 2's \(T_2=[1,2]\), \(t_2\sim\text{unif}[1,2]\), independent. \(A_1=\{U,D\}\), \(A_2=\{L,R\}\), payoff matrix below. Types independent, so player 1's type gives no information about player 2's type distribution.
| \(1\backslash2\) | \(L\ (q)\) | \(R\ (1-q)\) |
|---|---|---|
| \(U\ (p)\) | \((2+t_1,9+t_2)\) | \((4+t_1,3)\) |
| \(D\ (1-p)\) | \((5,t_2)\) | $(1,3)$ |
设玩家 2 行为策略 \(b_2\),则给定任一玩家 1 类型,玩家 2 选 \(L\) 的总概率 \(q=\int_1^2 b_2(L|t_2)\,dt_2\);同理 \(p=\int_0^1 b_1(U|t_1)\,dt_1\)。玩家 1 类型 \(t_1\) 下:选 \(U\) 期望收益 \(V(U,b_2|t_1)=(2+t_1)q+(4+t_1)(1-q)=t_1+4-2q\),选 \(D\) 为 \(V(D,b_2|t_1)=5q+1\cdot(1-q)=4q+1\),无差异条件 \(t_1=t^\star_1=6q-3\) (10.14)(\(t_1>t^\star_1\) 必选 \(U\)、\(t_1
Given player 2's behavioral strategy \(b_2\), for any player 1 type the overall probability player 2 chooses \(L\) is \(q=\int_1^2 b_2(L|t_2)\,dt_2\); similarly \(p=\int_0^1 b_1(U|t_1)\,dt_1\). For player 1 of type \(t_1\): \(U\) gives \(V(U,b_2|t_1)=(2+t_1)q+(4+t_1)(1-q)=t_1+4-2q\), \(D\) gives \(V(D,b_2|t_1)=5q+1\cdot(1-q)=4q+1\), indifference \(t_1=t^\star_1=6q-3\) (10.14) (\(t_1>t^\star_1\) plays \(U\), \(t_1
$$p=\int_0^1 b_1(U|t_1)\,dt_1=\int_0^{t^\star_1}0\,dt_1+\int_{t^\star_1}^1 1\,dt_1=1-t^\star_1\tag{10.16}$$
$$q=\int_1^2 b_2(L|t_2)\,dt_2=\int_1^{t^\star_2}0\,dt_2+\int_{t^\star_2}^2 1\,dt_2=2-t^\star_2\tag{10.17}$$
均衡处 (10.14)–(10.17) 同时成立:\(t^\star_1=6(2-t^\star_2)-3\Rightarrow t^\star_1+6t^\star_2=9\)、\(t^\star_2=3-9(1-t^\star_1)\Rightarrow 9t^\star_1-t^\star_2=6\),解得 \(t^\star_1=\tfrac{9}{11}\)、\(t^\star_2=\tfrac{15}{11}\)。故 BNE:\(b_1\):\(t_1<\tfrac{9}{11}\) 时 \(b_1(U|t_1)=0,b_1(D|t_1)=1\),\(t_1>\tfrac{9}{11}\) 时 \(b_1(U|t_1)=1,b_1(D|t_1)=0\);\(b_2\):\(t_2<\tfrac{15}{11}\) 时 \(b_2(L|t_2)=0,b_2(R|t_2)=1\),\(t_2>\tfrac{15}{11}\) 时 \(b_2(L|t_2)=1,b_2(R|t_2)=0\)。
At equilibrium (10.14)–(10.17) hold simultaneously: \(t^\star_1=6(2-t^\star_2)-3\Rightarrow t^\star_1+6t^\star_2=9\), \(t^\star_2=3-9(1-t^\star_1)\Rightarrow 9t^\star_1-t^\star_2=6\), giving \(t^\star_1=\tfrac{9}{11}\), \(t^\star_2=\tfrac{15}{11}\). So the BNE: \(b_1\): if \(t_1<\tfrac{9}{11}\), \(b_1(U|t_1)=0,b_1(D|t_1)=1\); if \(t_1>\tfrac{9}{11}\), \(b_1(U|t_1)=1,b_1(D|t_1)=0\); \(b_2\): if \(t_2<\tfrac{15}{11}\), \(b_2(L|t_2)=0,b_2(R|t_2)=1\); if \(t_2>\tfrac{15}{11}\), \(b_2(L|t_2)=1,b_2(R|t_2)=0\).
10.4 Extensive form game
至今两类博弈:策略型 \(G\)(同时、完全信息、单轮)与贝叶斯 \(BG\)(不完全信息、类型、类型实现前同时决策、单轮)。现考虑更一般的多轮、含动态与信息的模型:玩家可观察并用信息影响后续决策。
10.4.1 Motivating example
Example 10.11(simple Poker):两玩家。牌堆 2 张 deuce(\(D\))、1 张 Jack(\(J\)),自然随机抽牌。玩家 1 抽一张 \(\theta\in\{D,J\}\),宣称 \(D\) 或 \(J\)(可撒谎),行动 \(a_1\in\{D,J\}\);玩家 2 预测说 \(D\) 或 \(J\),行动 \(a_2\in\{D,J\}\)。收益 \((u_1,u_2)\):玩家 1 诚实且玩家 2 错为 $(1,-1)$;诚实且 2 对为 $(-1,1)$;不诚实且 2 错为 $(3,-3)$;不诚实且 2 对为 $(-3,3)$。
10.4 Extensive form game
So far two types of game: strategic form \(G\) (simultaneous, complete info, one round) and Bayesian \(BG\) (incomplete info, types, simultaneous before type realization, one round). Now a more general model with multiple rounds, dynamics and information: players observe and use information to affect later decisions.
10.4.1 Motivating example
Example 10.11 (simple Poker): two players. Deck of 2 deuces (\(D\)) and 1 Jack (\(J\)); nature randomizes the card. Player 1 draws \(\theta\in\{D,J\}\), claims \(D\) or \(J\) (may lie), action \(a_1\in\{D,J\}\); player 2 predicts saying \(D\) or \(J\), action \(a_2\in\{D,J\}\). Payoffs \((u_1,u_2)\): player 1 truthful and 2 wrong $(1,-1)\(; truthful and 2 right \)(-1,1)\(; untruthful and 2 wrong \)(3,-3)\(; untruthful and 2 right \)(-3,3)$.
图 1(博弈树,已转述):自然以 \(\tfrac13\) 抽 \(J\)、\(\tfrac23\) 抽 \(D\);玩家 2 无法区分各虚线椭圆内的节点。定义混合概率:\(r_{JJ}\)=观察到 \(J\) 后宣称 \(J\) 的概率(\(1-r_{JJ}\) 宣称 \(D\));\(r_{JD}\)=观察到 \(D\) 后宣称 \(J\) 的概率;\(q_{JJ}\)=听到宣称 \(J\) 后说 \(J\) 的概率;\(q_{JD}\)=听到宣称 \(D\) 后说 \(J\) 的概率。求使双方都不改变混合的均衡。可证均衡中 \(r_{JJ},r_{JD},q_{JJ},q_{JD}\in(0,1)\)(Claim 10.1:逐一检验角点;如 \(q_{JJ}=1\) 时玩家 2 听到 \(J\) 总说 \(J\),但宣称 \(J\) 中 \(\tfrac23\) 概率来自 \(D\) 的不诚实宣称,玩家 2 有偏离动机,矛盾)。
由 \(r_{JJ}\in(0,1)\),玩家 1 观察 \(J\) 后对宣称 \(J/D\) 无差异:宣称 \(J\) 期望 \((-1)q_{JJ}+1(1-q_{JJ})=1-2q_{JJ}\)、宣称 \(D\) 期望 \((-3)q_{JD}+3(1-q_{JD})=3-6q_{JD}\),故 \(1-2q_{JJ}=3-6q_{JD}\) (10.18)。由 \(r_{JD}\in(0,1)\),观察 \(D\) 后无差异:宣称 \(J\) 为 \(3q_{JJ}-3(1-q_{JJ})=6q_{JJ}-3\)、宣称 \(D\) 为 \(q_{JD}-(1-q_{JD})=2q_{JD}-1\),故 \(6q_{JJ}-3=2q_{JD}-1\) (10.19)。联立得 \(q_{JJ}=q_{JD}=\tfrac12\)。
玩家 2 听到宣称后用贝叶斯法则更新:\(\mathbb{P}(\theta=J|a_1=J)=\dfrac{\tfrac13 r_{JJ}}{\tfrac13 r_{JJ}+\tfrac23 r_{JD}}=\dfrac{r_{JJ}}{r_{JJ}+2r_{JD}}\),\(\mathbb{P}(\theta=D|a_1=J)=\dfrac{2r_{JD}}{r_{JJ}+2r_{JD}}\);\(\mathbb{P}(\theta=J|a_1=D)=\dfrac{1-r_{JJ}}{3-r_{JJ}-2r_{JD}}\),\(\mathbb{P}(\theta=D|a_1=D)=\dfrac{2-2r_{JD}}{3-r_{JJ}-2r_{JD}}\)。由 \(q_{JJ}\in(0,1)\),玩家 2 听到 \(J\) 后对猜 \(J/D\) 无差异 \(\Rightarrow r_{JJ}=6r_{JD}\) (10.20);由 \(q_{JD}\in(0,1)\),听到 \(D\) 后无差异 \(\Rightarrow 3r_{JJ}-2r_{JD}=1\) (10.21)。联立得 \(r_{JJ}=\tfrac38\)、\(r_{JD}=\tfrac1{16}\)。故均衡 \(q_{JJ}=q_{JD}=\tfrac12\)、\(r_{JJ}=\tfrac38\)、\(r_{JD}=\tfrac1{16}\)。
10.4.2 Formal definition of extensive form game
Figure 1 (game tree, paraphrased): nature draws \(J\) with \(\tfrac13\), \(D\) with \(\tfrac23\); player 2 cannot distinguish nodes within each dotted ellipse. Define mixing probabilities: \(r_{JJ}\) = prob of claiming \(J\) after observing \(J\) (\(1-r_{JJ}\) claim \(D\)); \(r_{JD}\) = prob of claiming \(J\) after \(D\); \(q_{JJ}\) = prob of saying \(J\) after hearing claim \(J\); \(q_{JD}\) = prob of saying \(J\) after hearing claim \(D\). We seek the equilibrium where neither changes mixing. One can show \(r_{JJ},r_{JD},q_{JJ},q_{JD}\in(0,1)\) at equilibrium (Claim 10.1: check corners; e.g. if \(q_{JJ}=1\), player 2 always says \(J\) on hearing \(J\), but a claim of \(J\) is \(\tfrac23\) likely an untruthful claim from \(D\), so player 2 has an incentive to deviate, contradiction).
By \(r_{JJ}\in(0,1)\), player 1 after observing \(J\) is indifferent between claiming \(J/D\): claim \(J\) gives \((-1)q_{JJ}+1(1-q_{JJ})=1-2q_{JJ}\), claim \(D\) gives \((-3)q_{JD}+3(1-q_{JD})=3-6q_{JD}\), so \(1-2q_{JJ}=3-6q_{JD}\) (10.18). By \(r_{JD}\in(0,1)\), after observing \(D\): claim \(J\) gives \(3q_{JJ}-3(1-q_{JJ})=6q_{JJ}-3\), claim \(D\) gives \(q_{JD}-(1-q_{JD})=2q_{JD}-1\), so \(6q_{JJ}-3=2q_{JD}-1\) (10.19). Solving, \(q_{JJ}=q_{JD}=\tfrac12\).
Player 2 updates by Bayes' rule: \(\mathbb{P}(\theta=J|a_1=J)=\dfrac{\tfrac13 r_{JJ}}{\tfrac13 r_{JJ}+\tfrac23 r_{JD}}=\dfrac{r_{JJ}}{r_{JJ}+2r_{JD}}\), \(\mathbb{P}(\theta=D|a_1=J)=\dfrac{2r_{JD}}{r_{JJ}+2r_{JD}}\); \(\mathbb{P}(\theta=J|a_1=D)=\dfrac{1-r_{JJ}}{3-r_{JJ}-2r_{JD}}\), \(\mathbb{P}(\theta=D|a_1=D)=\dfrac{2-2r_{JD}}{3-r_{JJ}-2r_{JD}}\). By \(q_{JJ}\in(0,1)\), player 2 after hearing \(J\) is indifferent between guessing \(J/D\) \(\Rightarrow r_{JJ}=6r_{JD}\) (10.20); by \(q_{JD}\in(0,1)\), after hearing \(D\) \(\Rightarrow 3r_{JJ}-2r_{JD}=1\) (10.21). Solving, \(r_{JJ}=\tfrac38\), \(r_{JD}=\tfrac1{16}\). So the equilibrium is \(q_{JJ}=q_{JD}=\tfrac12\), \(r_{JJ}=\tfrac38\), \(r_{JD}=\tfrac1{16}\).
10.4.2 Formal definition of extensive form game
Definition 10.26(Extensive form game)& Definition 10.27(Partition)
Def 10.26:\(\Gamma\) 由以下构成:1. 有限玩家数 \(N\);2. 行动集 \(A\)(含游戏中可能出现的所有行动);3. 节点(历史)集 \(X\):(a) 初始节点 \(x_0\in X\),(b) \(X\setminus\{x_0\}\) 为 \(A\) 中行动的有限序列集,(c) 若序列在 \(X\setminus\{x_0\}\) 则其所有截断也在其中;4. 自然在 \(x_0\) 的行动集 \(A(x_0)\subseteq A\) 与分布 \(\pi\in\Delta(A(x_0))\)(自然只在 \(x_0\) 动一次);5. \(\forall x\in X\setminus\{x_0\}\),可用行动 \(A(x)=\{a\in A:(x,a)\in X\}\);(a) 末端节点 \(E\equiv\{x\in X:A(x)=\emptyset\}\),(b) 决策节点 \(D\equiv X\setminus(E\cup\{x_0\})\);6. \(\iota:D\to\{1,\dots,N\}\) 指明在每决策节点轮到的玩家;(a) \(X_i\equiv\{x\in D:\iota(x)=i\}\);7. 对每玩家 \(i\),\(\mathcal{I}_i=\{I_{i,1},\dots,I_{i,k}\}\) 为 \(X_i\) 的划分,\(\forall x,x'\in I_{i,j}\) 玩家 \(i\) 不能区分(信息集);8. \(u_i:E\to\mathbb{R}\) VNM 效用。
Def 10.27:划分性质 (a) \(\forall I\in\mathcal{I}_i,\forall x,x'\in I\),\(A(x)=A(x')\),故可记 \(A(I)\);(b) \(\mathcal{I}=\cup_{i=1}^N\mathcal{I}_i\) 为 \(D\) 的划分;(c) \(\mathcal{I}(x)\) 为含 \(x\) 的划分元素。Def 10.26: \(\Gamma\) consists of: 1. finite \(N\) players; 2. action set \(A\) (all actions that can occur); 3. set of nodes (histories) \(X\): (a) initial node \(x_0\in X\), (b) \(X\setminus\{x_0\}\) a collection of finite sequences of actions in \(A\), (c) if a sequence is in \(X\setminus\{x_0\}\) all its truncations are too; 4. nature's action set \(A(x_0)\subseteq A\) at \(x_0\) with \(\pi\in\Delta(A(x_0))\) (nature moves once, at \(x_0\)); 5. \(\forall x\in X\setminus\{x_0\}\), available actions \(A(x)=\{a\in A:(x,a)\in X\}\); (a) end nodes \(E\equiv\{x\in X:A(x)=\emptyset\}\), (b) decision nodes \(D\equiv X\setminus(E\cup\{x_0\})\); 6. \(\iota:D\to\{1,\dots,N\}\) indicates the player at each decision node; (a) \(X_i\equiv\{x\in D:\iota(x)=i\}\); 7. for each \(i\), \(\mathcal{I}_i=\{I_{i,1},\dots,I_{i,k}\}\) a partition of \(X_i\), with \(\forall x,x'\in I_{i,j}\) indistinguishable to player \(i\) (information set); 8. \(u_i:E\to\mathbb{R}\) VNM utility.
Def 10.27: partition properties (a) \(\forall I\in\mathcal{I}_i,\forall x,x'\in I\), \(A(x)=A(x')\), so write \(A(I)\); (b) \(\mathcal{I}=\cup_{i=1}^N\mathcal{I}_i\) partitions \(D\); (c) \(\mathcal{I}(x)\) the element containing \(x\).
注记 10.15–10.16 / Remarks 10.15–10.16 10.15:游戏本质钉定每玩家唯一划分 \(\mathcal{I}_i\);同一信息集 \(I_{i,j}\) 内节点对 \(i\) 包含完全相同信息、不可区分,故 \(i\) 在其中任一节点用同一行动计划。例:动机例博弈树中 \(\mathcal{I}_1=\{\{1a\},\{1b\}\}\)、\(\mathcal{I}_2=\{\{2c,2d\},\{2e,2f\}\}\)。10.16:策略型博弈涵盖扩展型与贝叶斯博弈;反之扩展型与贝叶斯博弈也可生成策略型博弈——任何复杂博弈中玩家总能在开局前定好相机计划并贯彻全程,故视作所有玩家在开局前同时选一计划(或计划的概率混合),即策略型博弈。10.15: the game's nature pins down each player's unique partition \(\mathcal{I}_i\); nodes in the same information set \(I_{i,j}\) contain exactly the same information to \(i\) and are indistinguishable, so \(i\) uses the same plan at any node within it. E.g. in the motivating example \(\mathcal{I}_1=\{\{1a\},\{1b\}\}\), \(\mathcal{I}_2=\{\{2c,2d\},\{2e,2f\}\}\). 10.16: the strategic form game encompasses extensive form and Bayesian games; conversely they can generate a strategic form game — in any complex game a player can make a contingent plan before the start and carry it out, so all players simultaneously choose a plan (or a probability-weighted mixture of plans) before the start, making it a strategic form game.
Definitions 10.28–10.34(strategies & behavioral NE)
Def 10.28(纯策略):\(s_i:\mathcal{I}_i\to A\) 使 \(s(I)\in A(I)\)。Def 10.29:\(S=\prod S_i\) 联合纯策略剖面集;给定 \(s\) 与自然行动 \(a\in A(x_0)\),唯一末端节点 \(\eta(s,a)\),\(U_i(s)=\sum_{a\in A(x_0)}\pi(a)u_i(\eta(s,a))\)。Def 10.30(混合策略):\(m_i:S_i\to[0,1]\),\(\sum m_i(s_i)=1\)。Def 10.31(行为策略):\(\forall I\in\mathcal{I}_i,\forall a\in A(I)\),\(b_i(a,I)\in[0,1]\),\(\sum_{a'\in A(I)}b_i(a',I)=1\)。Def 10.32(末端节点分布):\(e=(a_0,a_1,\dots,a_K)\) 时 \(\mathbb{P}(e|b)=\pi(a_0)\prod_{k=1}^K b_{\iota(a_{
10.4.8 Perfect information game
聚焦理性玩家能向前看(完全信息)预测理性对手所为、据此理性回推以最大化期望收益的情形。
10.4.8 Perfect information game
Focus on the setting where a rational player can look ahead (with perfect information) to predict what rational opponents would do, then act rationally backward to maximize expected payoff.
Definition 10.35(Perfect information)& Definition 10.36(Backward induction strategy profile)& Theorem 10.9 Def 10.35:\(\Gamma\) 有完美信息当且仅当 \(\mathcal{I}(x)=\{x\}\)(\(\forall x\in D\))(Rmk 10.18:任何玩家对自己所处节点无混淆,可向前看)。Def 10.36(逆向归纳策略剖面):完美信息博弈中,\(s\in S\) 是逆向归纳策略剖面,当且仅当 \(\forall x\in D\),玩家 \(i=\iota(x)\) 在给定所有人(含自己)于 \(x\) 之后按 \(s\) 行动时,不能比在 \(x\) 处选 \(s_i(x)\) 做得更好(Rmk 10.19:逆向归纳策略剖面是纯策略纳什均衡,且完美信息博弈中总存在)。Thm 10.9:任何有限完美信息扩展型博弈中,每个逆向归纳策略剖面都是纳什均衡;但并非每个纳什均衡都是逆向归纳策略剖面(Rmk 10.20:逆向归纳策略剖面是纳什均衡的子集,即纳什均衡的"精炼")。Def 10.35: \(\Gamma\) has perfect information iff \(\mathcal{I}(x)=\{x\}\) (\(\forall x\in D\)) (Rmk 10.18: no confusion about which node, so look-ahead possible). Def 10.36 (Backward induction strategy profile): in a perfect-info game, \(s\in S\) is a backward induction strategy profile iff \(\forall x\in D\), player \(i=\iota(x)\) cannot do better than choosing \(s_i(x)\) at \(x\) given all players (incl. himself) follow \(s\) after \(x\) (Rmk 10.19: it is a pure strategy Nash equilibrium and always exists in perfect-info games). Thm 10.9: in any finite perfect-info extensive form game, every backward induction strategy profile is a Nash equilibrium, but not every Nash equilibrium is a backward induction strategy profile (Rmk 10.20: backward induction profiles are a subset — a "refinement" — of Nash equilibria).
Example 10.12(Looking ahead,已转述):两玩家。玩家 1 在 \(1a\) 选 Stop(S)/Continue(C),玩家 2 在 \(2a\) 选 S/C,玩家 1 在 \(1b\) 选 S/C;全程 C 终得 $(4,4)$,在 \(1a\) Stop \(\to(1,0)\)、\(2a\to(0,2)\)、\(1b\to(3,0)\)。完美信息逆向求解:\(1b\) 选 C、\(2a\) 选 C、\(1a\) 选 C,收益 $(4,4)$。
Example 10.13(Backward induction,已转述):两玩家。玩家 1 在 \(1a\) 选 L/R,玩家 2 在 \(2a/2b\) 选 L/R,玩家 1 在 \(1b/1c\) 选 L/R。逆向归纳(红箭头):\(1b\) 选 L、\(1c\) 选 R,玩家 2 在 \(2a/2b\) 选 L;\(1a\) 比较:选 L 得 $(1,3)\(、选 R 得 \)(2,1)\(,故玩家 1 选 R,结果 \)(2,1)$。蓝箭头亦为纳什均衡但非逆向归纳剖面、不捕捉向前看——若玩家 1 能让玩家 2 相信其在 \(1b\) 选 R 的(不可信)威胁,则蓝箭头成立;无向前看能力时玩家 1 宣布的任何策略被玩家 2 视为既定、必被相信。
10.4.9 Subgame
完美信息下向前看得逆向归纳剖面;无完美信息时借子博弈仍可。
Example 10.14(Backward induction without perfect information,已转述):两玩家。玩家 1 在 \(1a\) 选 Out(\(\to(1,2)\))或 In;In 后进入一个延续博弈(红框内、同时决策的策略型博弈):玩家 2 在 \(2a\) 选 L/R、玩家 1 在不知 \(1b/1c\) 下选 L/R。延续博弈收益矩阵如下,无纯策略 NE,双方严格混合。
Example 10.12 (Looking ahead, paraphrased): two players. Player 1 at \(1a\) Stop(S)/Continue(C), player 2 at \(2a\) S/C, player 1 at \(1b\) S/C; continuing throughout ends at $(4,4)$, Stop at \(1a\to(1,0)\), \(2a\to(0,2)\), \(1b\to(3,0)\). Solving backward: \(1b\) C, \(2a\) C, \(1a\) C, payoff $(4,4)$.
Example 10.13 (Backward induction, paraphrased): two players. Player 1 at \(1a\) L/R, player 2 at \(2a/2b\) L/R, player 1 at \(1b/1c\) L/R. Backward induction (red arrows): \(1b\) chooses L, \(1c\) chooses R, player 2 at \(2a/2b\) chooses L; at \(1a\), L gives $(1,3)\(, R gives \)(2,1)\(, so player 1 chooses R, result \)(2,1)$. The blue arrows are also a Nash equilibrium but not a backward induction profile and don't capture looking ahead — if player 1 can make player 2 believe his (incredible) threat to choose R at \(1b\), the blue arrows hold; without look-ahead, any strategy player 1 announces is taken as given by player 2 and must be trusted.
10.4.9 Subgame
Under perfect info, looking ahead gives the backward induction profile; without it, the subgame still allows this.
Example 10.14 (Backward induction without perfect information, paraphrased): two players. Player 1 at \(1a\) Out (\(\to(1,2)\)) or In; after In comes a continuation game (a simultaneous-move strategic game in the red box): player 2 at \(2a\) L/R, player 1 chooses L/R without knowing \(1b/1c\). The continuation payoff matrix is below; no pure NE, both strictly mix.
| \(1\backslash2\) | \(L\ (q)\) | \(R\ (1-q)\) |
|---|---|---|
| \(L\ (p)\) | $(0,0)$ | $(10,1)$ |
| \(R\ (1-p)\) | $(-2,1)$ | $(20,0)$ |
玩家 1 选 L 期望 \(10(1-q)\)、选 R 期望 \(-2q+20(1-q)=20-22q\),无差异 \(\Rightarrow q=\tfrac56\);玩家 2 选 L 期望 \(1-p\)、选 R 期望 \(p\),无差异 \(\Rightarrow p=\tfrac12\)。延续博弈中玩家 1 期望收益 \(\tfrac12\tfrac56\cdot0+\tfrac12\tfrac16\cdot10+\tfrac12\tfrac56(-2)+\tfrac12\tfrac16\cdot20=\tfrac53>1\),故 VNM 效用使玩家 1 在 \(1a\) 选 In。延续博弈可作为逆向归纳的一步,把向前看推广到不完全信息博弈,此延续博弈正式称为子博弈。
Player 1's L gives \(10(1-q)\), R gives \(-2q+20(1-q)=20-22q\), indifference \(\Rightarrow q=\tfrac56\); player 2's L gives \(1-p\), R gives \(p\), indifference \(\Rightarrow p=\tfrac12\). Player 1's expected continuation payoff is \(\tfrac12\tfrac56\cdot0+\tfrac12\tfrac16\cdot10+\tfrac12\tfrac56(-2)+\tfrac12\tfrac16\cdot20=\tfrac53>1\), so VNM utility makes player 1 choose In at \(1a\). The continuation game is a step in backward induction, generalizing look-ahead to imperfect-info games; it is formally a subgame.
Definition 10.37(Subgame)& Definition 10.38(Expected utility of subgame)
Def 10.37:节点 \(x\in X\setminus E\) 定义一个子博弈,当且仅当:要么 \(x=x_0\)(平凡子博弈);要么 \(x\in D\) 且 \(\mathcal{I}(x)=\{x\}\)(\(x\) 信息集为单点)且 \(\forall y,z\in D\),若 \(z\in\mathcal{I}(y)\) 且 \(y\ge x\),则 \(z\ge x\)。\(\Gamma_x\) 为由 \(x\) 定义的同一博弈(含 \(x\) 之后所有节点与信息结构、收益)。(图示:\(x\) 独立"站立"——一旦到达 \(x\),所有玩家都知到达了,且不经 \(x\) 不可达,故 \(x\) 是独立子博弈。Rmk 10.21:完美信息博弈中每个决策节点都定义子博弈。)
Def 10.38:改记 \(V_i(b|x_0)\equiv V_i(b)\)。对定义子博弈 \(\Gamma_x\) 的 \(x\),整局 \(b\in B\) 诱导出 \(\Gamma_x\) 的行为策略剖面,\(V_i(b|x)\) 为 \(i\) 在 \(\Gamma_x\) 的期望收益。Def 10.37: a node \(x\in X\setminus E\) defines a subgame iff: either \(x=x_0\) (trivial); or \(x\in D\) and \(\mathcal{I}(x)=\{x\}\) (\(x\)'s info set a singleton) and \(\forall y,z\in D\), if \(z\in\mathcal{I}(y)\) and \(y\ge x\) then \(z\ge x\). \(\Gamma_x\) is the same game defined by \(x\) (all nodes after \(x\) with their info structure and payoffs). (Graphically: \(x\) stands "alone" — once reached, all players know it, and it cannot be reached without passing \(x\), so \(x\) is an independent subgame. Rmk 10.21: in a perfect-info game every decision node defines a subgame.)
Def 10.38: rewrite \(V_i(b|x_0)\equiv V_i(b)\). For \(x\) defining a subgame \(\Gamma_x\), a whole-game \(b\in B\) induces a behavioral profile for \(\Gamma_x\), and \(V_i(b|x)\) is \(i\)'s expected payoff in \(\Gamma_x\).
10.4.10 Subgame perfect Nash equilibrium
10.4.10 Subgame perfect Nash equilibrium
Definition 10.39(Subgame perfect Nash equilibrium) \(b^\star\in B\) 是子博弈完美纳什均衡(SPNE),当且仅当 \(b^\star\) 在 \(\Gamma\) 的每个子博弈中都诱导出纳什均衡。求解法:1. 找出所有子博弈;2. 找不含其他子博弈的"最深子博弈";3. 求最深子博弈的(部分/全部)纳什均衡(可能多个,各自往下进行,可能导致整局不同均衡);4. 把最深子博弈各玩家期望收益当作末端收益,代入更高层子博弈;5. 迭代直至整局。(Rmk 10.22:自底向上求解,某种意义上是逆向归纳。)\(b^\star\in B\) is a subgame perfect Nash equilibrium (SPNE) iff \(b^\star\) induces a Nash equilibrium in every subgame of \(\Gamma\). Method: 1. identify all subgames; 2. find the "deepest" subgames containing no subgames themselves; 3. compute (some/all) Nash equilibria of the deepest subgames (possibly multiple, each proceeding downward, possibly leading to different whole-game equilibria); 4. treat the deepest subgames' expected payoffs as end payoffs and plug into a higher-level subgame; 5. iterate to the whole game. (Rmk 10.22: solve from bottom to top, in a sense backward induction.)
Example 10.15(Solve for SPNE,已转述):三玩家。玩家 3 选 In/Out(Out \(\to(0,0,0)\));In 后玩家 1 在 \(1a\) 选 L/M/R;选 L 玩家 2 知在 \(2a\),选 M/R 玩家 2 不知 \(2b/2c\);玩家 2 在 \(2b/2c\) 决策结束,在 \(2a\) 决策后玩家 1 在不知 \(1b/1c\) 下再选 L/R。节点 3、\(1a\)、\(2a\) 各定义子博弈,\(\Gamma_{2a}\) 最深。\(\Gamma_{2a}\) 有多个 NE(红箭头玩家 1 期望 1;蓝箭头玩家 1 期望 3)。取红箭头:考虑 \(\Gamma_{1a}\),玩家 2 在 \(2b/2c\) 混合 \((\tfrac12,\tfrac12)\) 使玩家 1 对 M,R 无差异(期望都 2>1),故玩家 1 在 \(1a\) 以 \((0,\tfrac12,\tfrac12)\) 混合 L,M,R;再考虑 \(\Gamma_3\),玩家 3 期望 0.5>0 故选 In。合成 SPNE。若取蓝箭头则玩家 1 期望 3>2,玩家 1 在 \(1a\) 以 $(1,0,0)$ 混合——可见最深层选不同 NE 致整局结果不同。
随后问:是否总存在 SPNE?非完美回忆博弈不一定,完美回忆博弈则有。Example 10.16(健忘玩家无 SPNE,已转述):玩家 1 健忘,在 \(1a\) 选 L/R 后忘记自己所选、不知在 \(1b/1c\)。纯策略 \((L,R),(R,L)\) 被 \((L,L)\) 或 \((R,R)\) 严格占优,混合策略可对其赋零概率而存在;但 \(\{1b,1c\}\) 处任何严格混合行为策略都会对 \((L,R),(R,L)\) 赋正概率,故无行为策略 NE,从而无 SPNE。Thm 10.10:任何有完美回忆的有限扩展型博弈至少有一个行为策略 SPNE。
Example 10.15 (Solve for SPNE, paraphrased): three players. Player 3 In/Out (Out \(\to(0,0,0)\)); after In, player 1 at \(1a\) L/M/R; if L, player 2 knows it's \(2a\), if M/R player 2 doesn't know \(2b/2c\); player 2's decision at \(2b/2c\) ends the game, after \(2a\) player 1 chooses L/R without knowing \(1b/1c\). Nodes 3, \(1a\), \(2a\) each define a subgame, \(\Gamma_{2a}\) deepest. \(\Gamma_{2a}\) has multiple NE (red arrows give player 1 payoff 1; blue arrows give 3). Take red: for \(\Gamma_{1a}\), player 2 mixes \((\tfrac12,\tfrac12)\) at \(2b/2c\) to make player 1 indifferent between M,R (both 2>1), so player 1 at \(1a\) mixes L,M,R by \((0,\tfrac12,\tfrac12)\); for \(\Gamma_3\), player 3's payoff 0.5>0 so chooses In. This gives a SPNE. With blue arrows player 1's payoff 3>2, so player 1 at \(1a\) mixes by $(1,0,0)$ — different NE at the deepest level give different whole-game results.
Then: does a SPNE always exist? Not necessarily for non-perfect-recall games, yes for perfect-recall. Example 10.16 (forgetful player, no SPNE, paraphrased): player 1 is forgetful — after choosing L/R at \(1a\) he forgets and doesn't know \(1b/1c\). The pure strategies \((L,R),(R,L)\) are strictly dominated by \((L,L)\) or \((R,R)\), so a mixed strategy can assign them zero probability and exist; but any strictly mixing behavioral strategy at \(\{1b,1c\}\) assigns positive probability to \((L,R),(R,L)\), so there is no behavioral NE and hence no SPNE. Thm 10.10: any finite extensive form game with perfect recall has at least one behavioral-strategy SPNE.
Example 10.17(SPNE ⇏ backward induction,已转述 / paraphrased) 玩家 1 顶端选 L/M/R:L\(\to(0,5)\),M\(\to2a\)、R\(\to2b\)(玩家 2 信息集 \(\{2a,2b\}\))。红箭头是纯策略 NE,且唯一子博弈是整局,故是 SPNE;但不对应逆向归纳剖面:玩家 2 在 \(\{2a,2b\}\) 以 \((\tfrac12,\tfrac12)\) 混合 L,R 严格占优选 M(期望 2 vs 1),使玩家 1 对 M,R 无差异;若玩家 1 混合 M,R\((\tfrac12,\tfrac12)\) 得期望 2>0,不选 L。故向前看不让玩家 2 以概率 1 选 M,SPNE 不对应逆向归纳剖面。Player 1 at top L/M/R: L\(\to(0,5)\), M\(\to2a\), R\(\to2b\) (player 2's info set \(\{2a,2b\}\)). The red arrows are a pure NE, and the only subgame is the whole game, so it is a SPNE; but it doesn't correspond to a backward induction profile: player 2 at \(\{2a,2b\}\) mixing L,R by \((\tfrac12,\tfrac12)\) strictly dominates M (payoff 2 vs 1), making player 1 indifferent between M,R; if player 1 mixes M,R by \((\tfrac12,\tfrac12)\) he gets 2>0, so won't choose L. So look-ahead won't let player 2 choose M with probability 1, and the SPNE doesn't correspond to a backward induction profile.
10.4.11 System of beliefs and sequentially rational
为定义序贯均衡,需更多定义。
10.4.11 System of beliefs and sequentially rational
To define sequential equilibrium, we need more definitions.
Definitions 10.40–10.45(beliefs, assessment, sequential rationality)
Def 10.40(信念系统):\(p:D\to[0,1]\) 使 \(\forall I\in\mathcal{I}_i\),\(\sum_{x\in I}p(x)=1\)。Def 10.41(评估):\((p,b)\) 为评估。Def 10.42(从 \(x\) 到达 \(e\) 的概率):\(\mathbb{P}(e|b,x)=b_{\iota(x)}(a_0,\mathcal{I}(x))\prod_{k=1}^K b_{\iota(a_{
10.4.12 Bayes' rule for an assessment
10.4.12 Bayes' rule for an assessment
Definitions 10.46–10.49(completely mixed & Bayes' rule)
Def 10.46(完全混合):\(b\) 完全混合当且仅当 \(\forall i,\forall I\in\mathcal{I}_i,\forall a\in A(I)\),\(b_i(a,I)>0\)(则每节点正概率被达)。Def 10.47(到达 \(x\) 的概率):\(x=(a_0,\dots,a_K)\) 时 \(\mathbb{P}(x|b)=\pi(a_0)\prod_{k=1}^K b_{\iota(a_{
注记 10.24–10.25 / Remarks 10.24–10.25 10.24:给定 \(b\),满足贝叶斯法则的评估 \((p,b)\) 由 (10.24) 唯一钉定。10.25:\(b\) 完全混合时贝叶斯法则约束每个节点/信息集;但 \(b\) 非完全混合时,可能有 0 概率被达的信息集,贝叶斯法则对其无约束,可任意赋信念——这正是 Example 10.18 之问题源。10.24: given \(b\), the assessment \((p,b)\) satisfying Bayes' rule is uniquely pinned by (10.24). 10.25: if \(b\) is completely mixed, Bayes' rule constrains every node/info set; but if not, some info set may have 0 probability of being reached, where Bayes' rule imposes no restriction and beliefs can be assigned arbitrarily — the source of the issue in Example 10.18.
10.4.13 Consistency and sequential equilibrium
10.4.13 Consistency and sequential equilibrium
Definitions 10.50–10.52(convergence, consistency, sequential equilibrium) Def 10.50(逐点收敛):\(b^n\to b\) 当且仅当 \(\forall i,I,a\),\(b^n_i(a,I)\to b_i(a,I)\);\(p^n\to p\) 当且仅当 \(\forall x\in D\),\(p^n(x)\to p(x)\)。Def 10.51(一致性):\((p,b)\) 一致当且仅当 \(\exists\{(p^n,b^n)\}\) 使 \(p^n\to p\)、\(b^n\to b\)、\(\forall n\) \(b^n\) 完全混合且 \((p^n,b^n)\) 满足贝叶斯法则。Def 10.52(序贯均衡):\((p,b)\) 序贯均衡当且仅当 \((p,b)\) 一致且序贯理性。Def 10.50 (Point-wise convergence): \(b^n\to b\) iff \(\forall i,I,a\), \(b^n_i(a,I)\to b_i(a,I)\); \(p^n\to p\) iff \(\forall x\in D\), \(p^n(x)\to p(x)\). Def 10.51 (Consistency): \((p,b)\) is consistent iff \(\exists\{(p^n,b^n)\}\) with \(p^n\to p\), \(b^n\to b\), every \(b^n\) completely mixed, and \((p^n,b^n)\) satisfying Bayes' rule. Def 10.52 (Sequential equilibrium): \((p,b)\) is a sequential equilibrium iff it is consistent and sequentially rational.
序贯均衡 \((p,b)\) 也满足贝叶斯法则(脚注 10.22:\(\mathbb{P}(I|b)>0\) 时由 \(b^n\to b\) 得 \(p^n(x)=\tfrac{\mathbb{P}(x|b^n)}{\mathbb{P}(I|b^n)}\to\tfrac{\mathbb{P}(x|b)}{\mathbb{P}(I|b)}\)),但一致性还对 \(\mathbb{P}(I|b)=0\) 的信息集施加限制,使 \((p,b)\) 对 0 概率节点也赋"正确"信念。故 Def 10.52 间接含贝叶斯法则,但强于"序贯理性+贝叶斯法则",因贝叶斯法则本身不管 0 概率被达之事。
Example 10.18(序贯理性+贝叶斯法则 ⇏ SPNE,已转述):三玩家博弈。红箭头为纯策略 NE,其下信息集 \(\{2\}\) 与 \(\{3a,3b\}\) 都 0 概率被达,故平凡满足贝叶斯法则。该 NE 下 \((p,b)\) 序贯理性:\(p(3a)=1,p(3b)=0\),因相关末端节点 0 概率,\(p\) 值不影响期望收益,故任何信息集都无单边偏离获益。但非 SPNE:节点 2 定义的子博弈中,玩家 3 可由混合 L,R\((\tfrac12,\tfrac12)\) 得 0 而非 $-1$。即 \(p\) 在 0 概率信息集上太"任意"。一致性则规定:取序列 \(b^n\)(\(\varepsilon^n\to0\)),由贝叶斯法则得 \(p^n(3a)=\varepsilon^n_2\)、\(p^n(3b)=1-\varepsilon^n_2\),极限 \(p(3a)=0,p(3b)=1\) 为"正确"信念,使红箭头不再序贯理性,序贯理性遂蕴含 SPNE。
A sequential equilibrium \((p,b)\) also satisfies Bayes' rule (footnote 10.22: when \(\mathbb{P}(I|b)>0\), \(b^n\to b\) gives \(p^n(x)=\tfrac{\mathbb{P}(x|b^n)}{\mathbb{P}(I|b^n)}\to\tfrac{\mathbb{P}(x|b)}{\mathbb{P}(I|b)}\)), but consistency further restricts info sets with \(\mathbb{P}(I|b)=0\), making \((p,b)\) assign "right" beliefs to 0-probability nodes too. So Def 10.52 indirectly incorporates Bayes' rule but is stronger than "sequentially rational + Bayes' rule", since Bayes' rule alone ignores 0-probability of being reached.
Example 10.18 (Sequentially rational + Bayes' rule ⇏ SPNE, paraphrased): a three-player game. The red arrows are a pure NE under which info sets \(\{2\}\) and \(\{3a,3b\}\) are both reached with 0 probability, so Bayes' rule is trivially satisfied. \((p,b)\) is sequentially rational: \(p(3a)=1,p(3b)=0\), and since the relevant end nodes have 0 probability, \(p\)'s value doesn't affect expected payoffs, so no info set has a beneficial unilateral deviation. But it is not a SPNE: in the subgame at node 2, player 3 can get 0 instead of $-1$ by mixing L,R by \((\tfrac12,\tfrac12)\). So \(p\) is too "arbitrary" on 0-probability info sets. Consistency fixes this: a sequence \(b^n\) (with \(\varepsilon^n\to0\)) gives by Bayes' rule \(p^n(3a)=\varepsilon^n_2\), \(p^n(3b)=1-\varepsilon^n_2\), with limit \(p(3a)=0,p(3b)=1\) the "right" belief, so the red arrows are no longer sequentially rational and sequential rationality now implies SPNE.
三组关系 / Three relations & Theorems 10.11–10.12
1. 序贯理性+贝叶斯法则 ⇏ 向前看能力(逆向归纳剖面);有时 ⇒ SPNE(Ex 10.17),有时 ⇏ SPNE(Ex 10.18)。2. 序贯理性+贝叶斯法则+完全混合 ⇒ 向前看能力 ⇒ SPNE。3. 序贯理性+一致性 ⇒ 序贯均衡 ⇒ 向前看能力 ⇒ SPNE。
Thm 10.11:若 \((p,b)\) 是 \(\Gamma\) 的序贯均衡,则 \(b\) 是 \(\Gamma\) 的 SPNE;若 \(\Gamma\) 完美信息且 \(b\) 只含纯策略,则 \(b\) 是逆向归纳剖面。Thm 10.12:有完美回忆的有限博弈 \(\Gamma\) 至少有一个序贯均衡。1. Sequentially rational + Bayes' rule ⇏ look-ahead capability (backward induction profile); sometimes ⇒ SPNE (Ex 10.17), sometimes ⇏ SPNE (Ex 10.18). 2. Sequentially rational + Bayes' rule + completely mixing ⇒ look-ahead ⇒ SPNE. 3. Sequentially rational + consistency ⇒ sequential equilibrium ⇒ look-ahead ⇒ SPNE.
Thm 10.11: if \((p,b)\) is a sequential equilibrium of \(\Gamma\), then \(b\) is a SPNE; if \(\Gamma\) has perfect information and \(b\) contains only pure strategies, \(b\) is a backward induction profile. Thm 10.12: a finite game \(\Gamma\) with perfect recall has at least one sequential equilibrium.
10.4.14 Example of solving for a sequential equilibrium
Example 10.19(已转述):博弈树(玩家 1→2→{4,3})。玩家 1 在节点 1 选 Out(\(\to(1,1,1,1)\))或 L(信念 \(\alpha\))/R(信念 \(1-\alpha\))到玩家 2 信息集 \(\{2a,2b\}\);玩家 2 选 \(l\) 或 \(r\),分别通往玩家 4 信息集 \(\{4a,4b\}\)(信念 \(\gamma,1-\gamma\),选 C/D)与玩家 3 信息集 \(\{3a,3b\}\)(信念 \(\beta,1-\beta\),选 A/B)。基本分析(自底向上):玩家 3 在 \(3b\) 选 A、\(3a\) 选 B,\(\beta\) 高选 B、低选 A、\(\beta=\tfrac23\) 无差异;玩家 4 在 \(4a\) 选 C、\(4b\) 选 D,\(\gamma\) 高选 C、低选 D、\(\gamma=\tfrac13\) 无差异。
(1) SPNE:唯一子博弈即整局。最简情形玩家 1 选 Out:需玩家 2 选 \(r\)(则玩家 1 选 \(r\) 后得 0<1 之外的保障,实则若玩家 2 选 \(r\) 则玩家 1 选 Out 优),故 SPNE (10.25):玩家 1 Out、玩家 2 \(r\)、玩家 3 anything、玩家 4 anything。
(2) 序贯理性评估 \((p,b)\):选 \(p\)(此处不必满足贝叶斯法则)使玩家 2 选 \(r\)。玩家 4 选 C 时玩家 2 选 \(l\) 收益 3、选 D 时 1;玩家 3 选 A 时玩家 2 选 \(r\) 收益 0、选 B 时 2。故需玩家 4 选 D(\(\gamma\le\tfrac13\))、玩家 3 选 B(\(\beta\ge\tfrac23\))。序贯理性评估 (10.26):\(p\):玩家 2 任意 \(\alpha\in[0,1]\)、玩家 3 \(\beta\ge\tfrac23\)、玩家 4 \(\gamma\le\tfrac13\);\(b\):玩家 1 Out、玩家 2 \(r\)、玩家 3 B、玩家 4 D。
(3) 序贯均衡 \((p,b)\):需一致。沿完全混合序列 \(b^n\),贝叶斯法则给 \(\alpha^n=\tfrac{p^n_L}{p^n_L+p^n_R}\)、\(\beta^n=\tfrac{p^n_L p^n_r}{p^n_L p^n_r+p^n_R p^n_r}=\tfrac{p^n_L}{p^n_L+p^n_R}\)、\(\gamma^n=\tfrac{p^n_L p^n_l}{p^n_L p^n_l+p^n_R p^n_l}=\tfrac{p^n_L}{p^n_L+p^n_R}\),故 \(\alpha^n=\beta^n=\gamma^n\)(直觉:2,3,4 都认为唯一混淆源是玩家 1 的选择,故对其猜测一致——唯一客观世界)。极限 \(\alpha=\beta=\gamma\)。但 (10.26) 要求 \(\beta\ge\tfrac23\) 且 \(\gamma\le\tfrac13\),不可能,故 (10.26) 不一致、非序贯均衡。
在 \(\alpha=\beta=\gamma\) 下考虑玩家 3,4:无论 \(\alpha=\beta=\gamma\) 取何值(\(\le\tfrac13\) 时 3 选 A、4 选 D;\((\tfrac13,\tfrac23]\) 时 3 选 A、4 选 C;\(>\tfrac23\) 时 3 选 B、4 选 C),玩家 2 都选 \(l\)。考虑玩家 1:Out 得 1;In 至少得 2(玩家 1 可混合 L,R\((\tfrac12,\tfrac12)\),玩家 2 必选 \(l\),玩家 4 选 C 得 1、D 得 0.5 故选 C,玩家 1 期望 2)。故排除 Out。玩家 1 在 L,R 间选,玩家 2 恒选 \(l\),博弈化为玩家 1 与 4 的策略型博弈:
10.4.14 Example of solving for a sequential equilibrium
Example 10.19 (paraphrased): game tree (player 1→2→{4,3}). Player 1 at node 1 chooses Out (\(\to(1,1,1,1)\)) or L (belief \(\alpha\))/R (belief \(1-\alpha\)) to player 2's info set \(\{2a,2b\}\); player 2 plays \(l\) or \(r\), leading to player 4's info set \(\{4a,4b\}\) (beliefs \(\gamma,1-\gamma\), choose C/D) and player 3's info set \(\{3a,3b\}\) (beliefs \(\beta,1-\beta\), choose A/B). Basic analysis (bottom-up): player 3 chooses A at \(3b\), B at \(3a\), with B if \(\beta\) high, A if low, indifferent at \(\beta=\tfrac23\); player 4 chooses C at \(4a\), D at \(4b\), C if \(\gamma\) high, D if low, indifferent at \(\gamma=\tfrac13\).
(1) SPNE: the only subgame is the whole game. The simplest case is player 1 choosing Out: we need player 2 to choose \(r\) (then player 1 prefers Out), so the SPNE (10.25): player 1 Out, player 2 \(r\), player 3 anything, player 4 anything.
(2) Sequentially rational assessment \((p,b)\): choose \(p\) (need not satisfy Bayes' rule here) so player 2 chooses \(r\). With player 4 choosing C, player 2's \(l\) gives 3, with D it gives 1; with player 3 choosing A, player 2's \(r\) gives 0, with B it gives 2. So we need player 4 to choose D (\(\gamma\le\tfrac13\)) and player 3 to choose B (\(\beta\ge\tfrac23\)). The assessment (10.26): \(p\): player 2 any \(\alpha\in[0,1]\), player 3 \(\beta\ge\tfrac23\), player 4 \(\gamma\le\tfrac13\); \(b\): player 1 Out, player 2 \(r\), player 3 B, player 4 D.
(3) Sequential equilibrium \((p,b)\): need consistency. Along a completely mixed sequence \(b^n\), Bayes' rule gives \(\alpha^n=\tfrac{p^n_L}{p^n_L+p^n_R}\), \(\beta^n=\tfrac{p^n_L p^n_r}{p^n_L p^n_r+p^n_R p^n_r}=\tfrac{p^n_L}{p^n_L+p^n_R}\), \(\gamma^n=\tfrac{p^n_L p^n_l}{p^n_L p^n_l+p^n_R p^n_l}=\tfrac{p^n_L}{p^n_L+p^n_R}\), so \(\alpha^n=\beta^n=\gamma^n\) (intuition: 2,3,4 all see the only source of confusion as player 1's choice, so their conjectures agree — one objective world). In the limit \(\alpha=\beta=\gamma\). But (10.26) requires \(\beta\ge\tfrac23\) and \(\gamma\le\tfrac13\), impossible, so (10.26) is not consistent and not a sequential equilibrium.
Under \(\alpha=\beta=\gamma\), consider players 3,4: whatever the value of \(\alpha=\beta=\gamma\) (\(\le\tfrac13\): 3 chooses A, 4 chooses D; \((\tfrac13,\tfrac23]\): 3 A, 4 C; \(>\tfrac23\): 3 B, 4 C), player 2 always chooses \(l\). Consider player 1: Out gives 1; In gives at least 2 (player 1 can mix L,R by \((\tfrac12,\tfrac12)\), player 2 always \(l\), player 4 chooses C since C gives 1 vs D gives 0.5, player 1 expects 2). So rule out Out. Player 1 chooses between L,R, player 2 always \(l\), and the game becomes a strategic form game between players 1 and 4:
| \(1\backslash4\) | \(C\ (p)\) | \(D\ (1-p)\) |
|---|---|---|
| \(L\ (q)\) | $(0,2)$ | $(4,0)$ |
| \(R\ (1-q)\) | $(4,0)$ | $(0,1)$ |
无纯策略 NE。混合 NE:玩家 1 混合 L,R\((q,1-q)\) 使玩家 4 无差异——4 选 C 期望 \(2q\)、选 D 期望 \(1-q\),\(2q=1-q\Rightarrow q=\tfrac13\);玩家 4 混合 C,D\((p,1-p)\) 使玩家 1 无差异——1 选 L 期望 \(4(1-p)\)、选 R 期望 \(4p\),\(4-4p=4p\Rightarrow p=\tfrac12\)。故玩家 1 混合 \((\tfrac13,\tfrac23)\)、玩家 4 混合 \((\tfrac12,\tfrac12)\),并钉定 \(\alpha=\beta=\gamma=\tfrac13\)(\(\beta=\tfrac13<\tfrac23\) 故玩家 3 选 A)。
唯一序贯均衡:\(p\):玩家 2 \(\alpha=\tfrac13\)、玩家 3 \(\beta=\tfrac13\)、玩家 4 \(\gamma=\tfrac13\);\(b\):玩家 1 混合 L,R\((\tfrac13,\tfrac23)\)、玩家 2 \(l\)、玩家 3 A、玩家 4 混合 C,D\((\tfrac12,\tfrac12)\)。(\(\alpha=\gamma=\tfrac13\) 来自 \((p,b)\) 处的贝叶斯法则;\(\beta=\tfrac13\) 来自一致性,因 \((p,b)\) 处 0 概率到达 \(\{3a,3b\}\),贝叶斯法则对 \(\beta\) 无言。)Rmk 10.28:全程自底向上、由 SPNE→序贯理性评估→施加一致性,把复杂博弈约简为两人策略型博弈求解;求解过程无分支,故序贯均衡唯一。至此 Part II 完成(Ch 5–10)。
No pure NE. Mixed NE: player 1 mixes L,R by \((q,1-q)\) to make player 4 indifferent — 4's C gives \(2q\), D gives \(1-q\), \(2q=1-q\Rightarrow q=\tfrac13\); player 4 mixes C,D by \((p,1-p)\) to make player 1 indifferent — 1's L gives \(4(1-p)\), R gives \(4p\), \(4-4p=4p\Rightarrow p=\tfrac12\). So player 1 mixes \((\tfrac13,\tfrac23)\), player 4 mixes \((\tfrac12,\tfrac12)\), pinning \(\alpha=\beta=\gamma=\tfrac13\) (\(\beta=\tfrac13<\tfrac23\) so player 3 chooses A).
The unique sequential equilibrium: \(p\): player 2 \(\alpha=\tfrac13\), player 3 \(\beta=\tfrac13\), player 4 \(\gamma=\tfrac13\); \(b\): player 1 mixes L,R by \((\tfrac13,\tfrac23)\), player 2 \(l\), player 3 A, player 4 mixes C,D by \((\tfrac12,\tfrac12)\). (\(\alpha=\gamma=\tfrac13\) comes from Bayes' rule at \((p,b)\); \(\beta=\tfrac13\) comes from consistency, since \((p,b)\) reaches \(\{3a,3b\}\) with 0 probability and Bayes' rule says nothing at \(\beta\).) Rmk 10.28: throughout, bottom-up from SPNE → sequentially rational assessment → imposing consistency, reducing the complicated game to a two-player strategic form game; the solution path has no branches, so the sequential equilibrium is unique. This completes Part II (Ch 5–10).