5. Behavioral Incentives

5. Behavioral Incentives

Note

本章导读 本章用实验证据比较各种行为激励的相对重要性。§5.1 DellaVigna & Pope (2018) 的相对重要性实验:MTurk 按键任务(10 分钟内交替按 a、b),18 种补偿表述归为 9 类(标准货币激励、其他计件、慈善捐赠、概率加权、参考点依赖、现时偏误、礼物交换、任务意义、社会规范);结论(图 5.1):标准货币激励有效、小额外在激励不挤出内在动机、无小概率高估证据、利他更像"暖光"、有(弱)现时偏误与损失厌恶、社会规范作用大;并有 208 名专家预测实验(许多专家预测错误;DellaVigna-Pope 2018a 群体智慧)。§5.2 Ariely et al. (2008) 意义对努力的影响:找字母"s"实验(被认可 / 被忽视 / 被销毁三组,递减报酬),有意义感者供给更多劳动(图 5.2);Bionicle 搭建实验(有意义组 vs 西西弗斯组)。§5.3 发展经济学中的行为激励:5.3.1 公共服务激励田野实验 Ashraf et al. (2014)(赞比亚发型师卖避孕套,非财务激励强于财务激励,图 5.3);5.3.2 贫穷与认知能力 Mani et al. (2013)(认知贫困陷阱,商场假想情景 + 收获前后农民,财务稀缺损害认知表现,图 5.4–5.5)。图 5.1–5.5 均已转述。

5. Behavioral Incentives

Note

Overview This chapter uses experimental evidence to compare the relative importance of various behavioral incentives. §5.1 DellaVigna & Pope (2018)'s relative-importance experiment: an MTurk button-pressing task (alternately press a, b for 10 minutes), with 18 compensation statements grouped into 9 categories (standard monetary incentives, other piece rates, charitable giving, probability weighting, reference dependence, present bias, gift exchange, task significance, social norms); conclusions (Figure 5.1): standard monetary incentives work, small extrinsic incentives don't crowd out intrinsic motivation, no small-probability overweighting, altruism is more like "warm glow", there is (weak) present bias and loss aversion, social norms matter a lot; plus a 208-expert prediction experiment (many experts predict wrongly; DellaVigna-Pope 2018a wisdom of crowds). §5.2 Ariely et al. (2008) on the effect of meaning on effort: a find-the-letter-"s" experiment (Acknowledged / Ignored / Shredded groups, decreasing pay), where those with a sense of meaning supply more labor (Figure 5.2); plus a Bionicle-building experiment (Meaningful vs Sisyphus). §5.3 behavioral incentives in development economics: 5.3.1 a public-service-incentives field experiment Ashraf et al. (2014) (Zambian hairstylists selling condoms, non-financial incentives stronger than financial, Figure 5.3); 5.3.2 poverty and cognitive inability Mani et al. (2013) (cognitive poverty traps, hypothetical mall scenarios + pre/post-harvest farmers, financial scarcity harms cognitive performance, Figures 5.4–5.5). Figures 5.1–5.5 are all paraphrased.

5.1 行为激励相对重要性的实验室实验:DellaVigna 和 Pope (2018) / Lab Experiment on the Relative Importance of Behavioral Incentives

DellaVigna & Pope (2018b) 用一个预注册的 MTurk 实验(脚注:MTurk 即 Amazon Mechanical Turk,一个众包网站,企业可在此付费雇佣众包工人完成某些"人类智能任务")研究若干行为因素作为处理时的相对重要性。

5.1 Lab Experiment on the Relative Importance of Behavioral Incentives: DellaVigna and Pope (2018)

DellaVigna & Pope (2018b) use a pre-registered MTurk experiment (footnote: MTurk stands for Amazon Mechanical Turk, a crowd-sourcing website where businesses can hire remotely located crowdworkers to complete certain "human intelligence tasks" with compensation) to study the relative importance of several behavioral factors as treatments.

Important

实验设计 / Experiment design 实验运行:2015 年 5 月运行三周,在 AEA-RCT 注册库预注册、并带有递增样本量的停止规则。注册的停止规则是:在两周后、或招满 10,000 名参与者时停止,以先到者为准。作者误以为注册的是三周数据收集期,故实际跑了三周;他们声称这对结果只有微小影响、并未选择性地停止收集数据。任务:在 10 分钟内交替按 a、b 两个按钮;每成功在 a 与 b 之间交替一次记 1 分。设计理由:(1) 研究者希望结果是可分的 (divisible)——不同处理组的平均结果应彼此相距较远,故按键成千上万次可产生清晰差异;(2) 结果应对不同激励有弹性 (elastic)——10 分钟足够长、会疲劳,从而差异化激励能让主体行为不同;(3) 该在线实验易于实现。The experiment ran for three weeks in May 2015, pre-registered on the AEA-RCT Registry with a stopping rule of increasing sample size. The registered stopping rule was to stop after two weeks or at the point of recruiting 10,000 participants, whichever comes first. The authors incorrectly thought they had registered for a three-week data-collection period, so they ran for three weeks; they claimed this had only a minor effect on the result and that they didn't selectively stop collecting data. The task is about pressing a-b buttons alternately for ten minutes; each successful alternation between button a and button b counts as one point. Reasons to justify the design: (1) researchers want the outcome to be divisible — different treatment groups should have average outcomes fairly far from each other, so pressing buttons thousands of times generates clear differences; (2) the outcome should be elastic (responsive) to different incentives — the 10-minute interval is long enough for tiredness, so differential incentives can make agents behave differently; (3) the online experiment is implementable and easy.

Note

18 种补偿表述(9 类)/ The 18 compensation statements (9 categories) 处理组与基准组被给予归入以下 9 类的 18 种不同补偿规则表述(引自 DellaVigna & Pope 2018b):1. 基准处理(标准激励):(a) 你的分数不影响报酬;(b) 作为奖励,每 100 分额外付 1 美分;(c) 作为奖励,每 100 分额外付 10 美分。2. 其他计件:(a) 每 100 分额外付 4 美分;(b) 每 1,000 分额外付 1 美分。3. 慈善捐赠:(a) 每 100 分给红十字会 1 美分;(b) 每 100 分给红十字会 10 美分。4. 概率(盈利)加权:(a) 每 100 分有 1% 概率额外得 USD 1(每 100 名参与者随机抽 1 人付此奖励);(b) 每 100 分有 50% 概率额外得 2 美分(每 2 人随机抽 1 人)。5. 参考点依赖:(a) 若得分至少 2,000,额外付 40 美分;(b) 额外付 40 美分,但若得分不足 2,000 则失去该奖励(不计入账户);(c) 若得分至少 2,000,额外付 80 美分。6. 现时偏误:(a) 每 100 分额外付 1 美分,两周后入账;(b) 每 100 分额外付 1 美分,四周后入账。7. 礼物交换:(a) 为感谢你完成此任务,付你 40 美分奖励,你的分数不影响报酬。8. 任务意义:(a) 分数不影响报酬,我们关心你按键多快、希望你尽全力。9. 社会规范与比较:(a) 分数不影响报酬,玩后将告诉你相对于此前参与者表现如何;(b) 分数不影响报酬,上一版本中许多人能得 2,000 分以上。The treatment and benchmark groups are given 18 different statements of compensation rules falling into the following 9 categories (quoted from DellaVigna & Pope 2018b): 1. Benchmark treatment (standard incentives): (a) your score will not affect your payment; (b) as a bonus, you will be paid an extra 1 cent for every 100 points; (c) an extra 10 cents for every 100 points. 2. Other piece-rate: (a) an extra 4 cents for every 100 points; (b) an extra 1 cent for every 1,000 points. 3. Charitable giving: (a) the Red Cross fund will be given 1 cent for every 100 points; (b) given 10 cents for every 100 points. 4. Profitability (probability) weighting: (a) a 1% chance of being paid an extra USD 1 for every 100 points (one out of every 100 participants randomly chosen to be paid); (b) a 50% chance of being paid an extra 2 cents for every 100 points (one out of two randomly chosen). 5. Reference dependence: (a) an extra 40 cents if you score at least 2,000 points; (b) an extra 40 cents, however you will lose this bonus (not placed in your account) unless you score at least 2,000 points; (c) an extra 80 cents if you score at least 2,000 points. 6. Present bias: (a) an extra 1 cent for every 100 points, paid to your account two weeks from today; (b) an extra 1 cent for every 100 points, paid four weeks from today. 7. Gift exchange: (a) in appreciation for performing this task, you will be paid a bonus of 40 cents; your score will not affect your payment. 8. Task significance: (a) your score will not affect your payment; we are interested in how fast people press digits and would like you to do your best. 9. Social norms and comparisons: (a) your score will not affect your payment; after you play we will show you how well you did relative to other participants; (b) your score will not affect your payment; in a previous version many participants scored more than 2,000 points.

Important

结果(图 5.1)/ Results (Figure 5.1) 在 18 个处理组上比较平均按键数(带置信区间):1(a)、1(b)、1(c) 表明标准货币激励如预期般有效;1(a)、1(b) 表明小额外在货币补偿不挤出内在动机;4(a)、4(b) 表明没有小概率高估的证据;1(a)、1(b)3(a)、3(b) 相比差异更大、努力更高——主体更在意自身,对慈善事业更关注"自己投入多少努力"而非慈善的最终结果,这支持暖光模型 (4.2) 而非不纯利他 (4.1);6(a)、6(b) 表明(证据不强)主体有现时偏误,即今天支付的奖励比未来支付同额奖励效应更强;6(a)、6(b) 也表明(证据不强)人们有损失厌恶;1(a)、9(a)、9(b) 表明社会规范即使在没有货币刺激时也对行为有很大作用。Comparing the average button presses (with confidence intervals) across the 18 treatment groups: 1(a), 1(b), 1(c) show standard monetary incentives work well as expected; 1(a), 1(b) show small monetary extrinsic compensation has no crowd-out effect on intrinsic motivations; 4(a), 4(b) show there is no evidence for small-probability overweighting; 1(a), 1(b) have a much larger difference and higher effort than 3(a), 3(b) — agents care more about themselves, and for charitable causes they focus more on how much effort they contribute than on the real outcome of the final charity, which supports the warm-glow model (4.2) as opposed to impure altruism (4.1); 6(a), 6(b) show (although not very strong evidence) agents are present biased, i.e. a bonus paid today has a stronger effect than the same bonus paid in the future; 6(a), 6(b) also show (not very strong evidence) people have loss aversion; 1(a), 9(a), 9(b) show social norms play a big role in affecting behavior even without monetary stimulus.

Tip

专家预测实验 / Expert prediction experiment 作者另邀 208 名学术专家预测上述结果。许多专家预测错误,错误地预期:存在挤出效应(1(b) 小于 1(a))、存在小概率高估(4(a) 大于 1(b))、存在纯利他(3(b) 大于 3(a))。后续 DellaVigna & Pope (2018a) 沿"专家预测"思路展开,发现群体智慧效应:平均预测比 96% 的个体预测更准确;引用数、学术职级、研究领域、情境经验都不影响预测准确度;专家作为一个群体在预测实验结果上优于非专家。The authors also invite 208 academic experts to predict the results above. Many experts predict wrongly, incorrectly expecting: a crowd-out effect (1(b) smaller than 1(a)); small-probability overweighting (4(a) greater than 1(b)); pure altruism (3(b) greater than 3(a)). Later, DellaVigna & Pope (2018a) follow up on the expert-prediction idea and find a wisdom-of-crowds effect: the average forecast is more accurate than 96% of the individual forecasts; citation, academic rank, field of expertise, and contextual experience don't affect prediction accuracy; experts as a group outperform non-experts in forecasting experiment results.

5.2 意义对努力的影响的实验室实验:Ariely et al. (2008) / Lab Experiment on the Effect of Meaning on Effort

5.2 Lab Experiment on the Effect of Meaning on Effort: Ariely et al. (2008)

Important

Ariely et al. (2008):意义即激励 / meaning as an incentive Ariely et al. (2008) 用实验室实验研究"寻找意义"作为努力的激励。找字母实验:104 名 MIT 学生随机分到三个处理组,要求从一张纸上找出连续的字母"s"。三组:被认可组 (Acknowledged)——做第一张前先在纸上写下自己的名字,并被告知纸会放入一个文件夹;被忽视组 (Ignored)——不被要求写名字(也没人写),被告知纸会被放进一摞高高的纸堆;被销毁组 (Shredded)——被告知完成的纸在完成后立即被销毁、不被检查。递减报酬:第一张 USD 0.55、第二张 USD 0.50、第三张 USD 0.45,依此类推。结果(图 5.2):被认可组平均保留工资 14.85 美分(9.03 张)、被忽视组 26.14 美分(6.77 张)、被销毁组 28.29 美分(6.34 张)。即有"做有意义工作"之感的主体实际供给更多劳动。Ariely et al. (2008) 还做了另一个搭建 Bionicle(乐高拼搭玩具)的实验,思路与找字母实验相同、结果也相似:40 名哈佛男本科生,第一个完成的 Bionicle 报酬 USD 2.00、之后每个比前一个少 11 美分;两组——有意义组:完成的 Bionicle 被放在被试桌上;西西弗斯组 (Sisyphus):完成的 Bionicle 被拆解以备后续任务。有意义组平均完成 10.6 个,西西弗斯组平均完成 7.2 个。Ariely et al. (2008) use a lab experiment to study meaning-seeking as an incentive for efforts. The find-the-letter experiment: 104 MIT students are randomly assigned to three treatment groups and required to find consecutive letter "s" from one sheet of paper. Three groups: Acknowledged — the subjects write their name on the sheet before doing the job on the first piece, and are told the paper will be put into a folder; Ignored — the subjects are not asked to write their names (and none did), and are told the paper will be put into a high stack of papers; Shredded — subjects are told their completed sheets are shredded immediately upon completion without being examined. The decreasing compensation: the first sheet gets USD 0.55, the second USD 0.50, the third USD 0.45, and so on. Results (Figure 5.2): the Acknowledged group has an average reservation wage of 14.85 cents (9.03 sheets), the Ignored group 26.14 cents (6.77 sheets), and the Shredded group 28.29 cents (6.34 sheets). So agents with a sense of doing a meaningful job actually supply more labor. Ariely et al. (2008) also conduct another experiment of building Bionicles (Lego construction toys), which has the same idea as the first experiment and similar results: 40 male undergraduates at Harvard, with compensation of USD 2.00 for the first completed Bionicle and each additional Bionicle 11 cents less than the previous one; two groups — Meaningful: the completed Bionicle is placed on the subject's desk; Sisyphus: the completed Bionicle is disassembled for future tasks. The Meaningful group complete 10.6 Bionicles on average, while the Sisyphus group complete 7.2 on average.

Note

图 5.2(三组完成张数,已转述 / Figure 5.2, paraphrased) 三幅并列直方图,横轴为"完成张数"(1–12)、纵轴为"被试人数",分别对应 (a) 被认可组、(b) 被忽视组、(c) 被销毁组。被认可组的完成张数分布明显右移(更多被试完成更多张、平均约 9 张),被忽视组与被销毁组分布更靠左(平均约 6–7 张)——直观显示意义感提升了劳动供给。Three side-by-side histograms, the horizontal axis "number of sheets" (1–12) and the vertical axis "number of subjects", for (a) Acknowledged, (b) Ignored, (c) Shredded groups. The Acknowledged group's distribution is clearly shifted right (more subjects completing more sheets, averaging about 9), while the Ignored and Shredded groups are shifted left (averaging about 6–7) — visually showing that a sense of meaning raises labor supply.

5.3 发展经济学中的行为激励 / Behavioral Incentives in Development Economics

发展经济学关注改善发展中国家的经济状况,尤其关心能促进发展的最具成本效益的政策。带行为视角的发展经济学(行为发展经济学)有更广的理论范围,可解释发展中国家人们行为里的某些谜题。

5.3 Behavioral Incentives in Development Economics

Development economics focuses on improving the economic conditions in developing countries, and in particular cares about the most cost-efficient policy that facilitates development. Development economics with behavioral perspectives (behavioral development economics) has a broader scope of theories to explain certain puzzles in people's behaviors in developing countries.

Important

5.3.1 公共服务激励的田野实验:Ashraf et al. (2014) / field experiment on incentives for public service delivery Ashraf et al. (2014) 于 2009 年 12 月至 2010 年 12 月在赞比亚做田野实验,研究财务与非财务外在激励对公共服务提供的影响。设计:与当地公共健康组织 Society for Family Health (SFH) 合作,把实验嵌入 SFH 的 HIV 预防项目;771 名发型师为受试者,参加培训项目、并不知自己处于实验处理中;发型师被要求向顾客宣传 HIV 预防并销售女用避孕套。三种随机处理:大财务边际——获零售价 90% 的边际;小财务边际——获 10% 的边际;非财务——不获边际,但在一个温度计式展示牌上多得一颗星,显示其对社区健康的贡献;控制组——志愿者,既无财务也无非财务激励。结果(图 5.3)非财务激励强于财务激励。这意味着若设计得当,公共服务提供可由非财务奖励来激励。Ashraf et al. (2014) conduct a field experiment in Zambia from December 2009 to December 2010 to study the effect of both financial and non-financial extrinsic incentives on public-services providing. Design: they collaborate with a local public health organization called Society for Family Health (SFH) and embed the experiment in SFH's HIV prevention program; 771 hairstylists are the subjects, who participate in the training program and are unaware that they are exposed to treatment in an experiment; hairstylists are asked to inform customers about HIV prevention and sell female condoms. Three random treatments: Large financial-margin — receive a 90% margin of the retail price of the condom; Small financial-margin — receive a 10% margin; Non-financial — receive no margin, but earn one more star on a thermometer display showing their contribution to community health; Control — volunteers who receive neither financial nor non-financial incentives. Results (Figure 5.3): the non-financial incentives are stronger than the financial incentives. This implies that public-service delivery can be motivated by non-financial rewards if designed properly.

Note

图 5.3(各组平均年销量,已转述 / Figure 5.3, paraphrased) 柱状图(带 95% 置信区间),横轴为四组"大财务 / 小财务 / 志愿者 / 星星(非财务)",纵轴"平均年避孕套销量"。"星星"(非财务)组的销量明显最高(约 15),其余三组(含两种财务激励)相近且较低(约 6–7)——非财务激励的效果远超财务激励。A bar chart (with 95% confidence intervals), the horizontal axis being the four groups "Large financial / Small financial / Volunteer / Stars (non-financial)", the vertical axis "average yearly condom sales". The "Stars" (non-financial) group's sales are clearly the highest (about 15), while the other three groups (including the two financial incentives) are similar and lower (about 6–7) — the non-financial incentive far outperforms the financial incentives.

Important

5.3.2 贫穷与认知能力:Mani et al. (2013) / poverty and cognitive inability Mani et al. (2013) 通过检验"贫穷损害认知功能"这一假说,研究认知贫困陷阱(脚注:贫困陷阱指主体因起初贫穷带来的稀缺而变得更穷的情形)。研究两个实验。第一个实验:新泽西某商场 101 名不同财务背景的顾客被给予四个假想实验;参与者随机分到"易"或"难"情景。例如难情景为"你的车需要 USD 1,500 修理",易情景为"你的车需要 USD 150 修理";易情景对穷人和富人都不造成压力,难情景被假定对穷人(而非富人)造成认知稀缺。随后被要求完成 Raven 矩阵与认知控制任务(常用作 IQ 测试)。结果(图 5.4):易情景下穷富表现接近;难情景下穷富表现差异显著。作者还为商场情形做了其他实验以排除其他解释。第二个实验:以农民为对象,随机要求农民在收获前(穷)收获后(富)完成 Raven 矩阵与认知控制任务,测其 Raven 矩阵准确率、认知控制反应时 (RT) 与错误率。结果(图 5.5):农民在收获后(更富)的认知表现优于收获前(更穷)。这些结果意味着处于认知稀缺(源于财务稀缺)下的主体往往表现更差。该文的结果与行为激励相关,因为主体在不同情景下拥有有限的认知能力,这可能影响激励的产生与执行。Mani et al. (2013) study cognitive poverty traps (footnote: poverty traps refer to situations where agents become even poorer because of the scarcity from their poorness in the first place) by testing the hypothesis that poverty impedes cognitive function. They study two experiments. First experiment: 101 customers of different financial backgrounds in a New Jersey mall are given four hypothetical experiments; participants are randomly assigned "easy" or "hard" scenarios. For example, the hard scenario is "your car needs USD 1,500 to be fixed", while the easy scenario is "your car needs USD 150 to be fixed"; the easy scenario creates no pressure for both poor and rich participants, while the hard scenario is assumed to create cognitive scarcity for the poor but not the rich. Then they are asked to complete Raven's matrices and cognitive control tasks (often used as an IQ test). Results (Figure 5.4): under the easy scenario the rich and poor performances are close to each other; under the hard scenario the difference between rich and poor is significant. The authors also conduct other experiments for the shopping-mall case to rule out other explanations. Second experiment: with farmers as subjects, they randomly ask farmers to complete Raven's matrices and cognitive control tasks before harvest (poor) and after harvest (rich), and measure their Raven's-matrices accuracy, cognitive-control response time (RT), and error rates. Results (Figure 5.5): the cognitive performance of farmers after harvest (richer) is better than before harvest (poorer). These results imply that agents under cognitive scarcity (from financial scarcity) tend to perform more poorly. This paper's results are related to behavioral incentives in the sense that agents have limited cognitive power under different scenarios, which can potentially affect the generation and execution of incentives.

Note

图 5.4、5.5(认知任务表现,已转述 / Figures 5.4, 5.5, paraphrased) 图 5.4(商场实验):两组并列柱(Raven 矩阵准确率、认知控制准确率),各按"穷 / 富"× "难 / 易"画柱;在情景下穷富准确率接近,在情景下穷人准确率明显低于富人(差异带显著性星号)。图 5.5(农民实验):三幅图(Raven 矩阵准确率、认知控制反应时 RT、认知控制错误数),各按"收获前 / 收获后"画柱;收获后准确率更高、反应时更短、错误更少——更富时认知表现更好(均显著)。Figure 5.4 (mall experiment): two side-by-side bar groups (Raven's matrices accuracy, cognitive control accuracy), each with bars by "Poor / Rich" × "Hard / Easy"; under the easy scenario the poor and rich accuracies are close, while under the hard scenario the poor's accuracy is clearly lower than the rich's (difference with significance stars). Figure 5.5 (farmer experiment): three panels (Raven's matrices accuracy, cognitive control response time RT, cognitive control errors), each with bars by "Pre-harvest / Post-harvest"; post-harvest has higher accuracy, shorter RT, and fewer errors — better cognitive performance when richer (all significant).

参考文献 / References

  • Ariely, D., Kamenica, E., & Prelec, D. (2008). Man's Search for Meaning: The Case of Legos. Journal of Economic Behavior & Organization, 67(3-4), 671–677.
  • Ashraf, N., Bandiera, O., & Jack, B. K. (2014). No Margin, No Mission? A Field Experiment on Incentives for Public Service Delivery. Journal of Public Economics, 120, 1–17.
  • DellaVigna, S., & Pope, D. (2018a). Predicting Experimental Results: Who Knows What? Journal of Political Economy, 126(6), 2410–2456.
  • DellaVigna, S., & Pope, D. (2018b). What Motivates Effort? Evidence and Expert Forecasts. Review of Economic Studies, 85(2), 1029–1069.
  • Mani, A., Mullainathan, S., Shafir, E., & Zhao, J. (2013). Poverty Impedes Cognitive Function. Science, 341(6149), 976–980.

References

  • Ariely, D., Kamenica, E., & Prelec, D. (2008). Man's Search for Meaning: The Case of Legos. Journal of Economic Behavior & Organization, 67(3-4), 671–677.
  • Ashraf, N., Bandiera, O., & Jack, B. K. (2014). No Margin, No Mission? A Field Experiment on Incentives for Public Service Delivery. Journal of Public Economics, 120, 1–17.
  • DellaVigna, S., & Pope, D. (2018a). Predicting Experimental Results: Who Knows What? Journal of Political Economy, 126(6), 2410–2456.
  • DellaVigna, S., & Pope, D. (2018b). What Motivates Effort? Evidence and Expert Forecasts. Review of Economic Studies, 85(2), 1029–1069.
  • Mani, A., Mullainathan, S., Shafir, E., & Zhao, J. (2013). Poverty Impedes Cognitive Function. Science, 341(6149), 976–980.