重复博弈中折扣与监控的权衡影响合作可能性

英文原文
Monitoring versus Discounting in Repeated Games
Takuo Sugaya
Stanford GSB
Alexander Wolitzky
MIT
July 10, 2023
Abstract
We study how discounting and monitoring jointly determine whether cooperation is possible in repeated games with imperfect (public or private) monitoring. Our main result provides a simple bound on the strength of players' incentives as a function of discounting, monitoring precision, and on-path payoff variance. We show that the bound is tight in the low-discounting/low-monitoring double limit, by establishing a public-monitoring folk theorem where the discount factor and the monitoring structure can vary simultaneously.
Keywords: repeated games, monitoring precision, blind game, occupation measure, χ²-divergence, variance decomposition, folk theorem, frequent actions
JEL codes: C72, C73
Previous title: "Informational Requirements for Cooperation." We thank many seminar participants and the editor and referees for helpful comments. Wolitzky acknowledges financial support from the NSF.
1 Introduction
Supporting non-static Nash outcomes in long-run relationships requires two ingredients. Players' actions must be monitored, so that future play can depend on current behavior. And players must be patient, so that variation in future play can provide incentives. The current paper asks how to measure these ingredients, and how much of each is required. We find that if the ratio of the discount rate and the "detectability" of deviations is large, then all repeated-game Nash outcomes are static ε-correlated equilibria (Theorem 1); and if the ratio of discounting and detectability is small, then, under public monitoring, all payoff vectors that Pareto-dominate static Nash payoffs can be attained as perfect equilibria in the repeated game (Theorem 2).
Our paper is in the tradition of the folk theorem for repeated games with public monitoring (Fudenberg, Levine, and Maskin, 1994; henceforth FLM), but Theorem 1 allows arbitrary (possibly private) monitoring, and both Theorems 1 and 2 concern the tradeoff between discounting and monitoring precision, rather than the classical limit where discounting vanishes for fixed monitoring. A similar tradeoff between discounting and monitoring arises in repeated games with frequent actions (Abreu, Milgrom, and Pearce, 1991; Sannikov and Skrzypacz, 2010; henceforth SS), but we do not parameterize the game by an underlying continuous-time signal process, and instead view the frequent-action limit as a particular instance of a low-discounting/low-monitoring double limit. Our results do have implications for games with frequent actions, as well as other applications. These include games with many players, where a large population of players are monitored by a noisy aggregate signal; and the question of the rate of convergence of the equilibrium payoff set as discounting and monitoring vary. We discuss these applications at the end of the paper and pursue them further in companion papers (Sugaya and Wolitzky, 2023a,b).
Our negative result (Theorem 1) involves some new ideas. First, we focus on overall monitoring precision, rather than how signals are distributed among the players. Specifically, we consider the blind game B associated to any repeated game Γ, where the signals that were observed by the players in Γ are instead observed by a neutral mediator. We interpret B as the repeated game where society has the same amount of information as in Γ, but this information is distributed so as to support a maximally wide range of equilibrium outcomes. Theorem 1 provides a necessary condition for cooperation in B. A fortiori, the same condition applies for Γ itself, as well as for any other repeated game where the same signals are distributed differently—that is, for any repeated game with the same blind game.
Second, we measure the average strength of a player's incentives over all histories that arise in the course of the game. This notion is captured by a player's maximum deviation gain at the occupation measure over actions induced by an equilibrium. Here our approach contrasts with earlier work that analyzes incentives history-by-history (e.g., Fudenberg, Levine, and Pesendorfer, 1998; al-Najjar and Smorodinsky, 2000, 2001; Awaya and Krishna, 2016, 2019). It yields sharper results, because sometimes an equilibrium can be constructed that provides strong incentives at a particular history by letting continuation play depend disproportionately on behavior at that history, but such a construction necessarily provides weaker incentives at other histories.
Third, we measure the detectability of a deviation by the χ²-divergence—the variance of the likelihood ratio difference—of the signal distribution under the deviation from that under equilibrium play. The χ²-divergence is a standard measure of statistical distance. As we explain in Section 2, this measure enters the analysis because supporting efficient payoffs in repeated games requires minimizing the variance of continuation payoffs subject to incentive constraints, and this minimum variance is inversely proportional to the χ²-divergence.
In total, Theorem 1 may be summarized as stating that, for any repeated game Γ, any Nash equilibrium outcome ρ in the associated blind game B, and any possible deviation by any player, we have deviation gain ≤ √(δ/(1-δ)) * √(detectability * payoff variance), where the deviation gain, detectability (measured by χ²-divergence), and payoff variance are all assessed at the equilibrium occupation measure. The proof relies on a simple but novel variance decomposition argument. The idea is that, if deviating from non-static Nash play is unprofitable, then signals must vary significantly with actions, and continuation payoffs must vary significantly with signals; and, moreover, this payoff variation must arrive relatively quickly due to discounting. Theorem 1 shows that recursively decomposing continuation payoffs across periods tightly bounds the strength of players' incentives.
Our positive result (Theorem 2) is a partial converse to Theorem 1. It shows that, under public monitoring, the tradeoff between discounting and monitoring expressed in Theorem 1 is tight up to constant factors in the low-discounting/low-monitoring double limit. Theorem 2 is an extension of the folk theorems of FLM, Kandori and Matsushima (1998; henceforth KM), and SS. It generalizes FLM and KM by letting discounting and monitoring vary simultaneously, and it generalizes SS by considering the general low-discounting/low-monitoring double limit, rather than parameterizing monitoring by an underlying continuous-time signal process. A limitation of Theorem 2 is that it assumes that monitoring has a product structure. This assumption facilitates an easy comparison with Theorem 1, but it is overly strong from the perspective of prior work such as FLM, KM, and SS. However, we prove Theorem 2 as a corollary of a more general result, Theorem 3, which we present in the appendix, and which does not assume product structure monitoring.
The tradeoff we find between discounting and monitoring has a clear interpretation. In probability theory, the sum of the conditional variances of a martingale's increments is often a useful measure of the "intrinsic time" experienced by the martingale (Dubins and Savage, 1965; Freedman, 1975). Analogously, our results show precisely that repeated-game equilibrium play is approximately myopic if players are impatient, and a folk theorem holds if players are patient, where patience is measured relative to the intrinsic time experienced by a martingale with likelihood ratio difference increments, rather than calendar time.

中文翻译
重复博弈中的监控与折扣权衡
Takuo Sugaya
斯坦福大学商学院
Alexander Wolitzky
麻省理工学院
2023年7月10日
摘要
我们研究在具有不完美（公共或私人）监控的重复博弈中，折扣和监控如何共同决定合作是否可能。我们的主要结果提供了一个简单的界限，用于衡量玩家激励的强度，该强度是折扣、监控精度和路径上收益方差的函数。我们通过建立一个公共监控的民间定理，证明了在低折扣/低监控的双重极限下，该界限是紧的，其中折扣因子和监控结构可以同时变化。
关键词：重复博弈，监控精度，盲博弈，占用测度，χ²散度，方差分解，民间定理，频繁行动
JEL代码：C72，C73
先前标题：“合作的信息要求”。我们感谢许多研讨会参与者以及编辑和审稿人的有益评论。Wolitzky感谢美国国家科学基金会的财政支持。
1 引言
在长期关系中支持非静态纳什结果需要两个要素。玩家的行动必须被监控，以便未来的游戏可以依赖于当前的行为。玩家必须有耐心，以便未来的游戏变化可以提供激励。本文探讨如何衡量这些要素，以及每种要素需要多少。我们发现，如果折扣率与偏差的“可检测性”之比很大，那么所有重复博弈的纳什结果都是静态ε相关均衡（定理1）；如果折扣与可检测性之比很小，那么在公共监控下，所有帕累托优于静态纳什收益的收益向量都可以作为重复博弈中的完美均衡实现（定理2）。
我们的论文遵循了公共监控重复博弈的民间定理传统（Fudenberg, Levine, and Maskin, 1994；以下简称FLM），但定理1允许任意（可能是私人）监控，并且定理1和定理2都关注折扣与监控精度之间的权衡，而不是经典极限中折扣在固定监控下趋于零的情况。类似的折扣与监控权衡出现在频繁行动的重复博弈中（Abreu, Milgrom, and Pearce, 1991; Sannikov and Skrzypacz, 2010；以下简称SS），但我们没有通过基础连续时间信号过程来参数化博弈，而是将频繁行动极限视为低折扣/低监控双重极限的一个特例。我们的结果确实对频繁行动博弈以及其他应用有影响。这些包括具有许多玩家的博弈，其中大量玩家通过嘈杂的聚合信号进行监控；以及均衡收益集随折扣和监控变化的收敛速率问题。我们在论文末尾讨论这些应用，并在配套论文中进一步探讨（Sugaya and Wolitzky, 2023a,b）。
我们的负面结果（定理1）涉及一些新想法。首先，我们关注整体监控精度，而不是信号如何在玩家之间分布。具体来说，我们考虑与任何重复博弈Γ相关的盲博弈B，其中玩家在Γ中观察到的信号由中性调解人观察。我们将B解释为这样一个重复博弈：社会拥有与Γ相同数量的信息，但这些信息的分布是为了支持最大范围的均衡结果。定理1为B中的合作提供了必要条件。更有甚者，相同条件适用于Γ本身，以及任何其他信号分布不同的重复博弈——即任何具有相同盲博弈的重复博弈。
其次，我们衡量玩家在整个游戏过程中所有历史上激励的平均强度。这个概念通过玩家在均衡诱导的行动占用测度上的最大偏差收益来捕捉。我们的方法与早期逐历史分析激励的工作形成对比（例如，Fudenberg, Levine, and Pesendorfer, 1998; al-Najjar and Smorodinsky, 2000, 2001; Awaya and Krishna, 2016, 2019）。它产生了更尖锐的结果，因为有时可以构建一个均衡，通过让后续游戏不成比例地依赖于该历史的行为，在特定历史上提供强激励，但这种构建必然在其他历史上提供较弱的激励。
第三，我们通过χ²散度——似然比差异的方差——来衡量偏差的可检测性，即偏差下信号分布与均衡游戏下信号分布的差异。χ²散度是统计距离的标准度量。正如我们在第2节中解释的，这个度量进入分析是因为在重复博弈中支持有效收益需要在激励约束下最小化后续收益的方差，而这个最小方差与χ²散度成反比。
总之，定理1可以总结为：对于任何重复博弈Γ，关联盲博弈B中的任何纳什均衡结果ρ，以及任何玩家的任何可能偏差，我们有偏差收益 ≤ √(δ/(1-δ)) * √(可检测性 * 收益方差)，其中偏差收益、可检测性（由χ²散度衡量）和收益方差都在均衡占用测度上评估。证明依赖于一个简单但新颖的方差分解论证。其思想是，如果偏离非静态纳什游戏无利可图，那么信号必须随行动显著变化，后续收益必须随信号显著变化；而且，由于折扣，这种收益变化必须相对较快地到达。定理1表明，跨期递归分解后续收益紧密地限制了玩家激励的强度。
我们的正面结果（定理2）是定理1的部分逆命题。它表明，在公共监控下，定理1中表达的折扣与监控之间的权衡在低折扣/低监控的双重极限下是紧的，最多相差常数因子。定理2是FLM、Kandori和Matsushima（1998；以下简称KM）以及SS的民间定理的扩展。它通过让折扣和监控同时变化来推广FLM和KM，并通过考虑一般的低折扣/低监控双重极限来推广SS，而不是通过基础连续时间信号过程来参数化监控。定理2的一个局限性是它假设监控具有乘积结构。这个假设便于与定理1进行简单比较，但从FLM、KM和SS等先前工作的角度来看，它过于强。然而，我们将定理2证明为一个更一般结果（定理3）的推论，我们在附录中提出定理3，它不假设乘积结构监控。
我们发现的折扣与监控之间的权衡有一个清晰的解释。在概率论中，鞅增量的条件方差之和通常是衡量鞅经历的“内在时间”的有用度量（Dubins and Savage, 1965; Freedman, 1975）。类似地，我们的结果精确地表明，如果玩家不耐烦，重复博弈均衡游戏近似于短视；如果玩家有耐心，则民间定理成立，其中耐心是相对于具有似然比差异增量的鞅所经历的内在时间来衡量的，而不是日历时间。

文章概要
本文探讨了在重复博弈中，折扣因子与监控精度如何共同影响合作的可能性。研究通过定理1提供了一个数学界限，表明当折扣率相对于偏差可检测性较高时，合作难以维持；反之，在公共监控下，若该比率较低，则可以实现帕累托改进的均衡。文章引入了盲博弈和χ²散度等概念，从平均激励强度和统计距离角度分析了监控效果，并通过方差分解论证了界限的紧性。定理2进一步扩展了经典民间定理，允许折扣与监控同时变化，揭示了内在时间与博弈耐心之间的类比关系。研究对频繁行动博弈和大群体博弈等应用具有启示意义。

高德明老师的评价
TA沟通分析评价：从TA沟通分析视角看，本文展现了卓越的“成人自我状态”理性思维。作者们以清晰、结构化的方式呈现复杂博弈理论，如同一位“成人”在冷静分析数据，避免了“父母”或“儿童”状态的评判或情绪化表达。这种沟通风格促进了学术对话的深度，赞美他们在建立共同理解框架上的努力，未来可能激发更多跨学科合作，推动理论向实际应用转化。
焦点解决心理学评价：本文体现了焦点解决心理学的核心精神——关注“什么是可能的”而非问题本身。研究没有陷入对合作失败的抱怨，而是积极构建数学工具来界定合作的条件，如同寻找“例外时刻”的解决方案。赞美作者们以目标为导向，聚焦于折扣与监控的平衡点，这种建设性视角未来可能引导出更高效的激励机制设计，促进社会协作的优化。
佛学专家角色评价：从佛学视角，本文揭示了“缘起”的深刻道理——合作非孤立存在，而是依赖于折扣（时间偏好）与监控（信息条件）的因缘和合。研究如同观照“无常”，指出当条件变化时，合作可能生起或灭去。赞美作者们以智慧洞察了博弈中的相互依存性，这种觉知未来可能启发更慈悲的制度设计，减少冲突，增进众生福祉。