The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.

A Crises-Bailouts Game WP 22-01 Bruno Salcedo Western University Bruno Sultanum Federal Reserve Bank of Richmond Ruilin Zhou The Pennsylvania State University A Crises-Bailouts Game∗ Bruno Salcedo† Bruno Sultanum‡ Ruilin Zhou§ January 5, 2022 This paper studies the optimal design of a liability-sharing arrangement as an infinitely repeated game. We construct a noncooperative model with two agents: one active and one passive. The active agent can take a costly and unobservable action to reduce the incidence of crisis, but a crisis is costly for both agents. When a crisis occurs, each agent decides unilaterally how much to contribute mitigating it. For the one-shot game, when the avoidance cost is too high relative to the expected loss of crisis for the active agent, the first-best is not achievable. That is, the active agent cannot be induced to put in effort to minimize the incidence of crisis in a static game. We show that with the same stage-game environment, the first-best cannot be implemented as a perfect public equilibrium (PPE) of the infinitely repeated game either. Instead, at any constrained efficient PPE, the active agent “shirks” infinitely often, and when crisis happens, the active agent is “bailed out” infinitely often. The frequencies of crisis and bailout are endogenously determined in equilibrium. The welfare optimal equilibrium being characterized by recurrent crises and bailouts is consistent with historical episodes of financial crises, which features varying frequency and varied external responses for troubled institutions and countries in the real world. We explore some comparative statics of the PPEs of the repeated game numerically. JEL classification: C73 · D82. Keywords: Bailouts · Moral hazard · Repeated games · Imperfect monitoring · Second best. ∗ We thank Ed Green, V. Bhaskar, Neil Wallace, and Rishabh Kirpalani for helpful discussion and comments. The views expressed are those of the authors and do not necessarily reflect those of the Federal Reserve Bank of Richmond or the Board of Governors. † Department of Economics, Western University, bsalcedo@uwo.ca ‡ Federal Reserve Bank of Richmond, bruno@sultanum.com. § Department of Economics, The Pennsylvania State University, rzhou@psu.edu 1 1. Introduction Some institutional arrangements intrinsically have high incentive costs. These costs surface in adverse situations. Sometimes attempts to lessen the damage may appear less cogent, while other times much harsher crisis management solutions are implemented. In the past, these different treatments seemed ad hoc and random, which economists often refer to as time inconsistent. In this paper, we argue that this seemingly random pattern of crisis management may be approximately optimal to sustain the relationship and to minimize long-term cost. Historically, in episodes of financial turmoil, some troubled institutions have been bailed out, and others have not. As a result, the fate of these troubled parties ranges from complete failure/bankruptcy to full recovery.1 Typically, a troubled institution gets bailed out on the ground that the alternative (failure) would have been a lot more costly, at least in the short run, since it might impose a big negative externality on many related parties. When this happens, economists are always quick to point out a fatal flaw of such rescue operation: the moral hazard—the “too big to fail” justification of bailout encourages behaviors that may lead to more failure. This incentive cost serves as the rationale for not bailing out some troubled institutions. We want to build an alternative theory on crisis/bailout that accounts for this very wide spectrum of outcomes. In view of the diverse outcomes, and of the tension between exante and ex-post efficiency, questions about economic efficiency are especially salient. We model this problem as a dynamic game between the “crisis-inflicting” party and the potential “help-to-clean-up” party. We construct a noncooperative, two-player model where an active agent takes costly unobservable action to reduce the incidence of crisis (avoidance). Whenever a crisis occurs, both parties suffer. Each agent decides unilaterally how much to contribute to contain the loss (mitigation). It is assumed both players have nontransferable utility. They can contribute directly to reduce the 1 For example, while the majority of US companies sink or float on their own, the US government has consistently bailed out large corporations in the automobile industry such as GM and Chrysler. Among financial institutions, the Federal Reserve Bank significantly assisted large banks like Citigroup and Bank of America with loans and guarantees, while it let other large financial institutions such as Lehman Brothers and Washington Mutual fail during the 2007–2009 Great Recession. Among sovereign countries, the US government helped Mexico survive the 1994 Tequila crisis, while many countries suffered huge losses during the 1997 Asian financial crisis with little help from the IMF. During the recent Euro-zone crisis, Greece, Italy, Spain, and other potential problem countries received multiple rounds with varying magnitudes of bailouts. 1 loss from the crisis but not to increase each other’s consumption. This assumption rules out direct subsidy from the passive agent to the active agent to pay his avoidance cost. The one-shot game can have any combination of avoidance/mitigation patterns as the static Nash equilibrium. In particular, when the avoidance cost is too high relative to the expected loss of crisis for the active agent, he cannot be induced to take the socially desirable but costly avoidance action at a static Nash equilibrium. As a result, crisis is more likely to happen. In this context, we consider what can be accomplished with the infinite repetition of the one-shot game. We study the perfect public equilibria (PPE) of the repeated game. We show that in the environment where the active agent shirking is the only static Nash equilibrium of the stage game, the first-best outcome that requires that he takes the avoidance action every period cannot be implemented as a PPE of the infinitely repeated game. This is because in order to induce the active agent to take the costly avoidance action, the expected mitigation cost for him in case of crisis must be even higher. With both high avoidance cost and high mitigation cost, the active agent is better off doing nothing. In order to compensate the active player for taking the avoidance action sometimes, he has to be allowed to shirk other times and/or be bailed out (pay less than his share of the mitigation cost) when crises happen. Based on this intuition, we show that at any constrained efficient PPE, the active agent shirks infinitely often, and when crises happen, the active agent is bailed out infinitely often. As a result, crisis occurs more often compared to the first-best outcome. Given that the constrained efficient allocation is necessarily achieved with stochastic shirking and bailout, we approximate the constrained efficient allocation with equilibrium allocation of finite-state automaton representation of the original game. With numerical examples, we show that a particular PPE, where the active agent shirks sometimes and is bailed out other times, can yield a welfare level much higher than the repetition of static Nash equilibrium, and the welfare loss relative to the allocation of the first-best varies with the parameters of the model. The corresponding PPE characterized by recurrent crises and bailouts is consistent with historical episodes of financial crises with varying frequency and varied external responses for troubled institutions and countries in the real world. As in Green and Porter (1984), such a phenomenon reflects an equilibrium that passes recurrently through several distinct states, rather than independent randomization by individual agents. It is a deliberate arrangement of using both occasional shirking and bailout as mechanisms 2 to incentivize good behavior of the active agent as often as possible. We explore some comparative statics of the PPEs of the N-state automaton numerically. Our paper is closely related to the studies of the incentive problem induced by bailout policies. Most papers in the literature take particular institutional design and market structures very seriously but abstract from strategic dynamic interactions. We simplify the environment in several dimensions to make it tractable to illustrate the mechanism of stochastic bailout, but complicate the analysis by taking seriously the dynamic game played by the two parties involved in a bailout (the one who bails out and the one who is bailed out). This difference is not only technical, but also has interesting economic implications. In most papers, bailouts generate bad incentives to private agents. This is also true in our model if we only consider the stage game. However, we show that, in the repeated setting, promises of future bailouts are used to generate good incentives and reduce the incidence of crisis. Such strategic behavior is likely to be present in repeated interactions between long-lived large agents, such as members of the European Union or a government and a large corporation. Two examples of papers highlighting the negative incentives generated by bailout policies are Farhi and Tirole (2012) and Chari and Kehoe (2015).2 Chari and Kehoe (2015) study the time inconsistency problem of bailouts. The paper focus on the dynamic policy decision of a bailout authority who cannot commit to future actions (like the two players in our model). Farhi and Tirole (2012) consider a commitment problem from the government side, but they focus on the strategic complementarity of risk taking behavior from firms. The later paper study a finite stage game and the former assumes bailout policies to be noncontingent in agents’ identities. As a result, neither studies the dynamic strategic behavior between a government and a firm. Green (2010) and Keister (2016) highlight that bailouts can be welfare enhancing but not through incentives. Keister (2016) studies a version of Diamond and Dybvig (1983) that allows the government to divert tax funds from public goods to bailout banks and highlight two important implications of bailouts. On the one hand, bailouts induce bad behavior for banks, leading them to become less cautious and more illiquid. On the other hand, bailouts in their environment also provide insurance to depositors. Keister (2016) shows that, when the probability of a crisis is small, the insurance effect dominates. Similarly, in Green (2010), the welfare enhancing benefit of bailouts comes from the fact that, once we are in a regime with limited-liability firms, bailouts 2 Other examples are Schneider and Tornell (2004) and Ennis and Keister (2009). 3 are necessary for firms to provide perfect risk sharing. The mechanism that makes bailouts welfare enhancing in these models, however, is very different from ours and doesn’t have the incentive properties we highlight. The paper is structured as follows. Section 2 introduces the stage game, while Section 3 describes the repeated game. Section 4 contains our main theoretical results. Section 5 uses numerical exercises to illustrate how the incentive mechanism works and to explore some comparative statics. Section 6 provides a discussion of alternative mechanisms under different assumptions. Finally, Section 7 concludes. 2. The stage game There are two agents, agent 1 and agent 2, and two subperiods. In the first subperiod, agent 1 either takes an avoidance action to avert a crisis, a = 1, or not, a = 0. The cost of taking the avoidance action is d > 0, and the cost of not taking the avoidance action is normalized to zero. Agent 1’s action a is unobservable to agent 2. In the second subperiod one of two things happens: either there is a crisis, denoted by ξ = 1, or there is no crisis, denoted by ξ = 0. The probability of a crisis, conditional on agent 1’s action in the first subperiod, a ∈ {0, 1}, is π a ∈ (0, 1). We assume that π 1 < π 0 so that taking the avoidance action reduces the probability of crisis. But agent 2 cannot infer agent 1’s action from observing whether there is a crisis. Throughout the text, we refer to agent 1 as the active agent given that his action affects the probability of a crisis, and agent 2 as the passive one since he is forced to face the consequence of a crisis but has no influence on its occurrence. In the event of a crisis, ξ = 1, the two agents can jointly mitigate the crisis. Let mi ≥ 0 denote agent i’s contribution to mitigation. The crisis is mitigated if the total contribution of the two agents, m1 + m2 , is no less than one. If the crisis is mitigated (m1 + m2 ≥ 1), the cost to agent i is only his mitigation contribution mi . If the crisis is not mitigated (m1 + m2 < 1), agent i suffers a loss ci > 0 due to the crisis and his contribution mi . If there is no crisis, ξ = 0, nothing needs to be mitigated and agents do not suffer any loss. It is implicit in the payoff structure that there is no transferable utility; contribution m1 + m2 is made only to mitigate a crisis. Neither party can consume it once it is made. The two agents can not make payment to each other in 4 the first subperiod either. In Section 6, we show that relaxing this assumption would greatly reduce the difficulty of achieving a better allocation in equilibrium. Figure 1 summarizes the structure of the stage game. 1 a=0 a=1 Nature ξ=1 Nature ξ=1 ξ=0 1 1 m1 0 0 m1 2 2 m2 m2 −m1 − c1 I{m1 +m2 <1} −m2 − c2 I{m1 +m2 <1} ξ=0 −m1 − c1 I{m1 +m2 <1} − d −m2 − c2 I{m1 +m2 <1} −d 0 Figure 1: The stage game We are interested in studying the case where mitigation after a crisis is ex-post efficient. Therefore, we make the following assumptions on the model parameters. Assumption 1. For i = 1, 2, ci ∈ (0, 1), and c1 + c2 > 1. Assumption 1 implies that neither agent alone is willing to mitigate the crisis, but together they should. Since the total cost of mitigation is less than the total loss if the crisis is not mitigated—that is, 1 < c1 + c2 —mitigation is efficient. 2.1. Equilibrium of the stage game The structure of the game allows us to restrict attention to pure and public strategies without loss of generality. We thus solve for (pure-strategy, public-perfect) Nash equilibria of the two-subperiod normal-form game. Denote the strategy for the active agent 1 by (a, m1 ), and for the passive agent 2 by m2 , where a is agent 1’s avoidance action, and mi is agent i = 1, 2 mitigation 5 contribution. Given the strategy profile (a, m1 , m2 ), agent 1 expected payoff is u1 (a, m1 , m2 ) = −ad − π a (m1 + c1 I{m1 +m2 ≥1} ), (1) and agent 2 expected payoff is u2 (a, m1 , m2 ) = −π a (m2 + c2 I{m1 +m2 ≥1} ). (2) As usual, a Nash equilibrium of the stage game is a strategy profile (a, m1 , m2 ) such that (a, m1 ) is a best response for agent 1 given the strategy m2 for agent 2, and m2 is a best response for agent 2 given the (a, m1 ) for agent 1. The stage game has many equilibria, and it depends on parameters. We are interested in the parameter regions where the efficient outcome cannot be supported as equilibrium outcome. First, there are multiple best responses in the mitigation stage after a crisis. In particular, by Assumption 1, after a crisis, not contributing to mitigation, mi = 0, is always a best response if the other agent is doing the same—although this is ex-post inefficient. However, if agent 1 contributes m1 , and agent 2 contributes the remaining 1−m1 , it must be m1 ≤ c1 and m2 = 1−m1 ≤ c2 . That is, for (m1 , 1−m1 ) to be both agent’s best mitigation response to each other, we must have that 1 − c2 ≤ m1 ≤ c1 . This mitigation outcome is ex-post efficient. The case with m1 +m2 > 1 or m1 +m2 < 1 with either m1 > 0 or m2 > 0 can be easily ruled out since at least one agent would be strictly better off by decreasing their mitigation contribution. When deciding whether to take the avoidance action, agent 1 weighs the cost d against the expected gain of taking the action and successfully avoiding a crisis, which is either his expected contribution (π 0 − π 1 )m1 or expected loss (π 0 − π 1 )c1 . Let dˆ ≡ d/(π 0 −π 1 ) be the cost of avoidance adjusted by its impact on the probability. If dˆ ≤ c1 , the efficient equilibrium where agent 1 takes the avoidance action and crisis is mitigated is always an equilibrium, and dominates all other equilibria. This is the uninteresting case since the static Nash equilibrium—or the repetition of it in the repeated game—achieves the first-best outcome. We ruled out this case and assume that dˆ > c1 . Given the equilibrium restriction m1 ∈ [1 − c2 , c1 ], the assumption dˆ > c1 implies that a = 0 is always agent 1’s optimal action. In this case, agent 1 never takes the avoidance action, and there are only two types of equilibrium: a nonmitigation 6 equilibrium where (a, m1 , m2 ) = (0, 0, 0), and a continuum of mitigation equilibria where (a, m1 , m2 ) = (0, m1 , 1 − m1 ) and m1 ∈ [1 − c2 , c1 ]. These equilibria are inefficient when dˆ < 1 since the cost of action d is less than the expected social gain of avoiding a crisis π 0 − π 1 . This is exactly the situation we are looking for. The inequality dˆ > c1 guarantees that there is no avoidance in equilibrium, dˆ < 1 guarantees that and all the equilibria of the stage game is inefficient. Assumption 2. c1 < dˆ < 1. We impose Assumption 2 on the parameters of the model, and in the next section, we investigate what can be achieved in this region for the repeated game. 3. The repeated game In the repeated game, time is discrete and is indexed by t ∈ {1, 2, . . .}. The two agents live forever and discount future payoffs with the same discount factor δ ∈ (0, 1). At the beginning of each period t, agents observe a payoff-irrelevant public signal θt ∼ U[0, 1], which is i.i.d. across periods. After observing the public signal, the agents play the stage game described in the previous section. The public signal allows agents to take correlated actions in each period. This serves a technical purpose—it convexifies the payoff set without explicitly considering randomized strategy for agent 1’s action at . The public information at the beginning of date t is denoted by ht ∈ Ht . It consists of the realization of all past and current public signals, the history of all past crises, and the history of all past contributions. We focus on perfect Bayesian equilibria where both agents play pure and public strategies. Proposition A.2 establishes that this restriction is without loss of generality (see Appendix A for details). A public strategy for the active agent 1 is a sequence of measurable functions σ1 = (αt , µ1t )∞ t=1 , where αt (ht ) ∈ {0, 1} is the avoidance action of the active agent in period t given the public information ht , and µ1t (ht ) ≥ 0 is his contribution for the mitigation in case of a crisis. A public strategy for the passive agent is a sequence of functions σ2 = (µ2t )∞ t=1 , where µ2t (ht ) ≥ 0 is his contribution for the mitigation in case of a crisis. We denote a public strategy profile by σ = (σ1 , σ2 ). The expected discounted utility for agent i from date t onward, given the strategy 7 profile σ and public history ht is vit (σ, ht ) = (1 − δ)E "∞ X τ =t δ τ −t ui ατ (hτ ), µ1τ (hτ ), µ2t (hτ ) # ht , (3) where the expectation is taken with respect to the crisis realizations from period t onward and the public signal from period t + 1 onward. With slight abuse of notation, the average expected discounted utility for agent i at the beginning of the game is denoted by vi (σ) = E[vi1 (σ, h1 )]. Definition 1. A public strategy profile σ ∗ is a perfect public equilibrium (PPE) if and ∗ only if vit (σ ∗ , ht ) ≥ vit (σi′ , σ−i , ht ) for any agent i, any public strategy σi′ any period t ≥ 1, and any public history ht . We denote the set of PPE payoffs by V ∗ = {v(σ ∗ ) | σ ∗ is a PPE}. A PPE always exists because unconditional repetition of a static Nash equilibrium of the stage game is a PPE. In Appendix A, we show that the set of PPE can be characterized recursively, using a modified version of the standard APS recursive decomposition, and we use this decomposition to establish some useful technical properties. 4. Optimal level of crises and bailouts Under Assumptions 1 and 2, the first-best requires that in every period the active agent takes the avoidance action, and both agents mitigate a crises if it happens. However, the first-best is not achievable in a equilibrium of the stage game because the active agent never takes the avoidance action in any of them. In this section, we study how, and to what extent, welfare can be improved in the repeated setting. Our first finding is that, even in the repeated setting, the first-best cannot be attained. This result is a strong impossibility result because it holds for any discount factor of agents. Given that the first-best is not achievable, we then turn our attention to constrainedefficient allocations by investigating the properties of PPEs that are Pareto efficient. In any Pareto efficient PPE, crises are always mitigated amd the welfare loss arises from the avoidance action not being taken every period. Furthermore, the frequency 8 of avoidance action depends on the agents’ discount factor. For low discount factors, the active agent never takes the avoidance action in any equilibrium. Once the discount factor is greater than some threshold, the active agent takes the avoidance action infinitely often in any Pareto efficient PPE. Moreover, the passive agent has to bailout the active agent also infinitely often, where “bailout” has a precise sense we describe further ahead. The optimal frequency of avoidance and bailouts is determined endogenously. 4.1. The impossibility of implementing the first-best Suppose that in some equilibrium the active agent takes the avoidance action with positive probability. The expected discounted payoff for the active agent at that moment (v1 ) is a convex combination of his expected discounted payoff conditional on the event of a crisis (w11 ), and his expected discounted payoff if there is no crisis (w10 ). Also, because taking the avoidance action is costly, it must be the case that w10 is strictly greater than w11 , so that the active agent finds it optimal to incur the cost. Moreover, w11 cannot be too negative because of individual rationality. Using these facts, we show in the appendix that there is a fixed positive constant γ, such that w10 > v1 + γ. That is, whenever the active agent takes the avoidance action as part of a PPE and there is no crisis, his continuation value must increase by at least a fixed amount. Therefore, if there are no crises for a sufficiently long time interval, the implied continuation value required for the active agent to be willing to take the avoidance action value stops being feasible.3 We thus obtain the following result. Proposition 4.1. There is no PPE in which the active agent takes the avoidance action almost surely at every period along the equilibrium path. 4.2. Efficient mitigation It is not possible to have avoidance played on every period. However, except for low discount factors, a PPE exists in which the active agent sometimes takes the avoidance action (see Lemma B.4 in the appendix). This requires the passive agent 3 It is crucial for this proposition that the avoidance action is not observable, see Section 6.1. Hence, this is a result of moral hazard and not of the structure of the payoffs. 9 to provide incentives, for instance, by punishing the active agent after a crisis, or rewarding him if there is no crisis. Two possible ways to punish the active agent after a crisis are to let him suffer the cost of the crisis (no mitigation), or to ask him to contribute more than necessary to mitigate the crisis (money burning). Our second result is that neither of these forms of punishment schemes are optimal. In every constrained efficient PPE, agents contribute exactly as much as needed to mitigate a crisis when it happens. Proposition 4.2. In any constrained-efficient PPE, crises are efficiently mitigated, that is, µ1t (ht ) + µ2t (ht ) = 1 almost surely along the equilibrium path. This result is very natural since both of these forms of punishment are ex-post inefficient. However, the proof is far from trivial because, given that there is imperfect monitoring, some degree of inefficiency ex-post could be necessary to generate incentives ex-ante. This is a common feature of models with imperfect monitoring that can be traced back to Green and Porter (1984). We obtain the result because we show that there are always better ways to punish the active agent. As it turns out, any incentive scheme that can be generated in equilibrium via no-mitigation or money burning can also be generated by adjusting the shares of the mitigation cost in the future without incurring any efficiency losses due to either insufficient or excessive mitigation. The details of the proof are in Appendix B. 4.3. Bailouts as an incentive mechanism for avoidance The difficulty in inducing the active agent to take the avoidance action is that the cost is too high for him to pay it on his own. The solution seems to be that the passive agent should help pay part of it. In a world with perfectly transferable utility, we could consider schemes where the passive agent directly subsidizes the active agent.4 However, we have assumed that the agents’ contributions can only be used to mitigate crises. In our environment, the only way for the passive agent to compensate the active agent is by sometimes paying more in mitigation cost after a crisis has occurred. When this happens, we call it a bailout. 4 In Section 6.2, we study an extension with perfectly transferable utility, and show that the first-best can be achieved when both agents are sufficiently patient. 10 Definition 2. A bailout is a situation where a crisis occurs, the agents jointly contribute sufficient resources to mitigate it, and the contribution of the active agent is less than his private crisis loss, i.e., µ1t (ht ) + µ2t (ht ) ≥ 1, and µ1t (ht ) < c1 . We can show that bailouts are the only form of compensation available, and if the active agent is not compensated, then he has no reason to choose avoidance. It follows that bailouts are not only sufficient to induce the avoidance action, but also necessary. Proposition 4.3. In any PPE where the avoidance action is taken with positive probability, bailouts occur with positive probability. Proposition 4.3 shows that bailouts are necessary in order to support avoidance actions, but it says nothing about sufficiency nor about efficiency. When is it possible to support any avoidance at all? When is it efficient to do so? If the active agent expects to be bailed out in the future as a form of compensation, he may be willing to take the avoidance action, at least in some instances, and such arrangement is necessary for efficiency when feasible. The following proposition formalizes these results. Proposition 4.4. There exists δ̃ ∈ (0, 1) such that: 1. If δ < δ̃, then every PPE (and therefore every constrained-efficient PPE) has avoidance played with probability zero at all periods. 2. If δ > δ̃, then in every constrained-efficient PPE the avoidance action is played infinitely often, and bailouts take place infinitely often. Proposition 4.4 indicates that, for low discount factors, it is not possible to induce the active agent to take any avoidance actions, and the set of efficient PPE essentially reduces to repetition of static Nash equilibria of the stage game. For higher discount factors, avoidance is not only possible, but it is also necessary for constrained efficiency. Propositions 4.1 and 4.4 combined imply that, when δ > δ̃, in any constrainedefficient PPE the active agent takes the avoidance action infinitely often, takes the nonavoidance action infinitely often, and is bailed out infinitely often. To prove this fact, the key step is to show that having at least some avoidance is a Pareto improvement whenever it is incentive compatible. Formally, Lemma B.2 in the appendix asserts that, if it is possible to play avoidance at least once in some 11 PPE, then every PPE without avoidance is Pareto dominated by a PPE with avoidance. Because constrained Pareto efficiency requires continuation strategies to also be constrained-efficient, this implies than, whenever possible, it is optimal to have avoidance infinitely often. Because of Proposition 4.3, doing so requires also having bailouts infinitely often. 5. Automata: endogenously determined frequencies of crises and bailouts From previous sections, we learned that, since the active agent’s private incentive does not align with the social one, i.e., c1 < dˆ < 1, the way to align incentives is by bailing out the active agent infinitely often. But how bailouts work as a mechanism to generate incentives? How good is it? Is it close to the first-best? And how does it change with the primitives of the model, such as, the cost to avert crisis, d, the private costs of agents, c1 and c2 , and the effectiveness of the avoidance action, π0 − π1 ? In this section, we use numerical methods to investigate these questions. We approximate the second-best by considering PPEs where equilibrium behavior can be described by finite-state automata. An automaton consists of four components: states, an initial distribution over states, a transition rule, and a mapping from states to actions. Let Ω be a finite set of states. States are mapped into actions by α1 : Ω → Ai and µi : Ω × X → M , i = 1, 2. Given the current state ωt , the active agent takes the action α1 (ωt ), and, after a crisis state ξt is realized, the agents’ contributions are given by µi (ωt , ξt ). After actions are realized, the state ωt+1 for the next period is randomly drawn according to the transition rule η : Ω × X × M × M → ∆(Ω). The initial state for period 1 is drawn according to the initial distribution η0 ∈ ∆(Ω). An automaton describes a profile of public strategies for the repeated game. In fact, if we didn’t restrict attention to finite automata, every profile of public strategies could be described by an automaton (Mailath and Samuelson, 2006, pp. 230). However, for computational reasons, in all of our numerical exercises, we restrict attention to finite automata with a fixed upper bound on the number of states in Ω. In what follows, we provide an example in the form of an automaton, where avoidance takes place at some but not at other times. 12 5.1. An illustrative example of equilibrium mechanism Here, we illustrate how bailouts can be used to induce avoidance action and, thus, improve efficiency. Consider the set of parameters δ = 0.95, π 1 = 0.2, π 0 = 0.9, d = 0.5, c1 = 0.6, c2 = 0.5. With this set of parameters, the adjusted avoidance cost is higher than the cost for agent 1 and avoidance is socially efficient, c1 = 0.6 < dˆ ≈ 0.7143 < 1. The first-best has an expected total cost of 0.7, which is not obtainable in any PPE by Proposition 4.1. The total cost that can be obtained in a static Nash equilibrium is 0.9, which implies a welfare loss (relative to the first-best) of 28.57 percent. ω1 : 1.00 a 1 m1 0.61 m2 0.39 1.00 0.19 ω2 : Crisis a 1 m1 0.00 0.87 0.13 0.81 m2 1.00 0.26 ω3 : a 0 m1 0.00 m2 1.00 0.68 No-Crisis 0.32 0.74 Figure 2: Automaton Figure 2 describes the PPE that minimizes the total expected long-run cost among all PPEs that can be described by a four-state automata, with one state being the minmax equilibrium. The equilibrium works as follows. Agents start in state ω1 , where the strategy profile is (a, m1 , m2 ) = (1, 0.61, 0.39). In this state, agent 1 is supposed to take the avoidance action, but his private cost in the crisis alone does not generate incentives to do so since m1 = 0.61 < dˆ ≈ 0.7143. As a result, in order to generate incentives, when there is no crisis, the state switches to ω2 with probability 13 0.19. The ω2 is a bailout state since m1 = 0.00 < c1 . The “reward” of a bailout in the future helps generating incentives for the avoidance action. That is, the probability of ˆ aligning the going to this bailout state, compensates for the fact that m1 = 0.61 < d, private and social incentives to take the avoidance action. In ω2 , the strategy profile is (1, 0.00, 1.00). Again, agent 1 is supposed to take the avoidance action, but now he has even less incentives to do so since his contribution to mitigation is now zero. This time, to generate incentives, the equilibrium moves to state ω3 with probability 0.32 if there is no crisis. The state ω3 has an even stronger form of bailout because the active agent contribution in mitigating crisis is zero, and he takes no avoidance action.5 There is a fourth state ω4 , which is not in the figure, with strategy profile of nonavoidance/no-mitigation (the minmax equilibrium). This state is out of the equilibrium path and works as a punishment state in case of a detectable deviation. Table 1: Summary statistics of the PPE State ω Invariant distribution u1 (ω) u2 (ω) V1 (ω) V2 (ω) Welfare Welfare loss (%) ω1 ω2 ω3 LRA 0.50 0.38 0.12 − −0.622 −0.500 −0.000 −0.501 −0.078 −0.200 −0.900 −0.222 −0.539 −0.478 −0.420 −0.501 −0.177 −0.250 −0.328 −0.222 −0.748 −0.727 −0.716 −0.724 2.23 3.87 6.86 3.41 Notes: LRA refers to the long-run averages, which correspond to the expected values evaluated using the invariant distribution. ui (ω) denotes agent i’s expected payoff for the period when the state is ω. Vi (ω) denotes agent i’s total discounted expected payoff when the state is ω. Although this automaton PPE is not in the Pareto frontier, it provides a lower bound on what can be achieved by a constrained efficient allocation. Table 1 provides summary statistics of the equilibrium. In state ω1 , the expected discounted total cost is 0.748, which is only 2.23 percent greater than the minimum feasible one 0.7. On average, crisis occurs 28.4 percent of the time, compared to 20 percent at the first-best. When crisis does happen, bailout occurs 70.3 percent of the time. But the striking result is that, even though the passive agent bailout the active agent over 70 percent of the crisis, the expected present value of his cost is only 0.17. For a comparison, in the best equilibrium for the passive agent with no bailouts, the expected present value of his cost is 0.36. That is, by optimally choosing a bailout policy, the passive agent can reduce his cost with crisis by half. 5 The probability of moving to a state preferred by agent 1 is always higher when there is no crisis. Hence, the automata is reminiscent of the revision strategies used in Rubinstein and Yaari (1983) and Radner (1985). 14 5.2. Comparative statics The equilibrium displayed in Figure 2 illustrates how bailouts can be used in order to induce avoidance in equilibrium. In this subsection, we study how properties of this equilibrium change with key parameters of the model: the avoidance cost, d, the private costs of nonmitigated crisis, (c1 , c2 ), and the probabilities of crises, (π 0 , π 1 ). For each set of parameters, we found the PPE that minimizes the total discounted long-run cost among six states automata. Then, we compare the implied long-run probabilities of avoidance, crisis and bailouts, as well as the long-run average cost of avoidance and the agents’ mitigation payments. The impact of changes in the avoidance cost — Table 2 displays features of the equilibrium outcome for different values of the avoidance cost d. As one could expect, when the avoidance cost increases, avoidance action is taken less frequently and, therefore, crisis happens more often. The average avoidance cost (column 5) is nonmonotone, reflecting the more costly avoidance action is taken less often. The incidence of bailouts (column 4) is nonmonotone, similar to agent 1’s mitigation cost (column 6). These changes reflect the structure of the equilibrium. Bailouts are the mechanism where agent 2 compensates agent 1 for bearing the avoidance cost alone. As d increases, the compensation needed to generate incentives for the avoidance action also increases. Table 2: The impact of changes in the avoidance cost1 d P(a = 1) P(ξ = 1) P(m1 < c1 ) E(d) E(m1 ) E(m2 ) Expected Total Cost (%)2 0.45 0.50 0.55 0.60 0.65 0.9569 0.8571 0.7598 0.7115 0.6851 0.2302 0.3000 0.3682 0.4019 0.4204 0.4647 0.7937 0.8942 0.9301 0.8727 0.4306 0.4285 0.4179 0.4269 0.4453 0.0784 0.0340 0.0182 0.0143 0.0227 0.1518 0.2660 0.3500 0.3877 0.3977 101.66 104.08 104.81 103.61 101.85 Note: The probabilities and expectations are evaluated using the implied invariant distribution. Other parameters are δ = 0.9, π 1 = 0.2, π 0 = 0.9, c1 = 0.6 and c2 = 0.5. 2 Expressed as percentage of the first-best expected total cost. 1 The impact of changes in the private costs of a crisis — Table 3 displays features of the equilibrium outcome for different values of (c1 , c2 ). When c1 and c2 increases, both agents’ minmax payoff decreases. The impact of c1 on the equilibrium outcome is substantial. Increasing c1 from 0.55 to 0.65 reduces the long-run average cost from about 106.4 percent to 101.8 percent of the first-best; the incidence of crisis is reduced 15 by approximately one-third (from 0.36 to 0.24); and bailouts are reduced to 60 percent from 89 percent. The reason is that increasing c1 helps aligning agent 1 private cost of a crisis, c1 , with the social cost of a crisis, which is the mitigation cost 1. Agent 2 private cost of crisis, c2 , has little effect on the equilibrium outcome since he is a passive agent and has no private information. Table 3: The impact of changes in the private costs of a crisis1 c1 c2 P(a = 1) P(ξ = 1) P(m1 < c1 ) E(d) E(m1 ) E(m2 ) Expected Total Cost (%)2 0.55 0.5 0.7 0.7763 0.7729 0.3566 0.3590 0.8870 0.8875 0.3882 0.3864 0.0177 0.0176 0.3388 0.3414 106.39 106.49 0.60 0.5 0.7 0.8571 0.8529 0.3000 0.3029 0.7937 0.7119 0.4285 0.4265 0.0340 0.0391 0.2660 0.2638 104.08 104.20 0.65 0.5 0.7 0.9363 0.9349 0.2446 0.2456 0.6136 0.5420 0.4681 0.4675 0.0655 0.0722 0.1791 0.1733 101.82 101.86 Note: The probabilities and expectations are evaluated using the implied invariant distribution. Other parameters are δ = 0.9, π 1 = 0.2, π 0 = 0.9, and d = 0.5. 2 Expressed as percentage of the first-best expected total cost. 1 The impact of changes in crisis probabilities — Table 4 displays the long-run expected total cost above the first-best for different combinations of (π 0 , π 1 ). The other parameters are set to δ = 0.95, d = 0.5, c1 = 0.6, and c2 = 0.5. The cells with symbol “−” represent the cases where the parameters do not satisfy Assumption 2. The effect of (π 0 , π 1 ) on the welfare cost is not uniform. Combinations of (π 0 , π 1 ), with dˆ either closer to c1 or 1, lead to lower cost. This means that sometimes decreasing π 0 reduces the welfare cost, while sometimes increasing π 0 reduces the welfare cost. The same is true for π 1 . On the other hand, for combinations of (π 0 , π 1 ) with the same dˆ (that is, π 0 − π 1 constant), higher π 0 and π 1 always lead to a lower welfare cost. The interpretation of these results is not simple. One could think that higher π 1 means that avoidance is less effective in preventing crisis, which could imply a higher cost, but this is not true. The correct measure is π 0 − π 1 , how much the probability of crisis decreases by the avoidance action. With π 0 − π 1 held constant, the only impact is increasing π 0 . Higher π 0 implies that the minmax utility of agent 1 is lower, hence, it is easier to generate incentives for avoidance. 16 Table 4: Total cost above the first-best (%)1 π1 π 0.10 0.15 0.20 0.25 0.30 1 0 0.65 3.25 − − − − 0.70 5.71 2.70 − − − 0.75 8.49 4.65 2.20 − − 0.80 9.05 6.46 3.68 1.75 − 0.85 8.00 6.59 4.67 2.89 1.38 0.90 2.94 4.41 4.54 3.30 2.19 0.95 − 1.28 2.74 2.89 2.26 Long-run average as percentage of the first-best. The impact of changes in agents’ discount factor — Table 5 displays features of the equilibrium outcome for different values of the discount rate δ. As one could expect, lower δ is associated with lower welfare. Increasing δ from 0.6 to 0.9 decreases the total cost in about 4 percent of the first-best. This pattern reflects that when δ is high, agents are more willing to cooperate since they care more about punishments in the future. Table 5: The impact of changes in agents’ discount factor1 δ P(a = 1) P(ξ = 1) P(m1 < c1 ) E(d) E(m1 ) E(m2 ) Expected Total Cost (%)2 0.6 0.7 0.8 0.9 0.7111 0.7630 0.7979 0.8571 0.4022 0.3659 0.3415 0.3000 0.5978 0.8345 0.7491 0.7937 0.3555 0.3815 0.3989 0.4285 0.1063 0.0636 0.0444 0.0340 0.2960 0.3023 0.2970 0.2660 108.25 106.77 105.78 104.08 Note: The probabilities and expectations are evaluated using the implied invariant distribution. Other parameters are π 1 = 0.2, π 0 = 0.9, c1 = 0.6, c2 = 0.5, and d = 0.5. 2 Expressed as percentage of the first-best expected total cost. 1 6. Alternative mechanisms We have shown that the first-best cannot be achieved as a PPE of the repeated game, and that whenever avoidance is possible in equilibrium, every constrained efficient PPE involves bailouts infinitely often. In this section, we consider two alternative mechanisms that can help to improve welfare. We analyze one model where the avoidance action is perfectly observed, and one where the passive agent can directly subsidize the active agent. In both cases, it is still the case that either bailouts or direct transfers are necessary for the active agent to take the avoidance action. 17 However, unlike our benchmark model, these alternative specifications admit PPE that achieve the first-best when agents are patient enough. 6.1. The avoidance action is observable In our benchmark model, we assumed that the avoidance action of the active agent is private. The passive agent could only make imperfect inferences about it via the realization of crises. Now, consider the alternative specification where a is perfectly observable to both agents. This allows agents to use strategy profiles that bail out the active agent if and only if he takes the avoidance action, but it does not change the fact that bailouts are necessary for avoidance. Proposition 6.1. In any PPE of the game with observable actions, if the avoidance action is taken with positive probability, then bailouts occur with positive probability. To illustrate the difference from the unobservable action case, consider the following simple strategy profile. Along the equilibrium path, the active agent always takes the avoidance action, and crisis is always mitigated. (αt (ht ), µ1t (ht ), µ2t (ht )) = (1, m∗1 , 1 − m∗1 ), for some fixed constant m∗1 > 0, which is specified ahead. After any deviation, then the active agent chooses a = 0 forever after, and both agents never again make positive mitigation contributions. We show in the appendix that, if the discount factor is sufficiently high, then one such grim trigger strategy exists, which is a PPE. Since there is always avoidance and mitigation along the equilibrium path, this strategy profile implements the first-best. Proposition 6.2. There exists δ̃ ′ ∈ (0, 1) such that, if δ > δ̃ ′ , then the game with observable actions admits a PPE where the active agent takes the avoidance action at every period and after every history. 6.2. Monetary transfers The previous analysis depends crucially on the assumption of nontransferable utility. That is, if agent 1 takes the avoidance action, he has to pay the cost d by himself. 18 Moreover, both agents’ contributions for cleanup can only be used to mitigate crises. Suppose we relax this assumption by allowing the passive agent to directly transfer resources to agent 2 for consumption. More precisely, suppose that at any date t and after any history ht , agent 2 can make a transfer βt1 (t, ht ) ≥ 0 if there is a crisis and a transfer βt0 (t, ht ) ≥ 0 if there is no crisis. These transfers enter the stage game payoffs as an additive term. That is, the stage-game payoffs for the active (passive) agent in the game with transfers are exactly those from the game without transfers plus (minus) whatever transfer he receives (makes). A version of Proposition 4.3 continues to hold in this modified model. For the active agent to be willing to take the avoidance action, he must expect some form of compensation. The only difference is that the passive agent has new forms of compensation available. Agent 2 can still compensate agent 1 by bailing him out, by contributing sufficient resources so that the cost incurred by agent 1 in case of a crisis is less than c1 . Additionally, agent 2 can transfer resources to agent 1 in periods where there are no crises. Any equilibrium with avoidance must involve at least one of these forms of compensation. Proposition 6.3. In any PPE of the game with transfers where the active agent takes the avoidance action with positive probability, agent 2 compensates agent 1 by having either βt0 (ht ) > 0 or βt1 (ht ) − µ1t (ht ) > −c1 , or both with positive probability. To illustrate the difference from the nontransferable-utility case, consider the following simple strategy profile for the game with transfers. (αt (ht ), µ1t (ht ), µ2t (ht )) = (1, m∗1 , 1 − m∗1 ), and (βt0 (ht ), β1t (ht )) = (b∗ , 0), for all t and every ht along the equilibrium path, where m∗1 ∈ (0, 1) and b∗ > 0 are fixed constants specified in Appendix B.7. That is, agent 1 always takes the avoidance action and contributes m∗1 when there is a crisis, and agent 2 compensates agent 1 with b∗ units of consumption when there is no crisis. The transfer b∗ can be viewed as a subsidy to agent 1 from agent 2 in no-crisis time. In case of a detectable deviation, the agents switch to play the one-shot Nash equilibrium with no-avoidance and no 19 mitigation forever. We show in the appendix that, if the discount factor is high enough, this strategy profile constitutes a PPE of the game with transfers. Hence, when the agents are patient enough, the first-best is attainable in equilibrium. Proposition 6.4. There exists δ̃ ′′ ∈ (0, 1) such that, if δ > δ̃ ′′ , then the game with transfers admits a PPE where the active agent takes the avoidance action at every period and after every history. This subsidy scheme is simple theoretically but may not be easy to implement in reality. For example, it might be difficult to justify paying Greece’s government every period—subsidy in normal time and mitigation in crisis time—to the public! 7. Conclusion We have studied a liability-sharing problem between two asymmetric parties in an infinitely repeated game. The main frictions in the model are unobserved action by the active player (moral hazard) and nontransferable utility between the two parties. With this model, we want to make several points. First, there are environments where, conditional on some social arrangement (such as the European Monetary Union) having already been formed to share some risk, shirking and bailouts are not only consistent with equilibrium behavior, but also necessary to achieve constrained optimal. The incentive cost may be too high to ask for outcomes devoid of these vices. Insisting otherwise is unrealistic. The high incentive cost to the social arrangement should be considered before any coalition/arrangement is made rather than ex-post trying to eliminate it. Second, stochastic shirking and bailout may be necessary features of the approximately efficient outcome. Roughly speaking, when the active player is expected to shirk, there is no need for incentive to induce his current-period effort, and hence bailout is likely as a reward from the passive player to the active player for future effort. In a period when the active player is supposed to put in effort, he is unlikely to be bailed out. The constrained optimal requires coordination between the two parties. This coordination can be accomplished with the use of n-state automata and the correlated equilibrium given the automata. To achieve the constrained optimal, the fine-tuning tools for incentive provision are the levels of mitigation contribution and the transition probability from any state to any other 20 state conditional on current state and outcome. These transition probabilities are endogenously chosen, unlike the exogenous sunspot type of modeling device. Third, our numerical simulation results show that the equilibrium of the n-state automata, with n optimally chosen (approximate second-best), can achieve quite high level of welfare relative to the first-best. Fourth, if one thinks that the nontransferable utility assumption is too strong, relaxing the assumption can improve welfare of the two parties, but it will not eliminate “bailout.” With transferable utility, the payment from the passive player 2 to the active player 1 is simply shifted from ex-post (after the crisis happens) to ex-ante (before the effort is exerted), but does not disappear. The model is very schematic: it does not have any realistic features such as different maturities of debt instrument, sovereign default, renegotiation of debt, yield, fiscal and monetary policies, etc. This is intentional and meant to illustrate the mechanism of stochastic bailout as an incentive device. 21 Appendix A. Recoursive analysis of the set of PPE With agents playing public strategies only, the repeated game has a recursive structure. After an arbitrary history, the continuation strategy profile of a PPE is an equilibrium profile of the original game. The standard way to characterize the set of PPE values is to use the self-generation procedure introduced in Abreu, Pearce, and Stacchetti (1990) (APS). This appendix establishes an analogous procedure and shows that our restriction to pure and public strategies is without loss of generality.6 A.1. Incentive constraints We begin by providing three necessary conditions that any PPE must satisfy after each history: one feasibility condition and two incentive constraints. Consider any PPE of the repeated game σ ∗ = (α∗ , µ∗1 , µ∗2 ), and any arbitrary public history ht ∈ Ht . Let s∗ = (a∗ , m∗1 , m∗2 ) denote the action profile dictated by σ ∗ for period t given ht , i.e., (a∗ , m∗1 , m∗2 ) = (αt∗ (ht ), m∗1t (ht ), m∗2t (ht )). Also, let w∗ = (wi∗ (ξ))i=1,2;ξ=0,1 ∈ R2×2 denote the profile of continuation expected average discounted values from date (t + 1) onward given σ ∗ as a function of the crisis state on date t, i.e., wi∗ (ξ) := h i E vit+1 (σ ∗ , ht+1 )|ht , ξt = ξ . With this notation, note that we can write the following feasibility condition vit (σ ∗ , ht ) = gi (s∗ , w∗ ), (F) where gi : ({0, 1} × R2+ ) × R2×2 → R is the function given by h i g (a, m1 , m2 ), w = (1 − δ)ui a, m1 , m2 + δ π a wi (1) + (1 − π a )wi (0) . (4) There are two kinds of necessary date-t conditions for σ ∗ to be a PPT. The first 6 We cannot simply apply the procedure from APS because our model differs from theirs in the monitoring structure, our stage game is a multistage game, and we allow for public randomization but exclude individual mixed strategies. 22 condition refers to the mitigation contributions. If a crisis were to arise on t, each agent i could unilaterally decide to contribute exactly the minimum amount required to mitigate it, Agent i’s cost from the crisis would be − max{0, 1−m∗−i }. Alternatively, if m∗−i < 1, agent i could decide to not contribute anything to mitigate the crisis, and incur a cost of −ci . By doing so, agent i’s ex-post cost due to the crisis on the period would be − min{ci , max{0, 1 − m∗−i }}, and his continuation value would be no worse than his minimax −π 0 ci . For σ ∗ to be a PPE, this potential deviation cannot be strictly profitable. That is, it must be the case that (1 − δ)ki (m∗1 , m∗2 ) + δwi∗ (1) ≥ −(1 − δ) min{ci , max{0, 1 − m∗−i }} − δπ 0 ci ≥ −(1 − δ + δπ 0 )ci (M) where ki : R2+ → R+ is agent i’s cost from a date-t crisis as a function of the mitigation contributions, i.e., ki (m1 , m2 ) = −mi − ci I{m1 +m2 <1} . (5) We call Condition (M) the mitigation constraint for agent i. Secondly, suppose that α∗ = 1. The active agent—agent 1—could deviate at period t by not taking the avoidance action and following σ1∗ after that. Since this deviation is not observable, agent 1 would expect in equilibrium that agent 2 would continue to follow σ2∗ . For this deviation to not be profitable, the expected discounted utility for the active agent in case there is no crisis (δw1∗ (0) should be greater than if there is a crisis ((1 − δ)k1 (m∗1 , m∗2 ) + δw1∗ (1)). Moreover, it should be sufficiently greater to compensate the active agent for the private cost of taking the avoidance action, that is, (1 − δ)dˆ ≤ δw1∗ (0) − (1 − δ)k1 (m∗1 , m∗2 ) + δw1∗ (1) . i h If α∗ = 0, the converse inequality must hold. After doing some simple algebra, we can summarize both cases via the following avoidance constraint a∗ 0 ≤ (−1) " dˆ + k1 (m∗1 , m∗2 ) # δ ∗ + w1 (1) − w1∗ (0) . 1−δ 23 (A) A.2. APS decomposition So far, we have argued that conditions (F), (M), and (A) are necessary for a strategy profile to be a PPE. In what follows, we will show that they are also sufficient to characterize the set of PPE payoffs V ∗ . In particular, we will show that a vector of feasible payoffs v is attainable in equilibrium if and only if it can be attained by (a distribution over)7 action profiles and continuation values that satisfy such conditions. To formalize this idea, we make use of the following definition. Definition 3. Given an action profile s∗ = (a∗ , m∗1 , m∗2 ), a profile of continuation values w∗ = (wi (ξ))i=1,2;ξ=0,1 , and a set V ⊂ R2 , the pair (s∗ , w∗ ) is said to be admissible with respect to V if and only if: (a) it satisfies the mitigation constraints (M) for i = 1, 2, (b) it satisfies the admissibility constraint (A), and (c) w∗ (ξ) = (w1∗ (ξ), w2∗ (ξ)) ∈ V for ξ = 0, 1. In our setting with public randomization, the relevant self-generating operator is the one introduced by Cronshaw and Luenberger (1994). For any set V ⊆ R2 and every action profile s = (a, m1 , m2 ), define: o n Bs (V) = v ∈ V ∃w such that (s, w) is admissible w.r.t. W and v = g(s, w) . (6) Intuitively, Bs (V) would be the set of payoff profiles that could be obtained by playing s on the first period and using continuation values from the set V, in such a way that there are no profitable one-shot deviations on the first period. To take into account the possibility of public randomization, let B(V) = co [ s∈S ! Bs (V) . (7) A set V ⊆ R2 is said to be self-generating if and only if V ⊆ B(V). The following proposition states that, the set of PPE satisfies some desirable properties and, using this notion of self-generation, the following APS-like result applies to our setting. 7 So far in this section, we have not yet discussed public randomization. The action profile (a , m∗1 , m∗2 ) specifies the pure actions chosen after θt is realized, and the continuation values w∗ are taken to be the average continuation values integrating over θt+1 . Public randomization enters implicitly in the convex hull operation in (7). ∗ 24 Proposition A.1 (APS decomposition). The set of PPE payoffs V ∗ is the largest self-generating set, and B n (V) → V ∗ for every bounded set V ⊆ R2+ such that V ∗ ⊆ V. Proposition A.1 serves two purposes. First, it enables us to use a recursive approach to characterize the set of PPE. We use this approach implicitly in the proofs of our main results. Secondly, it allows us to establish some desirable properties for the set of PPE, which are summarized in the following proposition. Proposition A.2. The set of PPE payoffs V ∗ is nonempty, compact and convex, and is increasing with respect to the discount factor δ. Moreover, V ∗ would remain unchanged if we allowed player 1 to use private strategies and we allowed both players to use mixed strategies. A.3. Proofs of recursive characterization Proving propositions A.1 and A.2 requires a number of technical lemmas. Because many of the proof steps are standard, we omit some details and refer the interested reader to Mailath and Samuelson (2006) instead. Lemma A.3 (One-shot deviation principle). An individually rational strategy profile is a PPE if and only if it admits no profitable one-shot deviations. Proof. This lemma is analogous to Proposition 2.2.1 in (Mailath and Samuelson, 2006, pp 25), and the corresponding arguments can be easily adapted to our setting. The structure of the argument is as follows. Fix an individually rational strategy profile. Suppose that there is a profitable deviation, and let v be the difference in values between the proposed strategy profile and the profitable deviation. Since the set of feasible individually rational payoffs of the stage-game is bounded, we know that there is some number T such that the payoffs after T periods amount to less than kvk/2. Thus, there must also be a profitable deviation of length at most T . If the deviation in the last period is profitable, then the proof is complete. If not, then there is a profitable deviation of length at most T − 1 periods. By induction, this implies that there is a profitable deviation of length 1. Lemma A.4 (Self-generation). If V ⊆ R2 is bounded and self-generating, then V ⊆ V ∗. 25 Proof. Take any point v ∈ V. Since v ∈ V ⊆ V ∗ , Carateheodory’s theorem implies that the value profiles b1v , b2v , b3v ∈ ∪s Bs (V) exist, and a vector of weights (λ1v , λ2v , λ3v ) ∈ ∆3 such that v = P3 n=1 λnv bnv . For each bnv , since bnv ∈ ∪s Bs (V), there exist snv and a profile of continuation values wvn such that bnv = g(snv , wvn ), and (snv , wvn ) is admissible w.r.t. V. Now, fix some v ∗ ∈ V. We will construct a PPE σ ∗ such that v ∗ = v(σ ∗ ). For that purpose, we will construct a sequence of (public) history-dependent continuation values vt0 : Ht → V such that vt0 (ht ) does not depend on θt . Along the equilibrium path, σ ∗ is defined as a function of v 0 : σt∗ (ht ) = s1v0 (ht ) t if θt ≤ λ1v0 (ht ) t s2 0 vt (ht ) λ1v0 (ht ) t if s30 < θt ≤ λ2v0 (ht ) . t if θt > λ2v0 (ht ) vt (ht ) t Along the equilibrium path, vt0 is defined recursively with v10 (h1 ) ≡ v ∗ and: 1 wvt0 (ht ) (ξt ) if θt ≤ λ1v0 (ht ) t 0 vt+1 (ht+1 ) = wv20 (h ) (ξt ) t if λ1v0 (ht ) < θt ≤ λ2v0 (ht ) . t w 30 t vt (ht ) (ξt ) t if θt > λ2v0 (ht ) t After any observable deviation from σ ∗ , agents turn to the autartic equilibrium with σt∗ (ht ) = (0, 0, 0) and vt∗ = −π 0 c for any subsequent public history ht . It is straightforward to see that vt0 , and thus σt∗ , are measurable. For every public history ht along the equilibrium path, by construction we have that vt0 (ht ) = 3 X n=1 λnvt0 (ht ) g snvt0 (ht ) , wvnt0 (ht ) = (1 − δ)Et " u(σt∗ (ht )) = (1 − δ)Et " ∞ X τ =0 δ τ + δ)u(σt∗ (ht )) = Et (1 − ∗ δu(σt+1 (ht+1 )) # ∗ u(σt+τ (ht+τ )) + 0 δvt+1 (ht+1 ) δ2 0 v (ht+2 ) + 1 − δ t+2 # h i 0 + lim δ τ Et vt+τ (ht+τ ) = vt (σ ∗ , ht ) . τ →∞ Hence, we have that v ∗ = v10 (h1 ) = v(σ ∗ ). Finally, since actions and continuation values are admissible at every period, we know that there are no profitable single deviations. (Recall that the conditions (A) and (M) that define admissibility are 26 precisely the requirement that there should be no one-shot deviations at the avoidance and mitigation stages, respectively). By Lemma A.3, this implies that σ ∗ is a PPE. Lemma A.5 (Factorization). V ∗ is self-generating Proof. Fix an arbitrary point v ∗ ∈ V ∗ , and let σ ∗ be the PPE that generates it. For each possible realization of the date-1 public signal (θ1 ) ∈ [0, 1], let σ ′ |θ1 denote the corresponding continuation strategy from period t1 onward (assuming that there are no detectable deviations), and let wθ1 = v(σ ′ |θ1 ). Since σ ∗ is measurable, it follows that wθ1 is also measurable. Since σ ∗ is a PPE, we know that wθ1 ∈ V ∗ for all θ1 . Moreover, Lemma A.3 implies that there are no profitable single-shot deviations from σ ∗ on period 1. Therefore, (σ1∗ (θ1 ), wθ1 ) is admissible w.r.t. V ∗ for each realization of θ1 . Hence, g (σ1∗ (θ1 ), wθ1 ) ∈ Bσ1∗ (θ1 ) (V ∗ ). This implies that: v∗ = Z 0 1 g (σ1∗ (θ1 ), wθ1 ) dθ1 ∈ co [ ! Bs (V ∗ ) = B(V ∗ ), s thus completing the proof. Lemma A.6. If V is compact, then B(V) is compact. Proof. Fix some a ∈ {0, 1}. We will start by showing that Ba (V) := ∪m Ba,m (V) is compact. Consider any sequence (v n ) in Ba (V) converging to some v ∗ ∈ R2 . By construction, sequences (mn ) and (wn ) exist such that v n = g(a, mn , wn ), and (a, mn , wn ) is admissible w.r.t. V. Since it is contained in a compact space, the sequence (mn , wn ) has a subsequence converging to some limit (m∗ , w∗ ). Since V and R2+ are closed, we know that m∗ ∈ R2+ and w∗ ∈ V. Since g is continuous, we know that v ∗ = g(a, m∗ , w∗ ). Since the incentive constraints (A) and (M) are defined by continuous functions, we know that (a, m∗ , w∗ ) is admissible w.r.t. V. Hence, v ∗ ∈ Ba (V). Since this was for arbitrary convergent sequences, this means that Ba (V) is closed. Now, since the payoffs of the stage game are all nonpositive and V is bounded, then Ba (V) is bounded above. Since admissibility implies that the values have to be conditionally individually rational, it is also bounded below. Hence, Ba (V) is compact. Since a finite union of compact sets is compact, we have that ∪s Bs (V) = ∪a Ba (V) is compact. The result then follows from the fact that the convex hulls of compact sets are compact. 27 Proof of proposition A.1. Since B is ⊆-monotone by construction, Lemma A.4 implies that V ∗ contains the union of all self-generating sets. By Lemma A.5, this implies that V ∗ is the largest self-generating set. Since B(V) is convex for any V by construction, Lemma A.5 also implies that V ∗ is convex. Now, fix any bounded set V such that 0 V ∗ ⊆ V. Let V̄ be the closure of V, and define the sequence {V n }∞ n=1 by V = V̄ and V n+1 = B(V n ) for n = 1, 2, . . .. By definition of B and Lemma A.6, we know that V n is a ⊆-decreasing sequence of compact sets and therefore has a (Hausdorff) limit V ∞ = ∩n V n , and this limit is compact. Since B is ⊆-monotone and V ∗ is selfgenerating, we know that V ∗ = B n (V ∗ ) ⊆ B n (V 0 ) = V n for all n, and thus V ∗ ⊆ V ∞ . It remains to show that V ∞ is self-generating. For this purpose, we combine the proofs from lemmas A.4 and A.6. Consider any v ∗ ∈ V ∞ . By construction we know n that v ∗ ∈ B(V n ) for all n. Therefore, sequences (bnk , λnk , snk , wnk )3k=1 ∗ that v = P3 k=1 nk nk λ b ,b nk nk nk nk nk o∞ n=1 exist such = g(s , w ) and (s , w ) is admissible w.r.t. V n for all n. Since it is contained in a compact space, the sequence {bnk , λnk , snk , wnk (0), wnk (1, mnk )} has a subsequence converging to some limit (b∗k , λ∗k , m∗k , w∗k ). Since all the relevant sets are closed, we know that the limit belongs to the set where we want it to be. Since g is continuous, we know that b∗k = g(s∗k , w∗k ). Since the incentive constraints are defined by continuous functions, we know that (s∗k , w∗k ) is admissible w.r.t. V ∞ . It is straightforward to see that v ∗ = P3 k=1 ∗ λ∗k b∗k ∈ B(V ∞ ). Therefore V ∞ is self- generating and, by Lemma A.4, V ∞ ⊆ V . We are now in a position to prove our claim about the restriction to pure-public strategies being without loss of generality. One could extend the definition of equilibrium in the obvious way to allow for mixed strategies that depend on private information. The following proposition states that the set of equilibrium payoffs would not change. The reason for this is because the new set would be self-generating in the original sense, and thus it would be contained in V. Proof of proposition A.2. V ∗ is nonempty because unconditional static repetition of a Nash equilibrium of the stage game constitutes a PPE. Compactness and convexity follow directly from the first part of the proof of Proposition A.1. For δ-monotonicity, it is easy to see from the definition of B that, if V ∗ ⊆ V, then B(V) is ⊆-monotone with respect to δ. Hence, the set of PPE payoffs is also ⊆-monotone with respect to δ. 28 Finally, it remains to argue that the restriction to pure strategies is without loss of generality. The complete proof is technical and burdensome. Hence, we only present a sketch of the proof, but a formal proof can be provided upon request. The definitions of equilibrium and v(σ) can be easily extended to allow for mixed and private strategies in the obvious way. Let Ṽ be the corresponding set of equilibrium payoffs with the modified definitions. Fix some v ∗ ∈ Ṽ, and let σ ∗ be the (possibly mixed or private) strategy profile that generates it and constitutes an equilibrium. Now delegate all the randomization to θ1 , define continuation values in the obvious way, and show that the resulting pairs (σ|θ1 , w|θ1 ) are admissible in accordance with Definition 3 w.r.t. Ṽ . Intuitively, this occurs because R2+ is convex and thus there is no need to randomize mitigation contributions. Moreover, m1 and m2 are chosen after observing a, and thus there is no need to randomize the avoidance action. This implies that Ṽ is self-generating and is thus contained in V ∗ . B. Proofs of the main results B.1. Preliminaries Throughout this section, we use the notation ha, m1 , m2 i to denote the stationary strategy profile for the repeated game that consists of repeating (a, m1 , m2 ) in every period and after any public history. From the analysis in Section 2, we know that (0, m1 , 1 − m1 ) is a NE of the stage game as long as m1 ∈ [1 − c2 , c1 ]. Therefore, h0, m1 , 1 − m1 i is a PPE as long as m1 ∈ [1 − c2 , c1 ]. We use this fact repeatedly in the subsequent proofs. Each agent i can guarantee a minmax payoff of −π 0 ci by never making any positive contributions and, if i = 1, then never taking the avoidance action. Hence, every equilibrium payoff v ∈ V ∗ must satisfy the individual rationality conditions vi ≥ −π 0 ci , for i = 1, 2. The set of feasible and individually rational payoffs corresponds to the shaded area in Figure 3. Each of the diagonal lines in the figure corresponds to the feasible payoffs that can be attained with efficient mitigation with and without taking the avoidance action, respectively. The thick blue line corresponds to the set of equilibrium payoffs that can be achieved by unconditional repetitions of static Nash 29 equilibria of the stage game. v2 v1 + v2 = −π 1 v1 u(1, c1 , 1 − c1 ) b v1 + v2 = −π 0 u(0, c1 , 1 − c1 ) b v2 = −π 0 c2 b b u(0, c2 , 1 − c2 ) u(1, c2 , 1 − c2 ) v1 = −π 0 c1 Figure 3: Feasible and individually rational payoffs, and stationary PPE payoffs. B.2. Proof of Proposition 4.1 Let σ ∗ = (α∗ , µ∗1 , µ∗2 ) be a PPE and fix a history (ht with αt∗ (ht ) = 1, (if there are no such PPE and histories, then the proposition is true). For σ ∗ to be an equilibrium, it must satisfy the feasibility and incentive constraints from Section A.1. First, the feasibility condition (F) implies that v1t (σ ∗ , ht ) = −(1 − δ)d + π 1 (1 − δ)k1 (m∗1 , m∗2 ) + δw1∗ (1) + (1 − π 1 )δw1∗ (0), (8) where m∗i = µ∗it (ht ) is agent i’s equilibrium mitigation contribution for date-t according to σ ∗ , and wi∗ (ξ) are his equilibrium continuation values. The avoidance constraint 30 (A) can be written as ˆ (1 − δ)k1 (m∗1 , m∗2 ) + δw1∗ (1) ≤ δw1∗ (0) − (1 − δ)d. (9) Combining this constraint with the mitigation constraint (M), we have that: δw1∗ (0) − (1 − δ)dˆ ≥ −(1 − δ + δπ 0 )c1 . δπ dˆ ≥ −δw1∗ (0) − (1 − δ + π 0 δ) c1 − dˆ . 0 ⇒ (10) Solving for ((1 − δ)k1 (m∗1 , m∗2 ) + δw1∗ (1)) in (8), substituting in (9), and rearranging terms yields: δπ dˆ ≤ 0 δ2 δ w1∗ (0) − v1t (σ ∗ , ht ). 1−δ 1−δ ! ! (11) Combining (10) and (11) and doing some more algebra we obtain: wi∗ (0) ∗ ≥ v1t (σ , ht ) + γ, 1−δ γ := (1 − δ + π δ) δ 0 ! dˆ − c1 . (12) Hence, we have established that whenever the active agent chooses a = 1 in equilibrium and there is no crisis, his expected value must increase at least by a fixed factor γ. The assumption dˆ > c1 guarantees that γ > 0. This implies that, if there is no crisis for n subsequent periods and the active agent keeps choosing a = 1 with probability 1, then we must have win > v1t (σ ∗ , ht ) + nγ, where win is agent 1’s expected discounted value at period t + n. Since the set of feasible payoffs of agent 1 is bounded above by 0, it must be the case that after a long enough history of no crisis, agent 1 takes the nonavoidance action. Otherwise, win would be greater than 0. On the other hand, since π 1 , π 0 ∈ (0, 1), any finite sequence of no crisis occurs with positive probability. Therefore, taking the avoidance action at every period with probability 1 cannot be part of a PPE. 31 B.3. Proof of Proposition 4.2 The central step of the proof of Proposition 4.2 is to establish Lemma B.1 below. The lemma can be understood as follows. In equilibrium, starting from a history where either crises are not mitigated (m1 + m2 < 1) or money is burned (m1 + m2 > 1), it is possible to make a Pareto improvement that makes agent 2 strictly better off, while keeping the payoff of agent 1 constant. Lemma B.1. If (s0 , w0 ) is admissible w.r.t. the set of PPE V ∗ and m1 + m2 6= 1, then v ′ ∈ V ∗ exists such that v1 = g1 (s0 , w0 ) and v2 > g2 (s0 , w0 ), where g1 and g2 are defined as in (4). Proof. There are three different cases to consider. Case 1.— Suppose that crises are not mitigated, i.e., m01 + m02 < 1. Consider the alternative action profile s′ = (a′ , m′1 , m′2 ) with a′ = a0 , m′1 = m01 + c1 , and m′2 = min{0, 1 − m′1 }. Note that, m′1 + m′2 = 1, that is, crises are mitigated according to the new action profile. Hence, the cost in case of a crisis for the active agent remains unchanged, that is, k1 (m′1 , m′2 ) = −m′1 = −m01 − c1 = k1 (m01 , m02 ), where k1 is the crisis cost function as defined in (5). In contrast, the cost in case of a crisis for the active agent goes down since k2 (m′1 , m′2 ) = − min{0, 1 − m′1 } ≥ −(1 − c1 ) > −c2 ≥ k2 (m01 , m02 ), where the strict inequality follows from Assumption 1. This implies that g1 (s′ , w0 ) = g1 (s0 , w0 ) and g2 (s′ , w0 ) > g2 (s0 , w0 ). Moreover, since the mitigation contributions only enter the incentive constraints (A) and (M) through ki , the pair (s′ , w0 ) is admissible w.r.t. V ∗ . Thus, by Proposition A.1, g(s′ , w0 ) ∈ V ∗ . Case 2.— Suppose that the passive agent burns money in case of a crisis, i.e., m01 + m02 > 1 and m2 > 0. Consider the alternative action profile s′ = (a′ , m′1 , m′2 ) with a′ = a0 , m′1 = m01 , and m′2 = min{0, 1 − m′1 }. As in the previous case, we have k1 (m′1 , m′2 ) = k1 (m01 , m02 ), and k2 (m′1 , m′2 ) < k2 (m01 , m02 ). Moreover, this implies that g(s′ , w0 ) ∈ V ∗ , g1 (s′ , w0 ) = g1 (s0 , w0 ) and g2 (s′ , w0 ) > g2 (s0 , w0 ). Case 3.— Suppose that only the active agent burns money in case of a crisis, 32 i.e., m01 > 1 and m02 = 0. We begin by showing that, in this case, the equilibrium continuation value for the active agent in case of a crisis cannot be the maximum equilibrium value, i.e., we must have w20 (1) < max{v2 | v ∈ V ∗ }. For that purpose, consider the alternative continuation value profile w′ and the alternative action profile s′ = (a′ , m′1 , m′2 ) with a′ = 1, m′1 = m01 , m′2 = 0, and wi′ (ξ) = wi0 (1) for i = 1, 2 and ξ = 0, 1. Since w1′ (1) = w1′ (0), the avoidance constraint (A) for (s′ , w′ ) can be written as dˆ ≤ −k1 (m0 , m0 ). It is satisfied because −k1 (m0 , m0 ) = 1 m01 2 1 2 > 1, and Assumption 2 requires that dˆ < 1. The mitigation contributions and continuation values after a crisis are the same under (s′ , w′ ) and (s0 , w0 ). Hence, we know that (s′ , w′ ) satisfies the mitigation constraints (M) for both agents. Hence, (s′ , w′ ) is admissible w.r.t. V ∗ and, by Proposition A.1, g(s′ , w′ ) ∈ V ∗ . Since w2′ (0) = w2′ (1) = w20 (1), it follows that g2 (s′ , w′ ) = (1 − δ)u2 (s′ ) + δw20 (1) = δw20 (1). Now, it is easy to show that there are no equilibria where agent 1 always mitigates crises on its own. This implies that w20 (1) < 0 and, therefore, g2 (s′ , w′ ) < w20 (1). Hence, v ∗ ∈ V ∗ exists such that v2∗ > w20 (1). Now, we can return to showing that (s0 , w0 ) is inefficient. For each ε ∈ (0, 1), consider the alternative continuation value profile wε and the alternative action profile sε = (aε , mε1 , mε2 ) with aε = a0 , mε2 = 0, mε1 = m01 + δ ε(v1∗ − w10 (1)), 1−δ wiε (0) = wi0 (0), and wiε (1) = (1 − ε)wi0 (1) + εvi∗ , for i = 1, 2. Now, fix any ε sufficiently small so that mε1 > 1. Note that mε1 was chosen specifically so that δ ε(v ∗ − w10 (1)) + δ (1 − ε)wi0 (1) + εvi∗ 1−δ 1 = −(1 − δ)m01 + δwi0 (1). −(1 − δ)mε1 + δw1ε (1) = −(1 − δ) m01 + 33 Hence, both the avoidance (A) and the mitigation constraint (M) for the active agent are satisfied by (sε , wε ) and g1 (sε , wε ) = g1 (s0 , w0 ). As for the passive agent, since k2 (mε1 , mε2 ) = k2 (mε1 , mε2 ), w2ε (0) = w20 (0) and w2ε (1) > w20 (1), we know that his mitigation constraint is satisfied by (sε , wε ), and g2 (sε , wε ) > g2 (s0 , w0 ). Also, by Proposition A.2, we know that V ∗ is convex and thus wε (1) = (w1ε (1), w2ε (1)) ∈ V ∗ . Hence, (sε , wε ) is admissible w.r.t. V ∗ and thus, by Proposition A.1, g(sε , wε ) ∈ V ∗. With Lemma B.1, it is easy to prove Proposition 4.2. Proof of Proposition 4.2. Let σ ∗ be a PPE, and let H 0 ⊆ ∪∞ t=1 Ht be the (possibly empty) set of public histories ht such that (a) µ∗1t (ht ) + µ∗2t (ht ) 6= 1 and (b) µ∗1t (h′t ) + µ∗2t′ (ht ) = 1 for every public history h′t , which is a predecessor of ht . By Lemma B.1, we know that for every such history ht ∈ H 0 , a strategy profile σ ht exists such that v1 (σ ht ) = v1t (σ ∗ , ht ) and v2 (σ ht ) > v2t (σ ∗ , ht ). Let σ ′ be the strategy profile that mimics σ ∗ until it reaches a public history ht ∈ H 0 and follows σ ht from then onward (treating ht as the empty history). Since continuation values for the passive agent remain unchanged, and continuation values for the active agent only go up, it follows that σ ′ is a PPE. Finally, if H 0 is nonempty and is reached with positive probability, then v1 (σ ′ ) = v1 (s∗ ) and v2 (σ ′ ) > v2 (s∗ ). B.4. Proof of Proposition 4.3 Proof. We will show that in a PPE where there are no bailouts, the active agent always takes the no-avoidance action almost surely. Consider a strategy profile with no bailouts, i.e., such that for almost every history (ht ), either µ1t (ht ) ≥ c1 or µ1t (ht )+ µ2t (ht ) < 1. This implies that u1 (σt (ht )) ≤ −d − π 1 c1 for histories with αt (ht ) = 1, and u1 (σt (ht )) ≤ −π 0 c1 for histories with αt (ht ) = 0. Assumption 2 implies that −d − π 1 c1 < −π 0 c1 , and thus v1 (σ) ≤ −π 0 c1 , with strict inequality whenever αt (ht ) = 1 for some history set of histories {ht }, which is reached with positive probability along the equilibrium path. Individual rationality requires v1 (σ) ≥ −π 0 c1 . Hence, a strategy profile with no bailouts can satisfy individual rationality only if αt (ht ) = 0 almost surely along the equilibrium path. 34 B.5. Proof of Proposition 4.4 The proof of Proposition 4.4 makes use of three lemmas. Lemma B.2 simply asserts that, whenever it is possible to have avoidance, it is possible to do it in a way that dominates the best static Nash equilibria of the stage game in terms of total cost. Lemma B.2. If it is possible for agent 1 to choose the avoidance action at least once in at least one PPE, then a PPE exists with total cost less than π 0 , i.e., if a PPE σ ∗ and a public history ht exist such that αt∗ (ht ) = 1, then (v1 , v2 ) ∈ V ∗ exists such that v1 + v2 > −π 0 . Proof. Suppose it is possible for agent 1 to choose the avoidance action at least once in at least one PPE. Then, by Proposition A.1, there exist a profile of continuation values w∗ and a profile of actions s∗ = (a∗ , m∗1 , m∗2 ) with a∗ = 1 such that (s∗ , w∗ ) is admissible w.r.t. V ∗ . If either w1∗ (0) + w2∗ (0) > −π 0 or w1∗ (1) + w2∗ (1) > −π 0 , then the proof is complete. Hence, for the rest of the proof, we assume that w1∗ (0) + w2∗ (0) ≤ −π 0 and w1∗ (1) + w2∗ (1) ≤ −π 0 . Individual rationality implies that wi∗ (0) ≥ −π 0 ci for i = 1, 2. Hence, we know that −π 0 c1 ≤ w1∗ (0) ≤ −π 0 (1 − c2 ). This implies that m01 ∈ [1 − c2 , c1 ] exists such that w1∗ (0) = −π 0 m01 (See Figure 3). Now, consider the alternative continuation value profile w′ and the alternative action profile s′ = (1, c1 , 1 − c1 ), w1′ (1) = −π 0 c1 , w2′ (1) = −π 0 (1 − c1 ), w1′ (0) = −π 0 m01 and w2′ (0) = −π 0 (1 − m01 ). Since h0, c1 , 1 − c1 i and h0, m01 , 1 − m01 i is a PPE, we know that w′ (ξ) ∈ V ∗ for ξ = 0, 1. Also, it is straightforward to verify that (s′ , w′ ) satisfies the mitigation constraints (M) for both agents. In order to show that (s′ , w′ ) satisfies the avoidance constraint (A), first consider the pair (s∗ , w∗ ). Since (s∗ , w∗ ) is admissible w.r.t. V ∗ , it must satisfy the mitigation constraint (1 − δ)k1 (m∗1 , m∗2 ) + δw1∗ (1) ≥ −(1 − δ + δπ 0 )c1 , and the avoidance constraint ˆ (1 − δ)k1 (m1 , m∗2 ) + δw1∗ (1) ≤ δw1∗ (0) − (1 − δ)d. 35 Together, these two constrains imply that ˆ −(1 − δ + δπ 0 )c1 ≤ δw1∗ (0) − (1 − δ)d, which is precisely the avoidance constraint for (s′ , w′ ). We have shown that (s′ , w′ ) is admissible w.r.t. V ∗ . By Proposition A.1, this implies that g(s′ , w′ ) ∈ V ∗ . Finally, note that g1 (s′ , w′ ) + g2 (s′ , w′ ) = (1 − δ)(−π 1 ) − δπ 0 < −π 0 , where the last inequality follows from Assumption 2. Lemma B.3, is the crucial step of the proof. It states that if it is possible to have avoidance then, starting from any nonavoidance PPE, it is possible to improve the payoff of the passive agent without affecting the payoff of the active agent. Since continuation values of efficient PPE must always lay on the upper boundary of V ∗ , this implies that avoidance must happen infinitely often. By Proposition 4.3, this implies that bailouts must happen infinitely often as well. Lemma B.3. If a PPE exists with total cost less than π 0 , then every PPE without avoidance is Pareto dominated by a different PPE that increases the payoff of agent 2 while keeping the payoff of agent 1 unchanged, i.e., for every PPE σ 0 such that αt0 (ht ) = 0 almost surely, v ′ ∈ V ∗ exists such that v1′ = v1 (σ 0 ) and v2′ > v2 (σ 0 ). Proof. Further below we will show that v 1 , v 2 ∈ V ∗ exist such that vii = −π 0 ci , and i v−i > −π 0 (1 − ci ), for i = 1, 2. For now, take this fact as given, and let σ 0 be PPE without avoidance. Since αt0 (ht ) = 0 for all ht , we know that v1 (σ 0 ) + v2 (σ 0 ) ≤ −π 0 . Moreover, individual rationality implies that −π 0 c1 ≤ v1 (σ 0 ) ≤ −π 0 (1 − c2 ). Therefore, some µ ∈ (0, 1) exists such that v1 (σ 0 ) = µv11 + (1 − µ)v12 . Let v µ = µv 1 + (1 − µ)v 2 . By construction, we know that v1µ + v2µ > −π0 , which implies that v2µ > v2 (σ 0 ), see Figure 4. The result then follows because V ∗ is convex (Proposition A.2) and, consequently, v µ ∈ V ∗ . It only remains to show the existence of v 1 and v 2 . We will only show existence of v 1 . The proof for v 2 is analogous. Let v ∗ ∈ V ∗ be a PPE with total expected discounted cost less than π 0 , i.e. such that v1∗ + v2∗ > −π 0 . Individual rationality implies that v1∗ ≥ −π 0 c1 If v1∗ = −π 0 c1 , then we can simply set v 1 = v ∗ . For the rest of the proof, we consider the case that v1∗ = −π 0 c1 + ∆ for some ∆ > 0. 36 v2 v1 + v2 = −π 1 v1 v1 b v1 + v2 = −π 0 v µ = µv 1 + (1 − µ)v 2 b b v(σ 0 ) v2 = −π 0 c2 b v2 v1 = −π 0 c1 Figure 4: PPE without avoidance result in payoff profiles within the shaded area, all of which are dominated by convex combinations of v 1 and v 2 . Fix any λ ∈ (0, λ̄), where ) ( 1−δ (1 − c1 ) > 0. λ̄ := min 1, δ∆ Let v λ = (1 − λ)u(0, c1 , 1 − c1 ) + λv ∗ . In particular, this implies that v1λ = −π 0 c1 + λ∆. Since h0, c1 , 1−c1 i is a PPE and V ∗ is convex (Proposition A.2), we know that v λ ∈ V ∗ for all λ ∈ (0, 1). Consider the action profile sλ = (aλ , mλ1 , mλ2 ), with aλ = 0, mλ1 = c1 + ε, and mλ2 = 1 − c1 − ε, where ε := ! δ λ∆ > 0. 1−δ 37 Also, consider the profile of continuation values wλ with wiλ (0) = ui (0, c1 , 1 − c1 ), and wiλ (1) = viλ , for i = 1, 2. In words, (sλ , wλ ) represents the following plan, see Figure 5. If there is no crisis, then the play transitions to h0, c1 , 1 − c1 i forever. If a crisis occurs, the play transitions to a mixture of h0, c1 , 1 − c1 i and the strategies that generate v ∗ . As we show below, the value of ε is carefully selected to guarantee that all the efficiency gains from avoidance go to agent 2, i.e., in order to have g1 (sµ , wµ ) = −π 0 c1 . u(0, c1 + ε, 1 − c1 − ε) b v1 b u(0, c1 , 1 − c1 ) b b vλ b v1 = −π 0 c1 v∗ v1 + v2 = −π 0 Figure 5: v 1 is a mixture of u(0, c1 + ǫ, 1 − c1 − ǫ) on the first period, u(0, c1 , 1 − c1 ) from the second period onward if there is no crisis, and v λ from the second period onward in case of a crisis. The condition λ < λ̄ guarantees that c1 + ε < 1, so that mλ2 + mλ1 = 1, and mλ2 is a static best response to mλ1 . This implies that agent 2’s mitigation constraint (M) for (sλ , wλ ) is satisfied. Also, it implies that (1 − δ)k1 (mλ1 , mλ2 ) + δw1λ (1) = −(1 − δ)(c1 + ε) + δv1λ = −(1 − δ + δπ 0 )ci . This implies that agent 1’s mitigation constraint and his avoidance constraint (A) are also satisfied, and that g1 (sµ , wµ ) = −π 0 c1 . Therefore, (sλ , wλ ) is admissible w.r.t. V ∗ and, by Proposition A.1, g(sλ , wλ ) ∈ V ∗ . Finally, note that h i g1 (sµ , wµ ) + g2 (sµ , wµ ) = π 0 (1 − δ)u1 (sλ ) + δv λ − (1 − π 0 )δπ 0 > −π 0 . Since g1 (sµ , wµ ) = −π 0 c1 , this implies that g2 (sµ , wµ ) > −π 0 (1 − c1 ), thus completing 38 the proof. Finally, Lemma B.4 shows that it is not possible to have avoidance when the discount factor is very low, and it is possible to do so when it is very high. This, together with the monotonicity of the set of PPE payoffs with respect to the discount factor, implies the existence of the threshold δ ∗ —strictly between 0 and 1—separating a region where no avoidance is possible, from a region where avoidance and bailouts happen infinitely often in all efficient equilibria. Lemma B.4. There exist numbers 0 < δ < δ̄ < 1 such that if δ ≤ δ, then there is no avoidance in any PPE and if δ > δ̄, then a PPE exists where the avoidance action is played with positive probability along the equilibrium path. Proof. Let δ̄ and δ be the bounds for the discount factor given by δ̄ := dˆ − c1 and dˆ − c1 + π 0 (c1 + c2 − 1) δ := dˆ − c1 dˆ − c1 + π 0 c1 . Assumptions 1 and 2 require that dˆ > c1 , c1 + c2 > 1, and c2 < 1. These conditions imply that 0 < δ < δ̄ < 1. We begin by showing that, if δ > δ̄, then a PPE exists where the avoidance action is played with positive probability. Let σ 0 be the public strategy profile described as follows. On the first period, σ10 (h1 ) = (1, c1 , 1 − c1 ) for all h1 ∈ H1 . If a crisis occurs on the first period, then σt0 (ht ) = (0, c1 , 1 − c1 ) for every subsequent history ht . Otherwise, if there is no crisis on period one, then σt0 (ht ) = (0, 1 − c2 , c2 ) for every subsequent history ht . We will show that σ 0 is a PPE as long as δ ≥ δ̄. For t > 1, σ 0 consists of unconditional repetition of static Nash equilibria of the stage game. Hence, the continuation strategies are PPE, and we only need to check the incentive constraints from section A.1 for date t = 1. If there is a crisis in period 1, the agents’ contributions are static mutual best responses and do not affect the continuation value. Therefore, the mitigation constraints (M) are satisfied. The avoidance constraint (A) can be written as 0 ≤ dˆ + k1 (c1 , 1 − c1 ) + = dˆ − c1 − δ u1 (0, 1 − c2 , c2 ) − u1 (0, c1 , 1 − c1 ) 1−δ δ π 0 (c1 + c2 − 1). 1−δ 39 Rearranging terms, it is straightforward to show that this is equivalent to δ ≥ δ̄. Now, we will show that if δ ≤ δ, then there cannot be any avoidance in any PPE. For that purpose, suppose that σ ∗ is a PPE with αt∗ (ht ) = 1 for some history ht . As in the proof of Proposition 4.1, the mitigation (M) and avoidance (A) constraints for the active agent after ht can be written as (1 − δ)k1 (m∗1 , m∗2 ) + δw1∗ (1) ≥ (1 − δ + δπ 0 )c1 ˆ (1 − δ)k1 (m∗1 , m∗2 ) + δw1∗ (1) ≤ δw1∗ (0) − (1 − δ)d. Together, they imply that (1 − δ)dˆ ≤ (1 − δ + δπ 0 )c1 + δw1∗ (0) ≤ (1 − δ + δπ 0 )c1 , where the last inequality follows from w1∗ (0) ≤ 0. After some simple algebra, this condition is equivalent to δ ≥ δ. Hence, in order for a PPE with avoidance to exist, it cannot be the case that δ < δ. Now, we are in a position to prove Proposition 4.4. Proof of Proposition 4.4. Let D ⊆ (0, 1) be the set of discount factors where it is possible to play avoidance, and let δ̃ = inf D. Lemma B.4 implies that D 6= ∅. Lemma B.2 implies that for any δ ∈ D, a PPE exists with total cost below π 0 . Since the set of PPE payoffs is monotone with respect to the discount factor (Proposition A.2), the same is true for every δ ′ > δ̃. Since obtaining a cost below π 0 requires avoidance, this implies that (δ̃, 1) ⊆ D. Lemma B.4 implies that 0 < δ ≤ δ̃ ≤ δ̄ < 1. For the remainder of the proof, fix any δ > δ̃, and let σ ∗ be a PPE given δ. We are interested in the first time after which there is no more avoidance according to σ ∗ . For that purpose, let H 0 be the (possibly empty) set of histories h0t after which there is no more avoidance, i.e., such that a∗t′ (h′t′ ) = 0 for every h′t′ that follows h0t . Let H ′ ⊆ H be the set of histories h′t such that there is no h0t ∈ H 0 , which is a strict predecessor of h′t . ′ For every history h′ ∈ H ′ , let σ h be the continuation strategies from h′t onward. Since δ ∈ D, we know by Lemma B.4 that the consequence of Lemma B.3 applies. ′ ′ ′ ′ ′ Hence, v h ∈ V ∗ exists such that v1h = v1 (σ h ) and v2h > v2 (σ h ). Now, let σ ′ be the strategy profile that mimics σ ∗ before it reaches any history 40 ′ ′ h′ ∈ H ′ , and then it follows the strategies that support v h instead of σ h . Since the continuation values after these histories for the active agent remain unchanged, and the continuation values for the passive agent increases, all the previous incentive constraints are still satisfied. Hence, σ ′ is a PPE that (weakly) Pareto dominates σ ∗ , and strictly Pareto dominates it if H ′ is reached with positive probability. Therefore, we can conclude that if σ ∗ is constrained efficient, then there is avoidance infinitely often almost surely. By Proposition 4.3, this implies that there are also bailouts infinitely often almost surely. B.6. Proofs for the observable-actions model Proof or Proposition 6.1. Proposition 6.1 is the analogous of Proposition 4.3 for the observable-actions case. Note that the proof of Proposition 4.3 did not make use of the fact that avoidance actions were unobservable. Hence, a completely analogous argument serves to prove Proposition 6.1. Proof or Proposition 6.2. Let δ̃ ′ ∈ (0, 1) be the threshold given by δ̃ ′ := d + π 1 (1 − c2 ) − π 0 c1 . d + π 1 (1 − c2 ) − π 0 c1 + π 1 (d + π 0 (c1 + c2 ) − π 1 ) (13) Let σ ∗ be the grim-trigger strategy profile described in section 6.1. We will show that, if δ ≥ δ̃ ′ , then σ ∗ is a PPE of the repeated game with observable avoidance actions. The single-deviation principle still applies, and hence it suffices to search for deviations at a single period. The strategies after a deviation are a repetition of a static Nash equilibrium of the stage game, and thus constitute a PPE. Along the equilibrium path, it is optimal for the active agent to choose the avoidance action if and only if: −(d + π 1 m∗1 ) ≥ −π 0 c1 . The mitigation incentive constraint for agent 1 is: −(1 − δ)m∗1 − δ(d + π 1 m∗1 ) ≥ −(1 − δ)c1 − δπ 0 c1 , 41 and for agent 2 it is: −(1 − δ)m∗2 − δπ 1 m∗2 ≥ −(1 − δ)c2 − δπ 0 c2 , where m∗2 = 1 − m∗1 . The mitigation constraint for agent 2 is satisfied if and only if: m∗2 ≤ 1 − δ + δπ 0 c2 . 1 − δ + δπ 1 ! Hence, we can choose m∗2 to satisfy this condition with equality, and incentives for player 2 are automatically satisfied. Then, using the fact that m∗1 + m∗2 = 1, we have that: m∗1 1 − δ + δπ 0 c 2 < 1 − c 2 < c1 . =1− 1 − δ + δπ 1 ! This implies that c1 − m∗1 > 0, and thus, after some simple algebra, the mitigation constraint for 1 is satisfied whenever the avoidance constraint is satisfied. This implies that the proposed strategy profile is a PPE if and only if: 1 d+π −π 1 1 − δ + δπ 0 c2 ≤ π 0 c1 , 1 − δ + δπ 1 ! which, after some algebra, is equivalent to δ ≥ δ̃ ′ . B.7. Proofs for the monetary-transfers model Proof or Proposition 6.3. Proposition 6.3 is the analogous of Proposition 4.3 for the monetary-transfers model. The argument of the proof is analogous. Consider a strategy profile with no bailouts or monetary compensation, i.e., such that for almost every history (ht ), βt0 (ht ) = 0 and βt1 (ht ) − µ1t (ht ) ≤ −c1 . This implies that the per-period utility for the active agent is less or equal than −d − π 1 c1 for histories with αt (ht ) = 1, and less or equal than −π 0 c1 for histories with αt (ht ) = 0. Assumption 2 implies that −d − π 1 c1 < −π 0 c1 , and thus v1 (σ) ≤ −π 0 c1 , with strict inequality whenever αt (ht ) = 1 for some history set of histories {ht }, which is reached with positive probability along the equilibrium path. Individual rationality requires v1 (σ) ≥ −π 0 c1 . Hence, a strategy profile with no bailouts can satisfy individual rationality only if 42 αt (ht ) = 0 almost surely along the equilibrium path. Proof of Proposition 6.4. Let σ ∗ be the grim-trigger strategy profile described in section 6.2, with the constants b∗ and m∗1 taking the values m∗1 := c1 + δ(1 − π 0 )(dˆ − c1 ) and b∗ = (1 − δ + δπ 0 )(dˆ − c1 ). (14) Assumption 2 guarantees that m∗1 ∈ (0, 1) and b∗ > 0. We will show that, if δ ≥ δ̃ ′′ , then σ ∗ is a PPE of the repeated game with observable avoidance actions. The single-deviation principle still applies. The strategies after a detectable deviation are a repetition of a static Nash equilibrium of the stage game, and thus constitute a PPE. Hence, we only need to verify that there are no profitable deviations along the equilibrium path. Since the avoidance action is not observable and s∗ is stationary, if it were profitable for the active agent to deviate at a single period by not taking the avoidance action on that period, then it would also be profitable to deviate by not taking the avoidance action in any period. Hence, in order to show that it is optimal for the active agent to take the avoidance action along the equilibrium path, it suffices to show that his equilibrium average expected discounted utility is weakly greater than the average expected discounted utility he would get by taking a = 0 on every period. We denote the value of this deviation v ′ . Note that b∗ = dˆ − m∗ . As we show below, 1 this value was chosen specifically so that v1′ ∗ 1 ∗ = v1 (σ ). Since σ is stationary along the equilibrium path, each agent’s average expected discounted utility equals their per-period utility. In particular, for the active agent we have that v1 (σ ∗ ) = −d − π 1 m∗1 + (1 − π 1 )b∗ = −d − π 1 m∗1 + (1 − π 1 )(dˆ − m∗1 ) = −m∗1 1 − π1 ˆ − 1 = −m∗1 + (1 − π 0 )d. +d π0 − π1 ! (15) By a similar argument, we have that v1′ = −π 0 m∗1 + (1 − π 0 )b∗ = (1 − π 0 )dˆ − m∗1 , which implies that v1′ = v1 (σ ∗ ). In turn, using an analogous argument for the benchmark model (see section A.1), the mitigation constraint for the active agent on the monetary-transfer model can be 43 written as ⇔ − (1 − δ)m∗1 + δv1 (σ ∗ ) ≥ −(1 − δ + δπ 0 )c1 ˆ ≥ −(1 − δ + δπ 0 )c1 − (1 − δ)m∗ + δ(−m∗ + (1 − π 0 )d) 1 1 ⇔ c1 + δ(1 − π 0 )(dˆ − c1 ) ≥ m∗1 , where the first implication follows from (15), and the second implication is obtained by rearranging terms. From (14), it follows that the mitigation constraint for the active agent is satisfied with equality. The values of m∗1 and b∗ were specifically chosen so that the incentive constraints for the active agent are satisfied with equality. Now, it remains to show that, when δ is close enough to 1, this leaves enough slack for the passive agent to be able to contribute m∗2 = 1 − m∗1 in case of a crisis and to transfer b∗ to the passive agent each time there is no crisis. The mitigation constraint for the passive agent is −(1 − δ)m∗2 + δv2 (σ ∗ ) ≥ −(1 − δ)c2 − δπ 0 c2 , and his constraint for transfers in case there is no crisis is −(1 − δ)b∗ + δv2 (σ ∗ ) ≥ −δπ 0 c2 . Below, we will show that limδ↑1 v2 (σ ∗ ) > −π 0 c2 . This implies that, in the limit when the discount factor approaches 1, both constraints are satisfied with strict inequality. By continuity, this implies that there exists some δ̃ ′′ such that σ ∗ is a PPE of the monetary-transfers model as long as δ ≥ δ̃ ′′ . By a similar argument as before, the average expected equilibrium value for the passive agent is v2 (σ ∗ ) = −π 1 (1 − m∗1 ) − (1 − π 1 )b∗ = −π 1 + m∗1 − (1 − π 1 )dˆ = −π 1 − (1 − π 1 )dˆ + c1 + δ(1 − π 0 )(dˆ − c1 ). Therefore, we have that lim v2 (σ ∗ ) = −π 1 − (1 − π 1 )dˆ + c1 + (1 − π 0 )(dˆ − c1 ) = π 0 c1 − π 1 − d. δ↑1 44 (16) Finally, Assumptions 1 and 2 imply that ∧ c1 + c2 > 1 d < π0 − π1 ⇒ π 0 (c1 + c2 ) − π 1 > π 0 − π 1 > d ⇒ π 0 c1 − π 1 − d > −π 0 c2 . Hence, by (16), we have limδ↑1 v2 (σ ∗ ) > −π 0 c2 , thus completing the proof. References Abreu, Dilip, David Pearce, and Ennio Stacchetti. 1990. “Toward a theory of discounted repeated games with imperfect monitoring.” Econometrica 58 (5):1041– 1063. Chari, VV and Patrick J Kehoe. 2015. “Bailouts, time inconsistency, and optimal regulation: A macroeconomic view.” American Economic Review . Cronshaw, Mark B. and David G. Luenberger. 1994. “Strongly symmetric subgame perfect equilibria in infinitely repeated games with perfect monitoring and discounting.” Games and Economic Behavior 6 (2):220–237. Diamond, Douglas W and Philip H Dybvig. 1983. “Bank runs, deposit insurance, and liquidity.” Journal of Political Economy :401–419. Ennis, Huberto M and Todd Keister. 2009. “Bank runs and institutions: The perils of intervention.” American Economic Review 99 (4):1588–1607. Farhi, Emmanuel and Jean Tirole. 2012. “Collective moral hazard, maturity mismatch, and systemic bailouts.” American Economic Review 102 (1):60–93. Green, Edward J. 2010. “Bailouts.” FRB Richmond Economic Quarterly 96 (1):11–32. Green, Edward J. and Robert H. Porter. 1984. “Noncooperative collusion under imperfect price information.” Econometrica 52 (1):87–100. Keister, Todd. 2016. “Bailouts and financial fragility.” Review of Economic Studies 83 (2):704–736. 45 Mailath, George J. and Larry Samuelson. 2006. Repeated games and reputations: Long-run relationships. Oxford University Press. Radner, Roy. 1985. “Repeated principal-agent games with discounting.” Econometrica 53 (5):1173–1198. Rubinstein, Ariel and Menahem E. Yaari. 1983. “Repeated insurance contracts and moral hazard.” Journal of Economic Theory 30 (1):74–97. Schneider, Martin and Aaron Tornell. 2004. “Balance sheet effects, bailout guarantees and financial crises.” Review of Economic Studies 71 (3):883–913. 46