View original document

The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.

Efficient Computation with Taste Shocks  WP 19-15  Grey Gordon Federal Reserve Bank of Richmond  Efficient computation with taste shocks∗ Grey Gordon Federal Reserve Bank of Richmond September 11, 2019 First draft: January, 2018. Working Paper No. 19-15 Abstract Taste shocks result in nondegenerate choice probabilities, smooth policy functions, continuous demand correspondences, and reduced computational errors. They also cause significant computational cost when the number of choices is large. However, I show that, in many economic models, a numerically equivalent approximation may be obtained extremely efficiently. If the objective function has increasing differences (a condition closely tied to policy function monotonicity) or is concave in a discrete sense, the proposed algorithms are O(n log n) for n states and n choices—a drastic improvement over the naive algorithm’s O(n2 ) cost. If both hold, the cost can be further reduced to O(n). Additionally, with increasing differences in two state variables, I propose an algorithm that in some cases is O(n2 ) even without concavity (in contrast to the O(n3 ) naive algorithm). I illustrate the usefulness of the proposed approach in an incomplete markets economy and a long-term sovereign debt model, the latter requiring taste shocks for convergence. For grid sizes of 500 points, the algorithms are up to 200 times faster than the naive approach. Keywords: Computation, Monotonicity, Discrete Choice, Taste Shocks, Sovereign Default, Curse of Dimensionality JEL Codes: C61, C63, E32, F34, F41, F44  ∗ Contact: greygordon@gmail.com. This paper builds on previous work with Shi Qiu, whom I thank. I also thank Amanda Michaud, Gabriel Mihalache, David Wiczer, and participants at LACEA/LAMES (Guayaquil), EEA/ESEM (Manchester), and Midwest Macro (Madison) for helpful comments. The views expressed are those of the author and do not necessarily reflect those of the Federal Reserve Bank of Richmond or the Board of Governors.  1  1  Introduction  Taste shocks have been widely used in economics. They give nondegenerate choice probabilities and likelihood functions (Luce, 1959; McFadden, 1974; Rust, 1987); facilitate indirect inference estimation (Bruins, Duffy, Keane, and Smith, 2015); smooth nonconvexities (Iskhakov, Jørgensen, Rust, and Schjerning, 2017); and make computation possible in consumer (Chatterjee, Corbae, Dempsey, and R´ıos-Rull, 2015) and sovereign default models (Dvorkin, S´anchez, Sapriza, and Yurdagul, 2018; Gordon and Querron-Quintana, 2018).1 But for all their advantages, taste shocks come with a significant computational burden. Specifically, to construct the choice probabilities (the “policy function” one usually uses when using taste shocks), one must generally know the value associated with every possible choice for every possible state. Consequently, taste shocks imply that computational cost grows quickly in the number of states and choices. In this paper, I show there is a way to reduce this cost in many economic models. Specifically, the cost can be reduced whenever the taste shocks are not very large and the objective function exhibits increasing differences and/or concavity in a discrete sense. (While the term increasing differences is uncommon, it is intimately connected with policy function monotonicity and commonly holds, as I will show.) If taste shocks are not very large, some choices will occur with negligible probability, and increasing differences and/or concavity lets one determine the location of those choices without having to evaluate them. The algorithms I propose, which build on Gordon and Qiu (2018a), have a cost that grows linearly or almost linearly for small taste shocks. With n states and n choices, the algorithms are O(n log n) for sufficiently small taste shocks if one has monotonicity or concavity or O(n) if one has both. Compared to the naive case, which has an O(n2 ) cost, the improvement is dramatic: a doubling of the points leads to doubles or nearly doubles the improvement over the naive approach. Consequently, the algorithms I propose can be arbitrarily more efficient than the standard approach. With a two-dimensional state space, the proposed algorithm (for a restricted class of problems) is O(n2 ) even when only exploiting monotonicity, another drastic improvement over the naive O(n3 ) algorithm. I demonstrate the numeric efficiency of the algorithm for quantitatively relevant grid sizes in two models. The first is a standard incomplete markets (SIM) model developed by Laitner (1979), Bewley (1983), Huggett (1993), Aiyagari (1994), and others. This model exhibits monotonicity in two-state variables and concavity, allowing all the algorithms to be evaluated within one model. I show that for grids of moderate size (500 asset states and choices), the algorithms can be up to 200 times faster than the naive approach. Moreover, to highlight their usefulness, I show taste shocks are useful in this model for reducing the numerical errors arising from a discrete choice space and for making excess demand continuous. Further, I show that the optimal taste shock size—in the 1 The idea of using taste shocks for convergence in sovereign debt models was developed independently by Dvorkin et al. (2018) and this paper and then followed subsequently by Gordon and Querron-Quintana (2018), Mihalache (2019), Arellano, Bai, and Mihalache (2019), and others. Additional applications include use in marriage markets (Santos and Weiss, 2016); dynamic migration models (Childers, 2018; Gordon and Guerron-Quintana, 2018); and quantal response equilibria (McKelvey and Palfrey, 1995).  2  sense of minimizing average Euler equation errors—is decreasing in grid sizes. This suggests the theoretical cost bounds, which require small taste shocks, are the relevant ones. In the SIM model, taste shocks are useful but not necessary for computing a solution. The second model I use is a long-term sovereign default model that requires taste shocks, or something like them, for convergence. The model further serves as a useful benchmark because the policy function is monotone—which implies increasing differences on the graph of the choice correspondence—but increasing differences does not hold globally. Adopting a guess-and-verify approach, I show the algorithm works without flaw and is extremely quick. There are few algorithms for speeding computation with taste shocks. Partly this is because taste shocks require a discrete number of choices, which precludes the use of derivative-based approaches. One clever approach is that of Chiong, Galichon, and Shum (2016), which shows there is a dual representation for discrete choice models. By exploiting it, they can estimate the model they consider five times faster. In contrast, the approach here can be hundreds of times faster. Chatterjee and Eyigungor (2016) use lotteries to solve models having quasigeometric discounting. While the algorithm they use to implement lotteries is efficient, it is also problem specific and does not apply to the long-term debt model considered here. In contrast, taste shocks provide an alternative and fast way of doing lotteries. The rest of the paper is organized as follows. Section 2 describes taste shocks, shows the usual structure of choice probabilities by means of an example, and derives the properties that will be exploited by the algorithms. Section 3 gives the algorithms for exploiting increasing differences and/or concavity. Section 4 characterizes the algorithms’ efficiency theoretically (characterizing the worst-case behavior) and numerically in the SIM and sovereign default applications. Section 5 extends the algorithms to exploiting increasing differences in two state variables, characterizing the performance theoretically and numerically. Section 6 concludes.  2  Taste shock properties  This section establishes key taste shock properties, which will be exploited later by the algorithms.  2.1  The problem and a numerical example  Fix a state i ∈ {1, . . . , n} and let U (i, i0 ) denote the utility associated with choice i0 ∈ {1, . . . , n0 }. Consider the maximization problem max  i0 ∈{1,...,n0 }  U (i, i0 ) + σi0 .  3  (1)  where i0 is a random variable for each i0 . McFadden (1974) pointed out that if the {i0 } are i.i.d. and all distributed Type-I extreme value,2 then the choice probabilities have a closed form expression3 exp(U (i, i0 )/σ) P(i0 |i) = Pn0 0 j 0 =1 exp(U (i, j )/σ)  (2)  From (2), it appears that computing P(i0 |i) requires evaluating U at each (i, i0 ) combination. However, this is not necessarily the case computationally, and an easy way to see this is with a numerical example. Consider a grid of wealth B = {b1 , . . . , bn } where we take n = 100 and bi = i (making B = {1, . . . , 100}) for simplicity. Let U (i, i0 ) = log(bi − bi0 /2) + log(bi0 ), corresponding to a two-period consumption-savings problem, where first-period (second-period) wealth is bi (bi0 ). Figure 1 plots contours of the log10 choice probabilities—i.e., log10 P(i0 |i)—for σ = 0.01 and σ = 0.0001 for this problem. 100 -1 -8 6 -4  100  80  6-8-1  80  4  -4  60  -8  60 6 -1  40  40  -16  20  -81- 6 4  20  -4 -8  6-8 -1  4  -16  20  40  60  80  100  20  40  60  80  100  Figure 1: Choice probabilities for different taste shock sizes The white regions in each graph represent (i, i0 ) combinations where—if one knew where they were—one would not need to evaluate U in order to construct the choice probabilities P. To see this, one must first note that, on a computer, 1 + ε = 1 if ε > 0 is small enough. The smallest ε for which 1 + ε 6= 1 is called machine epsilon, and, when computation is carried out in double precision (the 2  The Gumbel distribution, or generalized extreme value distribution Type-I, has two parameters, µ and β with −z a pdf β −1 e−z+e where z = (x − µ)/β. This has a mean µ + βγ for γ the Euler-Mascheroni constant. Rust (1987) focuses on the special case where µ = γ and β = 1. The formulas in this paper also assume this special case. 3 The expected value of the maximum P 0 is also given in closed form by the “log-sum” formula,   n 0 0 E maxi0 ∈{1,...,n0 } U (i, i ) + σi0 = σ log (Rust, 1987, p. 1012). These results follow from i0 =1 exp(U (i, i )/σ) the maximum of Gumbel-distributed random variables also being Gumbel distributed. Note that as the number of choices grow, this expectation tends to infinity. To eliminate this behavior, one can make the mean of the Gumbel distribution depend on the number of choices. With a suitably chosen mean, the log-sum formula can be replaced with a log-mean formula. See Gordon and Querron-Quintana (2018) for details.  4  standard on modern machines), machine epsilon is approximately 10−16 .4 Consequently, the white region characterizes (i, i0 ) combinations where, as far as the computer is concerned, P(i0 |i) is zero in the sense that 1 + P(i0 |i) = 1. Hence, if one knew where these state/choice combinations were, one could just assign U (i, i0 ) = U for a sufficiently negative U , calculate the choice probabilities P(i0 |i) using (2), and arrive at the same choice probabilities as if one had calculated U (i, i0 ) everywhere. How can one determine where these low-choice-probability (white) regions are without evaluating U everywhere? There are two ways. First, as evident in the figure, each contour line is monotonically increasing. (As will be shown, this is a consequence of U exhibiting increasing differences.) To see the importance of this, suppose one knew the choice probabilities P(·|i = 50) and knew that the lowest 10−16 contour occurs at i0 = 22 (like in the left panel of figure 1). Then, for i > 50, one knows that i0 < 22 must have a choice probability less than 10−16 . Consequently, one knows the white region, at a minimum, must include all (i, i0 ) having i > 50 and i0 < 22. Consequently, from just this one piece of information (i.e., P(·, i = 50)) one can immediately eliminate 1,050 evaluations, a savings of 10.5% compared to the naive approach.5 For the smaller taste shock giving rise to the right panel of figure 1, the savings are close to 25%, again just from one piece of information. The algorithms I propose will exploit much more than just this information and produce correspondingly larger gains. The second way to determine the location of the low-choice-probability region is to note that, for each i, the set connecting two equal contour probabilities is convex. (This is a consequence of concavity, or, more accurately, a version of quasi-concavity where the upper contour sets are convex in a discrete sense.) To see why this is useful, suppose for i = 50 one could find the maximum i0 = 50 of the no taste shock version quickly (O(log n0 ) evaluations) using Heer and Maußner’s (2005) binary concavity algorithm. Then one could move sequentially up (down) from i0 = 50 until reaching the upper (lower) contour—75 (22) for σ = 0.01 and 53 (47) for σ = 0.0001—and then stop; by concavity, any larger (smaller) i0 must have even lower probability and therefore be in the white region. Excluding the cost of finding the maximum (i0 = 50 in this case), which is small, this is extremely efficient because it is evaluating U almost only in the colored region. The algorithms I propose will use this concavity property by itself or jointly (when applicable) with monotonicity.  2.2  Key concepts and formalization of choice probability properties  To formally establish the choice probability properties seen in figure 1, a few definitions are necessary. Define the log odds ratio as 0  0  Lσ (j , i |i) := σ log    P(i0 |i) P(j 0 |i)  4    = U (i, i0 ) − U (i, j 0 ),  (3)  For single precision, which is the usual case for graphics card computation (such as with CUDA), machine epsilon is close to 10−8 . 5 The number 1,050 is the cardinality of the set {(i, i0 )|i > 50, i0 < 22, (i, i0 ) ∈ {1, . . . , 100}2 }, and there are 1002 combinations of (i, i0 ), which gives a savings of 1050/1002 = 10.5%.  5  which uses (2). Denote the optimal value and choices absent taste shocks as U ∗ (i) =  max  i0 ∈{1,...,n0 }  U (i, i0 )  G(i) = arg max U (i, i0 ).  and  (4)  i0 ∈{1,...,n0 }  Additionally, denote the log odds ratio of i0 relative to any choice in G(i) as L∗σ (i0 |i)  0    := Lσ (g(i), i |i) = σ log  P(i0 |i) P(g(i)|i)    = U (i, i0 ) − U ∗ (i),  (5)  where g(i) is any value in G(i). Note L∗σ (i0 |i) ≤ 0. As argued above, P(i0 |i), while strictly positive in theory for every choice, is numerically equivalent to zero if it is small enough. To capture this, I will condition the algorithms on an ε > 0 and treat any choice probability P(i0 |i) < ε as zero. I will say a choice i0 is numerically relevant at i if P(i0 |i) ≥ ε. A necessary condition for i0 to be numerically relevant at i is given in the following proposition: Proposition 1. For i0 to be numerically relevant at i, one must have L∗σ (i0 |i) ≥ Lσ := σ log(ε). The proofs for this and the other propositions are given in the appendix. 2.2.1  Monotonicity of relative choice probabilities  I will now establish the monotonicity properties of L∗ , which hinge on U having increasing differences. By definition, U has (strictly) increasing differences on S if and only if U (i, j 0 ) − U (i, i0 ) is (strictly) increasing in i for j 0 > i0 whenever (i, j 0 ), (i, i0 ) ∈ S.6 (Unless explicitly stated otherwise, the set S is {1, . . . , n} × {1, . . . , n0 }.) In the differentiable case, increasing differences requires, essentially, that the cross-derivative Ui,i0 be nonnegative. However, differentiability is not required, and Gordon and Qiu (2018a,b) collect many sufficient conditions for increasing differences. The main result the algorithms will exploit is summarized in proposition 2: Proposition 2. Fix an (i, i0 ), let g(i) ∈ G(i), and suppose L∗σ (i0 |i) < Lσ . Then, the following hold: If i0 < g(i), then L∗σ (i0 |j) < Lσ for all j > i; and if i0 > g(i), then L∗σ (i0 |j) < Lσ for all j < i. Hence, if one knows i0 < g(i) (i0 > g(i)) is not numerically relevant at i, then one knows it cannot be numerically relevant at j > i (j < i) either. In terms of figure 1, and speaking loosely, it says that the white region below (above) the bottom (top) 10−16 contour expands when moving to the right (left).7 The order of arguments does not matter, and one could equivalently require U (j, i0 ) − U (i, i0 ) to be increasing in i for j > i. As one might guess, increasing differences is a very general property, which I have simplified to the present context. The concept was introduced by Topkis (1978), and, in general, it is a property of a function of two partially ordered sets (which here are totally ordered). Supermodularity is a closely related property that implies increasing differences (Topkis, 1978, Theorem 3.1). 7 If this connection is not clear, note the contours describe {(i, i0 )|P(i0 |i) = ε}. If the necessary condition in proposition 1 is “close” to sufficient, then the contours are roughly {(i, i0 )|L∗σ (i0 |i) = Lσ }. 6  0  6  2.2.2  Concavity of relative choice probabilities  In addition to exhibiting monotonicity properties, L∗σ can potentially have concavity properties as well, and these hinge on U having concavity properties. Specifically, say a function f : {1, . . . , n} × {1, . . . , n0 } → R is concave if, for any i and any y, {i0 ∈ {1, . . . , n0 }|f (i, i0 ) ≥ y} is a list of integers with no gaps (e.g., {2,3,4}). Then, we have the following result: Proposition 3. If U is concave, then the following hold: If L∗σ (i0 |i) ≥ Lσ and L∗σ (i0 + 1|i) < Lσ , then L∗σ (j 0 |i) < Lσ for all j 0 > i0 ; and if L∗σ (i0 |i) ≥ Lσ and L∗σ (i0 − 1|i) < Lσ , then L∗σ (j 0 |i) < Lσ for all j 0 < i0 . In other words, if one knows that i0 could be numerically relevant but that i0 + 1 (i0 − 1) is not, then all j 0 > i0 (j 0 < i0 ) must be irrelevant. Because U is concave in the example used in figure 1, the white region above (below) the top (bottom) 10−16 contour is convex, as is the colorful region between the two contours.8 The algorithms will exploit this to avoid evaluating U in the white region.  2.3  The relationship between monotone policies and increasing differences  In the introduction, I claimed increasing differences is closely connected to policy function monotonicity and holds in many economic models. To back up the first claim, I first note that, under mild conditions, increasing differences implies monotonicity.9 The reason is not difficult to see. Consider j > i and g(j) ∈ G(j), g(i) ∈ G(i) (not necessarily with g(j) ≥ g(i)). Then by optimality U (i, g(j)) − U (i, g(i)) ≤ 0 ≤ U (j, g(j)) − U (j, g(i))  (6)  With increasing differences, this is only possible if g(j) ≥ g(i), which gives monotonicity. Moreover, there is partial converse of this result. Specifically, suppose g is known to be monotone (or in the general case, where g is not unique, that G is strongly ascending).10 Then taking g(j) ≥ g(i) and (6) gives that U exhibits increasing differences on {(i, i0 )|i ∈ I, i0 ∈ G(i)}. That is, monotonicity implies the objective function exhibits increasing differences on the graph of the optimal choice correspondence. I also claimed increasing differences holds in many economic models. In the weaker sense of having monotone policies (and thereby having increasing differences on a subset of the choice and state space), this is obvious: the real business cycle model (RBC), the SIM model, and the benchmark sovereign default models have monotone policies, along with many others. However, it The convexity of the colorful region follows as a corollary of proposition 3. Specifically, if L∗σ (i0 |i) ≥ Lσ for i ∈ {a, b}, then L∗σ (i0 |i) ≥ Lσ for all i0 ∈ {a, . . . , b} (if not, there would be a contradiction of proposition 3). 9 In particular, the feasible choice correspondence must be ascending (which it is here). In that case, increasing differences implies the optimal choice correspondence is ascending. This ensures a monotone optimal policy exists. Moreover, strictly increasing differences implies the optimal choice correspondence is strongly ascending. This ensures every optimal policy is monotone. See Gordon and Qiu (2018a) for details. 10 For the general definition, see Topkis (1978); for a simplified one sufficient for the purposes of this paper, see Gordon and Qiu (2018a). 8  0  7  is also true in the stronger sense of U having increasing differences. E.g., in the RBC and SIM models, increasing differences holds globally.11 While it does not hold globally in the sovereign default model presented here, it will be seen that the monotonicity algorithms deliver the correct result. Hence, even if one cannot prove increasing differences holds globally, a guess-and-verify approach is attractive.  3  Algorithms for exploiting choice probability structures  I now lay out the algorithms: first, the algorithm exploiting monotonicity; second, exploiting concavity; and finally, exploiting both simultaneously. The algorithms’ efficiency is explored in Section 4, and exploiting monotonicity in two states is deferred until Section 5.  3.1  Exploiting monotonicity  The algorithm for exploiting the monotonicity of L∗ outlined in proposition 2 is as follows: Algorithm 1: Binary monotonicity with taste shocks Parameters: a σ > 0, an ε > 0 such that P(i0 |i) < ε will be treated as zero, and Lσ := σ log(ε). Input: none. Output: U (i, i0 ) for all (i, i0 ) having P(i0 |i) ≥ ε. 1. Let i = 1 and i := n 2. Solve for U (i, i0 ), and then L∗σ (i0 |i), at each i0 . Find the smallest and largest i0 such that L∗σ (i0 |i) ≥ Lσ and save them as l(i) and h(i), respectively. 3. Solve for U (i, i0 ) for i0 = l(i), . . . , n0 . Then, solve for L∗σ (i0 |i) for each i0 = l(i), . . . , n0 assuming that U (i, i0 ) = −∞ for i0 = 1, . . . , l(i) − 1. Find the smallest and largest i0 in {l(i), . . . , n0 } such that L∗σ (i0 |i) ≥ Lσ and save them as l(i) and h(i), respectively. 4. Main loop: (a) If i < i + 1, STOP. Otherwise, define m = b(i + i)/2c. (b) Solve for U (m, i0 ) for i0 = l(i), . . . , h(i). Solve for L∗σ (i0 |m) for each i0 = l(i), . . . , h(i) assuming that U (m, i0 ) = −∞ for i0 = 1, . . . , l(i) − 1, h(i) + 1, . . . , n0 . Find the smallest and largest i0 in {l(i), . . . , h(i)} such that L∗σ (i0 |m) ≥ Lσ , and save them as l(m) and h(m), respectively. (c) Go to step 4(a) twice, once redefining (i, i) := (i, m) and once redefining (i, i) := (m, i). 11  Gordon and Qiu (2018b) explicitly prove this for the RBC model, but the proof for the SIM would be virtually identical.  8  Note that at step 4 of the algorithm, it is always the case (provided U has increasing differences) that L∗σ (i0 |m) < Lσ for all i0 < l(i) or i0 > h(i), which justifies the algorithm’s treatment of U (m, i0 ) = −∞ for these choices. The algorithm as applied to the U (i, i0 ) = log(bi −bi0 /2)+log(bi0 ) example is depicted graphically in figure 2. The top panel shows the algorithm when n = n0 = 35. The blue dots give the computed l(·) and h(·) bounds on numerical relevance. The gray, empty circles show where U ends up being evaluated. For i = 1, nothing is known, and so one must—absent an assumption on concavity— evaluate U (1, ·) at all i0 . This corresponds to step 2. In step 3, the algorithm moves to i = 35. There, U (35, ·) is evaluated at {l(1), . . . , n}, but since l(1) = 1 for this example, U (35, ·) is evaluated everywhere. In step 4, the algorithm goes to 18, the midpoint of 1 and 35. Because l(1) = 1 and h(35) = 35, U (18, ·) must be evaluated at all i0 again. So far, the algorithm has gained nothing. However, as the divide-and-conquer process continues, the gains grow progressively larger. After i = 18, the algorithm goes to i = 9 (b(1 + 18)/2c) and i = 26 (b(18 + 35)/2c). At i = 9, U must be evaluated at {l(1), . . . , h(18)} = {1, . . . , 21}, a 40% improvement over the 35 from the naive approach. Each subsequent step reduces the number of evaluations, and, in the final iterations, the number of evaluations becomes extremely small. For instance, at i = 4, U (4, ·) is evaluated only at {l(3), . . . , h(5)} = {3, 4, 5}. This is an order of magnitude fewer evaluations than in the naive approach. Increasing the number of points to 250, as is done in the bottom panel of figure 2, shows the algorithm wastes very few evaluations. In particular, everywhere in between the blue lines must be evaluated to construct the choice probabilities, since every choice in that region is numerically relevant. The gray area outside the blue lines are inefficient in the sense that knowledge of U (i, i0 ) there is not necessary for constructing the choice probabilities, but evidently these wasted evaluations make up only a small percentage of the overall space.  3.2  Exploiting concavity  Using the concavity property established in proposition 3, algorithm 2 identifies the set of relevant i0 values for a given i. Algorithm 2: Binary concavity with taste shocks Parameters: a σ > 0, an ε > 0 such that P(i0 |i) < ε will be treated as zero, and Lσ := σ log(ε). Input: an i ∈ {1, . . . , n} and a, b such that L∗σ (i0 |i) < Lσ for all i0 < a and i0 > b. Note that this implies G(i) = arg maxi0 ∈{a,...,b} U (i, i0 ), because i0 ∈ G(i) has L∗σ (i0 |i) = 0 ≥ Lσ . Output: U (i, i0 ) for all i0 having P(i0 |i) ≥ ε, and l(i), h(i) such that L∗σ (i0 |i) ≥ Lσ if and only if i0 ∈ {l(i), . . . , h(i)}. 1. Solve for any element g(i) ∈ arg maxi0 ∈{a,...,b} U (i, i0 ) = G(i) using Heer and Maußner’s (2005) binary concavity algorithm as described in Gordon and Qiu (2018a). Note: U ∗ (i) = U (i, g(i)), so knowing U (i, i0 ) gives L∗ (i0 |i) = U (i, i0 ) − U ∗ (i).  9  35  30  25  20  15  10  5  1  3  5  7  9  1  16  32  47  63  11  13  15  18  20  22  24  26  28  30  32  35  250  200  150  100  50  78  94 109 125 140 156 171 187 202 218 234 250  Figure 2: Illustration of algorithm 1 evaluations for two grid sizes  10  2. Define i0 = g(i) (a) If i0 = a, STOP. Store i0 as l(i). (b) Evaluate U (i, i0 − 1). If L∗σ (i0 − 1|i) < Lσ , store i0 as l(i) and STOP. (c) Decrement i0 by 1 and go to step 2(a). 3. Define i0 = g(i) (a) If i0 = b, STOP. Store i0 as h(i). (b) Evaluate U (i, i0 + 1). If L∗σ (i0 + 1|i) < Lσ , store i0 as h(i) and STOP. (c) Increment i0 by 1 and go to step 3(a). For an illustration of how algorithm 2 works, consider figure 3. In particular, focus on the middle column of dots in the top panel, which corresponds to algorithm 2 inputs (i, a, b) = (18, 1, 35). Heer and Maußner’s (2005) algorithm locates the optimal policy by comparing two adjacent values in the middle of the {a, . . . , b} range, which in this case are 18 and 19. Since U (18, 18) ≥ U (18, 19), the algorithm eliminates the range {19, . . . , 35} because there must be a maximum in {1, . . . , 18}.12 The algorithm then compares U (18, i0 ) at i0 = 9 and i0 = 10, which are adjacent points in the middle of {1, . . . , 18}, and compares them. Since g(18) = 18, this binary up-or-down step proceeds up, evaluating i0 = 14, 15; then 17, 18; and stops. Having located g(18) = 18, step 2 then moves sequentially through i0 = 19, 20, . . . and stops when it reaches the not numerically relevant i0 = 22. Then, step 3 moves sequentially through i0 = 17, 16, . . . and stops when it reaches i0 = 13. In this case, only four evaluations are wasted—U (18, ·) is evaluated eleven times and it is necessary to evaluate it seven times—, which represents a savings of 69% (1 − 11/35) over the naive algorithm. For n = n0 = 250, as in the bottom panel, again only four evaluations are wasted, and the savings, consequently, are even larger.  3.3  Exploiting monotonicity and concavity  I now combine algorithms 1 and 2 to simultaneously exploit monotonicity and concavity. As one may have already guessed from examining figure 3, doing so will be extremely efficient. Algorithm 3: Binary monotonicity and binary concavity with taste shocks Parameters: a σ > 0, a ε > 0 such that P(i0 |i) < ε will be treated as zero, and Lσ := σ log(ε). Input: none. Output: U (i, i0 ) for all (i, i0 ) having P(i0 |i) ≥ ε. 12 There is a subtlety here related to feasibility. The canonical problem in Gordon and Qiu (2018a,b) is (1) with σ = 0, which assumes that every choice is feasible. However, they prove that, under mild conditions, a more general problem with nonfeasible choices can be mapped into it. The mapping consists of replacing U (i, i0 ) at non-feasible (i, i0 ) pairs with a sufficiently negative number (and if a state has no feasible choice at all, replacing it with 1[i0 = 1]), which produces indifference when both choices are not feasible. They prove the algorithms will deliver correct solutions if the choice set is increasing in i and, if using concavity, has the form {1, . . . , n ¯ (i)} for some n ¯ (·) (see Section D and especially proposition 6 in Gordon and Qiu, 2018b). Binary concavity works there, in part, because at points of indifference (which could correspond to two non-feasible choices), the algorithm eliminates the higher range, thereby moving towards feasible choices (if they exist).  11  35  30  25  20  15  10  5  1  3  5  7  9  1  16  32  47  63  11  13  15  18  20  22  24  26  28  30  32  35  250  200  150  100  50  78  94 109 125 140 156 171 187 202 218 234 250  Figure 3: Illustration of algorithm 3 evaluations for two grid sizes  12  1. Let i = 1 and i := n 2. Use algorithm 2 with (i, a, b) = (i, 1, n0 ) to solve for l(i), h(i) and U (i, i0 ) at numerically relevant i0 . 3. Use algorithm 2 with (i, a, b) = (i, l(i), n0 ) to solve for l(i), h(i) and U (i, i0 ) at numerically relevant i0 . 4. Main loop: (a) If i < i + 1, STOP. Otherwise, define m = b(i + i)/2c. (b) Use algorithm 2 with (i, a, b) = (m, l(i), h(i)) to solve for l(m), h(m), and U (m, i0 ) at numerically relevant i0 . (c) Go to step 4(a) twice, once redefining (i, i) := (i, m) and once redefining (i, i) := (m, i).  4  Algorithm efficiency  This section first establishes theoretical efficiency bounds, and then examines the empirical performance in two common models.  4.1  Theoretical worst-case bounds  For all the algorithms, the worst-case behavior can be very bad for two reasons. First, if U (i, ·) is a constant, then L∗σ (i0 |i) = 0 for all i, i0 . Consequently, l(i) = 1 and h(i) = n0 for all i and U must be evaluated everywhere. Second, if σ is large enough, every choice will be chosen with virtually equal probability, which again requires evaluating U everywhere. However, if σ is small and U (i, ·) has a unique maximizer for each i, then one can obtain the following theoretical cost bounds as stated in propositions 4, 5, and 6. Proposition 4. Consider algorithm 1. Suppose U (i, ·) has a unique maximizer for each i. Then for any ε > 0 and any (n, n0 ), there is a sufficiently small σ(n, n0 ) > 0 such that U is evaluated at most n0 log2 (n) + 3n0 + 2n times and, fixing n0 = n, the algorithm is O(n log2 n) with a hidden constant of 1. Proposition 5. Consider algorithm 2. Suppose U (i, ·) has a unique maximizer for each i. Then for any ε > 0 and any (n, n0 ), there is a sufficiently small σ(n, n0 ) > 0 such that U is evaluated at most 2n log2 (n0 ) + 3n times if n0 ≥ 3 and, fixing n0 = n, the algorithm is O(n log2 n) with a hidden constant of 2. Proposition 6. Consider algorithm 3. Suppose U (i, ·) has a unique maximizer for each i. Then for any ε > 0 and any (n, n0 ), there is a sufficiently small σ(n, n0 ) > 0 such that U is evaluated fewer than 8n + 8n0 + 2 log2 (n0 ) times and, fixing n0 = n, the algorithm is O(n) with a hidden constant of 16. 13  4.2  Numerical example #1: Bewley-Huggett-Aiyagari  The previous subsection established very efficient worst-case bounds for the case of small taste shocks. I now explore the algorithm’s efficiency properties numerically for moderately sized taste shocks, quantifying the trade-off between taste shock size and speed. 4.2.1  Model description  The incomplete market model I consider is from Aiyagari (1994) with a zero borrowing limit. The household problem can be written V (a, e) = max u(c) + βEe0 |e V (a0 , e0 ) 0 a ∈A  .  (7)  s.t. c + a0 = we + (1 + r)a where A is a the asset choice and state space. The equilibrium prices w and r are determined by optimality conditions of a competitive firm and given by r = FK (K, N ) − δ and w = FN (K, N ). R R Aggregate labor and capital are given by N = edµ and K = adµ, respectively, where µ is the invariant distribution of households. To map this into a discrete problem with taste shocks, take A = {a1 , . . . , an } and assume the support of e is finite also. Then one can write   W (i, e) = E  max    u(we + (1 + r)ai − ai0 ) + βEe0 |e W (i0 , e0 ) +σi0  . {z }  i0 ∈{1,...,n} |  =:U (i,i0 ;e)  When a choice i0 is not feasible for some e, Gordon and Qiu (2018a) prove the algorithms will still work provided a large negative number is assigned to U (i, i0 ; e) there.13 4.2.2  The trade-off between speed and taste shock size  Figure 4 shows the algorithm performance for commonly used grid sizes at differing levels of taste shock sizes. The performance is measured in two ways. The primary measure is evaluation counts, which is used in the left panels. This is a programming-free and system-independent way to measure performance. The secondary way is run times, which is used in the right panels. This metric depends on programming and the processing system used. First consider the top left panel. This gives the evaluation count speedup—i.e., the ratio of n2 (the naive algorithm’s evaluation counts) to the proposed algorithm’s evaluation count—for using monotonicity and concavity together (algorithm 3). Proposition 6 shows that this speedup at worst grows linearly when taste shocks are sufficiently small. And as seen in the blue line, where taste shocks are very small, the speedup in fact grows linearly. Consequently, algorithm 3 is around 140 In the case that all the i0 are not feasible, one should assign a large negative number to each choice but an even more negative number to i0 = 1. 13  14  times faster than the naive algorithm by the time n = n0 = 500. As the size of taste shocks grow, which increases size of the numerically relevant region, performance necessarily worsens and speedup no longer grows linearly. However, as will be shown later, the optimal taste shock size—in the sense of minimizing Euler equation errors—goes to zero as the number of gridpoints grows. Consequently, if one is using taste shocks for this reason, one can in fact expect very good performance as one progresses upward, from the orange (circled), to green (dotted), to red (dashed), and to blue (solid) lines. For algorithms 1 and 2, which exploit only monotonicity (middle left panel) and concavity (bottom left panel), respectively, the speedup is almost linear for small taste shocks. This is guaranteed by the theory, as the speedup must grow linearly up to a log factor. Again, larger taste shocks diminish performance. None of the algorithms attain linear speedup when measuring speedups in run times (in the right panels). The reason, ironically, is that all three of the algorithms are quite efficient. Consequently, the time spent on maximization of the objective function, which is where the algorithms help, becomes small in comparison to the other necessary parts of the solution (such as computing expectations). Nevertheless, one can still expect an order of magnitude gain for the moderate grid sizes considered here. 4.2.3  Optimal taste shock levels  The previous subsection showed the performance of the algorithms is much greater for small taste shocks. I now show that (1) there is an optimal, in the sense of minimizing Euler equation errors, taste shock size; and (2) that this value tends to zero as grid sizes increase. Consequently, the algorithms’ performance for smaller taste shocks is the most relevant metric, and, as seen, the performance is very good. First, consider how taste shocks change the optimal policy as illustrated in figure 5. Absent taste shocks, consumption (in the top left panel) exhibits a saw-tooth pattern that is familiar to anyone who has worked with these models. Using taste shocks, as in the top right panel, smooths out “the” consumption policy—i.e., the expected consumption associated with a given asset and income level (integrating out the taste shock values), E c(a, e, ). The effect on the Euler equation errors can be large. At medium levels of asset holdings, the errors drop by as much as two orders of magnitude. However, at the upper bound and lower bound of the grid, there is not necessarily any improvement. The reason is that the consumption policy is being treated as E c(a, e, ), and so the calculation does not account for the points where the borrowing constraint (or saving constraint) are binding for only some . This can be seen for the high-earnings policy in which the borrowing constraint is not binding and the Euler error is small. Figure 6 shows a clear trade-off between shock size and the average Euler errors. At small values of taste shocks, the comparatively large errors are driven by the discrete grid. At large taste shock sizes, the errors are instead driven by “mistakes” coming from agents purposely choosing positions that have comparatively low utility U (i, i0 ) but large taste value i0 . The optimal value for this 15  60 100 40  50  20  0  0 100  200  300  400  500  40  25  30  20  100  200  300  400  500  100  200  300  400  500  100  200  300  400  500  15 20 10 10  5  0  0 100  200  300  400  500  30 15 20 10 10  5  0  0 100  200  300  400  500  Figure 4: Algorithm speedup by taste shock magnitude and grid size  16  5  5  4  4  3  3  2  2  1  1  0  0 0  10  20  30  40  50  -1  -1  -2  -2  -3  -3  -4  -4  -5  -5  -6  0  10  20  30  40  50  0  10  20  30  40  50  -6 0  10  20  30  40  50  Figure 5: Policy and Euler error comparison  17  calibration and grid size is around 10−2.6 ≈ 0.002, which is the value used in figure 5.  Figure 6: Taste shocks’ impact on Euler errors Clearly, as grid sizes increase, the first type of error falls. The second type of error, however, does not tend to change. The import of this is that the optimal taste shock size—in the sense of minimizing average Euler equation errors—is diminishing in grid size. This is revealed in figure 7, which plots average Euler equation errors for differing grid and taste shock sizes. For grids of 100, σ = 0.1 is optimal. This decreases to σ = 0.01 for grid sizes of 250. For grids of 1,000, the optimal size falls by another order of magnitude to σ = 0.001. For the grid sizes here, the difference between an optimal taste shock size and an arbitrary size is one to two orders of magnitude. Additionally, a 250-point grid with an optimally chosen σ is far better (with average errors around 4 ≈ 10−.6 times smaller) than a 1,000-point grid with no taste shocks. Consequently, taste shocks are a cost-effective way—when using the proposed algorithms— of reducing computational error. Moreover, the fact that the optimal σ is decreasing in grid size implies that the algorithms’ speedups grow linearly or almost linearly when using optimal taste shock magnitudes.  18  -1  -1.5  -2  -2.5  -3  -3.5  -4 0  100  200  300  400  500  600  700  800  Figure 7: Euler errors by taste shock magnitude and grid size  19  900  1000  4.2.4  Two more advantages of taste shocks: market clearing and calibration  We also briefly point out two more advantages of having preference shocks. First, provided the utility function U moves continuously in the underlying object, the policy functions (averaged across choice probabilities) do as well. This has many benefits, one of which can be seen in figure 8, a reproduction of the classic market clearing figure in Aiyagari (1994). In the top panels, it seems that as the interest rate r increases, the implied aggregate asset holdings labeled Ea(r) increases continuously. However, upon further inspection in the bottom panels, aggregate assets actually move discontinuously for σ = 0. For some parameterizations, this is more of an issue than for others. Here, K(r), firms’ demand for capital, falls close to Ea(r) for r ≈ 0.014. However, the model rules out the possibility that capital would be between 7.7 and 8, which could be an issue in calibration. In contrast, for σ > 0.01, demand moves continuously. 0.03  0.03  0.02  0.02  0.01  0.01  0  0  -0.01  -0.01  -0.02  -0.02  -0.03  -0.03 5  10  15  20  25  5  0.018  0.018  0.017  0.017  0.016  0.016  0.015  0.015  0.014  0.014  0.013  0.013  0.012  10  15  20  25  0.012 7  8  9  10  7  8  9  10  Figure 8: Market clearing in Aiyagari (1994) A second advantage is that taste shocks aid in calibration and estimation, a point made, e.g., in Bruins et al. (2015). Consider, for simplicity, trying to choose the depreciation rate δ, to match some target interest rate. Increasing δ lowers firm capital demand, shifting K(r) down in a parallel 20  fashion.14 In the bottom left panel, one can see this shift down in K(r) (the red dashed line) would at first lower the equilibrium interest rate r∗ and equilibrium capital K ∗ in a fairly smooth fashion. But, after reaching the gap in the supply of capital that occurs between K ∈ [7.7, 8], an equilibrium does not even approximately exist.15 Consequently, whatever “equilibrium” moment the code returned for r∗ or K ∗ would be arbitrary and not necessarily vary smoothly in δ. In contrast, taste shocks typically make the equilibrium moments vary smoothly in δ and other parameters, facilitating moment-based estimation techniques.  4.3  Numerical example #2: Sovereign default  While taste shocks are very useful in the SIM model above, for decades the profession has made due without them because they are, strictly speaking, not necessary. A class of models where taste shocks, or something like them, are necessary is sovereign default models with long-term bonds (which were first considered by Hatchondo and Martinez, 2009, and Chatterjee and Eyigungor, 2012). I will now show that, in that class of models, convergence fails absent taste shocks (but can hold with them). I will then characterize the algorithm speedup, which turns out to be very similar to the speedup in the SIM model. 4.3.1  Model description  The model has sovereign with a total stock of debt −b and output y. If the sovereign decides to default on its debt, the country moves to autarky with output reduced by φ(y). The country returns from autarky with probability ξ, in which case all its outstanding debt has gone away. When not defaulting and not in autarky, the sovereign has to pay back (λ + (1 − λ)z)(−b), reflecting that λ fraction of the debt matures and, for the fraction not maturing, a coupon z must be paid. Additionally, the sovereign issues −b0 + (1 − λ)b units of debt at price q(b0 , y)—the price depends only on the next period total stock of debt and current output because they are sufficient statistics for determining repayment rates. The sovereign’s problem may be written V (b, y) = max (1 − d)V r (b, y) + dV d (y) d∈{0,1}  where the value of repaying is V r (b, y) = max u(c) + βEy0 |y V (b0 , y 0 ) 0 b ∈B  0  s.t. c = y − q(b , y)(b0 − (1 − λ)b) + (λ + (1 − λ)z)b 14  In particular, K(r) is given implicitly by r = FK (K(r), N ) − δ. Of course, an equilibrium does exist, but constructing it involves using demand correspondences and exploiting points of indifference, not the simple policy functions here. 15  21  and the value of default is h i V d (y) = u(y − φ(y)) + Ey0 |y (1 − ξ)V d (y 0 ) + ξV (0, y 0 ) Let the optimal bond policy for the V r problem be denoted a(b, y). The price schedule q is a solution to q = T ◦ q where  (T ◦ q)(b0 , y) = (1 + r∗ )−1 Ey0 |y (1 − d(b0 , y 0 )) λ + (1 − λ)(z + q(a(b0 , y 0 ), y 0 ) ,  (8)  with r∗ an exogenous risk-free rate. The price schedule reveals a fundamental problem with using a discrete grid B in this model: very small changes to q can cause discrete changes in a, which then cause discrete changes in T ◦ q that inhibit or prevent convergence. The literature has found ways to get around this issue. E.g., Hatchondo and Martinez (2009) use continuous choice spaces with splines and seem to not have convergence troubles, and Chatterjee and Eyigungor (2012) incorporate a continuously distributed i.i.d. shock to facilitate convergence. Taste shocks are, however, a simpler option that accomplishes the same task. Taste shocks can be added to either the default decision, the debt issuance decision, or both. For expositional simplicity, I add them only to the debt issuance decision, which will be sufficient to generate convergence.16 Taking B = {b1 , . . . , bn }, I define   r  V (i, y) = E  0  max  i0 ∈{1,...,n}    0  u(c) + βEy0 |y V (i , y ) + σi0  s.t. c = y − q(bi0 , y)(bi0 − (1 − λ)bi ) + (λ + (1 − λ)z)bi (while also changing V and V d to use indices instead of bond values). The price schedule update then becomes " 0  ∗ −1  (T ◦ q)(i , y) = (1 + r )  !# 0  0  Ey0 |y (1 − d(i , y )) λ + (1 − λ)(z +  X  00 0  0  00  0  P(i |i , y )q(i , y )  .  (9)  i00  The advantage of using taste shocks is that P(i00 |i0 , y 0 ) moves continuously in the q guess. Consequently, T ◦ q changes little when q changes little, except when the default decision d changes. In contrast, in (8), small changes in q can result in discontinuous jumps in a, which then create large jumps in T ◦ q as Chatterjee and Eyigungor (2012) discuss.17 16  A previous version of the paper, available by request, combined the repayment and default decision into one choice, which can be considerably faster. Another issue here is that, as the number of choices grow, V r will tend to infinity because the support of the taste shocks is infinite. As discussed in Gordon and Querron-Quintana (2018), a good way to correct this is to make the mean of the taste shocks dependent on the grid size. This converts the “log-sum formula” from Rust (1987) into a “log-mean” formula, which eliminates this systematic growth in V r . 17 Why is it important to smooth jumps in a more than jumps in d? Consider the case of V r and V d being close to their fixed points with V r not almost identical to V d . Then, small changes in q will not change the default decision but may change the policy a in a substantial way, thereby affecting T ◦ q.  22  4.3.2  Convergence only with preference shocks  Figure 9 shows the lack of convergence for σ = 0, even after 2,500 iterations with a relaxation parameter above 0.95 and with 300 income states.18 In contrast with σ = 10−4 , convergence happens without any relaxation. For the intermediate case of σ = 10−6 , convergence does not occur without relaxation, but as soon as the relaxation parameter begins increasing (which happens at the 500th iteration, as seen in the bottom panel), the sup norms for V and q trend downward.  -2 -4 -6 500  1000  1500  2000  2500  500  1000  1500  2000  2500  500  1000  1500  2000  2500  -2 -4 -6  1  0.95  0.9  Figure 9: Convergence and lack of it with long-term debt  4.3.3  Monotone policies, but not necessarily increasing differences  Using results from Gordon and Qiu (2018b), increasing differences of the objective function will hold globally if c(b, b0 ) := y − q(b0 , y)(b0 − (1 − λ)b) + (λ + (1 − λ)z)b is increasing in b, decreasing in b0 , and if c has increasing differences. All of these hold except for possibly c decreasing in b0 . With short-term debt λ = 1, c may not be decreasing in b0 because q is non-monotone; with long-term debt λ < 1, the situation is even worse because of debt dilution (see Hatchondo, Martinez, and 18 The relaxation parameter is the constant θ used when updating q in the fixed point iteration algorithm. Specifically, at iteration t, one has a guess qt and computes an update T ◦ qt . The guess at iteration t + 1 is then qt+1 = θqt + (1 − θ)(T ◦ qt ). See Judd (1998) for more details.  23  Sosa-Padilla, 2014, 2016 for details). However, one can also show, using results from Gordon and Qiu (2018a), that the policy function absent taste shocks is nevertheless monotone, which implies increasing differences holds on the graph of the optimal choice correspondence. Can the monotonicity algorithms be used without a proof of increasing differences? Yes. Analogously to figure 2, figure 10 plots choice probability contours for an intermediate level of output computed without any monotonicity assumptions. Both the double precision cutoff of P(i0 |i) ≥ 10−16 and the single precision cutoff of P(i0 |i) ≥ 10−8 are monotone. This means the monotonicity algorithm applied to either of these cutoff levels will work correctly. However, this behavior is not guaranteed, so one should use caution. A good way to be cautious is to use a guess-and-verify approach, computing the optimal policy assuming the algorithms work and lastly checking, using the naive algorithm, whether the computed V and q functions constitute an equilibrium. Note that as the final verification step is only one iteration out of several hundred or thousands, its overall cost is small. 250  200  150  100  50  0 0  50  100  150  Figure 10: Choice probabilities  24  200  250  4.3.4  Empirical speedup  As can be seen in figure 11, the algorithm’s speedups are essentially identical to those in the incomplete markets model. In particular, the speedups grow almost linearly as measured by evaluation counts provided σ is close to zero. However, convergence cannot be obtained for very small σ. The best performance—while still obtaining convergence—occurs for σ between 10−5 and 10−6 . There, the algorithm is twelve to twenty-four times more efficient than the naive algorithm, a large improvement.  40 30 20 10 0 50  100  150  200  250  300  350  400  450  500  100  150  200  250  300  350  400  450  500  7.5 7 6.5 6  50  Figure 11: Default model speedups and iterations to convergence  5  Exploiting monotonicity in two states  So far, I have focused on the one-dimensional case. But in many cases there can be monotonicity in more than one state variable, such as in the incomplete markets model where the savings policy is monotone bonds and earnings. I now show how the ideas from the one-state-variable algorithms can be extended to having two state variables.  25  5.1  The algorithm  Consider a problem with a two-dimensional state space (i1 , i2 ) ∈ {1, . . . , n1 } × {1, . . . , n2 }. Defining everything analogously to section 2, e.g., max  i0 ∈{1,...,n0 }  and  U (i1 , i2 , i0 ) + σi0  (10)  exp(U (i1 , i2 , i0 )/σ) , P(i0 |i1 , i2 ) = PN 0 j 0 =1 exp(U (i1 , i2 , j )/σ)  (11)  all the preceding results for the one-dimensional case go through holding the other state fixed. In particular, propositions 1 and 3 are unchanged (except replacing i with i1 , i2 ), while the monotonicity result in proposition 2 becomes ( Lσ (j 0 , i0 |i1 , i2 ) is  decreasing in i1 and i2 if j 0 > i0 increasing in i1 and i2 if j 0 < i0  .  (12)  The below algorithm exploits monotonicity (and, if applicable, concavity) in an efficient way. Algorithm 4: Binary monotonicity in two states with taste shocks Parameters: a σ > 0, an ε > 0 such that P(i0 |i1 , i2 ) < ε will be treated as zero, and Lσ := σ log(ε). Input: none. Output: U (i1 , i2 , i0 ) for all (i1 , i2 , i0 ) having P(i0 |i1 , i2 ) ≥ ε. 1. Solve for U (·, 1, i0 ) using algorithm 1 or 3. Save the l function from it as l(·, 1). 2. Solve for U (·, n2 , i0 ) using algorithm 1 or 3 while only checking i0 ≥ l(·, 1). Save the h function from it as h(·, n2 ). Set (i2 , i2 ) := (1, n2 ). 3. Main loop: (a) If i2 < i2 + 1, STOP. Otherwise, define m2 = b(i2 + i2 )/2c. (b) Solve for U (·, m2 , i0 ) using algorithm 1 or 3 but restricting the search space additionally to l(·, i2 ) and h(·, i2 ). Save the l and h coming from the one-state algorithm as l(·, m2 ) and h(·, m2 ), respectively. (c) Go to step 3(a) twice, once redefining (i2 , i2 ) := (i2 , m2 ) and once redefining (i2 , i2 ) := (m2 , i2 ).  5.2  Theoretical cost bounds  The theoretical cost bounds in the two-state case are more difficult to establish. The following proposition shows, for a subset of problems, that the algorithm performs very efficiently. Proposition 7. Consider algorithm 4 for exploiting monotonicity only (i.e., using algorithm 1 instead of 3). Suppose that, for each i1 , i2 , U (i1 , i2 , ·) has a unique maximizer g(i1 , i2 )—monotone 26  in both arguments—and that n1 , n2 , n0 ≥ 4. Let λ ∈ (0, 1] be such that for every j ∈ {2, . . . , n2 − 1}, one has g(n1 , j) − g(1, j) + 1 ≤ λ(g(n1 , j + 1) − g(1, j − 1) + 1). Then for any ε > 0 and any (n1 , n2 , n0 ), there is a σ(n1 , n2 , n0 ) > 0 such that U is evaluated fewer than (1+λ−1 ) log2 (n1 )n0 nκ2 +3(1+λ−1 )n0 nκ2 +4n1 n2 +2n0 log2 (n1 )+6n0 times where κ = log2 (1+λ). Moreover, for n1 = n0 =: n and n1 /n2 = ρ with ρ a constant, the cost is O(n1 n2 ) with a hidden constant of 4 if (g(n1 , j) − g(1, j) + 1)/(g(n1 , j + 1) − g(1, j − 1) + 1) is bounded away from 1 for large n. To see the large potential benefit of the algorithm, note that in the special case of n1 = n2 = n0  =: n, the O(n1 n2 ) behavior is O(n2 ), which grows increasingly efficient compared to the naive  algorithm’s n3 cost. I now turn to how the algorithm fairs in the incomplete markets model.  5.3  Empirical speedup  Figure 12 reveals the evaluation counts per state σ ≈ 0 (in the top panel) and σ = 10−3 (in the bottom panel). As suggested (though not guaranteed for all U ) by proposition 7, for σ ≈ 0 the two-state monotonicity algorithm’s evaluations counts fall and seem to level off at slightly more than 3.0 by the time #A = 500 and #E = 50 are reached. This is around a 167-fold improvement on brute force and is obtained without an assumption on concavity. When also exploiting concavity, the evaluation counts fall to around 2.4, a 208-fold improvement on brute force. As it is always necessary to have at least an evaluation count of 1 (which is required simply to evaluate U at a known optimal policy), the algorithm is extremely efficient in this case. With larger σ, the evaluation counts continue to rise in grid sizes, resulting in less dramatic speedups. For instance, at the largest grid size, the two-state algorithms and monotonicity with concavity all have evaluation counts of around seventeen. While this is not the dramatic 200 times better than brute force, it is still twenty-nine times better.  6  Conclusion  Taste shocks have a large number of advantages, but they also imply a large computational cost when using naive algorithms. By using the proposed algorithms, this cost can be drastically reduced if the objective function exhibits increasing differences or concavity. This behavior was proven theoretically for very small taste shocks and demonstrated empirically for larger taste shocks. Consequently, the algorithms proposed in this paper should enable the computation of many new and challenging models.  27  16 14 12 10 8 6 4 2 50  100  150  200  250  300  350  400  450  500  100  150  200  250  300  350  400  450  500  30 25 20 15 10 5 50  Figure 12: Evaluations per state  28  References S. R. Aiyagari. Uninsured idiosyncratic risk and aggregate savings. Quarterly Journal of Economics, 109(3):659–684, 1994. C. Arellano, Y. Bai, and G. Mihalache. Monetary policy and sovereign risk in emerging economies (NK-Default). Working Paper 19-02, Stony Brook University, 2019. T. Bewley. A difficulty with the optimum quantity of money. Econometrica, 51(5):145–1504, 1983. M. Bruins, J. A. Duffy, M. P. Keane, and A. A. Smith, Jr. Generalized indirect inference for discrete choice models. Mimeo, July 2015. S. Chatterjee and B. Eyigungor. Maturity, indebtedness, and default risk. American Economic Review, 102(6):2674–2699, 2012. S. Chatterjee and B. Eyigungor. Continuous Markov equilibria with quasi-geometric discounting. Journal of Economic Theory, 163:467–494, 2016. S. Chatterjee, D. Corbae, K. Dempsey, and J.-V. R´ıos-Rull. A theory of credit scoring and competitive pricing of default risk. Mimeo, August 2015. D. Childers. Solution of rational expectations models with function valued states. Mimeo, 2018. K. X. Chiong, A. Galichon, and M. Shum. Duality in dynamic discrete-choice models. Quantitative Economics, 1:83–115, 2016. M. Dvorkin, J. M. S´ anchez, H. Sapriza, and E. Yurdagul. Sovereign debt restructurings: A dynamic discrete choice approach. Working paper, Federal Reserve Bank of St. Louis, 2018. G. Gordon and P. A. Guerron-Quintana. On regional migration, borrowing, and default. Mimeo, 2018. G. Gordon and S. Qiu. A divide and conquer algorithm for exploiting policy function monotonicity. Quantitative Economics, 9(2):521–540, 2018a. G. Gordon and S. Qiu. Supplement to “A divide and conquer algorithm for exploiting policy function monotonicity”. Mimeo, 2018b. G. Gordon and P. A. Querron-Quintana. A quantitative theory of hard and soft sovereign defaults. Mimeo, 2018. J. C. Hatchondo and L. Martinez. Long-duration bonds and sovereign defaults. Journal of International Economics, 79(1):117–125, 2009. J. C. Hatchondo, L. Martinez, and C. Sosa-Padilla. Voluntary sovereign debt exchanges. Journal of Monetary Economics, 61:32–50, 2014. 29  J. C. Hatchondo, L. Martinez, and C. Sosa-Padilla. Debt dilution and sovereign default risk. Journal of Political Economy, 124(5):1383–1422, 2016. B. Heer and A. Maußner. Dynamic General Equilibrium Modeling: Computational Methods and Applications. Springer, Berlin, Germany, 2005. M. Huggett. The risk-free rate in heterogeneous-agent incomplete-insurance economies. Journal of Economic Dynamics and Control, 17(5-6):953–969, 1993. F. Iskhakov, T. H. Jørgensen, J. Rust, and B. Schjerning. The endogenous grid method for discretecontinuous dynamic choice models with (or without) taste shocks. Quantitative Economics, 8(2): 317–365, 2017. K. L. Judd. Numerical Methods in Economics. Massachusetts Institute of Technology, Cambridge, Massachusetts, 1998. J. Laitner. Household bequest behavior and the national distribution of wealth. The Review of Economic Studies, 46(3):467–483, 1979. R. D. Luce. Individual Choice Behavior: A Theoretical Analysis. Wiley, New York, 1959. D. McFadden. Conditional logit analysis of qualitative choice behavior. In P. Zarembka, editor, Frontiers in Econometrics, chapter 4. Academic Press, 1974. R. D. McKelvey and T. R. Palfrey. Quantal response equilibria for normal form games. Games and Economic Behavior, 10:6–38, 1995. G. Mihalache. Sovereign default resolution through maturity extension. Mimeo, 2019. J. Rust. Optimal replacement of GMC bus engines: An empirical model of Harold Zurcher. Econometrica, 55(5):999–1033, 1987. C. Santos and D. Weiss. “Why not settle down already?” A quantitative analysis of the delay in marriage. International Economic Review, 57(2):425–452, 2016. G. Tauchen. Finite state Markov-chain approximations to univariate and vector autoregressions. Economics Letters, 20(2):177–181, 1986. D. M. Topkis. Minimizing a submodular function on a lattice. Operations Research, 26(2):305–321, 1978.  A  Calibration [Not for publication]  For the SIM model, I follow the calibration in Aiyagari (1994) fairly closely, using constant relative risk aversion (CRRA) of 3, a discount factor of β = 0.96, depreciation of δ = 0.08, and a capital 30  share of α = 0.36. The earnings process follows an AR(1) with a persistence parameter of 0.9 and innovation standard deviation of 0.4. The asset grid is linearly spaced from zero to δ 1/(α−1) , where the maximum is essentially that used by Aiyagari. I discretize the income process using Tauchen’s (1986) method with a varying number of points (depending on whether the one- or twostate algorithm is being used) and a linearly spaced grid covering three unconditional standard deviations. For the sovereign default model, I follow closely the calibration in Chatterjee and Eyigungor (2012). The default cost φ(y) = max(0, yφ0 + y 2 φ1 ) with (φ0 , φ1 ) = (−0.18819, 0.24558). The discount factor is β = 0.95402, and flow utility has CRRA of 2. The income process is an AR(1) with a persistence parameter of 0.9485 and innovation standard deviation of 0.27092. The maturity rate λ = 0.05, the coupon rate is z = .03, and the probability of escaping autarky is ξ = 0.0385. I discretize the income process using Tauchen’s method with 300 points and a linearly spaced grid covering three unconditional standard deviations. (A large number of discretization points smooths out jumps in T ◦ q, allowing a smaller taste-shock size to be used while still obtaining convergence.) The bond grid is linearly spaced from -0.7 to 0.  B  Proofs [Not for publication]  0 0 Proof of proposition 1. L∗σ (i0 |i) =  Since i is numerically relevant at  i, 0P(i|i) ≥ ε. By  definition,  0 P(i |i) P(i |i) ε Lσ (g(i), i0 |i) = σ log P(g(i)|i) . Since P(i0 |i) ≥ ε, σ log P(g(i)|i) ≥ σ log P(g(i)|i) . Lastly, since   ε P(g(i)|i) ≤ 1, σ log P(g(i)|i) ≥ σ log (ε) =: Lσ . Hence, P(i0 |i) ≥ ε implies L∗σ (i0 |i) ≥ Lσ .  Proof of proposition 2. Recall Lσ (j 0 , i0 |i) = U (i, i0 )−U (i, j 0 ). Increasing differences implies U (i, i0 )− U (i, j 0 ) < U (j, j 0 ) − U (j, j 0 ) if i0 > j 0 and j > i. Consequently, Lσ (j 0 , i0 |·) is increasing if i0 > j 0 . Similarly, increasing differences implies U (i, i0 ) − U (i, j 0 ) > U (j, j 0 ) − U (j, j 0 ) if j 0 > i0 and j > i. Hence, Lσ (j 0 , i0 |·) is decreasing if j 0 > i0 . Summarizing, ( Lσ (j 0 , i0 |·) is  decreasing if j 0 > i0 increasing if j 0 < i0  .  (13)  Now suppose L∗σ (i0 |i) = Lσ (g(i), i0 |i) < Lσ , i0 < g(i) and j > i. Then Lσ (g(i), i0 |·) is decreasing. Since j > i, one has Lσ (g(i), i0 |i) ≤ Lσ (g(i), i0 |j) < Lσ as well. Therefore, Lσ > Lσ (g(i), i0 |j) = U (j, i0 ) − U (j, g(i)) ≥ U (j, i0 ) − U (j, g(j)) = Lσ (g(j), i0 |j) = L∗σ (i0 |j), where the second inequality follows from the optimality of g(j). 31  Symmetrically, suppose L∗σ (i0 |i) < Lσ , i0 > g(i), and j < i. Then Lσ (g(i), i0 |·) < Lσ is increasing. Since j < i, one has Lσ (g(i), i0 |j) ≤ Lσ (g(i), i0 |i) < Lσ as well. Therefore, Lσ > Lσ (g(i), i0 |j) = U (j, i0 ) − U (j, g(i)) ≥ U (j, i0 ) − U (j, g(j)) = L∗σ (i0 |j), where the second inequality follows from the optimality of g(j). Proof of proposition 3. Since U is concave, L∗σ (i0 |i) = U (i, i0 ) − U (i, g(i)) is concave (thinking of L∗ as a function taking {1, . . . , n} × {1, . . . , n0 } to R, which is denoted L∗σ (i0 |i) rather than L∗σ (i, i0 )). To prove L∗σ (i0 |i) ≥ Lσ and L∗σ (i0 + 1|i) < Lσ implies L∗σ (j 0 |i) < Lσ for all j 0 > i0 , suppose not. Then there is some j 0 > i0 having L∗σ (j 0 |i) ≥ Lσ (and clearly j 0 > i0 + 1). Consequently, the upper contour set {˜i0 |L∗σ (˜i0 |i) ≥ Lσ } has i0 and j 0 > i0 in it but not i0 + 1. Therefore, L∗σ is not concave, which is a contradiction. The fact that L∗σ (i0 |i) ≥ Lσ and L∗σ (i0 − 1|i) < Lσ implies L∗σ (j 0 |i) < Lσ for all j 0 < i0 can be proved in the same way. Proof of proposition 4. Note that L∗σ (i0 |i) = U (i, i0 ) − U (i, g(i)) is invariant to σ. Additionally, L∗σ (i0 |i) ≤ 0 with equality if and only if i0 = g(i). The cutoff Lσ = σ log(ε) < 0 can be made arbitrarily close to zero by decreasing σ. Consequently, for a small enough σ, the condition L∗σ (i0 |i) ≥ Lσ is satisfied if and only if (i0 , i) = (g(i), i). In this case, l(i) and h(i) equal g(i). For l and h equal to g, step 4 evaluates U (m, i0 ) only at {g(i), . . . , g(i)}. This is the same as in Step 3 of Gordon and Qiu (2018a). Similarly, step 2 (3) evaluate U (1, i0 ) (U (n, i0 )) only at {1, . . . , n0 } ({g(1), . . . , n0 }). This is the same as in step 1 (2) of Gordon and Qiu (2018a). Since the same divide and conquer step is used, the total number of evaluations of U does not exceed (n0 − 1) log2 (n − 1) + 3n0 + 2n − 4 (proposition 1 in Gordon and Qiu, 2018a). Simplifying this expression by making it slightly looser gives the desired result. Proof of proposition 5. As in the proof of proposition 4, we note that for small enough σ one has l(i) and h(i) equal to g(i). In this case, the algorithm reduces to binary concavity as stated in Gordon and Qiu (2018a) except for steps 3 and 4 of algorithm 2. For small enough preference shocks, each of these requires 1 additional evaluation—specifically U (i, g(i) − 1) and U (i, g(i) + 1), respectively—, which results in 2n extra evaluations overall. Gordon and Qiu (2018a) show binary concavity for a fixed i requires at most 2dlog2 (n0 )e − 1 evaluations if n0 ≥ 3 and n evaluations for n0 ≤ 2. Consequently, for n0 ≥ 3 there are at most n·(2+2dlog2 (n0 )e−1) evaluations for proposition 5 if n0 ≥ 3. Simplifying, there are fewer than 2n log2 (n0 ) + 3n evaluations required. Proof of proposition 6. As in the proof of proposition 4, we note that for small enough σ one has l(i) and h(i) equal to g(i). In this case, the algorithm reduces to binary monotonicity with binary concavity as stated in Gordon and Qiu (2018a) except for steps 3 and 4 of algorithm 2. For small 32  enough preference shocks, each of these requires 1 additional evaluation—specifically U (i, g(i) − 1) and U (i, g(i) + 1), respectively—, which results in 2n extra evaluations overall. Consequently, the 6n + 8n0 + 2 log2 (n0 − 1) − 15 bound in proposition 1 of Gordon and Qiu (2018a) becomes 8n + 8n0 + 2 log2 (n0 − 1) − 15. Simplifying slightly by making the bounds looser gives the desired result. Proof of proposition 7. As in the proof of proposition 4, we note that for small enough σ, one has l and h equal g. Consequently, when using algorithm 1 at each step, the algorithm reduces to the two-state binary monotonicity algorithm in Gordon and Qiu (2018a). Proposition 2 of Gordon and Qiu (2018a) then gives the desired upper bound and O(n1 n2 ) behavior.  33
Federal Reserve Bank of St. Louis, One Federal Reserve Bank Plaza, St. Louis, MO 63102