The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.
Federal Reserve Bank of Chicago Internal Immigrant Mobility in the Early 20th Century: Experimental Evidence from Galveston Immigrants Daniel Aaronson, Jonathan Davis, and Karl Schulze February 2018 WP 2018-04 Working papers are not edited, and all opinions and errors are the responsibility of the author(s). The views expressed do not necessarily reflect the views of the Federal Reserve Bank of Chicago or the Federal Reserve System. * Internal Immigrant Mobility in the Early 20th Century: Experimental Evidence from Galveston Immigrants Daniel Aaronson Federal Reserve Bank of Chicago Jonathan Davis University of Chicago Karl Schulze Federal Reserve Bank of Chicago February 2018 Abstract Between 1907 and 1914, the “Galveston Movement,” a philanthropic effort spearheaded by Jacob Schiff, fostered the immigration of approximately 10,000 Russian Jews through the Port of Galveston, Texas. Upon arrival, households were given train tickets to pre-selected locations west of the Mississippi River where a job awaited. Despite the program’s stated purpose to locate new Russian Jewish immigrants to the Western part of the U.S., we find that almost 90 percent of the prime age male participants ultimately moved east of the Mississippi, typically to large Northeastern and Midwestern cities. We use a standard framework for modeling location decisions to show destination assignments made cities more desirable, but this effect was overwhelmed by the attraction of religious and country of origin enclaves. By contrast, there is no economically or statistically significant effect of a place having a larger base of immigrants from other areas of the world and economic conditions appear to be of secondary importance, especially for participants near the bottom of the skill distribution. Our paper also introduces two novel adjustments for matching historical data – using an objective measure of match quality to fine tune our match scores and a deferred acceptance algorithm to avoid multiple matching. Aaronson: daaronson@frbchi.org; Davis: jonmvdavis@gmail.com; Schulze: kschulze@frbchi.org. We thank participants at the Social Science History Association Conference and especially Phyllis Aaronson for spreading the story of her father, a Galveston immigrant who wound up in NYC. The views expressed in this paper are not necessarily those of the Federal Reserve Bank of Chicago or the Federal Reserve System. 1 I. Introduction One of the more compelling but challenging questions in the social sciences is the extent to which place determines socioeconomic success. The best evidence to date stems from two major U.S. programs -- the Gautreaux assisted housing project in Chicago and its national successor, Moving to Opportunity (MTO). Each assigned low-income residents to more affluent and racially mixed neighborhoods based on quasi-random and random designs, respectively. Our paper is the first that we are aware of to study an immigration program that in many ways is a more ambitious, early precursor to these influential experiments. Starting in the mid-19th century, a mass migration brought tens of millions of European immigrants to the U.S. (Hatton and Williamson 1994). Many disembarked at Ellis Island and then proceeded to a variety of Eastern and Midwestern urban areas. East European Jewish immigrants that were part of this wave clustered heavily in New York City. By 1907, the large, unprecedented influx into already dense parts of NYC prompted a prominent local Jewish philanthropist, Jacob Schiff, to partly fund a program that ultimately steered roughly 10,000 Russian Jews from more than 7,000 families to a new U.S. gateway – the port of Galveston, Texas. Upon arrival in Galveston, each family was given train tickets to destinations west of the Mississippi River that were determined by matching the male head’s occupation with local demand for their skill. Since Schiff’s immigrants had little input on their initial assignment and indeed much of the decision was based primarily on a single observable characteristic, this setting in many ways mirrors later mobility experiments.1 However, we cannot study the role of place in shaping life 1 Similar natural experiments in Sweden and Denmark are described in Edin, Fredriksson, and Aslund (2003) and Damm and Dustmann (2014). 2 outcomes because, despite the stated purpose of the program, roughly 88 percent of prime-age Galveston men sent to the western U.S. wound up east of the Mississippi River, and especially in NYC and Chicago. To explain this lack of compliance, we use a multinomial logit framework inspired by a Roy model for location decisions. Each individual chooses from the 48 possible U.S. states, with demographic and economic characteristics determining relative attractiveness. The program had a clear impact. Immigrants were more likely to reside in their assignment state a decade and more later; in the model, the increase in their latent utility for their assignment state is equivalent to increasing the number of local Russian immigrants, the strongest determinant of locational preference, by a factor of 20, holding all other features of the state fixed. However, this seemingly large effect resulted in only a minor shift in the ultimate location decisions of the Galveston immigrants since other Russian and Jewish immigrants were overwhelmingly concentrated in Northeastern and Midwestern states. For example, in 1910, the state of New York had 17 times more Russian immigrants than North Dakota, the state west of the Mississippi with the largest Russian immigrant population, and differed on a number of other dimensions that may have made it more attractive as well. As a result, the Galveston assignment could not overcome the appeal of living in areas with a high density of Russians or Jews. For a typical state, assignment corresponds to a highly significant 1.9 percent increase in the probability of choosing that state. While the number of Russian immigrants is especially powerful in predicting where Galveston immigrants ultimately resided, by contrast there is no economically or statistically significant effect from a place having a larger base of immigrants from other areas of the world. Moreover, while we find an association between observable economic conditions and location, 3 these appear to be of secondary importance to our ethnic measures, especially for participants near the bottom of the skill distribution. In total, we interpret this evidence as consistent with ethnic and religious networks playing a vital role in location decisions for early 20th century Russian Jewish immigrants, paralleling the experiences of many other immigrant waves to the U.S. over the remainder of the 20th century (e.g. Robinson 1998; Haines 1989; Bartel 1989; Cutler, Glaeser, and Vigdor 2008; Damn 2009; and Beaman 2012). Indeed, this result has some parallel to MTO, where participants ultimately sorted into neighborhoods with less poverty but similar racial composition to their original communities (Orr et al. 2003). The main empirical challenge we face is successfully matching administrative records from the Port of Galveston to the 100 percent population counts from the 1910 to 1940 decennial censuses. Bailey et al (2017) provide compelling evidence that hand matching administrative records is by far the gold standard methodology. In contrast, standard matching algorithms that principally rely on phonetic codes, such as Soundex or the New York State Identification and Intelligence System (NYSIIS), can have high rates of both type I and type II errors. However, in practice, hand matching is far more costly than the algorithmic approach. We build on existing algorithmic methods (e.g. Ferrie 1996 and Abramitzky et al. 2012 but especially Feigenbaum 2016) in two novel ways. First, like Feigenbaum (2016), we tune our scoring of potential matches based on an objective measure of match quality. However, rather than relying on a training sample of hand-matches, we measure match quality using the share of Galveston immigrants who arrived after 1910 who were improperly (because they were not in the U.S.) matched to the 1910 census. Using this calibrated scoring procedure, the pre-1910 arrival match rate with the 1910 census is 73 percent whereas the post-1910 match rate is 11 percent. Moreover, we get similar match rates if instead we calibrate our procedure using half 4 our data and evaluate the matches using a hold-out sample. This method is a scalable approach to improving match quality between immigration records and decennial censuses. Our second nonstandard technique is to select best matches using a deferred acceptance algorithm (DAA) (Gale and Shapley 1962). The DAA allows us to keep Galveston immigrants who are matched to multiple census observations but avoid the real possibility that multiple Galveston immigrants are matched to the same individual in a Census. Nix and Qian (2015) show that observations with multiple potential census matches substantially increase the match rate. To the best of our knowledge, the potential for multiple matches on the same census record is not typically addressed in the literature (an exception is Feigenbaum 2016). Using the diagnostics developed in Bailey et al (2017), we show that our approach yields a match rate nearly 50 percentage points higher than Ferrie (1996) while increasing the rate of false positives by only about 10 percentage points. Our paper is organized as follows. We begin by providing background on the Galveston program. Section III describes the data and our approach to matching individuals over time, including an assessment of our matching algorithm. Section IV illustrates the assignment and eventual location of Galveston immigrants and their demographic and economic characteristics. A simple model of geographic choice is described and implemented in Section V. Section VI briefly concludes. II. A Brief History of the Galveston Movement The second half of the 19th and beginning of the 20th century is often referred to as the age of mass migration (Hatton and Williamson 1994; Abramitzky, Boustan, and Eriksson 2012, 2014). Tens of millions of Europeans entered the U.S., with many establishing residency in a 5 handful of large Northeastern and Midwestern cities. Starting in the 1880s, Jews from Russia began to emigrate in mass as well and likewise generally clustered in those same large cities, most prominently NYC.2 Increasing congestion worried prominent American Jewish philanthropists Jacob Schiff and Baron de Hirsch, who feared the sweeping new waves of Jewish immigrants to NYC would cause nativist backlash and ultimately stricter immigration laws. Moreover, they believed housing and labor market conditions would be better in places that had not experienced the same rapid spike in population. De Hirsch and his philanthropic fund, of which Schiff was a member, initially created the Industrial Removal Office (IRO), tasked to find jobs for Jewish immigrants outside of NYC. The trustees “tried out almost every possible solution - agricultural colonization, suburbanization (on a small scale), the removal of industries to outlying districts, the transportation of families to smaller towns and industrial centers, and so on” (Best 1978, p. 44). Eventually, Schiff pursued an alternative. Pledging $500,000 (just under $13 million in 2016 dollars), Schiff’s plan involved diverting new Russian Jewish immigrants away from NYC.3 The Jewish Territorial Organization (JTO) was charged with recruiting “only young, sturdy immigrants, ready to do whatever work was available” (Best 1978, p. 49) and ensuring those recruits were transported to Bremen, Germany to embark on ships headed to the Port of Galveston. Best (1978) termed the resulting program “Jacob Schiff’s Galveston Movement.” 2 See Spitzer (2016) for a discussion of the causes of Russian Jewish emigration. To be clear, the money was not used for trans-Atlantic transportation. Schiff insisted that immigration laws be strictly followed and reprimanded a member of Hilfsverein, a European Jewish aid organization, “for assuring some immigrants that part of their traveling expenses would be paid if necessary. He was willing only that they should be assured that, if the situation required it, the JIIB [Jewish Immigrants’ Information Bureau] would contribute to the expense of their transportation from Galveston to their ultimate destination” (Best, p. 52). 3 6 Galveston was chosen for several reasons. First and foremost, Galveston was the farthest port from NYC that serviced Europeans. Logically, destinations farther from NYC were considered more appealing. Moreover, the southeastern states were rejected because of fear of labor market competition from cheap black laborers. Second, Bremen-to-Galveston was already an established shipping route operated by the North German Lloyd line. Finally, Galveston was linked to the railroad network servicing the Midwest and West. The railroads played a key role in Schiff’s plan. Upon arrival, immigrants were almost immediately dispersed via the rail lines to pre-determined towns west of the Mississippi River. Destination locations were scouted by the IRO and chosen based on demand for specific jobs (e.g. tailor versus shoemaker) and the size of their Russian Jewish population. Over time, the towns would hopefully build a network of Russian Jews that would make them more appealing to future cohorts. The first group of immigrants arrived on the SS Cassel on July 1, 1907. The 54 program participants were aged 18 to 42 and included “locksmiths, bakers, bookkeepers, noodle and macaroni makers, bookbinders, electricians and shoemakers, as well as many others.” The group was sent to “Colorado, Iowa, Nebraska, Minnesota, Missouri, Illinois, Oklahoma, Texas, and Wisconsin; to sizable cities like Minneapolis, Kansas City, and Milwaukee, and to smaller ones like Davenport, Quincy, and Dubuque” (Best 1978, p. 55). The Galveston Movement grew from there. Figure 1 provides a time-series of the number of “Hebrew” immigrants that landed in Galveston from 1900 to 1920. Up until 1906, a Hebrew immigrant was rarely routed to Galveston. That changed markedly between 1907 and 1914 (the two vertical lines on the chart), when over 10,000 Jewish immigrants, from more than 7,000 families, arrived. While 10,000 immigrants is a large number relative to the trivial flows prior to 1907, even at its peak, the number of Jewish immigrants arriving in Galveston never 7 exceeded 4 percent of the annual flow of Russian Jews to the U.S.4 The program faced several challenges that restricted its size, including uncooperativeness from the JTO and quality issues with the North German Lloyd ship line. Schiff’s program was abruptly terminated in 1914, at which point, he had dispersed roughly half of his original half-million dollar pledge. III. Data and Matching Methods A. Data The Port of Galveston provides a complete list of names, arrival dates, and basic demographic characteristics of the 132,155 passengers that debarked between 1844 and 1949.5 We extract 10,076 individuals who departed from the Port of Bremen, Germany,6 arrived in Galveston between July 1907 and December 1914, and reported their ethnicity as Hebrew and country of origin as Russia. Our analysis restricts attention to the 5,911 men who were aged 15 to 44 in 1910. We match this sample of Galveston arrivals to the 100 percent population counts from the 1910 to 1940 decennial censuses.7 In the next subsection, we detail our matching procedure and assess the quality of matches relative to other reasonable benchmarks. In the meantime, Table 1 provides descriptive statistics about the prime age male Galveston immigrants 4 Working-age men were overrepresented among Galveston immigrants. We calculate Galveston represented the port destination of about 7 percent of all Russian Jewish working-age male immigrants during the Movement era. These figures are calculated from a count of the number of Russian Hebrew immigrants arriving in Galveston between 1908 and 1914 divided by the1920 census count of the number of Russian immigrants who arrived between 1908 and 1914 and report their mother tongue as Hebrew. We think of this as an upper bound since some Russian Jewish immigrants may report their mother tongue as Russian. But it is also possible that Galveston immigrants and other Russian Jewish immigrants may return to Russia at a differential rate. 5 See http://ghf.destinationnext.com/immigration/Search.aspx. Ethnicity is not included in the version of the data maintained by the Port of Galveston. We collected this information from a version of the data maintained by Ancestry.com. 6 Nearly 90 percent of all arrivals to the Port of Galveston during this period departed from Bremen. Most other arrivals departed from Mexican ports but none of these were recorded as Hebrew. 7 The 100 percent census files were generously provided to us by the University of Minnesota Population Center via the data collection efforts of ancestry.com. For information on the IPUMS samples, see Steven Ruggles, J. Trent Alexander, Katie Genadek, Ronald Goeken, Matthew B. Schroeder, and Matthew Sobek, Integrated Public Use Microdata Series: Version 5.0 [Machine-readable database], Minneapolis: University of Minnesota, 2010. 8 and, for context, samples of comparable immigrants that match the same demographic profile and arrival time as our Galveston sample (males, aged 15 to 44 in 1910 and arriving between 19088 and 1914). This includes 10,560 non-Russian, non-Hebrew immigrants who likewise departed from the Port of Bremen and immigrated through Galveston (Column 2), and for completeness, all Russian and non-Russian men who immigrated between 1908 and 1914 in the 1920 census (Columns 3 and 4). Among our main sample of Galveston immigrants, 92 percent declared their U.S. destination as a place west of the Mississippi River, consistent with Schiff’s plan.9 The most common destination was Texas itself (37 percent) and another 46 percent were assigned to the Midwest, especially Missouri, Iowa, and Minnesota. The remaining 17 percent were spread throughout the Southwest and Pacific regions. Relative to the non-Hebrew Galveston arrivals, the Galveston Hebrew immigrants were slightly younger (25.6 versus 27.6 years), less likely to list their final destination as Texas (37.4 versus 70.3 percent), and appear to be more skilled. Almost half the Hebrew immigrants reported their occupation as craftsman (46.3 percent), whereas 48.4 percent of non-Hebrew arrivals to Galveston were farmers.10 B. Matching Procedure Historical record linkage typically relies on phonetic codes, such as the New York State Identification and Intelligence System (NYSIIS), to resolve minor variations in spelling (e.g. Ferrie 1996; Abramitzky et al 2012; Long and Ferrie 2013). Given the ethnic heritage of our 8 While the Galveston program began in July 1907, we can only observe year of arrival in the census. Consequently, we only include those arriving between 1908 and 1914. 9 While Schiff’s wish was to settle families west of the Mississippi River, if sufficient cooperation from a Western location could not be obtained, a family could be sent to states contiguous to the Mississippi River. Of the 8 percent recording destinations east of the Mississippi River, most headed to Illinois (3.6 percent) or Tennessee (1.7 percent). 10 Other common occupations include operatives (15 percent), farm laborer (8 percent), and clerks (8 percent). 9 sample, we further address phonetic name coding with the Daitch-Mokotoff (DM) Soundex system (JewishGen). This variant of the Soundex system allows for common variants of Yiddish and Slavic name modifications by matching names according to their pronunciation rather than their spelling.11 Bailey et al (2017) convincingly show that an over-reliance on phonetic codes increases rates of both type I and type II error. Consequently, it is becoming more standard to use phonetic codes solely as a “blocking” mechanism to eliminate non-matches, as in Herzog, Scheuren, and Winkler (2007). That is, we define the universe of potential matches for each Galveston participant as individuals in a census having a DM Soundex match or an NYSIIS match on last name. With our potential set of matches, we then calculate a score for each Galveston-census pair based on Russian origin, age, year of immigration, Russian or Hebrew as native language, and the Jaro-Winkler string similarity scores for first and last names and their interaction.12 Importantly, the Jaro-Winkler metric allows us to incorporate meaningful variation in name spellings that go beyond simple phonetic matches. An important decision in matching is how to combine all of this information to select the best possible set of matches. We move beyond the prior literature in calibrating the weight put on our various indicators by exploiting a key feature of the data. In particular, there cannot be matches to the 1910 census for Galveston participants that arrive after 1910.13 This observation 11 Our implementation of DM is available upon request. For more information on specific rules, see http://www.jewishgen.org/InfoFiles/soundex.html#DM. 12 Unfortunately, the 1940 census did not ask year of immigration or, universally, native language. Therefore, we expect lower quality matches in 1940 relative to earlier years where those two variables are available. Indeed, as we discuss below, we get higher match rates, as well as higher false-positives, in 1940. Consequently, we put less emphasis on the 1940 match. 13 We assume that post-1910 immigrants were not in the U.S. at the time of the 1910 Census (April 1910). We checked for evidence of an earlier migration with the Galveston records that date back to the mid-19th century and were unable to find any examples. That, of course, does not eliminate the possibility of an earlier migration to another port. Our synthetic data tests, described below, are also inconsistent with this concern. 10 implies that parameters can be chosen to maximize the match rate between the 1907 to 1909 Port records and the 1910 census and minimize the match rate between the 1911 to 1914 Port records and the 1910 census. In addition, the short period from arrival to the 1910 census gives us some further contextual detail that we use to identify potentially false positive matches.14 We incorporate these insights into the tuning of our matching algorithm, the details of which can be found in Appendix A. Our approach is similar to that of Feigenbaum (2016) who sets parameters in his matching algorithm by training the scoring on a hand-matched subsample of his data. We use this calibrated scoring rule to assign a score to every potential match between a Galveston and a census record. A second unique feature of our procedure is the use of a deferred acceptance algorithm (DAA) to select the best match for each Galveston record. The DAA matches a Galveston record to the unmatched census record with the highest match score. Consequently, the procedure addresses the possibility that multiple Galveston individuals can simultaneously match to the same census individual and, therefore, by construction create at least one incorrect match. Therefore, the DAA can improve match quality relative to current algorithmic methods that do not address the potential for multiple matches on the same census record.15 Appendix A describes the DAA algorithm in more detail. Matches with non-positive scores or extreme differences in ages are discarded.16 14 This additional detail involves flagging matches if a) he reports citizenship in the 1910 census (not possible for 1907-09 immigrants), b) he report no children in the 1910 census if the individual arrived in Galveston with children, or c) the potential match results in a difference of more than 5 years in age or arrival in the U.S. 15 To be concrete within our context, we believe the current literature focuses on multiple census records being linked to the same Galveston record and ignores multiple Galveston records being linked to the same census record. To our knowledge, the only matching paper to consider this feature in the specification of their algorithm is Feigenbaum (2016). 16 The cutoff for differences in ages is a calibrated parameter. We fix the cutoff for overall score; the inclusion of a constant parameter in the creation of the score means this is in effect a flexible cutoff. Intuitively, we expect a higher constant to reflect general uncertainty in the correct match conditional on being a phonetic match. 11 C. Alternative Procedure We assess the performance of our preferred match algorithm relative to the widely-used procedure described in Ferrie (1996) and Long and Ferrie (2013).17 This method has the advantage of minimizing false positives but at the cost of a smaller and less representative sample. We implement the Ferrie matching method as follows: 1. Consider anyone who has an NYSIIS match on last name in the census. 2. Restrict to men older than 10 at arrival in the Galveston data and older than 10 in the census. 3. Drop if the difference in ages is greater than 5. 4. Drop if the difference in arrival year is greater than 5. 5. For 1910 and 1920 only, drop if the individual has children in the Galveston records but does not have children in the census. 6. Drop any Galveston individual who has more than 10 potential census matches at this point. 7. Take the individual for whom there is the smallest difference in ages and then the smallest difference in year of arrival. If there are multiple candidates for this match, drop the Galveston individual and any potential census matches. We depart slightly from the restrictions imposed by Ferrie (1996) in order to adapt his algorithm to our immigrant setting. These differences are: a) excluding restrictions on state of birth, b) swapping household head status with presence of children (step 5), c) setting the maximum age differences to 5, which lies between the restrictions in Ferrie (1996) and Long and Ferrie (2013), and d) moving step 6 to after steps 3 to 5. Dropping those with many potential matches and with 17 We discuss how our results vary by matching procedure in section V.B. 12 duplicate final matches (steps 6 and 7) decreases sample sizes, often considerably, and may decrease representativeness of the final sample since common names are those that typically get removed. D. Assessing the Matches We assess the quality of our matches in two ways. First, we compare the match rates to the 1910 Census for those arriving prior to 1910 with those arriving after 1910. Using the full set of data, the pre-1910 arrival match rate is 73 percent and the post-1910 rate is 11 percent. These match rates imply the probability of a falsepositive conditional on matching, i.e. the type I error rate, is 11.2/73.4 = 15.2 percent. As an alternative exercise, we calibrate our scoring procedure using half our data and evaluate the matches using a hold-out sample. In that case, the pre-1910 match rate in the hold-out sample is 76.0 percent and the post-1910 match rate is 13.4 percent. Together, these imply a type I error rate of 17.7 percent. By comparison, Bailey et al (2017) estimate type I error rates of between 18 and 70 percent for some of the prominent algorithmic methods in the literature. In our main analysis, we drop these false-positive post-1910 matches. Our second test matches the Galveston data to a constructed synthetic dataset consisting of randomly perturbed Galveston records combined with known non-matches, a method used in Bailey et al (2017).18 Since Galveston data are included in both the master and synthetic datasets, we know if the final match is “true,” allowing us to assess the rate of type I errors. See Appendix B for details on how this exercise is constructed. 18 The known non-matches consist of Russians arriving between 1900 and 1905 and between 1916 and 1920 with arrival year imputed for added robustness. Our perturbation to the Galveston data closely follows Bailey et al (2017) with a few amendments (see Appendix B). 13 The 1910 through 1930 match rates for both the final data used in our analysis (Column 1) and the synthetic data (Column 2) are shown in Table 2. Match rates are, unsurprisingly, significantly higher for our preferred method compared to the benchmark Ferrie method. The synthetic test implies that up to 35.3 percent of our preferred matches to the 1920 census could be false.19 However, we believe that this number likely represents an upper limit on the false positive rate due to the great deal of noise, such as randomly assigning arrival years to the nonmatched Russian sample, introduced into the records. Indeed, the synthetic test’s error rate is at least twice that of the implied 15 to 18 percent false positive rate based on the 1910 census match tests described earlier in this section. Moreover, we have no way to assess whether the level of measurement error introduced to the synthetic data reflect similar noise in the real census data. Therefore, of more interest is the performance of our algorithm relative to existing methods rather than the absolute error and match rate. Taken together, the synthetic tests in Table 2 imply that our method drastically improves match rates relative to the Ferrie method – 70 to 73 percent compared to 23 to 25 percent -- while somewhat increasing the rate of falsepositive matches from about 23 percent using the Ferrie method to 35 percent with our preferred method. Our preferred method uses the single best match for each Galveston participant. We also tried incorporating match uncertainty by including up to five census matches with the highest scores for each person. Each match is weighted by the size of the match score, with the weights for each participant summing to unity. In Table 2, we call this method the Five Best Weighted 19 We compute a somewhat smaller match rate but a very similar false positive rate if we drop the small share of Galveston residents that initially were assigned east of the Mississippi River. 14 Matches. We find that the Five Best match rates are similar to our preferred method but with a substantially higher rate of false-positives. IV. Location and Characteristics of Matches We start with an overview of where the Galveston participants were initially assigned when they arrived in the U.S. and then show where they are located as of the 1920 census. Figure 2 displays the assignment location of Galveston immigrants both in the full sample (Panel a) and conditional on being matched to the 1920 census using our preferred match procedure (Panel b). The states of Texas, Iowa, Minnesota, and Missouri have the largest shares at respective rates of 37.4, 11.8, 10.9, and 10.1 percent. That these patterns do not change when we condition on matching (comparing Panels a to b) shows that there is little selection on assignment location. Figure 3 shows the 1920 locations of Galveston immigrants matched to the 1920 census using the Preferred, Five Best, and Ferrie methods, respectively (Panels a to c), and of a comparison sample of Russian-born men who also arrived in the U.S. between 1908 and 1914 drawn from the 1920 census (Panel d).20 The maps highlight three striking features. First, rather than staying west of the Mississippi River, Galveston immigrants largely moved east, and especially to the Northeast and Illinois (we quantify this observation below). Second, the locations chosen are highly correlated with those of the Russian-born/non-Galveston men plotted 20 Appendix Figures A1, A2, and A3 show analogous maps for 1910, 1930, and 1940. 15 in Figure 3d. Indeed, the state-level correlation of the locations of Galveston immigrants and the Russian men is 0.97.21 Third, these patterns are robust to matching criteria. While Galveston participants gravitate to states east of the Mississippi River with high Russian immigrant populations, this does not mean that assignment to a state lacks empirical content. Figure 4 combines the data embedded in Figures 2 and 3 to show the percentage of those remaining in their initially assigned state as of the 1920 census.22 While assignment compliance is overall low, it also varies substantially across states, and again this observation is irrespective of matching method. For example, California has a compliance rate of 12.4 percent using our preferred method while Texas’s rate is 2.4 percent. Table 3 complements the maps by showing summary statistics for our preferred matches at the time of their arrival in Galveston (Column 1) and in the 1910 to 1940 censuses (Columns 2 to 5). Validating our sample and match procedure, we find a) age increases by roughly 10 years between each decennial census23, b) the percent of the sample arriving prior to 1910 remains constant at 18 percent and corresponds to the unconditional rate at arrival displayed in Column 1, and c) certain variables such as marriage, naturalization, literacy in English, home ownership, self-employment, and managerial occupation display monotonic trends consistent with aging and assimilation with longer tenures since arrival.24 Interestingly, higher skilled jobs are much more common among the Galveston immigrants than the cohort of non-Galveston Russian immigrant that arrived during the same period (see Table 1). 21 See Appendix Figure A4. The correlation remains that high even if we discard New York. These figures ignore the small number of immigrants initially assigned east of the Mississippi River as well as states receiving five or fewer arrivals. 23 This pattern does not hold for the 1910 matches, since the sample is partly selected on pre-1910 arrivals, who tend to be older than their post-1910 counterparts. 24 For matches to the 1910 census, rather than 100 percent, 54.1 percent arrive prior to 1910 since this calculation is not inclusive of the year 1910 itself. 22 16 Table 3 also quantifies where Galveston’s immigrants ultimately were at each decennial census through 1940. Of the 88.4 percent that moved east by 1920,25 over 40 percent (37.9/88.4 percent) lived in New York and about 60 percent (52.4/88.4 percent) in the NY-NJ-PA region. Those NY rates exceed that of Russian-born/non-Galveston men who arrived between 1908 and 1914, a discrepancy possibly explained by our inability to calculate them specifically by Hebrew status among the non-Galveston Russian men. While only 11.6 percent live west of the Mississippi River in 1920, this number steadily increases over time to 18.5 percent by 1940, a trend driven by movement to California. These descriptive results mirror trends highlighted in Bartel (1989), who finds that post-1964 immigrants initially move into areas with high concentrations of their own ethnicity but become less sensitive to ethnic enclaves over time. V. Explaining Mobility Choices A. Conceptual Framework and Empirical Strategy To understand how demographic and economic characteristics influence residential mobility choices, we use a model pioneered by McFadden (1974) and previously applied to migration decisions by Dahl (2002). We assume individual 𝑖’s net utility from living in location 𝑘 is additively separable in the individual’s earnings y in location 𝑘, cost of moving c to location 𝑘, and tastes t over location 𝑘: (1) 𝑈𝑖𝑘 = 𝑦𝑖𝑘 − 𝑐𝑖𝑘 + 𝑡𝑖𝑘 25 For clarity, we define Louisiana and Minnesota as west of the Mississippi River, even though the river partially cuts through these states. 17 where 𝑡𝑖𝑘 encompasses non-wage location-specific factors such as demographic makeup, public amenities, geography, and weather. We can rewrite equation (1) as the sum of the average utility from moving to location 𝑘, 𝑉𝑘 , and an individual specific utility, 𝜈𝑖𝑘 : (2) 𝑈𝑖𝑘 = 𝑉𝑘 + 𝜈𝑖𝑘 . where 𝑉𝑘 is a hedonic function of the location’s characteristics: (3) 𝑉𝑘 = 𝐸[𝑦𝑖𝑘 − 𝑐𝑖𝑘 + 𝑡𝑖𝑘 |𝑥𝑖 , 𝑧𝑘 , 𝐴𝑠𝑠𝑖𝑔𝑛𝑒𝑑𝑖𝑘 ] = 𝛽(𝑥𝑖 ) ⋅ 𝑧𝑘 . 𝑥𝑖 is a vector of individual characteristics and 𝑧𝑘 is a vector of location characteristics. The individual specific component, 𝜈𝑖𝑘 , can be decomposed as follows: 𝜈𝑖𝑘 = 𝛿𝐴𝑠𝑠𝑖𝑔𝑛𝑒𝑑𝑖𝑘 + 𝜀𝑖𝑘 , where 𝐴𝑠𝑠𝑖𝑔𝑛𝑒𝑑𝑖𝑘 indicates whether individual 𝑖 is assigned to location 𝑘. 𝛿 can be interpreted as an individual’s net value of assignment to state k. 𝜀𝑖𝑘 represents all other idiosyncratic factors affecting individual 𝑖’s latent utility over state 𝑘. Individuals select whichever location yields the greatest utility. Let 𝑀𝑖𝑘 be an indicator for individual 𝑖 choosing location 𝑘. Specifically, 𝑀𝑖𝑘 equals 1 if: (4) 𝑉𝑘 + 𝜀𝑖𝑘 ≥ 𝑉𝑘 ′ + 𝜀𝑖𝑘 ′ 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑘 ′ ≠ 𝑘. Otherwise, 𝑀𝑖𝑘 equals 0. This simple model implies that a migration program like Galveston will shift an individual’s location choice from 𝑘 to 𝑘′ when: (5) 𝛿𝐴𝑠𝑠𝑖𝑔𝑛𝑒𝑑𝑖𝑘′ ≥ 𝛽(𝑥𝑖 ) ⋅ (𝑧𝑘 − 𝑧𝑘′ ) + 𝜀𝑖𝑘 − 𝜀𝑖𝑘 ′ . 18 Equation (5) has three implications about the importance of initial assignment on an individual’s ultimate location choice. First, location decisions are likely altered when 𝛿 is large. Second, when an individual is assigned to a location 𝑘 that differs from 𝑘′ along feature 𝑗, individuals (𝑗) (𝑗) will only change their location if 𝑗 is relatively unimportant, or specifically 𝛽 (𝑗) (𝑧𝑘 − 𝑧𝑘 ′ ) is small. Finally, assignment is more likely to affect behavior when an individual’s preference for a particular location is less idiosyncratic, or 𝜀𝑖𝑘 − 𝜀𝑖𝑘′ is small. Our aim is to estimate Galveston participants’ average latent utility to choosing any of the 48 U.S. states.26 We observe a vector of individual and state characteristics, 𝑥𝑖 and 𝑧𝑘 respectively, and where each individual chooses to locate, 𝑀𝑖𝑘 . Under the strong assumption that all relevant state characteristics 𝑧𝑘 are observed, the model implies: (6) 𝛽(𝑥𝑖 ) ⋅ (𝑧𝑘 − 𝑧𝑘 ′ ) + 𝛿(𝐴𝑠𝑠𝑖𝑔𝑛𝑒𝑑𝑖𝑘 − 𝐴𝑠𝑠𝑖𝑔𝑛𝑒𝑑𝑖𝑘 ′ ) ≥ 𝜀𝑖𝑘 − 𝜀𝑖𝑘 ′ ∀𝑘′. Point identification of the coefficients 𝛽(𝑥𝑖 ) and 𝛿 is challenging without parametric assumptions about the distribution of the idiosyncratic component of utility, 𝜀𝑖𝑘 . Therefore, like the much of the literature since McFadden (1974), we assume 𝜀𝑖𝑘 are independent and identically distributed according to the Type I Extreme Value Distribution. With this assumption, we can express the probability individual 𝑖 chooses state 𝑘 as: (7) exp 𝛽(𝑥𝑖 )⋅𝑧𝑘 +𝛿𝐴𝑠𝑠𝑖𝑔𝑛𝑒𝑑𝑖𝑘 . 𝑘′ 𝛽(𝑥𝑖 )⋅𝑧𝑘′ +𝛿𝐴𝑠𝑠𝑖𝑔𝑛𝑒𝑑𝑖𝑘′ 𝑃(𝑀𝑖𝑘 = 1|𝑥𝑖 , 𝑧𝑖 , 𝐴𝑠𝑠𝑖𝑔𝑛𝑒𝑑𝑖𝑘 ) = ∑ Equation (7) is McFadden’s alternative-specific conditional logit model which allows covariates to vary by individual and state jointly. 26 Of course, Alaska and Hawaii were not states at this time. Since we do not have some key data for DC, we exclude it is a destination location. Practically, when we include DC, it has no empirical impact as only 17 Galveston participants migrated there by 1920. 19 B. Results Table 4 displays estimates of Equation (7) where the outcome is a vector of indicator variables for residing in state 𝑘 in the 1920 census. The parameters reported in the table represent the impact of a variable on the expected latent utility of living in a particular state. Given our parametric assumptions, this latent utility is interpretable as the log-odds of residing in a particular state in 1920. Point estimates are in the top row and standard errors are in parentheses. For interpretability, we stress the average marginal effects of changing a variable on the propensity to choose that state, which are shown in brackets.27 We also display likelihood ratio tests for the following groups of covariates: (1) total population demographics (labeled “Population”), (2) demographics specific to cohorts (“Cohort”), (3) occupation, and (4) other Galveston immigrants. The first row quantifies the main treatment effect of the Galveston program: the extent to which assignment to a particular state increases the odds of staying in that state. Controlling for state-specific intercepts only (Column 1), we estimate that the effect of being assigned to a state increases the log odds of selecting a state by a factor of 1.101. This point estimate fluctuates narrowly between 0.724 and 1.233 using different specifications (Columns 2 onwards) and alternative matching procedures (Appendix Table A2).28 This is a big effect. The point estimate 27 Marginal effects are calculated as a matrix of cross-alternative parameters. Specifically, marginal effects for a single covariate are represented as a 48-by-48 symmetric matrix, where each i,j cell is the effect of a change in a covariate in state i on the probability of choosing state j. For interpretability, we take the average of the diagonal in the calculated marginal effects matrix. This makes the marginal effects in brackets interpretable as the average effect of changing a parameter on the probability of choosing a typical state. 28 In addition, we find that our results are robust to restricting the potential census match universe and other minor alterations to the baseline matching procedure. Out of concern that our sample reflects the locations of Russians in general rather than the location of true matches, we exclude potential matches to New York. We also extend this exercise by excluding New York, New Jersey, and Pennsylvania, as well as dropping Galveston arrivals who were assigned east of the Mississippi river. Second, out of concern that the DAA method induces low-quality matches on the second pass, we stop the matching algorithm on its first iteration (so each census record is uniquely matched to 20 in our most saturated model in Column (11), 1.086, implies that being assigned to a particular state increased the latent utility of that state by the same amount as increasing the number of Russian immigrants, which we will show is the strongest driver of this latent utility, by a factor of 20.29 However, this effect was only a small victory for Schiff, as the vast majority of Galveston families still moved east. The corresponding average marginal effect of assignment on the probability of living in the assigned state in 1920 fluctuates narrowly between 1.3 and 2.2 percent. How did they choose where to go? Conditional on assigned state, we find that Galveston participants flow to states with larger Jewish and Russian-born immigrant populations.30 Using a sparse specification in column 3, we find that a 10 percent increase in the Russian immigrant or Jewish population increases the propensity to move to that state by 10.7 and 3.1 percent, respectively, and both of these effects are highly statistically significant. Even in our most saturated model shown in column (11), which conditions on a host of other demographic and occupational covariates and the locational choices of other Galveston participants, we still find the same 10 percent increase in the Russian immigrant or Jewish population increases the propensity to move to that state by 6.2 and 1.2 percent, respectively. However, these latter estimates likely understate the degree to which location decisions are associated with ethnic enclaves since we cannot directly measure the most pertinent population – Russian Jews. Indeed, much of the decline in the Russian and Jewish point estimates between the sparse column 3 and the more saturated specification in column 11 occurs the Galveston record with its highest score) as well as take the highest score and drop duplicate census matches. None of these exercises have an impact on the results. 𝐵 𝐵 29 In particular, 1.086 = 0.361log( ) which implies =20.1. 𝐴 𝐴 30 State estimates of religion are taken from the Census of Religious Bodies. See Department of Commerce and Labor (1910). 21 when we add a reasonable proxy for Russian Jewish descent -- the location of Russian Jews that immigrated on the Galveston ships. We define log(Num. Non-Fam, Located 1920)k as the log level of non-family Hebrew arrivals to Galveston between 1907 and 1914 that live in a particular state k at the time of the 1920 census. It is important to stress that these results are not interpretable as peer effects in the causal sense but simply quantify the degree to which Galveston individuals (and perhaps Russian Jews arriving in the Western US at the beginning of the 20th century) move to similar destinations (Manski 1993).31 That said, we find a 10 percent increase in the population of Galveston immigrants in 1920 is associated with an 11.1 percent increase in the propensity to move to that state.32 Controlling for the number of other Galveston immigrants in state k also eliminates the effect of the total immigrant population on location choice, further indicating that location decisions are driven by well-defined ethnic enclaves. These results are robust to other matching methods (Appendix Table A2). 33 We find mixed evidence that location decisions were associated with the human capital of locals, as proxied by the separate male and female 1910 literacy rates for both native-born and immigrant populations from approximately the same ten year birth cohorts (Columns 6, 10, and 11). There is some indication that Galveston participants were more likely to move to states with higher literacy rates of male immigrants although this result disappears when we control for nonfamily Hebrew arrivals from the Galveston program. In our more saturated specifications 31 Our use of the multinomial logit model induces a non-linearity into the agent’s decision process that allows the Galveston immigrant effects to be estimated (Brock and Durlauf 2002). We include a dummy indicating that a state has no non-family Galveston immigrants. The results are similar whether we look at immigrants arriving in the same year or over the full 1907 to 1914 period. 32 We also tried using the 1920 location of shipmates, defined by the variable Located 1920, Ship (%) as the percent of non-family passengers on the same ship as participant i that moved to a particular state k. While the location of shipmates is somewhat associated with location choice (Column 4), the effect is economically small and insignificant when we condition on other covariates (Column 11). 33 We also looked at 1930 location decisions. Aside some from some expected attenuation in the effect of assignment, the results are nearly identical for our more complete specifications. 22 (Columns 10 and 11), we find that Galveston immigrants were significantly less likely to reside in areas with native residents with higher literacy rates. There is also some, but not especially robust, evidence that prime age male Galveston participants are more likely to move to states with more similarly aged women. While we find that locations were driven by clusters of ethnic enclaves, we do not want to dismiss the role of economic opportunity. However, the importance of income and occupation are mixed and of secondary economic importance to ethnicity. Adding the log of the state’s mean occupational income34 to our statistical model, we find that states with better opportunities detract (Column 7) or at least have no impact (Column 11) on the location choices of Galveston immigrants. Similarly, there is no evidence that migration decisions are affected by the share of jobs in their specific two-digit occupation at arrival (Columns 8 and 11). The best evidence that we have uncovered that economic opportunities affected decisions is that Galveston men tend to move to states with a higher percentage of higher-paid professional workers and craftsmen and away from states with a higher fraction of lower-paid farmers and clerical/sales/service workers (Columns 9 to 11).35 After controlling for other factors, a 10 percent increase in the share of professionals and craftsmen is associated with a 1.0 and 2.4 percent increase, on average, in the probability of moving to a state. Finally, we split the sample into thirds based on occupational earnings scores at the time of arrival in Galveston (Table 5). State assignments increased the probability of living in a state by about 2 percentage points across all three groups. Similarly, there are small, typically statistically insignificant differences in the response to the number Russian immigrants and Occupational income is IPUMS’ OCCSCORE variable, which calculates median earnings by occupation in 1950. These shares roughly correspond to 1-digit occupation codes. We exclude shares for general laborers and nonoccupational responses in the estimates. 34 35 23 Galveston peers across the three occupational earnings terciles. The main difference between the groups is that the occupation effects are economically more important among the higher skilled workers, suggesting some heterogeneity in the relevance of economic opportunity. In particular, individuals in the top tercile are much more attracted to states with a large share of professionals and craftsmen, even after controlling for the location of other Galveston arrivals. Nevertheless, the appeal of being in an ethnic enclave is present and dominant across the skill distribution. VI. Conclusion This paper studies an unusual natural experiment that assigned initial U.S. destination to roughly 10,000 immigrants just prior to World War I. To alleviate congestion in New York City, philanthropists recruited Russian Jewish migrants to board ships in Bremen, German destined for Galveston, Texas. Upon arrival, families were given train tickets to locations west of the Mississippi River where a job awaited. Since participants had no influence on their initial assignment, the Galveston Movement had many features of later mobility programs such as Moving to Opportunity. Yet despite the stated purpose of the program to locate Russian Jewish immigrants to ethnically sparse areas of the U.S., we find that almost 90 percent of the prime age male participants in this program ultimately moved east of the Mississippi, and typically to large Northeastern and Midwestern cities. To explain this lack of compliance, we use a multinomial logit framework inspired by a Roy model for location decisions. We show that Galveston immigrants were ultimately attracted to enclaves with similar religious and country of origin background. By contrast, there is no economically or statistically significant effect from a place having a larger base of immigrants from other areas of the world. Moreover, while we find an association between observable 24 economic conditions and location, these appear to be of secondary importance to our ethnic measures, especially for participants in the bottom part of the skill distribution. We interpret this evidence as consistent with ethnic and religious networks playing a vital role in location decisions for early 20th century Russian Jewish immigrants, paralleling the experiences of many other immigrant waves to the U.S. over the remainder of the 20th century and perhaps to the experiences of the Moving to Opportunity program much later (Orr et al. 2003). Finally, our paper introduces two novel adjustments for matching census data. First, we tune our scoring of potential matches based on an objective measure of match quality: the share of Galveston immigrants who arrived after 1910 who were matched to the 1910 census. Second, we use a deferred acceptance algorithm (DAA) to avoid the real possibility that multiple Galveston immigrants are matched to the same individual in a Census. Using the diagnostics developed in Bailey et al (2017), we show that our approach performs favorably relative to standard algorithmic approaches popular in the literature. 25 References Abramitzky, Ran, Leah Platt Boustan, and Katherine Eriksson, 2014, “A Nation of Immigrants: Assimilation and Economic Outcomes in the Age of Mass Migration,” Journal of Political Economy, 122(3): 467-506. Abramitzky, Ran, Leah Platt Boustan, and Katherine Eriksson, 2012, “Europe’s Tired, Poor, Huddled Masses: Self-Selection and Economic Outcomes in the Age of Mass Migration,” American Economic Review, 102(5): 1832-1856. Bailey, Martha, Connor Cole, Morgan Henderson, and Catherine Massey, 2017, “How Well do Automated Linking Methods Perform in Historical Samples? Evidence from New Ground Truth,” NBER Working Paper no. 24019. Bartel Anne, 1989, “Where Do the New U.S. Immigrants Live?” Journal of Labor Economics, 7(4): 371-391. Beaman, Lori, 2012, “Social Networks and the Dynamics of Labour Market Outcomes: Evidence from Refugees Resettles in the U.S.,” Review of Economic Studies 79: 128-161. Best, Gary, 1978, “Jacob H. Schiff’s Galveston Movement: An Experiment in Immigrant Deflection, 1907-1914,” American Jewish Archives, April: 43-79. Brock, William, and Steven Durlauf, 2002, “A Multinomial Choice Model with Neighborhood Interactions,” Review of Economic Studies, 68: 235-260. Cutler, David M., Edward Glaeser, Jacob Vigdor, 2008, “When are Ghettos Bad? Lessons from Immigrant Segregation in the United States,” Journal of Urban Economics, 63: 759-774. Dahl, Gordon, 2002, “Mobility and the Return to Education: Testing a Roy Model with Multiple Markets,” Econometrica, 70(6): 2367-2420. Damm, Anna Piil, 2009, “Ethnic Enclaves and Immigration Labor Market Outcomes: QuasiExperimental Evidence,” Journal of Labor Economics, 27(2): 281-314. Damm, Anna Piil and Christian Dustmann, 2014, “Does Growing Up in a High Crime Neighborhood Affect Youth Criminal Behavior,” American Economic Review, 104(6): 18061832. Department of Commerce and Labor, Bureau of the Census, 1910, Census of Religious Bodies, County File, 1906, Washington, D.C.: Government Printing Office, retrieved from: http://www.thearda.com/Archive/Files/Descriptions/1906CENSCT.asp Edin, Per-Anders, Peter Fredriksson, and Olof Aslund, 2003, “Ethnic Enclaves and the Economic Success of Immigrants – Evidence from a Natural Experiment,” Quarterly Journal of Economics 118(1), 329-357. Feigenbaum, James, 2016, “A Machine Learning Approach to Census Record Linking,” Working Paper, Harvard University. 26 Ferrie, Joseph, 1996, “A New Sample of Males Linked from the Public Use Micro Sample of the 1850 U.S. Federal Census of Population to the 1860 U.S. Federal Census Manuscript Schedules,” Historical Methods, 29(4): 141-156. Gale, David, and Lloyd Shapley, 1962, “College Admissions and the Stability of Marriage,” The American Mathematical Monthly, 69(1): 9-15. Haines, David, , 1989, Refugees as Immigrants: Cambodians, Laotians, and Vietnamese in America, Totowa, N.J.: Rowman and Littlefield. Hatton, Timothy, and Jeffrey Williamson, 1994, “What drove the mass migration from Europe in the Late Nineteenth Century?” Population and Development Review, 20(3): 533-559. Herzog, Thomas N., Fritz Scheuren, and William Winkler, 2007, Data Quality and Record Linkage Techniques. JewishGen. “Soundex Coding”. Accessed July 16th, 2014. http://www.jewishgen.org/infofiles/soundex.html Long, Jason, and Joseph Ferrie, 2013, “Intergenerational Occupational Mobility in Great Britain and the United States since 1850,” American Economic Review, 103(4): 1109-1137. McFadden, Daniel, 1974, “Conditional logit analysis of qualitative choice behavior,” Frontiers in Econometrics, ed. P. Zarembka, 105-142. Manski, Charles, 1993, “Identification of Endogenous Social Effects: The Reflection Problem,” Review of Economic Studies, 60: 531-542. Nix, Emily and Nancy Qian, 2015. “The Fluidity of Race `Passing’ in the United States, 18801940,” NBER Working Paper no. 20828. Orr, Larry, Judith Feins, Robin Jacob, Erick Beecroft, Lisa Sanbonmatsu, Lawrence Katz, Jeffrey Liebman, and Jeffrey Kling, 2003, Moving to Opportunity for Fair Housing Demonstration Program: Interim Impacts Evaluation, US Department of Housing and Urban Development, Office of Policy Development and Research. Robinson, W. Courtland, 1998, Terms of Refuge: The Indochinese Exodus and the International Response, London: Zed Books. Ruggles, Steven, J. Trent Alexander, Katie Genadek, Ronald Goeken, Matthew Schroeder, and Matthew Sobek, 2010, Integrated Public Use Microdata Series: Version 5.0 [Machine-readable database], Minneapolis: University of Minnesota. Spitzer, Yannay, 2016, “Pogroms, Networks, and Migration: The Jewish Migration from the Russian Empire to the United States, 1881-1914,” Working Paper, Hebrew University. 27 Appendix A: Additional Detail on the Matching Algorithm and Calibration Once we obtain the universe of potential matches, defined as having either a DM Soundex or NYSIIS phonetic match, we construct a match score for each record pair. Let g denote a Galveston record and c denote a census record. The match score is: 𝑆𝐶𝑂𝑅𝐸𝑐𝑔 = 𝛽𝐶𝑜𝑛𝑠𝑡𝑎𝑛𝑡 + 𝛽𝐵𝑜𝑟𝑛 𝑅𝑢𝑠𝑠𝑖𝑎 𝐷[𝐵𝑜𝑟𝑛 𝑅𝑢𝑠𝑠𝑖𝑎]𝑐 + 𝛽𝐴𝑔𝑒 |𝐴𝑔𝑒𝑐 − 𝐴𝑔𝑒𝑔 | + 𝛽𝑌𝑒𝑎𝑟𝐴𝑟𝑟𝑖𝑣𝑒 |𝑌𝑒𝑎𝑟𝐴𝑟𝑟𝑖𝑣𝑒𝑐 − 𝑌𝑒𝑎𝑟𝐴𝑟𝑟𝑖𝑣𝑒𝑔 | 𝛾𝐴𝑔𝑒 𝛾𝑌𝑒𝑎𝑟𝐴𝑟𝑟𝑖𝑣𝑒 + 𝛽𝐿𝑎𝑠𝑡 𝐽𝑊𝐿𝑎𝑠𝑡 + 𝛽𝐹𝑖𝑟𝑠𝑡 𝐽𝑊𝐹𝑖𝑟𝑠𝑡 + 𝛽𝐿𝑎𝑠𝑡𝑋𝐹𝑖𝑟𝑠𝑡 𝐽𝑊𝐿𝑎𝑠𝑡 𝐽𝑊𝐹𝑖𝑟𝑠𝑡 + 𝛽𝑆𝑝𝑒𝑎𝑘 𝑅𝑢𝑠𝑠𝑖𝑎𝑛 𝐷[𝑆𝑝𝑒𝑎𝑘 𝑅𝑢𝑠𝑠𝑖𝑎𝑛]𝑐 + 𝛽𝑆𝑝𝑒𝑎𝑘 𝐻𝑒𝑏𝑟𝑒𝑤 𝐷[𝑆𝑝𝑒𝑎𝑘 𝐻𝑒𝑏𝑟𝑒𝑤] 𝑐 where 𝐷[−]𝑐 indicates a dummy for having a characteristic in the census data and 𝐽𝑊− indicates Jaro-Winkler string similarity score for names. 𝛾𝑌𝑒𝑎𝑟𝐴𝑟𝑟𝑖𝑣𝑒 and 𝛾𝐴𝑔𝑒 are parameters that penalize difference in age or year of arrival by changing the convexity of the difference. We tune the parameters used in the algorithm by optimizing an objective function that maximizes the pre-1910 match rate and minimizes the post-1910 match rate. To avoid overfitting, for example on year of arrival, we also minimize the rate of “bad matching” based on matches having implausible characteristics in the 1910 census. Specifically, this is defined as being a U.S. citizen, having children in the Galveston data but not the census, and having differences in age or year of arrival of more than 5 years. The specific objective function we maximize is: 𝑓(𝑒, 𝑙, 𝑏) = 𝑒(𝑒 − 𝑙1.5 )(1 − 𝑏1.5 ) 28 This is optimized by random search over the fourteen parameters the algorithm depends on. Specifically, we run the algorithm for 250 iterations, with each iteration consisting of 200 random draws. The first 60 random draws sample values for all parameters jointly. The latter 140 draws iteratively sample 10 draws for each for the parameters. Each parameter is given a bandwidth to search in; if none of the draws in an iteration of the algorithm improve the performance of the objective function, the bandwidth for the parameters is decreased by a factor of 0.99. We calibrate the algorithm over the sample of all white foreign-born males who are aged 10 or older upon arrival in the Galveston data and 10 or older in the census data. While we calibrate these parameters for all white immigrants, our baseline matches are restricted to Russian-born, white immigrants aged 25 to 54 in 1920. We run the optimization for several starting values and random seeds. Unfortunately, the non-concave nature of the matching function means that convergence is not unique. We choose the final values based on the parameter set which corresponds best to the optimization where we only use Russian-born individuals from the census. Table A1 shows the final values and upper and lower limits of the search area for each parameter used to create our matches. Once match scores are created, we drop non-positive scores as well as records where the absolute value of the difference in ages is more than 𝛿𝐴𝑔𝑒 , an additional parameter for which we solve. While dropping pairs with a non-positive score is somewhat arbitrary, the inclusion of a constant, 𝛽𝐶𝑜𝑛𝑠𝑡𝑎𝑛𝑡 , means that the cutoff is in reality flexible. Next, we implement a deferred acceptance algorithm to find the final match pair using the match score as the numerical strength of the potential matching. The algorithm iterates until 29 all Galveston individuals have a match or no longer have a potential match. The procedure is as follows: 1. On iteration S, match each Galveston record to its highest score, 𝑆𝐶𝑂𝑅𝐸𝑐𝑔 . 2. If a census record is matched to multiple Galveston records, take the Galveston record with the highest 𝑆𝐶𝑂𝑅𝐸𝑐𝑔 . Score ties are broken by choosing the record pair with the smallest age difference and the Galveston record for which the census record represents the largest increase in score relative to the next best census record. Delink the Galveston records with lower scores for this census record. 3. Save these unique matches and remove any Galveston or census record that is included in this set. 4. Repeat the algorithm with the remaining Galveston-census combinations that have thus far not been included in the matches. These rules will create a set of unique matches representing the best combination for all records. 30 Appendix B: Synthetic Validation Our synthetic validation procedure follows Bailey et al (2017). We create a synthetic dataset of perturbed true matches and known non-matches which we then match to the original, unperturbed data. This allows us to compare match rates as well as error rates in both the original and synthetic dataset. This test is performed over the sample of males aged 10 and older. Creating the synthetic dataset involves two steps. First, for a subset of the data, we randomly introduce measurement error to the Galveston data consistent with Bailey et al (2017). This takes a number of forms: 1. Names: Randomly switching first and last names and introducing duplication, deletion, and transposition of characters in the first and last name. 2. Age: Randomly round 25 percent of ages to the closest multiple of 5 to simulate age clumping. 3. Year of arrival: Randomly change 10 percent of arrival years to within 5 years of the original year listed, randomly shift 5 percent to the previous decade, and randomly “clump” 5 percent by rounding to the nearest year multiple of 5. 4. Mother tongue: Mother tongue is not a variable in the Galveston data. We randomly assign Russian, Hebrew, or neither as mother tongue based on the empirical proportion of Russian-born migrants from 1907-1914 reporting these languages (conditional on reporting Russian, Hebrew, English, or no response). To this number, we randomly change 20 percent to neither Russian nor Hebrew. 31 5. Non-matches: Following Bailey et al (2017), individuals who die, move out of the U.S., or are unable to be matched for other reasons are simulated by dropping 15 percent of the true matches. In the second step, we construct a set of census individuals who should be non-matches. This group needs to be both comparable enough that they have the potential to be falsely matched and different enough that we know a priori that they should not be matched. Consequently, we select Russian-born men who arrive in Galveston between 1900 to 1905 or 1916 to 1920. Since our algorithm depends heavily on matching year of arrival, the synthetic test shows very low error rates when we do not alter the non-match sample’s year of arrival. To create additional difficulty for our algorithm, we randomly assign a year of arrival between 1907 and 1914 for all known non-matches. This addition drastically increases error rates in the synthetic test.36 The two sets of perturbed true matches and known non-matches are appended together to create synthetic census data. We then match the original Galveston data to this synthetic census dataset using the methods described already. These results are displayed in Table 2 and discussed in the main text. 36 Without allocating year of arrival, all error rates are below 5 percent for our preferred algorithm. 32 Figure 1: Number of Hebrew Arrivals to the Port of Galveston: 1900-1920 3250 3000 Number of Hebrew Arrivals 2750 2500 2250 2000 1750 1500 1250 1000 750 500 250 0 00 19 02 19 04 19 06 19 8 0 19 10 12 19 19 Year 14 19 16 19 18 19 20 19 Notes: This figure plots the number of arrivals tagged as “Hebrew” (Russian and non-Russian) arriving to the port of Galveston between 1900 and 1920. The red lines are at 1907 and 1914, indicating the start and end of the Galveston program, respectively. Source: Ancestry.com. Figure 2: Assignment State After Galveston Arrival, Men Born Between 1866 and 1895 (Aged 15 to 44 in 1910) (a) Total Sample N = 5,911 Percent of Matches: (b) Matched to 1920 N = 4,304 +10% 8-10% 6-8% 4-6% 2-4% 0-2% Notes: Panel (a) shows the share of male Galveston participants, who would have been 15 to 44 in 1910, assigned to each state upon their arrival in Galveston between 1908 and 1914. Panel (b) shows the share assigned to each state upon arrival but uses the sample that can be matched to the 1920 census using our preferred method. 33 Figure 3: State Location as of 1920 Census (a) Preferred Matches (b) Five Best Matches Weighted N = 4,304 N = 4,352 (c) Ferrie Matches N = 1,442 (d) Location of All Non-Galveston Russian Immigrants Who Arrived Between 1908 and 1914) Percent of Matches +10% 8-10% 6-8% 4-6% 2-4% None or 0-2% Notes: Panels (a) to (c) show the share of prime-age Galveston immigrants located in each state as of the 1920 census, by matching method. See the text for more information on our matching procedures. Panel (d) shows the share of non-Galveston Russian-born men, aged 25 to 54 in 1920 and arriving in the U.S. between 1908 and 1914, that are located in each state as of the 1920 census. 34 Figure 4: Percent of Galveston Men that Remain in Their Assigned State as of 1920 (a) Preferred Matches (b) Five Best Matches Weighted N = 4,304 N = 4,352 (c) Ferrie Matches N = 1,442 Percent of Assigned Staying +10% 8-10% 6-8% 4-6% 2-4% None or 0-2% Notes: This figure shows the share of Galveston immigrants who remain in their assigned state as of the 1920 census. We do not compute shares for states with 5 or fewer assignments or east of the Mississippi. 35 Table 1: Summary Statistics: Full Data Age Arrive Prior to 1910 (%) West of Mississippi (%) Texas (%) California (%) New York (%) Pennsylvania-New Jersey (%) Midwest (East of Mississippi) (%) New England (%) South (East of Mississippi) (%) Occupational Income (1940 dollars) Professional (%) Farmer, Farm Laborer (%) Managers, Proprietors (%) Clerical Workers (%) Sales Workers (%) Craftsmen (%) Operative Workers (%) Service Workers (%) General Laborer (%) No Occupation (%) N Port of Galveston (1) (2) Russian Hebrews Other Arrivals 25.6 27.6 (0.1) (0.1) 18.3 43.5 (0.5) (0.5) 91.8 99.0 (0.4) (0.1) 37.4 70.3 (0.6) (0.4) 2.3 8.8 (0.2) (0.3) 0.0 0.0 (0.0) (0.0) 0.0 0.1 (0.0) (0.0) 4.7 0.4 (0.3) (0.1) 0.0 0.0 (0.0) (0.0) 3.4 0.4 (0.2) (0.1) 1355.3 957.9 (6.0) (5.9) 0.0 0.0 (0.0) (0.0) 7.5 48.4 (0.3) (0.5) 7.7 3.5 (0.3) (0.2) 8.5 3.7 (0.4) (0.2) 4.3 0.6 (0.3) (0.1) 46.3 19.9 (0.6) (0.4) 14.8 9.7 (0.5) (0.3) 2.2 2.6 (0.2) (0.2) 5.0 5.6 (0.3) (0.2) 3.7 6.1 (0.2) (0.2) 5,911 10,560 Comparable 1920 Immigrants (3) (4) Russians Non-Russians 33.5 34.0 (0.0) (0.0) 21.7 23.4 (0.1) (0.0) 10.1 18.1 (0.1) (0.0) 0.4 0.5 (0.0) (0.0) 1.7 4.7 (0.0) (0.0) 28.6 18.5 (0.1) (0.0) 18.5 20.6 (0.1) (0.0) 25.0 28.5 (0.1) (0.0) 14.5 11.2 (0.1) (0.0) 3.3 3.0 (0.0) (0.0) 1129.1 1076.7 (1.5) (0.6) 0.0 0.1 (0.0) (0.0) 4.2 6.2 (0.0) (0.0) 7.8 3.3 (0.1) (0.0) 0.8 1.1 (0.0) (0.0) 3.4 1.6 (0.0) (0.0) 19.3 16.8 (0.1) (0.0) 21.1 20.1 (0.1) (0.0) 2.6 4.4 (0.0) (0.0) 19.4 26.2 (0.1) (0.0) 21.5 20.1 (0.1) (0.0) 234,618 1,147,043 Notes: Columns (1) shows our unmatched sample of prime-age male Galveston participants. Columns (2) shows all prime-age male non-Russian arrivals to the port of Galveston over the same period. Columns (3) and (4) compare these samples to the total Russian-born and non-Russian immigrant populations arriving from 1908 to 1914 and meetiing the same sample restrictions. 36 Table 2: Comparison of Match Rates and Synthetic Validation Preferred Ferrie Five Best Weighted Year 1910 1920 1930 1910 1920 1930 1910 1920 1930 Actual Match Rate (%) 71.3 72.8 69.5 25.1 24.4 23.3 71.9 73.4 70.2 Synthetic Match Rate (%) 88.4 87.7 87.0 48.0 46.8 47.8 89.2 88.8 88.1 Synthetic Type I Error (%) 34.5 35.3 34.6 23.4 22.2 23.5 63.3 64.0 63.0 Notes: See the text for details on our matching procedures. See Appendix B for information on the synthetic tests. 37 Table 3: Summary Statistics: Matched Data Age Arrive Prior to 1910 (%) Mother Tongue: Hebrew (%) Mother Tongue: Russian (%) Stay Assigned State (%) Stay Assigned Division (%) West of Mississippi (%) Texas (%) California (%) New York (%) Pennsylvania-New Jersey (%) Midwest (East of Mississippi) (%) New England (%) South (East of Mississippi) (%) Married (%) Naturalized (%) Literate (%) Owns Home (%) Occupational Income (1940 dollars) Self-Employed (%) Works for Wages (%) Farm Status (%) Professional (%) Farmer, Farm Laborer (%) Managers, Proprietors (%) Clerical Workers (%) Sales Workers (%) Craftsmen (%) Operative Workers (%) Service Workers (%) General Laborer (%) No Occupation (%) N (1) Matched Sample at Arrival 25.3 (0.1) 18.2 (0.6) . (.) . (.) . (.) . (.) 91.6 (0.4) 36.8 (0.7) 2.4 (0.2) 0.0 (0.0) 0.0 (0.0) 4.6 (0.3) 0.0 (0.0) 3.7 (0.3) . (.) . (.) . (.) . (.) 1357.8 (7.2) . (.) . (.) . (.) 0.0 (0.0) 7.3 (0.4) 8.1 (0.4) 8.6 (0.4) 4.3 (0.3) 46.1 (0.8) 14.9 (0.5) 2.1 (0.2) 4.8 (0.3) 3.9 (0.3) 4,304 (2) (3) (4) (5) 1910 Matches 27.2 (0.2) 54.1 (1.3) 51.8 (1.3) 8.1 (0.7) 3.2 (0.5) 4.6 (0.5) 9.5 (0.8) 1.4 (0.3) 0.9 (0.3) 45.5 (1.3) 16.5 (1.0) 17.3 (1.0) 8.7 (0.7) 2.5 (0.4) 47.7 (1.3) 0.6 (0.2) 76.9 (1.1) 8.4 (0.7) 1245.5 (12.0) 9.0 (0.7) 86.1 (0.9) 1.9 (0.4) 0.0 (0.0) 1.7 (0.3) 2.4 (0.4) 1.4 (0.3) 4.5 (0.5) 23.9 (1.1) 29.5 (1.2) 1.9 (0.4) 27.4 (1.2) 7.5 (0.7) 1,487 1920 Matches 34.7 (0.1) 18.2 (0.6) 64.3 (0.7) 23.9 (0.6) 3.5 (0.3) 6.3 (0.4) 11.6 (0.5) 1.3 (0.2) 1.6 (0.2) 37.9 (0.7) 14.5 (0.5) 22.6 (0.6) 9.6 (0.4) 3.8 (0.3) 78.0 (0.6) 49.1 (0.8) 83.8 (0.6) 17.1 (0.6) 1171.1 (11.9) 20.8 (0.6) 55.9 (0.8) 1.8 (0.2) 0.1 (0.0) 2.1 (0.2) 11.8 (0.5) 0.8 (0.1) 5.3 (0.3) 21.6 (0.6) 20.8 (0.6) 2.1 (0.2) 11.7 (0.5) 23.7 (0.6) 4,304 1930 Matches 44.5 (0.1) 17.9 (0.6) 68.8 (0.7) 20.6 (0.6) 3.0 (0.3) 5.3 (0.3) 12.4 (0.5) 1.1 (0.2) 3.6 (0.3) 37.7 (0.8) 14.8 (0.6) 22.0 (0.6) 10.1 (0.5) 3.0 (0.3) 88.6 (0.5) 78.4 (0.6) 89.0 (0.5) 35.1 (0.7) 1266.4 (13.1) 28.2 (0.7) 49.5 (0.8) 2.9 (0.3) 0.1 (0.0) 3.0 (0.3) 19.2 (0.6) 0.8 (0.1) 6.1 (0.4) 21.2 (0.6) 16.1 (0.6) 3.3 (0.3) 7.4 (0.4) 22.9 (0.7) 4,109 1940 Matches 55.3 (0.1) 18.2 (0.6) . (.) . (.) 2.0 (0.2) 4.5 (0.3) 18.5 (0.6) 0.4 (0.1) 9.3 (0.4) 33.7 (0.7) 12.6 (0.5) 20.2 (0.6) 11.0 (0.5) 4.1 (0.3) 83.9 (0.5) 86.1 (0.5) . (.) 35.9 (0.7) 1396.1 (12.2) 34.0 (0.7) 46.1 (0.7) 3.7 (0.3) 0.2 (0.1) 3.5 (0.3) 21.1 (0.6) 2.3 (0.2) 9.8 (0.4) 18.5 (0.6) 15.3 (0.5) 5.1 (0.3) 6.6 (0.4) 17.5 (0.6) 4,699 Notes: Column (1) shows the arrival characteristics of prime-age male Galveston participants for those who were matched to the 1920 census. Columns (2) to (5) show census variables for this sample conditional on being matched to each census. 38 Table 4: Determinants of 1920 State Location (1) ∗∗∗ Assigned 1.101 (0.104) [0.0188] Log(Pop.) 1910 Log(Immig.) 1910 (2) ∗∗∗ 0.724 (0.089) [0.0128] 0.419∗∗∗ (0.049) [0.0074] 1.117∗∗∗ (0.039) [0.0198] Log(Russian Immig.) 1910 Log(Jewish+1) 1906 (3) ∗∗∗ 1.233 (0.093) [0.0213] 0.002 (0.051) [0.0000] 0.287∗∗∗ (0.053) [0.0050] 0.615∗∗∗ (0.037) [0.0107] 0.177∗∗∗ (0.025) [0.0031] Located 1920, Ship (%) (4) ∗∗∗ 1.209 (0.092) [0.0206] 0.016 (0.050) [0.0003] 0.292∗∗∗ (0.050) [0.0050] 0.558∗∗∗ (0.036) [0.0095] 0.141∗∗∗ (0.023) [0.0024] 0.010∗∗∗ (0.002) [0.0002] D(No Non-Fam. Located) 1920 (5) ∗∗∗ 0.945 (0.097) [0.0162] -0.041 (0.053) [-0.0007] 0.011 (0.052) [0.0002] 0.217∗∗∗ (0.043) [0.0037] 0.006 (0.031) [0.0001] (6) ∗∗∗ 1.137 (0.097) [0.0194] -1.480∗∗∗ (0.326) [-0.0253] 0.125 (0.177) [0.0021] 0.529∗∗∗ (0.045) [0.0090] 0.178∗∗∗ (0.030) [0.0030] (7) ∗∗∗ 1.178 (0.094) [0.0204] -0.213∗∗∗ (0.078) [-0.0037] 0.377∗∗∗ (0.059) [0.0065] 0.626∗∗∗ (0.036) [0.0108] 0.256∗∗∗ (0.034) [0.0044] (8) ∗∗∗ 1.234 (0.093) [0.0213] 0.002 (0.051) [0.0000] 0.286∗∗∗ (0.053) [0.0049] 0.616∗∗∗ (0.037) [0.0107] 0.177∗∗∗ (0.025) [0.0031] ∗∗∗ 1.161 (0.097) [0.0198] 0.012 (0.091) [0.0002] 0.178∗∗∗ (0.065) [0.0030] 0.715∗∗∗ (0.048) [0.0122] 0.141∗∗∗ (0.043) [0.0024] (10) ∗∗∗ 1.132 (0.102) [0.0193] -0.453 (0.429) [-0.0077] 0.045 (0.210) [0.0008] 0.649∗∗∗ (0.058) [0.0111] 0.107∗∗ (0.050) [0.0018] 0.327 (0.507) [0.0056] 0.810∗∗∗ (0.055) [0.0139] Log(Num. Non-Fam. Located 1920) Log(Male Immig.), Cohort (10 Yr.) 0.008 (0.215) [0.0001] -0.271 (0.562) [-0.0046] 0.092∗∗∗ (0.015) [0.0016] -0.049 (0.039) [-0.0008] 0.332∗∗ (0.151) [0.0057] 1.634∗∗∗ (0.597) [0.0279] -0.053∗∗∗ (0.013) [-0.0009] 0.050 (0.037) [0.0009] Log(Male US Born), Cohort (10 Yr.) Literate (%), Male Immig, Cohort (10 Yr.) Literate (%), Male US Born, Cohort (10 Yr.) Log(Female Immig.), Cohort (10 Yr.) Log(Female US Born), Cohort (10 Yr.) Literate (%), Female Immig, Cohort (10 Yr.) Literate (%), Female US Born, Cohort (10 Yr.) -1.127∗∗∗ (0.303) [-0.0195] Log(State Mean Occscore), 1910 Workers with own 2-Dig. Occ. (%), 1910 -0.011 (0.018) [-0.0002] 8825.0∗∗∗ 3133.2∗∗∗ 31.3∗∗∗ 174.4∗∗∗ 140.2∗∗∗ 5367.5∗∗∗ 8823.4∗∗∗ -0.126∗∗∗ (0.036) [-0.0022] 0.078∗∗∗ (0.026) [0.0013] 1.087∗∗∗ (0.194) [0.0185] -1.251∗∗∗ (0.153) [-0.0213] 0.031 (0.035) [0.0005] -0.192∗∗∗ (0.022) [-0.0033] 0.276∗∗∗ (0.049) [0.0047] 2411.9∗∗∗ 0.372 4,304 0.373 4,304 0.379 4,304 0.376 4,304 0.372 4,304 0.372 4,304 0.377 4,304 Farmers (%), 1910 Farm Laborers (%), 1910 Professionals (%), 1910 Managers (%), 1910 Clerical, Sales, and Service (%), 1910 Operatives (%), 1910 Craftsmen (%), 1910 χ2 (LR test): χ2 (LR test): χ2 (LR test): χ2 (LR test): Pseudo-R2 N (9) Population Cohort Occupation Galveston Immigrants 0.382 4,304 0.359 4,304 -0.545 (0.336) [-0.0093] -0.654 (0.747) [-0.0111] 0.057∗∗∗ (0.019) [0.0010] -0.071∗ (0.042) [-0.0012] 0.722∗∗∗ (0.198) [0.0123] 1.193 (0.741) [0.0203] -0.045∗∗∗ (0.016) [-0.0008] 0.058 (0.039) [0.0010] 2.440 (1.583) [0.0416] -0.008 (0.018) [-0.0001] -0.123∗∗∗ (0.043) [-0.0021] -0.021 (0.043) [-0.0004] 1.144∗∗∗ (0.228) [0.0195] -0.921∗∗∗ (0.242) [-0.0157] -0.142∗∗ (0.057) [-0.0024] -0.212∗∗∗ (0.040) [-0.0036] 0.116 (0.081) [0.0020] 133.2∗∗∗ 36.5∗∗∗ 58.0∗∗∗ 0.378 4,304 (11) 1.086∗∗∗ (0.103) [0.0185] -0.566 (0.435) [-0.0097] -0.006 (0.217) [-0.0001] 0.362∗∗∗ (0.074) [0.0062] 0.068 (0.056) [0.0012] 0.001 (0.003) [0.0000] 0.504 (0.543) [0.0086] 0.653∗∗∗ (0.084) [0.0111] -0.185 (0.326) [-0.0031] 0.653 (0.785) [0.0111] 0.020 (0.021) [0.0003] -0.086∗∗ (0.043) [-0.0015] 0.267 (0.194) [0.0046] -0.192 (0.802) [-0.0033] -0.003 (0.018) [-0.0001] 0.042 (0.038) [0.0007] -0.115 (1.579) [-0.0020] -0.006 (0.018) [-0.0001] -0.056 (0.047) [-0.0010] 0.001 (0.043) [0.0000] 0.568∗∗ (0.242) [0.0097] -0.259 (0.239) [-0.0044] -0.143∗∗∗ (0.054) [-0.0024] -0.060 (0.043) [-0.0010] 0.138∗ (0.081) [0.0024] 26.0∗∗∗ 13.4∗ 20.8∗∗ 61.1∗∗∗ 0.380 4,304 Notes: This table implements McFadden’s (1974) alternative-specific conditional logit model. Column (1) includes state intercepts to absorb state-specific variation. Located 1920, Ship (%) is the share of non-family shipmates who lived in a state in 1920. Log(Num. Non-Fam. Located 1920) is the log-number of non-family members residing in a state in 1920. In order to take logs, this variable is imputed as zero when a participant has no non-family members in a state, and we include the dummy, D(No Non-Fam. Located 1920), for these cases. Cohort variables are measured using data on members of a participant’s 10-year birth cohort who reside in a state. Marginal effects are calculated as the mean of each state-specific marginal effect (see footnote 27). Standard errors are in parentheses and average marginal effects are in brackets. ∗ p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01. 39 Table 5: Determinants of 1920 State Location, by Occupational Income Tercile Tercile 1 (1) Assigned Log(Pop.) 1910 Log(Immig.) 1910 Log(Russian Immig.) 1910 Log(Jewish+1) 1906 (2) Tercile 2 (3) ∗∗∗ 1.047 (0.157) [0.0179] -1.552∗∗∗ (0.512) [-0.0265] 0.047 (0.281) [0.0008] 0.499∗∗∗ (0.069) [0.0085] 0.212∗∗∗ (0.060) [0.0036] ∗∗∗ ∗∗∗ 1.068 (0.157) [0.0182] -1.479∗∗∗ (0.512) [-0.0253] 0.183 (0.269) [0.0031] 0.498∗∗∗ (0.070) [0.0085] 0.144∗∗∗ (0.047) [0.0025] 1.035 (0.164) [0.0177] -0.660 (0.618) [-0.0112] -0.050 (0.295) [-0.0009] 0.570∗∗∗ (0.088) [0.0097] 0.106 (0.077) [0.0018] 0.346 (0.355) [0.0059] -2.325∗∗ (0.946) [-0.0397] 0.105∗∗∗ (0.022) [0.0018] -0.016 (0.060) [-0.0003] 0.173 (0.239) [0.0030] 3.619∗∗∗ (1.000) [0.0618] -0.061∗∗∗ (0.020) [-0.0010] 0.027 (0.056) [0.0005] -1.053∗ (0.567) [-0.0180] 0.136 (0.334) [0.0023] -2.097∗∗ (0.911) [-0.0358] 0.106∗∗∗ (0.022) [0.0018] -0.011 (0.060) [-0.0002] 0.171 (0.233) [0.0029] 3.487∗∗∗ (0.992) [0.0595] -0.065∗∗∗ (0.020) [-0.0011] 0.023 (0.056) [0.0004] 0.070 (0.405) [0.0012] -2.615∗∗ (1.202) [-0.0446] 0.072∗∗ (0.030) [0.0012] -0.013 (0.063) [-0.0002] 0.331 (0.256) [0.0056] 3.261∗∗ (1.266) [0.0556] -0.052∗∗ (0.024) [-0.0009] 0.025 (0.057) [0.0004] Located 1920, Ship (%) D(No Non-Fam. Located) 1920 Log(Num. Non-Fam. Located 1920) Log(Male Immig.), Cohort (10 Yr.) Log(Male US Born), Cohort (10 Yr.) Literate (%), Male Immig, Cohort (10 Yr.) Literate (%), Male US Born, Cohort (10 Yr.) Log(Female Immig.), Cohort (10 Yr.) Log(Female US Born), Cohort (10 Yr.) Literate (%), Female Immig, Cohort (10 Yr.) Literate (%), Female US Born, Cohort (10 Yr.) Log(State Mean Occscore), 1910 Workers with own 2-Dig. Occ. (%), 1910 -0.015 (0.019) [-0.0003] 67.9∗∗∗ 59.7∗∗∗ 61.3∗∗∗ 65.2∗∗∗ -0.134∗∗ (0.066) [-0.0023] 0.010 (0.055) [0.0002] 0.960∗∗∗ (0.332) [0.0164] -0.840∗∗ (0.366) [-0.0143] -0.051 (0.086) [-0.0009] -0.172∗∗∗ (0.046) [-0.0029] 0.121 (0.101) [0.0021] 46.0∗∗∗ 17.2∗∗ 0.379 1,806 0.379 1,806 0.380 1,806 Farmers (%), 1910 Farm Laborers (%), 1910 Professionals (%), 1910 Managers (%), 1910 Clerical, Sales, and Service (%), 1910 Operatives (%), 1910 Craftsmen (%), 1910 χ2 (LR test): χ2 (LR test): χ2 (LR test): χ2 (LR test): Pseudo-R2 N Population Cohort Occupation Galveston Immigrants (4) (5) ∗∗∗ 0.977 (0.166) [0.0167] -0.379 (0.675) [-0.0065] -0.032 (0.337) [-0.0005] 0.312∗∗∗ (0.115) [0.0053] 0.084 (0.090) [0.0014] -0.005 (0.005) [-0.0001] 1.174∗ (0.678) [0.0201] 0.666∗∗∗ (0.145) [0.0114] 0.249 (0.497) [0.0043] -1.949 (1.502) [-0.0333] 0.034 (0.033) [0.0006] -0.031 (0.064) [-0.0005] -0.053 (0.303) [-0.0009] 2.139 (1.494) [0.0366] -0.015 (0.026) [-0.0003] 0.015 (0.054) [0.0003] 0.060 (2.564) [0.0010] -0.009 (0.019) [-0.0001] -0.019 (0.072) [-0.0003] -0.008 (0.067) [-0.0001] 0.342 (0.374) [0.0058] -0.359 (0.380) [-0.0061] -0.033 (0.085) [-0.0006] -0.066 (0.071) [-0.0011] 0.084 (0.137) [0.0014] 8.0∗ 4.7 5.7 22.4∗∗∗ 0.383 1,806 (6) Tercile 3 (7) ∗∗∗ 1.254 (0.154) [0.0213] -1.509∗∗∗ (0.534) [-0.0256] -0.355 (0.312) [-0.0060] 0.481∗∗∗ (0.072) [0.0082] 0.210∗∗∗ (0.064) [0.0036] ∗∗∗ ∗∗∗ 1.266 (0.155) [0.0215] -1.425∗∗∗ (0.549) [-0.0242] -0.272 (0.305) [-0.0046] 0.480∗∗∗ (0.073) [0.0082] 0.164∗∗∗ (0.052) [0.0028] 1.332 (0.161) [0.0226] -0.885 (0.681) [-0.0150] -0.259 (0.323) [-0.0044] 0.628∗∗∗ (0.098) [0.0107] 0.152∗ (0.084) [0.0026] 0.276 (0.377) [0.0047] 0.335 (0.928) [0.0057] 0.096∗∗∗ (0.025) [0.0016] -0.067 (0.062) [-0.0011] 0.569∗∗ (0.257) [0.0097] 1.063 (0.942) [0.0180] -0.061∗∗∗ (0.022) [-0.0010] 0.086 (0.059) [0.0015] -0.631 (0.549) [-0.0107] 0.166 (0.363) [0.0028] 0.364 (0.942) [0.0062] 0.096∗∗∗ (0.025) [0.0016] -0.064 (0.062) [-0.0011] 0.546∗∗ (0.259) [0.0093] 1.062 (0.951) [0.0180] -0.063∗∗∗ (0.022) [-0.0011] 0.083 (0.059) [0.0014] -0.515 (0.459) [-0.0087] 1.109 (1.054) [0.0188] 0.103∗∗∗ (0.033) [0.0017] -0.148∗∗ (0.069) [-0.0025] 0.988∗∗∗ (0.289) [0.0168] -0.064 (1.065) [-0.0011] -0.077∗∗∗ (0.028) [-0.0013] 0.124∗ (0.065) [0.0021] 0.063 (0.136) [0.0011] 59.0∗∗∗ 51.3∗∗∗ 55.9∗∗∗ 46.8∗∗∗ -0.118∗ (0.069) [-0.0020] -0.038 (0.060) [-0.0006] 1.094∗∗∗ (0.361) [0.0186] -0.232 (0.379) [-0.0039] -0.283∗∗∗ (0.095) [-0.0048] -0.107∗∗ (0.050) [-0.0018] 0.112 (0.095) [0.0019] 50.3∗∗∗ 34.3∗∗∗ 0.383 1,564 0.383 1,564 0.385 1,564 (8) ∗∗∗ 1.292 (0.164) [0.0219] -0.890 (0.746) [-0.0151] -0.261 (0.350) [-0.0044] 0.312∗∗ (0.127) [0.0053] 0.075 (0.090) [0.0013] 0.008 (0.005) [0.0001] -0.213 (1.074) [-0.0036] 0.572∗∗∗ (0.129) [0.0097] -0.365 (0.529) [-0.0062] 1.910∗ (1.144) [0.0324] 0.075∗∗ (0.036) [0.0013] -0.144∗∗ (0.071) [-0.0024] 0.694∗∗ (0.321) [0.0118] -0.951 (1.151) [-0.0161] -0.041 (0.030) [-0.0007] 0.104 (0.064) [0.0018] -0.430 (2.456) [-0.0073] 0.084 (0.133) [0.0014] -0.083 (0.074) [-0.0014] -0.040 (0.071) [-0.0007] 0.416 (0.389) [0.0071] 0.345 (0.407) [0.0058] -0.311∗∗∗ (0.093) [-0.0053] 0.004 (0.072) [0.0001] 0.039 (0.129) [0.0007] 8.7∗ 19.1∗∗ 17.8∗∗ 24.0∗∗∗ 0.387 1,564 (9) (10) (11) ∗∗∗ 1.025 (0.213) [0.0176] -1.463∗∗ (0.692) [-0.0252] 0.535 (0.391) [0.0092] 0.665∗∗∗ (0.100) [0.0115] 0.261∗∗∗ (0.082) [0.0045] ∗∗∗ ∗∗∗ 1.029 (0.213) [0.0177] -1.468∗∗ (0.696) [-0.0253] 0.569 (0.373) [0.0098] 0.667∗∗∗ (0.100) [0.0115] 0.250∗∗∗ (0.062) [0.0043] 1.009 (0.222) [0.0172] -0.547 (0.847) [-0.0093] 0.341 (0.425) [0.0058] 0.858∗∗∗ (0.117) [0.0146] 0.093 (0.109) [0.0016] -0.289 (0.485) [-0.0050] 1.152 (1.132) [0.0198] 0.060∗ (0.032) [0.0010] -0.085 (0.088) [-0.0015] 0.204 (0.324) [0.0035] -0.047 (1.196) [-0.0008] -0.012 (0.031) [-0.0002] 0.037 (0.083) [0.0006] -0.181 (0.730) [-0.0031] -0.328 (0.451) [-0.0056] 1.200 (1.112) [0.0207] 0.060∗ (0.032) [0.0010] -0.083 (0.087) [-0.0014] 0.198 (0.323) [0.0034] -0.065 (1.192) [-0.0011] -0.012 (0.031) [-0.0002] 0.036 (0.082) [0.0006] -0.660 (0.601) [-0.0113] 1.424 (1.343) [0.0243] -0.027 (0.039) [-0.0005] -0.102 (0.086) [-0.0017] 0.483 (0.372) [0.0082] -0.955 (1.461) [-0.0163] 0.019 (0.036) [0.0003] 0.035 (0.076) [0.0006] -0.058 (0.268) [-0.0010] 56.1∗∗∗ 41.5∗∗∗ 57.6∗∗∗ 43.0∗∗∗ -0.182∗ (0.096) [-0.0031] 0.094 (0.074) [0.0016] 1.986∗∗∗ (0.491) [0.0339] -1.578∗∗∗ (0.494) [-0.0269] -0.090 (0.118) [-0.0015] -0.268∗∗∗ (0.065) [-0.0046] 0.425∗∗∗ (0.130) [0.0073] 55.4∗∗∗ 6.2 0.363 934 0.363 934 0.367 934 (12) 0.940∗∗∗ (0.223) [0.0161] -0.303 (0.913) [-0.0052] 0.416 (0.482) [0.0071] 0.527∗∗∗ (0.150) [0.0090] 0.044 (0.118) [0.0007] -0.000 (0.007) [-0.0000] -12.272∗∗∗ (0.428) [-0.2096] 0.745∗∗∗ (0.159) [0.0127] -0.620 (0.737) [-0.0106] 2.213 (1.533) [0.0378] -0.074∗ (0.042) [-0.0013] -0.117 (0.090) [-0.0020] 0.166 (0.417) [0.0028] -2.145 (1.627) [-0.0366] 0.067∗ (0.037) [0.0011] 0.017 (0.074) [0.0003] -0.027 (3.524) [-0.0005] -0.080 (0.275) [-0.0014] -0.084 (0.109) [-0.0014] 0.065 (0.085) [0.0011] 1.295∗∗ (0.556) [0.0221] -0.915∗ (0.499) [-0.0156] -0.116 (0.110) [-0.0020] -0.151∗ (0.090) [-0.0026] 0.350∗∗ (0.164) [0.0060] 12.5∗∗ 7.3 12.0 1959.7∗∗∗ 0.371 934 Notes: This table splits the sample from Table 4 into terciles of occupational income at arrival. See notes to Table 4 for variable definitions. Standard errors are in parentheses and average marginal effects are in brackets. ∗ p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01. 40 Appendix Figure A1: State Location as of 1910 Census (a) Preferred Matches (b) Five Best Matches Weighted N = 1,487 N = 1,501 (c) Ferrie Matches N = 523 Percent of Matches +10% 8-10% 6-8% 4-6% 2-4% None or 0-2% Notes: See notes to Figure 3. 41 Figure A2: State Location as of 1930 Census (a) Preferred Matches (b) Five Best Matches Weighted N = 4,109 N = 4,159 (c) Ferrie Matches N = 1,379 Percent of Matches +10% 8-10% 6-8% 4-6% 2-4% None or 0-2% Notes: See notes to Figure 3. 42 Figure A3: State Location as of 1940 Census (a) Preferred Matches (b) Five Best Matches Weighted N = 4,699 N = 4,756 (c) Ferrie Matches N = 955 Percent of Matches +10% 8-10% 6-8% 4-6% 2-4% None or 0-2% Notes: See notes to Figure 3. 43 Figure A4: Comparison of Locations of Galveston Matches with Comparable Russian Immigrants in 1920 (a) Preferred Matches (b) Five Best Matches Weighted Correlation: 0.973; Correlation (drop NY): 0.975 Correlation: 0.981; Correlation (drop NY): 0.986 40.0 NY Share of Galveston Match Sample Share of Galveston Match Sample 40.0 20.0 IL 10.0 PA 5.0 OH CT 2.5 MA MD MN CA WI MO TX NE IACO 1.0 0.0 NJ MI INND WV KS RI LA OR WA GA DE ME OK VA NH SD AL SC TN WY MS KY VT MT AR UT AZ NM FL ID NV NC 0.0 1.0 NY 20.0 IL PA 10.0 MI NJ 5.0 WI 2.5 5.0 10.0 Percent of Russians 20.0 1.0 40.0 OH CT MD TX CA CO MO NE MN IAIN OR WAND ME RI WV KS NH GA LA WY KY DE AZ FL AL TN VA SD MS AR SC ID OK VT MT NV NM UT NC 0.0 2.5 MA 0.0 1.0 2.5 5.0 10.0 Percent of Russians 20.0 40.0 (c) Ferrie Matches Correlation: 0.976; Correlation (drop NY): 0.992 Share of Galveston Match Sample 40.0 NY 20.0 IL PA 10.0 MA 5.0 OH CT 2.5 1.0 0.0 NJ MI WI MD MN MO CA NE IACO INND RI WA WV KS ME OR GA NH VA MT DE OK SD KY TN LA WY SC AL MS VT UT FL AR ID AZ NM NC NV TX 0.0 1.0 2.5 5.0 10.0 Percent of Russians 20.0 40.0 Notes: This figure plots the share of Galveston men in the 1920 census located in a state against the share of non-Galveston Russian-born men of the same age and arrival cohort in a state, as of the 1920 census. Axes are on log-scales. Correlations between state-level shares are displayed above the figure, including and excluding New York state. Table A1: Match Parameters βConstant βBornRussia βAge βY earArrive βLast βF irst βLast×F irst βSpeakRussian βSpeakHebrew γArr γAge δAge Values 1111.49 644.13 -110.00 -390.18 807.12 295.07 847.11 226.73 329.22 3.26 1.10 12 Lower Limit 0 0 -1000 -1000 0 0 0 0 0 1 1 0 Notes: See Appendix A. 44 Upper Limit 3000 1000 0 0 1000 1000 1000 1000 1000 5 5 15 Table A2: Comparison of 1920 Location Results across Matching Methods Assigned Log(Pop.) 1910 Log(Immig.) 1910 Log(Russian Immig.) 1910 Log(Jewish+1) 1906 D(No Non-Fam. Located) 1920 Located 1920, Ship (%) Log(Num. Non-Fam. Located 1920) χ2 (LR test): Population χ2 (LR test): Galveston Immigrants Pseudo-R2 N (1) (2) Preferred Ferrie 0.946∗∗∗ (0.097) [0.0162] -0.039 (0.053) [-0.0007] 0.017 (0.052) [0.0003] 0.217∗∗∗ (0.042) [0.0037] 0.004 (0.030) [0.0001] 0.264 (0.515) [0.0045] 0.001 (0.002) [0.0000] 0.795∗∗∗ (0.061) [0.0136] 31.8∗∗∗ 219.8∗∗∗ 0.379 4,304 1.003∗∗∗ (0.171) [0.0176] -0.093 (0.085) [-0.0016] 0.144 (0.090) [0.0025] 0.193∗∗∗ (0.075) [0.0034] 0.036 (0.042) [0.0006] 0.986∗∗∗ (0.280) [0.0173] -0.004 (0.003) [-0.0001] 0.753∗∗∗ (0.100) [0.0132] 13.0∗∗ 75.3∗∗∗ 0.350 1,442 (3) Five Best Matches Weighted 0.655∗∗∗ (0.068) [0.0111] -0.026 (0.031) [-0.0004] -0.007 (0.034) [-0.0001] 0.140∗∗∗ (0.030) [0.0024] -0.005 (0.018) [-0.0001] -0.000 (0.001) [-0.0000] 0.895∗∗∗ (0.046) [0.0152] 24.8∗∗∗ 503.8∗∗∗ 0.383 4,352 Notes: See notes to Table 4. Standard errors are in parentheses and average marginal effects are in brackets. 45 ∗ p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01.