Kasey Buckles is more of an economist than a family genealogist. Most of her past work explores the economics of the family, demography, and child health.
But she decided to try the genealogy website FamilySearch because she was working with Brigham Young University economist Joseph Price on a study of intergenerational mobility. Buckles knew how difficult it can be to track and link the historical records of one person over time, especially women who change names when they marry.
She decided to look up her great-grandmother, and was surprised to see that some of her U.S. census records were already attached to her profile on FamilySearch. In 1910, the 2-year-old was listed as Mary L. Gaddie. A decade later, she went by her middle name of Lettie. And by 1940, she was a married woman: M. Lettie Caswell.
Buckles knew traditional research methods that attempt to trace a person by following the same name over time would have failed to make the connections.
“I had my aha moment when I looked at my great-grandmother and saw all the work that other people had already done,” Buckles said. “And then I did get into it, because it is a little addicting.”
The Notre Dame professor in the Department of Economics was able to use the research to revisit family memories with her grandmother before she died in 2019. “We had this really great afternoon,” Buckles said, “where I was able to tell her things about her past that she had forgotten or had never known.”
Other people, likely relatives Buckles doesn’t know, had used their knowledge of family history to connect her great-grandmother’s changing names. Working with Price, she realized that this goldmine of crowdsourced family knowledge could be used to build a powerful tool for all kinds of long-term research.
With funding from the National Science Foundation and the Russell Sage Foundation, Buckles and Price created the Census Tree, a digitized database that uses genealogy research and machine learning to improve census linking from 1850 to 1940. The Census Tree website went live in late July 2023.
The same month, Buckles and Price presented the findings of their study of intergenerational mobility, the first working paper to use the data, at two sessions of the National Bureau of Economic Research’s Summer Institute. Notre Dame doctoral student Haley Wilbert is also a co-author on the paper, along with Zach Ward of Baylor University.
Buckles said creating the Census Tree required a huge team, including dozens of undergraduate students from both Notre Dame and the BYU Record Linking Lab, multiple economics doctoral students from Notre Dame, and Cornell doctoral student Adrian Haws.
“This effort will link people across the censuses in a way that allows you to see them through the course of their life, and to see how their experiences—their early life, world events, public policies—have shaped them in a way we haven’t been able to do before,” Buckles said. “Our innovation is our connection to people doing their own genealogy research. I think this is an exciting symbiotic relationship between the public and academic researchers.”
Inclusive research
The challenge in long-term research using census records is that each one is a snapshot in time a decade apart. That limits the ability to track people over time and across families. One method to link people over time uses Social Security numbers or tax records, both of which raise privacy issues and are not available in most historical records.
To solve this problem, researchers track people by identifying matches on first name, last name, birth place, and birth year. The results work reasonably well for white men, but not as well for minorities and barely at all for women. The underrepresentation of these group can affect a study’s conclusions.
For example, Buckles, Price, Wilbert, and Ward looked at intergenerational mobility to understand to what extent the circumstances of your birth determine how you turn out in life. Buckles put it this way: “How likely is it that people who are born into low-income or low-status families reach higher levels of achievement? Does the American Dream exist—when, and for whom?”
To answer that question, one measure compares the status of a person’s occupation with that of their parents. Wilbert said there can be a big socioeconomic difference between a white farmer in the north and a Black farmer in the South.
“So we use a measure that incorporates not only occupation, but also literacy measures, immigration status, gender, and region that you live in,” Wilbert said.
The Census Tree links allowed the sample size for this historical research to grow to hundreds of millions of links, and to include women for the first time.
“When people would estimate these kinds of correlations between parents and children in the past, they would do it only for white men,” Buckles said. “And it actually looks like economic mobility hasn’t changed much over the time or, if anything, it’s gotten worse.”
But Black Americans used to have very little mobility until more recent times. And women weren’t included at all, even though their presence in the workforce has increased drastically in the last 50 years. Including these previously ignored groups flipped the results.
“With data that includes Black Americans, but also women, and other immigrant groups, we can see that mobility better—and see that it actually has increased over time,” Buckles said. “Relative to the late 1800 and early 1900s period that we’re studying, today it is actually easier to have an outcome that is different from your parents.”
Crowdsourced input
FamilySearch, one of the largest genealogy websites in the world, began as an effort of the Church of Jesus Christ of Latter-day Saints. It’s free and not for profit, which is why it may be less familiar than advertised sites like Ancestry.com.
But FamilySearch has more than 12 million users and nearly 1.4 billion profiles of deceased people. Users can build their family tree and search through and combine with the efforts of others, as well as attach scans of actual historical records. The site prompts users to confirm data and avoid duplications.
BYU’s Price is a natural partner, as he and Buckles have collaborated on research projects for about 15 years due to their common interest in understanding families. They realized that highly motivated amateur genealogists had already made links between decennial censuses that could be used to track individuals. There were 133 million pairs among the 1850-1940 censuses for men, and 121 million for women.
While politicians, social scientists and others use census information for many reasons, the data does not include personal identifiers like names until 72 years after they are collected. That means that the 1950 census is the most recent one with information that can be linked to individuals.
Wilbert, who knew Price at BYU as an undergraduate, started on the project as Buckles’ research assistant and has worked on data creation and analysis. The genealogy data can be used to train an algorithm to identify additional matches, and to recognize variations like nicknames or place misspellings. She said using hand-linked data to inform machine learning makes her confident that it’s correct and is representative of the population.
“I had already worked with census data before, and I really liked the idea of bringing voices to those that historically haven’t been seen,” Wilbert said. “This is a project which highlights women in that time period as well as multiple minority groups.”
The complete Census Tree dataset has about 330 million census pairs for men and 270 million for women—a huge treasure of data. “We are able to take the wisdom of the crowds and learn a lot and make links that people have not been able to make before,” Buckles said.
The Census Tree can identify a link, for example, between the 1920 and 1930 census that may be someone’s grandmother. If they confirm it’s the right person, a new link or “crosswalk” is created.
“It has this really nice application for the public, for people who want to learn about their own family histories,” Buckles said. “And then in turn, researchers learn from all the people in the public who are doing this kind of work on their own.”
Untapped potential
The Census Tree website will make the links public, which Buckles hopes will unlock a wave of new research.
Her next project will look at the long-term effects of alcohol prohibition on kids. She will use the data to identify kids in the 1910 census, some of whom lived in a dry state or county before the 1920 Prohibition amendment, and see if their circumstance had an effect on their educational attainment, occupation or other measurable outcomes.
“To do this, we can download the 1910 and 1940 censuses, and then download our Census Tree crosswalks,” Buckles said. “And then we’d be able to take all of these people and know both what their childhood environment was like in terms of exposure to prohibition, and also know important things about how they turned out.”
Other researchers could use the data to study the impact of major public works projects, such as the improvement of water quality. Others might look at natural disasters and how that affected people’s life course.
“Honestly, it’s up to the user’s imagination,” Wilbert said. “The most obvious is looking at policy changes or major events. You can use it to look at how the Great Depression impacted personal migrations across the states.”
Social scientists and historians could study major policy initiatives like health interventions or the introduction of welfare initiatives. Demographers could study data about birth order or family size and how these factors affect outcomes in life.
“Historians, sociologists, anthropologists, politicians, all the social sciences . . . we hope that researchers across those communities find these useful,” Buckles said. “The fact that we’ve already had some success in economic history is encouraging.”
Buckles said she and her collaborators never considered not making the data public. She feels lucky to be at a stage in her career where she can benefit others and leave the field better off.
“There is so much that can be done here and we couldn’t possibly do it all,” she said. “I am as excited to see what other people do with the data as I am to work with it myself. This really feels like a contribution that is going to be far beyond any one paper that I might write.”