Socioeconomic and Demographic Covariate Data for Mortality Research - A Curated Dataset with Consistent, Multi-Source Integration

soa-research-societal-purpose-logo.png

September 2025

Authors

Linus Denny
Jianxi Su, PhD, FSA
Mengyi Xu, PhD, FSA, FIAA

Executive Summary

This report documents the construction of a merged dataset that compiles a wide range of socioeconomic and demographic variables from the American Community Survey (ACS) and the Decennial Census. The dataset is organized at the county, state, and national levels and is designed to support future research on mortality heterogeneity by providing consistent, well-documented covariates commonly used in mortality modeling.

Mortality plays a foundational role in actuarial practice, underpinning the design and management of life-contingent financial products. Understanding the relationship between mortality and socioeconomic factors is essential for promoting equity, improving risk assessment and supporting data-driven product and policy development. Although this dataset does not include mortality data itself, it provides a validated set of covariates that can be linked with external mortality outcomes to support a wide range of actuarial and public health analyses.

The construction process included consistent schema design, geographic identifier integration, and metadata documentation. A three-pronged validation framework was used to verify consistency across geographic hierarchies, internal coherence within aggregated variables, and alignment between ACS and Census sources. The results demonstrate strong data reliability, with most discrepancies traceable to known limitations of survey data such as sampling variation or non-additive fields.

Nonetheless, several limitations apply. These include potential changes in county boundaries, shifts in binning or variable definitions over time, and estimation error in reported margins of error—particularly when aggregating variables with unknown dependence structures. As with all survey-based data, common issues such as sampling error, nonresponse bias, and measurement inaccuracies should be considered when using the dataset for statistical analysis.

Despite these limitations, the dataset offers a carefully constructed resource for supporting future academic and industry research on mortality and its socioeconomic determinants. It is intended as a foundation for linking with external mortality data, enabling more robust and equitable analyses across populations and regions.

Material

Socioeconomic Demographic Covariate Data - Report

Socioeconomic Demographic Datasets

Acknowledgements

The researchers’ deepest gratitude goes to those without whose efforts this project could not have come to fruition: the Project Oversight Group and others for their diligent work overseeing the dataset development and reviewing and editing this report for accuracy and relevance. Project Oversight Group members:

Magali Barbieri, PhD
Carolyn C. Covington, FSA, CERA, MAAA
Jean-Marc Fix, FSA, MAAA
Robert M. Gomez, FSA, CERA, MAAA
Norman Niami, FCAS, MAAA
Murali Niverthi, FSA, MAAA
John W. Robinson, FSA, MAAA
Mark Spong, FSA, CERA, MAAA

At the Society of Actuaries Research Institute:

Stefanie J. Porta, ASA, MAAA, Consultant
Barbara Scott, Senior Research Administrator
Lisa A. Schilling, FSA, EA, FCA, MAAA, Director of Practice Research

Questions or Comments?

Give us your feedback! Take a short survey on this report. Take Survey
If you have comments or questions, please send an email to Research@soa.org