Data Analysis Contest

The Individual Life Experience Committee of the Society of Actuaries sponsored a Data Analysis Contest for the first time during the last quarter of 2018. The purpose of this Contest was to encourage SOA members and students, as well as the general public, to apply their data analysis and predictive analytics skills to a large, public dataset to test the dataset for issues, gaps, inconsistencies, outliers and problems.


  1. $5,000 ‐ Tommy Steed, FSA, Alyssa Columbus and Simon Hua
  2. $3,000 ‐ Winnie Liu, FCIA
  3. $2,000 ‐ Annie Wang, ASA


Data Analysis Contest

Listen to a podcast that highlights the Data Analysis Contest winning team.


Some Data Areas Requiring Investigation

There were a variety of data areas highlighted, and techniques used, regarding parts of the dataset that need additional validation:

  • One submission noted that the Actual to Expected ratio for Universal Life, on a counts basis, appeared out of line with other plan types
  • Nonsmokers who were in preferred class 2 (for those policies where there were only 2 potential preferred classes) had a significantly higher Actual to Expected ratio than expected
  • There was an interesting comparison between records on an ANB vs ALB basis, that highlighted a difference in results that needs some additional explanation
  • One of the submissions employed a tree model to show that a certain group of term policies has a very high Actual to Expected ratio
  • There are some records with a duration of 1, yet the Select/Ultimate indicator is set to Ultimate
  • Some records have zero exposures yet have death claims shown. (The ILEC was aware of this, but had not spent the resources yet to resolve it.)

About the Data Analysis Contest

The contest was designed to as to give students and members of the public the opportunity to apply predictive analytics and data mining techniques to test a very large dataset for inconsistencies and other potential data problems. While this dataset had gone through a rigorous deterministic validation process, the dataset had not been analyzed from a statistical and data science perspective as to what potential issues may exist with the data. The contest was successful in that it did generate some out-of-the-box thinking that allowed different approaches to be used to test the data.