April 2014

Tackling the “Enormous” Data Problem

By Albert Moore

Albert MooreData Analytics, Big Data, Predictive Modeling and other buzz words all center around the fact that the computer revolution, spawned a digital and data revolution. The data revolution has spawned a growing science seeking to address ways to put all this data to its most effective use. After all, data is no good if it cannot be used to generate investigation, answer questions or provide insight.

I admit I am a data junkie! I cannot read enough. I take every course possible. I recently completed an online course offered by my alma mater, “Tackling the Challenge of Big Data.” None of the data scientists were actuaries, but I saw great applicability to the data challenges we face.

I was first turned on to data analysis in 1992. I was six years into my actuarial career when I was approached by a professor from the University of Cincinnati needing someone to analyze data for him. Students in three inner-city public school districts, in two different states, were going to be tracked to evaluate the effectiveness of a program targeting at-risk youth. The study was conducted over five years.

There were five different questionnaires offered to the students. Each school system was further broken into a participant group (those impacted by the intervention) and a control group (those students not part of the program). This longitudinal study helped shape meaningful changes in policy and also helped determine where limited monetary resources could have the most impact.

I was advanced payment to purchase my first PC. It was a screamer: a Pentium 60, with more storage than anyone could ever use (40 megabytes)! MiniTab, SAS, along with Excel provided all the tools I needed.

Fast forward about two decades and my Pentium 60 screamer is still used by my kids, but 40 megabtyes could not hold one of my Excel files. Storage for the casual PC user is now measured in terabytes. Google, Amazon, Facebook and the federal bureaucracy, along with countless other entities, are the collectors of previously unimagined personal, business, behavioral, and scientific data.

In addition to the rapid growth of the volume of the data, the nature of the data has changed. Web browsing behavior is tracked and stored to target marketing. Google, Yahoo, and other search engines must maintain libraries of Web content. Facebook and LinkedIn need to track the degrees of separation and relational data. Geo-tracking has generated the need for monitoring GPS signals every second of the day. Digital data is stored so that it can be referenced and related to traditional demographic and tabular data.

One cannot overlook the contribution of various social media to the proliferation of data. We “tweet” and “blog” and “chat” and “pin” and “post” and “surf.” Each action generates more data!

Did I forget all the online financial data and processing? Oh my … “Big Data” is a misnomer. We have an “Enormous Data” problem to tackle and solve.

How does all this relate to actuaries? I believe that actuaries are uniquely educated and prepared to analyze the vast stores of data. Actuaries possess both the theoretical statistical knowledge and the skills required to make sense of the information. Most importantly, because the life insurance industry in many ways has been slow to embrace the data revolution, actuaries in pensions, health and life organizations and firms are the best equipped to provide leadership in the adoption of data analytics to answer questions and gain insight from business information.

There are a growing number of seminars and sessions designed to introduce actuaries to Data Analytics. I do not mind shamelessly plugging the “Beyond Excel: Advanced Data Analytics” seminar planned May 21, 2014, the day following the Life and Annuity Seminar. This full-day seminar is sponsored by the Technology Section. This seminar will be less theoretical and more “hands-on” reviewing specific case studies that actuaries face in their day-to-day duties. Actuaries perform mortality and lapse studies, therefore Excel, Microsoft Access, R, SAS, SQL and other tools will be used to sufficiently to provide additional approaches for data analysis.

The newest versions of Excel have introduced tools that are under-utilized, but are very powerful. Yet, many find the advanced statistical tools lacking in Excel and need either more robustness or more flexibility. This seminar will attempt to balance both the how and why of various approaches to analyzing data commonly tackled by actuaries.

In addition to the seminar, I would like to begin a regular column which takes common data challenges and explores approaches that may be applied. Please email me with any data challenges you may be facing or have faced. In addition, if you have expertise in a particular area and would like to present how you solved a particular data challenge, I would like to give you space to present your approach.

Albert J. Moore, ASA, MAAA, is second vice president, Actuarial Systems at Ohio National Financial Services in Cincinnati, Ohio. He can be contacted at albert_moore@ohionational.com.