My PA Certificate Experience
By Justin Serebro
Actuary of the Future, March 2021
This article is about my experience taking the SOA’s Predictive Analytics Certificate (PA Certificate) and contains advice for those considering enrolling in this program. If you don’t have interest in Predictive Analytics, then this article is not for you. If you are still reading this then kudos! I’ll assume you want to hear about my Predictive Analytics journey. Let’s begin with why I signed up for the PA Certificate in the first place.
Why Did I Do This?
There are many reasons why I signed up for the PA Certificate, but I will limit the discussion to the following three:
- Learn statistical techniques I could apply directly to my job,
- improve my visualization skills when presenting information to others, and
- refine my technical skills.
Learn Statistical Techniques I Could Apply Directly to My Job
Prior to the PA Certificate, my exposure building statistical models was limited to time series models. In college I took an Economic Forecasting Course where I built AR, MA, ARMA, and ARIMA models using STATA. This time series modeling experience is helpful for attempting to forecast inflation, stock prices, or interest rates (time dependent variables). However, time series modeling isn’t helpful for a lot of insurance applications such as assessing what characteristics make policyholders more likely to die. Additionally, the PA Certificate was developed by actuaries and is focused on insurance applications (we are actuaries after all, and actuaries primarily work in insurance). Learning relevant statistical techniques that I could apply to my job was a key draw.
Improve My Visualization Skills When Presenting Information to Others
Data visualization provides a clear, succinct way for people to understand data. It essentially takes the raw data and presents it in a way that is easier to understand trends, patterns, and outliers. Prior to the PA Certificate I usually used tables to display information to key stakeholders. Tables helped get my points across clearly and explain the drivers/intuition behind the result. However, tables can get large and are not as easy to understand as a simple visual like a bar chart. A former boss suggested leveraging data visualization to enhance my storytelling. From reading the syllabus, I knew that the PA Certificate had a significant focus on data visualization techniques. Enrolling in the PA Certificate was the ideal way for me to enhance my presentation skills.
Refine My Technical Skills
The spectrum of programming skills is very large. There are those who can code in VBA and others who can write programs in C++, Python, and R (no offense to VBA). My coding experience was limited to VBA and some Java prior to working full time as an actuary. A colleague opened my eyes to power of using Python and R instead of VBA. In response, I began self-learning Python in order to help automate and enhance several of my work processes. This mainly consisted of Googling and trying out various code to complete the automation. I realized that taking a formal programming course in Python or R would provide a better foundation and the PA Certificate provided that training.
What Data Science Applications Does it Cover?
The PA Certificate is quite an undertaking. There are hundreds of slides per module and some of the content takes multiple read throughs in order to really understand it. I will focus on the following three topics at a high level:
- Advanced Data Exploration Techniques,
- Feature Generation and Selection, and
- Building Models.
Advanced Data Exploration Techniques
Many machine learning and advanced predictive modeling techniques can automatically find patterns in data. However, machines can’t distinguish between patterns that are meaningless and those that are meaningful. Therefore, we need to perform data exploration. Here are some of the key benefits of data exploration:
- Understand basic relationships in the data.
- Check relationships in the data against common knowledge and intuition to identify potential data errors.
- Identify outliers and understand their potential effects on the model.
- Inform variable transformation and modeling choices that can improve model performance.
There are a lot of data exploration techniques explored in detail in the PA course:
- Univariate exploration: how one variable influences a prediction.
- Bivariate exploration: how the interaction between two variables influence a prediction.
- Principal Component Analysis: way to summarize data containing many variables into fewer variables while retaining a high level of information.
- K Means Clustering: method used to assign each observation to one of k groups (where k is specified by the model developer).
Data visualization techniques are also covered in depth as it is useful in data exploration.
Feature Generation and Selection
Feature Generation is about finding ways to make predictive models perform better than they would if built from the original data. It relies on creating features based on the underlying variables where the features serve as the final input into the model. Feature generation can be done via transformations.
Feature Selection is about selecting specific features or variables that can improve model performance rather than using all the features and variables. There are two main types:
- Filter-based feature selection: use statistical tests (correlation and mutual information) to determine a subset of features that has high predictive power.
- Algorithmic methods: models perform feature selection as part of the model fitting process (regularized regression and decision trees).
The most critical part of developing a model is making sure that it is suitable in the context of the business problem. There are different categories of models depending on the type of problem you are trying to solve.
- Supervised learning: occurs when there is a specific variable you want to predict. If the prediction is a category then you use a classification model. However, if it is a specific real value then you use a regression model.
- Unsupervised learning: occurs when you want to understand some structural element of the data.
The PA Certificate then goes through different types of models that data scientists use for each type of business problem (Generalized Linear Models and Decision Trees). Further, it shows how we can enhance our predictive models by using Ensemble methods (bagging and boosting) in which we build many models on random subsets of the data and take the answer in aggregate.
I highly recommend the PA Certificate to anyone interested in advancing their data science skillset. This course in isolation does not prepare an actuary to be a data scientist; however, it does teach common data science models and how to build them. The content covered in the PA Certificate is foundational knowledge that any actuary can utilize (valuation, pricing, risk, or non-traditional) to enhance their work product.
Statements of fact and opinions expressed herein are those of the individual authors and are not necessarily those of the Society of Actuaries, the editors, or the respective authors’ employers.
Justin Serebro, ASA, is a valuation actuary at Pacific Life as well as an active volunteer with the SOA. He can be reached at email@example.com.