February 2013

R Corner – R and Excel

By Steve Craighead

Steve Craighead I was asked by the editors of CompAct to describe why I use R in my actuarial models and compare that to my use of Excel. I use both software tools extensively, and the reason I choose one over the other is due to the modeling need.

I use Office products when I create models that will be reused by other actuaries. For instance, last year I designed an asset extract-transform-and-load (ETL) that was taken over by another actuary, so that my department could easily create the asset components for our ALM models. The design of this model was done using class modules inside of Excel.

However, when I need to analyze large stochastic simulations, I turn to R. R is very flexible in how it will accept data, and it has such a large library of packages (over 3,000) that I can examine large datasets with ease. A wide variety of graphic packages are available, yet I naturally gravitate to R for complex graphics, given its great flexibility.

Using R, I can easily customize my applications by starting with one or more packages. For instance, if I needed to build an application that could do efficient frontier on portfolios, I just do a Google search with the terms “Efficient Frontier R Cran.” This search leads me to the “fPortfolio” and “frontiles”, which I would examine and determine the package that is most relevant. The collection of Rmetric packages, of which “fPortfolio” is a member, is an excellent set to build upon for financial models.

I prefer R when I need to simulate various statistical models. For instance, several years ago, I created an entire ERM simulation system for a financial subsidiary in less than a day using R. The model had about eight major risks that were well-modeled just using basic statistical models and was aggregated with a normal copula. If I had done the same model in Excel, it would have taken me several days to accomplish the same task. I have since improved my copula models, so that I can use Excel directly, but R has the nice feature that I can access various Archimedean and Non-Archimedean copulas from different packages.

The packages are open source and this can be modified if necessary. Even if the source may be in FORTRAN or C, a large number of the packages still have the original R scripts that were later converted to speed up the package. This allows you to examine how the process works in R. Also, if you want to share your development, there are directions on how to build distributable packages.

I also prefer R when I need to do data mining and cluster analysis. I do use filters and pivot tables in Excel to analyze data, but once I have the data in my R workspace, I can easily ask very general questions about the data and create a variety of models on that data, without having to create special spreadsheets to examine the data. I can easily model multivariate time series, various multivariate regression models, and create many new advanced predictive models on that data.

When I create models, I prefer to use object oriented programming (OOP) by creating classes that mimic the reality of the underlying data. I can do this both in Visual Studio and in Excel, as well as R. Everything in R is an object because OOP underpins the entire modeling environment. VBA in Excel does not allow inheritance of classes, where R does.

VBA in Excel does not support multi core/multi threading, where using special packages in R, one of which is “foreach”, allows parallel processing.

I find it more difficult to debug Excel models, since I have to move from cell to cell within the workbook to determine what is being done, where I can edit and revise functions and methods more easily in classes. Though R does not have the same debugging or performance tuning tools as Visual Studio, you can capture and time intermediate results to find bugs and improve runtime.

When I study large data, I find that I have to either move it into Access or SQL Server databases, or directly import it into R. R will allow me to query against Access and SQL Server databases, if the data is too large for R, but I can examine some relatively large sets in R especially if I use 64-bit R in my 64bit laptop.

Some of my problems cannot be solved easily by the use of either R or Office products. For instance, if I need to solve stochastic differential equations (SDE), I can use R (with the Rmetric packages) or Excel to simulate the equations, but to actually carry out the calculus and algebraic manipulations for the solution, I must turn to a Computer Algebraic System (CAS) such as Maxima, so I can easily manipulate my formulas. I have found that as I have aged, I make more and more mistakes when I do algebra, so my dependence on CAS software has increased. Once I have solved the SDE, if I need to create high-speed models based on the SDE, I may write the simulations in C++ or C#. Some CAS packages will write this code for you.

Most of my actuarial problems fall into one of four major categories and I turn to separate tools to accomplish their solution. If I need to communicate and create a tool for others to use, I turn to Office solutions such as Excel and Access or to Visual Studio to create .Net applications. If I need to either conduct a study or create a statistical model, I turn to R. If I need to do extensive algebraic manipulations, I turn to a CAS. And finally, if I need to write a document containing a large number of mathematical formulae, I use a mathematical typesetting language such as LaTeX as my word processor. If the document has little to no mathematical content, I will use Word.

Though I am devoted to explaining the use and capability of the R language in my R Corner columns in CompAct, I find it is important to have a broad knowledge of many software and modeling languages to quickly pick the most efficient and useful tool for the specific problem at hand.

Steven Craighead, CERA, ASA, MAAA, is an actuarial consultant at Pacific Life Insurance. He can be reached at steven.craighead@pacificlife.com.