By Andrew Chan
Excel is probably the most popular business software application. As actuaries, we have used Excel to develop applications that handle data analysis, reports, models and many other business activities. However, many Excel applications do not have proper design and therefore suffer from poor performance, are difficult to enhance, and are error prone. I have migrated many Excel applications to enterprise systems and have seen many common problems. This article is focused on Excel application design and how to build a better Excel application.
Reinventing the Square Wheel
Before developing any application, we should always conduct due diligence to make sure it is cost effective to build an application and whether this is the best solution. We can use VBA to import data from a PDF report, develop an optimization model, predictive model, or a Monte Carlo simulation, etc. But most business users do not have the necessary skills or training to develop a high performing, well designed, robust application. The development cycle could take months or even years. They may not have the infrastructure in place to do a proper software development life cycle. Most Excel applications are developed, maintained, and operated by a single person. What if the application owner leaves the company?
Are we sure that there is no commercial software available?
Don't Jump to Coding
“ Without requirements or design, programming is the art of adding bugs to an empty text file. ” - Louis Srygley
This is probably the most common mistake; end-user computing (EUC) developers believe they know what the requirements are and start coding right away. The truth is, most EUC developers only remember a portion of the requirements. This eventually results in a development cycle that takes entirely too long. The project schedule may double or triple, and we still don't see the end of development. It is always a good idea to prepare a peer-reviewed requirements document.
Once we confirm the requirements, we can start the design. If we are a seasoned system consultant, then we may consider evolutionary design, e.g., design the system as we develop the system. However, most EUC developers are part-time programmers and do not have proper programming training. Evolutionary design can be a disaster. We might find that it is getting more and more difficult and time consuming to develop the application. Over time, we are going to add more bugs than new features without a proper design.
If we do a planned design before programming, then we can discover many of the issues that we should avoid and the application will be much more structured and easier to enhance in future.
Simple is Beautiful
“Programs must be written for people to read, and only incidentally for machines to execute.” - Abelson/Sussman
Most Excel users have difficulty understanding their own Excel formulas if it has not been looked at for a few months.
Have you seen an Excel formula that had 10 nested if functions? Or one with more than 100 different processes within a single formula, e.g. lookup data, compare results, calculations, etc. This is not an Excel formula, but a mini program inside an Excel cell. It could take another person a whole day to understand what it did.
Perhaps some logic was copied and pasted in another part of the Excel workbook. Sooner or later, we would need to enhance or fix the logic, good luck to us trying to find it.
Each Excel formula should only have one simple responsibility. We should divide a complex formula into multiple simple formulas, e.g.:
- A set of Excel columns that lookup all the required data;
- A set of Excel columns to handle comparisons;
- A set of Excel columns to handle calculations; and
- A few more Excel columns to consolidate all the above columns.
This approach offers better transparency; we can review the model easier. We do not need to copy and paste the same logic multiple times and it would be straightforward to reuse it in another Excel formula.
We should also design our Excel formulas in a way that they are open for extension but closed for modification. If we have 10 different reserve methods, adding a new one should only require adding a new Excel formula and all existing formulas should not be modified.
If we do not want our users to be overloaded with so many columns, then we can always hide some columns.
How someone can manage a worksheet with several hundred columns is always a mystery to me. I always have a hard time finding out what the columns are when I try to understand a formula. I would prefer to use multiple smaller worksheets and have each worksheet have a few logical related Excel Tables.
Instead of a single, gigantic Excel worksheet calculating sophisticated financial figures (e.g., actuarial reserves), we would create multiple worksheets that handle calculation for:
- Policy Illustration;
- Cashflow; and
Inside each worksheet, we may have different Excel tables. For example, we can have different Excel tables in the Cashflow worksheet calculating premium, expense, lapse benefits, death benefits, etc. Instead of using =D2-AC2-BH2-CD2, we can use structured references, e.g.,
This Excel formula is definitely easier to understand and we would be able to navigate to the data faster as well.
VBA should follow the same Excel formula's design principles.
My rule of thumb is that each VBA function should not have more than 20 lines, i.e., just good enough to fit in one screen. Any VBA function that has a few hundred to a few thousand lines is deemed to be difficult to understand, reuse, and enhance.
“Copy and paste is a design error. ” - David Parnas
Once, when I optimized a function that had 10,000 lines, I discovered more than 200 bugs. Most of them were due to copy and paste problems. The original developer copied and pasted the logic in many different places, so when the logic needed modifications, he had to go over each instance and make sure appropriate changes were made. Obviously, this was an error prone exercise and many bugs were created. It took me two months to factorize this function and fix all the bugs.
One major difference is that VBA can create objects which are more useful in describing business logic; however, I hardly ever see a VBA program adopting object oriented approach.
Many Excel applications contain a lot of data, i.e., parameters, input, results, etc. Some data is even embedded in an Excel formula. Storing data in Excel applications may be the worst hazard for change management. Any applications should be locked down to avoid any accidental changes. However, if data is stored in an Excel application, then each time we modify the data, we would be forced to change the Excel workbook itself. The ideal design is that the Excel applications do not contain any data. As matter of fact, this is how most enterprise applications behave; enterprise applications are decoupled from data, their data is stored in parameter files, input files, databases, or a data feed. In most cases, we will have a need to change the data, but not necessarily need to change the application.
No matter if it is the requirements document, design, or the Excel application itself, I always find it easier to find other people's error than my own. This is human nature, so it is crucial to have a peer review on every Excel application in detail, i.e., requirements, design, Excel formulas, VBA functions, etc. Deploying a bug can be very expensive; not just in the effort to fixit, but also in reputation damage.
Documentation should be part of the application; it should be well integrated into the application. We should have at least two sets of manuals, a user guide and reference.
All user controls, e.g., forms, buttons, and input fields, must have a hyperlink that points to the user guide. All Excel formulas and where the formulas are used must be documented in the reference manual.
It used to be a very time consuming job to create a reference manual and this is why most Excel applications do not have one. The good news is that Excel 2013 has come with Spreadsheet Inquire, a utility which is used to analyze an Excel workbook. This would drastically speed up the process of creating or maintaining a reference manual.
We need data before we can use Excel to analyze data, prepare reports, or run our models. Unfortunately, data is not always available and we have to gather, validate, and transform data. Excel is not the best tool to massage data; especially a lot of data. If you have tried before to vlookup multiple Excel tables where each has a few hundred thousand records, then you may appreciate why we should use a database.
Databases are designed to manipulate large volumes of data; it allows us to create indices to improve performance, use referential integrity to validate data and most database software include tools to extract, transform and load data from many data sources or feeds. It is always a good idea to store all our data in an enterprise database platform and let Excel handle the analysis.
Many Excel applications still require heavily manual intervention, i.e., copy and paste. We are in the 21st century, there is no reason why our applications are not automated. We should always design our applications so that it can be automated.
Automation can improve productivity, reduce operational risk and even promote team morale.
Performance by Design
Most Excel applications were not designed for performance; they could potentially run for hours or even days. Someone may argue that they only run the application once a month so runtime does not really matter. But there may come a time when senior management wants more scenarios and they want the results tomorrow. This happens all the time in today's dynamic business world. So when we design our application, we must always consider how to optimize the performance.
Reorganize the Process
This is always my first target: understand the process and determine if there is any opportunity for performance improvement. For example, a valuation process I once looked at looped through every policy and each policy included multiple benefits. The program had to load and initialize assumptions before they could calculate the financial results for benefit. I noticed the assumptions were identical under each benefit so I changed the order of the process. It would first loop through benefits and then loop through each policy. The assumptions were only loaded and initialized once rather than a few million times. The final output was identical but now we realized a significant performance gain.
Minimize Interaction between Excel and VBA
There is overhead involved every time there is an interaction between Excel and VBA so we should always minimize such interactions.
I have no idea why someone would do such a calculation in VBA when we can write a simple Excel formula to do the same functionality. They would produce the same results except VBA would execute much slower than an Excel formula.
Write Block of Data in a single command
Many Excel VBA programs write data cell by cell to Excel. If we have 500,000 records and each record has 10 columns, then we would have 5,000,000 writes. There are VBA functions that can dump all of the data to Excel in a single command, e.g., CopyFromRecordset. Writing or reading data in a single command can be a huge performance improvement opportunity.
Turn Off Notifications Before Running VBA Program
‘Turn off some Excel functionality so your code runs faster
Application.ScreenUpdating = False
Application.DisplayStatusBar = False
Application.Calculation = xlCalculationManual
Application.EnableEvents = False
‘note this is a sheet-level setting
ActiveSheet.DisplayPageBreaks = False
Eliminate Redundant Calculations
Let’s say cell A1 contains annual premium and we are doing a 100-year monthly projection. We definitely do not want to divide the same value 1,200 times. Instead, place the monthly premium in another cell and link the projection to this cell instead. Examine the formulas in the application and you might find many redundant calculations! Another less obvious example is the calculation on an entire column, e.g., sum(A:A); this would sum up one million rows even if we only have a couple thousand values.
If you Google about Excel performance, you will find many more tips. The key is when we design our Excel application, we must consider the performance!
The following article is a good one to start with.
Virtually all Excel users claim they know PivotTables but you may be surprised to find out how many of them have actually used PivotTable in their day-to-day analysis. This is probably the most underutilized Excel feature.
PivotTable allows us to analyze, present, summarize, slice, and dice data; we can discover business insights in lighting speed. PowerPivot is a FREE Microsoft Add-In introduced with Excel 2010; we can create our own analytic functions, load multiple tables, and as many records as your computer memory can handle. Why build your own filters and aggregation when Excel already delivers PivotTable/PowerPivot.
If PivotTable is the most underutilized Excel feature, then Excel Table is probably the most misleading name. Excel table is not a Data Table and it was a new feature introduced with Excel 2007. Many Excel users believe Excel Table is just a table with rows and columns of data. Excel Table is a powerful Excel feature that supports:
- Header rows—we can filter, sort or refer by header name;
- Calculated columns—instantly applied the same formula to the whole column;
- Total row—provide summary functions, e.g., Average, Count, Sum, etc.;
- Data validation—Allows only number or date; and
- Structured reference—instead of using A1 or R1C1, we can specify the column header name in the formula.
A lot of Excel objects are used in the VBA functions, e.g., application, worksheet, range, etc.; they are all very handy to use. We can also create our own business objects by using the Class Module. Our own custom object can contain variables and functions; like other Excel objects.
Custom objects allow logical related variables and functions group together; we will find complex logic much easier to understand and maintain. This also helps debugging and code reuse.
There are surely many more tips and guidelines that I did not cover and I look forward to hearing other suggestions and feedback from you.
Andrew Chan, ASA, is a financial model engineer with ALG Consulting. He can be contacted at firstname.lastname@example.org.