July 2013

The Importance of Data Flow Automation—Garbage In, Garbage Out

By Andrew Chan

Excel is a powerful analytical tool that we all use to build management reports, develop financial models and even massage data. However, some analysts still do not understand the importance of data flow automation and how to avoid “garbage in, garbage out.” An outstanding example was the JP Morgan CIO office’s multi-billion-dollar loss in 2012. One of the findings in their Report of JPMorgan Chase & Co. Management Task Force (page 124) is that “the model operated through a series of Excel spreadsheets, which had to be completed manually, by a process of copying and pasting data from one spreadsheet to another.” I don’t think JP Morgan is the only company whose analysts manually enter data into Excel.

Excel has several tools we can use to import data:

We can also find many third-party add-ins that we can use to import data from major software vendors, e.g., SAS, SAP, MATLAB … etc. I guess the problem is that they are too easy to use and we never bother to automate the data flow. However, the data flow automation is also straightforward from each vendor; most of them involve only one VBA function.

Import Text Files Using VBA
Workbooks.OpenText is the VBA method to load data into a new workbook:

Workbooks.OpenText filename:="DATA.TXT", dataType:=xlDelimited, tab:=True

Import XML Data Using VBA
To import using VBA is a two-step process. First we need to add an XML Map, and then use XmlImport to load XML data into workbook:

Dim Reservedetails As XmlMap

ActiveWorkbook.XmlMaps.Add "\\Reserves\Details.xsd", Reservedetails

ActiveWorkbook.XmlImport URL:="\\Reserves\Details_20130430.xml", ImportMap:= Reservedetails, Overwrite:=True, Destination:=Range("$A$1")

Import from Database Using VBA
There are a few ways to import data from a database using VBA. If the data is already loaded using the data connection wizard, then a simple Refresh command is all we need.


There may be a situation where we cannot directly connect to a database and we can only work with Recordset. In that case we can use the CopyFromRecordset function to load data into an Excel worksheet.

ActiveSheet.Cells.CopyFromRecordset rstReserves

Using recordset, we can also loop through every row and column in the recordset. We can do seriatim valuation, apply custom formatting to each individual cell, and even create another recordset/subreport.

Import from Web Page Using VBA
Both web queries and data queries output data to QueryTable in Excel; so we use the same function Refresh to reload data from a Web page.

It is quite straightforward to automatically load data into Excel. However, these VBA functions will not eliminate garbage data. For example, when we load data from a text file, how do we know the file is current? It may well be the same text file from last month! How can we avoid missing a new block of business? We face endless challenges to ensure clean data.

What is your biggest data flow automation challenge?

Andrew Chan, ASA, is a financial model engineer with ALG Consulting. He can be contacted at chan_a@algconsultings.com.