In this series we have explored Microsoft Excel from a data governance perspective, learning why Excel is not a great database. Now we turn our attention to Excel as an analytics platform and go into some of the reasons why Excel is not a great analytics platform.
Analytics in Excel are shared by copying the spreadsheet and re-sending it - When someone creates a spreadsheet with fancy charts and graphs and wants to show it to someone else there are two main ways of doing it - show it to the person on your screen, or send the spreadsheet and have that person open it and take a look. This creates several problems:
You might not want to give someone access to all the data in the spreadsheet, just to the charts and graphs. It is pretty difficult to block access to the underlying data while sharing just the charts and graphs unless you take screenshots of the charts and graphs themselves and send only those to another person.
The other party might ask for changes in the underlying data, assumptions, or visualizations. This means you have to take the base spreadsheet, make changes, and re-send it. This can get cumbersome if a lot of changes are happening and it might be difficult for the person on the other end to keep versions of their spreadsheets straight.
If your spreadsheet analytics assumptions are being pulled from several different departments you might need to send the base spreadsheet to someone else to make changes to the data, then copy it and make changes to the analytics, and then send it to the third party. Every step where the spreadsheet is copied by another person multiplies the number of copies and introduces possibility of problems.
It is difficult to expand greatly on analytics without creating a "monster spreadsheet" - The progression of doing analytics in spreadsheets starts with someone looking at a chart with a few data points in a spreadsheet. That person invariably says, "hey, can we cross-correlate this data with our website data?" or "hey, can I break this down into regions and view the detail from each region?" What then happens is the original spreadsheet is now stuffed with additional data, which prompts more requests for even MORE data until the spreadsheet becomes a monstrosity that takes forever to load and analyze.
Chains of functions and cell dependencies make it hard to know what's going on in a complex spreadsheet - An analytical spreadsheet might have chains of functions that span across dozens of workbooks. This complex web makes it quite difficult to follow the chain of reasoning and how the data is transformed from one place to another. This makes spreadsheets quite fragile - changing one value or one formula can have massive repercussions for the data in the rest of the spreadsheet. This leads to people being very unwilling to change spreadsheets for fear that they will "break" them, or it leads to a general mistrust of the data in Worksheet 27 because who knows how many miscalculations went into creating the data.
It is hard to "version control" a spreadsheet - As an analytics system becomes more complex it becomes very important to be able to see exactly how the data flows from one place to another and what calculations are performed upon it along the way to transform it into its final form. Traditional programming allows greater transparency into that process. The data is acted upon by well defined programs which can be tested and debugged and incrementally improved. This "version control" allows you to see quickly exactly HOW the analytics calculations have changed over time and even to revert to a previous version.
Visualization tools limited to what is provided by Excel - Excel has some nice looking charts and graphs, but sometimes you might want a different set of visualizations, maybe for aesthetic reasons or functionality reasons. To do so you have to export the data you want out of Excel and move it into a CSV or other format that can be imported into the new visualization tool.
This is not to say that one cannot use Excel for data analysis, and indeed we explore in our next article some good use cases for Excel. But one of the biggest barriers to organizational business intelligence growth is learning to set aside monster Excel spreadsheets in favor of better data management practices.