Comparing Excel Spreadsheets - Why?
Business and Data analysts, Developers and QA testers often have to verify their data is correct. Microsoft Excel is used to create a snapshot of "good data". Reports are captured and copied into Excel, data is exported and distributed as the model to use.
SelectCompare allows you to compare Excel spreadsheets in the same way as other data sources. This post is going to show you the process of creating data connections to Excel workbooks and comparing data between two spreadsheets.
Where do I start?
First, you have to install appropriate driver to connect to the Excel workbooks. It is an easy step to get you up and running.
First thing is to install the drivers. Microsoft provides 2007 Office System Driver Data Connectivity Components which you should download to your machine if you don't have it installed yet. These components allow for connection to the Access database as well.
Check out the Install Instructions section on the link above - the installation is really straightforward!
When you have the drivers installed, you will see them in the connection setup dialog in SelectCompare. Read on!
Create new data connection to Excel workbook
Go to Connections page and Click the icon. You will be brought to the data connection creator page.
To create a connection to an Excel file, carry out the following steps:
Click the Create button on the right-hand side of the Connection string edit box.
Select <other> Data Source and select .NET Framework Data Provider for OLEDB as shown on the illustration.
Select the "Microsoft Office 12.0 Access Database Engine OLE DB Provider" from the drop down list that appears on the screen. Don't worry, it works with Excel too.
Enter name of the Excel Workbook you want to use.
You can import Northwind sample database to Excel or use your own spreadsheet. I converted the downloaded xls file to the newest xslx format.
Now, there is a trick: if you click "Test Connection" button on the dialog, you will see this error message:
In such case, close the error message and click "Advanced" button at the bottom of the dialog.
Enter "Excel 12.0" to the Extended Properties line:
Close the Advanced Properties dialog and test the connection again. You should see the confirmation of the successful connection.
Note: The installation instructions section on the driver download page lists what values are appropriate for certain formats of Excel spreadsheets:
When you close the connection wizard, the connection string will be populated in SelectCompare's Connection string field. You can click Test Connection button to verify that the connection string works as expected in SelectCompare.
The last step is to name the connection and assign it a color of your preference.
OK - your first connection to Excel is created!
Now it is time to create connection to the second Excel spreadsheet.
Clone data source connection
The good news is that you don't have to repeat all the steps manually. You can clone the connection by clicking the icon. A new connection definition will be created, with name "Copy of " + your connection definition name.
Open the cloned connection definition and change its name and color if you like, and then change the connection details.
You don't have to open the connection configuration dialog windows. Just copy the connection string from the first connection and paste it to the cloned one!
In my case the connection string looks as follows:
Provider=Microsoft.ACE.OLEDB.12.0;Data Source=c:\Projects\DataFiles\Northwind1.xlsx;Extended Properties="Excel 12.0"
Paste the connection string to the appropriate place on the Edit connection page and modify the path of the file. Press Test Connection button to verify that you have the information correct. That's it!
Create data comparison project
The next step is to create the actual comparison project. Navigate from Connections to Compare Projects page. Click icon.
The full comparison project process has been described in the Data comparison project manual blog post. In short, we need to select data connection definitions (which we created above) and write select statements for each of the connections.
I created a copy of the Northwind1.xslx file named Northwind2.xlsx and made some changes to the order details, and customers list. I also made some changes in the Northwind1.xslx data to make the comparison more interesting.
The query I used is the same on both sides of the comparison - but obviously it doesn't have to be.
Note: The only condition is that column names in both queries have the same names.
Here's the query.
select c.CustomerID, c.CompanyName, a.OrderID, round(sum(UnitPrice * Quantity * (1 - Discount)), 2) as Subtotal from [Orders_details$] as a inner join (select o.OrderID, c.CustomerID, c.CompanyName from [Orders$] as o inner join [Customers$] as c on o.CustomerID = c.CustomerID) as b on a.OrderID = b.OrderID group by a.OrderID, c.CustomerID, c.CompanyName
The Spreadsheet comparison definition page looks as follows:
The comparison project uses Northwind1 and Northwind2 connections, and defines OrderID column as comparison key.
And here's screenshot of data comparison of two Excel sheets:
The icons on the top of the results grid allow you to select visibility of data, so you can focus on the differences of specific kind. By default, rows with no differences between source and target are invisible. You can also quickly check the numbers of rows belonging to each difference category under the filter icon.
You can also export the data to Excel for further analysis - visit the post Create Exel Baseline For Your Data
For some examples of queries against Excel spreadsheets, please read the post Write Select Queries for Excel Spreadsheets