Comparing Excel Spreadsheets - Why?
Business and Data analysts, Developers, and QA testers often have to verify their data are correct. Microsoft Excel is used to create a snapshot of "good data". Reports are captured and copied into Excel, data is exported and distributed as the model to use.
SelectCompare allows you to compare Excel spreadsheets in the same way as other data sources. This post is going to show you the process of creating data connections to Excel workbooks and comparing data between two spreadsheets. Have a look at the Excel data comparison use cases - there are plenty more!
Where do I start?
First, you have to install an appropriate driver to connect to the Excel workbooks. It is an easy step to get you up and running.
Install the drivers
First thing is to install the drivers. Microsoft provides Data Connectivity Components which you should download to your machine if you don't have it installed yet. These components allow for connection to the Access database as well.
Check out the Install Instructions section on the link above - the installation is straightforward!
When you have the drivers installed, you will see them in the connection setup dialog in SelectCompare. Read on!
Create a new data connection to the Excel workbook
Note - the process below uses the built-in system dialogs for managing the database connections. These dialog windows are very dated. It is easier and faster to paste the following connection string to the connection edit box:
Provider=Microsoft.ACE.OLEDB.12.0;Data Source=path_to_the_xlsx_file;Extended Properties="Excel 12.0"
Go to the Connections page and Click the icon. You will be brought to the data connection creator page.
To create a connection to an Excel file, carry out the following steps:
Click the Create button on the right-hand side of the Connection string edit box.
Select <other> Data Source and select .NET Framework Data Provider for OLEDB as shown in the illustration.
Select the "Microsoft Office 12.0 Access Database Engine OLE DB Provider" from the drop-down list that appears on the screen. Don't worry, it works with Excel too.
Enter the name of the Excel Workbook you want to use.
You can import the Northwind sample database to Excel or use your spreadsheet. I converted the downloaded xls file to the newest xslx format.
Now, there is a trick: if you click the "Test Connection" button on the dialog, you will see this error message:
In such a case, close the error message and click the "Advanced" button at the bottom of the dialog.
Enter "Excel 12.0" to the Extended Properties line:
Close the Advanced Properties dialog and test the connection again. You should see the confirmation of the successful connection.
Note: The installation instructions section on the driver download page lists what values are appropriate for certain formats of Excel spreadsheets:
When you close the connection wizard, the connection string will be populated in SelectCompare's Connection string field. You can click the "Test Connection" button to verify that the connection string works as expected in SelectCompare.
The last step is to name the connection and assign it a color of your preference.
OK - your first connection to Excel is created!
Now it is time to create a connection to the second Excel spreadsheet.
Clone data source connection
The good news is that you don't have to repeat all the steps manually. You can clone the connection by clicking the icon. A new connection definition will be created, with the name "Copy of " + your connection definition name.
Open the cloned connection definition and change its name and color if you like, and then change the connection details.
You don't have to open the connection configuration dialog windows. Just copy the connection string from the first connection and paste it into the cloned one!
In my case the connection string looks as follows:
Provider=Microsoft.ACE.OLEDB.12.0;Data Source=c:\Projects\DataFiles\Northwind1.xlsx;Extended Properties="Excel 12.0"
Paste the connection string to the appropriate place on the Edit connection page and modify the path of the file. Press the "Test Connection" button to verify that you have the information correct. That's it!
Create a data comparison project
The next step is to create the actual comparison project. Navigate from the Connections to Compare Projects page. Click icon.
The full comparison project process has been described in the Data comparison project manual blog post. In short, we need to select data connection definitions (which we created above) and write select statements for each of the connections.
I created a copy of the Northwind1.xslx file named Northwind2.xlsx and made some changes to the order details, and customers list. I also made some changes in the Northwind1.xslx data to make the comparison more interesting.
The query I used is the same on both sides of the comparison - but it doesn't have to be.
Note: The only condition is that column names in both queries have the same names.
Here's the query.
select c.CustomerID, c.CompanyName, a.OrderID, round(sum(UnitPrice * Quantity * (1 - Discount)), 2) as Subtotal from [Orders_details$] as a inner join (select o.OrderID, c.CustomerID, c.CompanyName from [Orders$] as o inner join [Customers$] as c on o.CustomerID = c.CustomerID) as b on a.OrderID = b.OrderID group by a.OrderID, c.CustomerID, c.CompanyName
The Spreadsheet comparison definition page looks as follows:
The comparison project uses Northwind1 and Northwind2 connections and defines the OrderID column as the comparison key.
And here's a screenshot of the data comparison of two Excel sheets:
The icons on the top of the results grid allow you to select the visibility of data, so you can focus on the differences of a specific kind. By default, rows with no differences between the source and target are invisible. You can also quickly check the number of rows belonging to each different category under the filter icon.
You can also export the data to Excel for further analysis - visit the post Create Exel Baseline For Your Data
For some examples of queries against Excel spreadsheets, please read the post-WriteWrite Select Queries for Excel Spreadsheets