Compare Excel Spreadsheets

Comparing Excel Spreadsheets - Why?

Business and Data analysts, Developers, and QA testers often have to verify their data are correct. Microsoft Excel is used to create a snapshot of "good data". Reports are captured and copied into Excel, data is exported and distributed as the model to use.

SelectCompare allows you to compare Excel spreadsheets in the same way as other data sources. This post is going to show you the process of creating data connections to Excel workbooks and comparing data between two spreadsheets. Have a look at the Excel data comparison use cases - there are plenty more!

Where do I start?

First, you have to install an appropriate driver to connect to the Excel workbooks. It is an easy step to get you up and running.

Install the drivers

First thing is to install the drivers. Microsoft provides Data Connectivity Components which you should download to your machine if you don't have it installed yet. These components allow for connection to the Access database as well.

Check out the Install Instructions section on the link above - the installation is straightforward!

When you have the drivers installed, you will see them in the connection setup dialog in SelectCompare. Read on!

Create a new data connection to the Excel workbook

Note - the process below uses the built-in system dialogs for managing the database connections. These dialog windows are very dated. It is easier and faster to paste the following connection string to the connection edit box:

Provider=Microsoft.ACE.OLEDB.12.0;Data Source=path_to_the_xlsx_file;Extended Properties="Excel 12.0"

 

Go to the Connections page and Click the  icon. You will be brought to the data connection creator page.

SelectCompare - create new data connection to compare Excel spreadsheet with other data

To create a connection to an Excel file, carry out the following steps:

Click the Create button on the right-hand side of the Connection string edit box.

Select <other> Data Source and select .NET Framework Data Provider for OLEDB as shown in the illustration. 

Comparing Excel spreadsheets requires creation of a data source. This screen allows to create it.

Click OK.

Select the "Microsoft Office 12.0 Access Database Engine OLE DB Provider" from the drop-down list that appears on the screen. Don't worry, it works with Excel too.

Microsoft.ACE.OLEDB.12.0 OLEDB driver selection for comparing Excel spreadsheets

 Enter the name of the Excel Workbook you want to use.

Data connection properties - Excel spreadheet file name specification for data comparison

Note: I downloaded a sample Northwind spreadsheet from Excel Dashboard Software for the needs of this article.

You can import the Northwind sample database to Excel or use your spreadsheet. I  converted the downloaded xls file to the newest xslx format.

Now, there is a trick: if you click the "Test Connection" button on the dialog, you will see this error message:

Cannot open database ''. It may not be a database that your application recognizes, or the file may be corrupt.

In such a case, close the error message and click the "Advanced" button at the bottom of the dialog.

Create Excel connection for data comparison - Advanced settings configuration screen

 Enter "Excel 12.0" to the Extended Properties line:

Advenced settings, Extended Properties: Excel 12.0

Close the Advanced Properties dialog and test the connection again. You should see the confirmation of the successful connection.

Note: The installation instructions section on the driver download page lists what values are appropriate for certain formats of Excel spreadsheets:

Screenshot with documentation of the Excel data connection parameters

When you close the connection wizard, the connection string will be populated in SelectCompare's Connection string field. You can click the "Test Connection" button to verify that the connection string works as expected in SelectCompare.

Test connection to Excel spreadsheet for data comparison

 The last step is to name the connection and assign it a color of your preference.

 Excel data comparison connection setup complete

OK  - your first connection to Excel is created!

Now it is time to create a connection to the second Excel spreadsheet. 

Clone data source connection

The good news is that you don't have to repeat all the steps manually. You can clone the connection by clicking the Clone data connection definition icon. A new connection definition will be created, with the name "Copy of " + your connection definition name.

Open the cloned connection definition and change its name and color if you like, and then change the connection details.

You don't have to open the connection configuration dialog windows. Just copy the connection string from the first connection and paste it into the cloned one!

In my case the connection string looks as follows:

Provider=Microsoft.ACE.OLEDB.12.0;Data Source=c:\Projects\DataFiles\Northwind1.xlsx;Extended Properties="Excel 12.0"

Paste the connection string to the appropriate place on the Edit connection page and modify the path of the file. Press the "Test Connection" button to verify that you have the information correct. That's it!

Create a data comparison project

The next step is to create the actual comparison project.  Navigate from the Connections to Compare Projects page. Click Add new comparison project icon.

The full comparison project process has been described in the Data comparison project manual blog post. In short, we need to select data connection definitions (which we created above) and write select statements for each of the connections.

I created a copy of the Northwind1.xslx file named Northwind2.xlsx and made some changes to the order details, and customers list. I also made some changes in the Northwind1.xslx data to make the comparison more interesting.

The query I used is the same on both sides of the comparison - but it doesn't have to be.

Note: The only condition is that column names in both queries have the same names.

Here's the query.

select c.CustomerID, c.CompanyName, a.OrderID, 
round(sum(UnitPrice * Quantity * (1 - Discount)), 2) as Subtotal from [Orders_details$] as a inner join
(select o.OrderID, c.CustomerID, c.CompanyName from [Orders$] as o inner join
[Customers$] as c on o.CustomerID = c.CustomerID) as b on a.OrderID = b.OrderID
group by a.OrderID, c.CustomerID, c.CompanyName
Note: For more examples of  SQL queries visit the post "Write Select Queries for Excel Spreadsheets"
 

The Spreadsheet comparison definition page looks as follows:

Data comparison project - comparing Excel spreadsheets

The comparison project uses Northwind1 and Northwind2 connections and defines the OrderID column as the comparison key.

 And here's a screenshot of the data comparison of two Excel sheets:

Excel spreadsheet comparison results

 

The icons on the top of the results grid allow you to select the visibility of data, so you can focus on the differences of a specific kind. By default, rows with no differences between the source and target are invisible. You can also quickly check the number of rows belonging to each different category under the filter icon.

You can also export the data to Excel for further analysis - visit the post Create Exel Baseline For Your Data

For some examples of queries against Excel spreadsheets, please read the post-WriteWrite Select Queries for Excel Spreadsheets

 

I see that you read the post until this point. Why don't you have a look at different editions of SelectCompare to find the one right for you?

 

Leave a comment

Please note, comments must be approved before they are published