SelectCompare allows for data comparison of Excel spreadsheets, as well as other data sources with simple, unified interface.
You may be initially anxious about writing an SQL statement to pull data from your Excel, but it is actually pretty easy!
This post will show you a few examples of writing queries against your spreadsheet that you will be able to use right away for your Excel data comparison.
If you want to find out how to connect to an Excel workbook, please read another post on this blog - Compare Excel Spreadsheets
You will see below how to query Excel spreadsheets, specify ranges of columns and rows, and write simple filters that make SQL so powerful.
In a general sense, the data that is stored in the Excel spreadsheet can be represented in two ways - it either has a header with column names, or (more rarely) it doesn't.
Excel spreadsheet with headers
Let's assume we have a spreadsheet with headers in the first row as on the following picture.
The simplest query you can issue is
Select * from [Customers$]
The query just selects all columns and all rows from the spreadsheet. Excel automatically returns only populated rows and columns.
If you need to limit the amount of rows you select from the spreadsheet, use TOP clause in the select statement:
Select top 3 * from [Customers$]
If you are interested only in specific columns, you can select only those that you that you need in the following way:
select CustomerID, CompanyName, ContactName, ContactTitle, Country
order by ContactName
What if my data doesn't begin in the first row?
It often happens that your spreadsheet has a title and other information at the top. The actual data begin below the title:
In such case you can specify the range of data you want to select from:
select * from [Customers$A5:K92] order by CompanyName desc
Or just simply
select * from [Customers$A5:K] order by CompanyName desc
Filtering data in the spreadsheet
You can select only data that you need for comparison by adding a WHERE clause to the query:
select CustomerID, CompanyName, ContactName from [Customers$]
where ContactTitle = 'Owner'
order by ContactName
Excel spreadsheet without headers
Excel, by default, assumes that column names of your table are stored in the first row of the table you select from. In some cases you may come across spreadsheets that do not have header information, like on the following illustration:
In this case, if you select data from the spreadsheet, first row will be treated as column names, which is not what you want.
You have to modify connection string for your Excel Workbook. Just add parameter "HDR=NO" to the connection string as displayed below:
Now, when you select from the spreadsheet, you will notice that Excel provided automatic names for the columns:
Usually you want to have meaningful names for your columns. You can easily apply aliases to your column names, like on the following illustration.
select F1 as CustomerID, F2 as CompanyName, F3 as CustomerName, F4 as CustomerTitle
where F4 ='Owner'
These are only a few simple examples of queries you can use to select data for your comparison of spreadsheets. You can write more complicated queries, like joining multiple spreadsheets or calling Excel functions in the query to perform certain operations on the attributes, but this is a topic for another occasion.
Check also other articles describing various aspects of usage of SelectCompare.
I described two interesting use cases that use the Excel data templates to facilitate data comparison of data produced by two distinct business processes.
There are also other articles that help to understand how to use SelectCompare.Some of them are listed below: