Write Select Statements for Excel Spreadsheets
SelectCompare allows for data comparison of Excel spreadsheets, as well as other data sources with a simple, unified interface.
You may be initially anxious about writing an SQL statement to pull data from your Excel, but it is pretty easy!
This post will show you a few examples of writing queries against your spreadsheet that you will be able to use right away for your Excel data comparison.
If you want to find out how to connect to an Excel workbook, please read another post on this blog - Compare Excel Spreadsheets. You may also want to have a look at the blog post DSN-less ODBC Data Sources which shows a few cool connection strings to other data technologies, allowing you to compare data between Excel and CSV for example.
You will see below how to query Excel spreadsheets, specify ranges of columns and rows, and write simple filters that make SQL so powerful.
In a general sense, the data that is stored in the Excel spreadsheet can be represented in two ways - it either has a header with column names, or (more rarely) it doesn't.
Excel spreadsheet with headers
Let's assume we have a spreadsheet with headers in the first row as in the following picture.
The simplest query you can issue is
Select * from [Customers$]
The query just selects all columns and all rows from the spreadsheet. Excel automatically returns only populated rows and columns.
If you need to limit the number of rows you select from the spreadsheet, use the TOP clause in the select statement:
Select top 3 * from [Customers$]
If you are interested only in specific columns, you can select only those that you need in the following way:
select CustomerID, CompanyName, ContactName, ContactTitle, Country
from [Customers$]
order by ContactName
What if my data doesn't begin in the first row?
It often happens that your spreadsheet has a title and other information at the top. The actual data begin below the title:
In such case you can specify the range of data you want to select from:
select * from [Customers$A5:K92] order by CompanyName desc
Or just simply
select * from [Customers$A5:K] order by CompanyName desc
Filtering data in the spreadsheet
You can select only data that you need for comparison by adding a WHERE clause to the query:
select CustomerID, CompanyName, ContactName from [Customers$]
where ContactTitle = 'Owner'
order by ContactName
Excel spreadsheet without headers
Excel, by default, assumes that the column names of your table are stored in the first row of the table you select from. In some cases you may come across spreadsheets that do not have header information, like in the following illustration:
In this case, if you select data from the spreadsheet, the first row will be treated as column names, which is not what you want.
You have to modify the connection string for your Excel Workbook. Just add the parameter "HDR=NO" to the connection string as displayed below:
Now, when you select from the spreadsheet, you will notice that Excel provided automatic names for the columns:
Usually, you want to have meaningful names for your columns. You can easily apply aliases to your column names, like in the following illustration.
select F1 as CustomerID, F2 as CompanyName, F3 as CustomerName, F4 as CustomerTitle
from [CustomersNoHDR$A1:K]
where F4 ='Owner'
These are only a few simple examples of queries you can use to select data for your comparison of spreadsheets. You can write more complicated queries, like joining multiple spreadsheets or calling Excel functions in the query to perform certain operations on the attributes, but this is a topic for another occasion.
Other reading
Check also other articles describing various aspects of the usage of SelectCompare.
I described two interesting use cases that use Excel data templates to facilitate data comparison of data produced by two distinct business processes.
Other articles help to understand how to use SelectCompare.Some of them are listed below:
Leave a comment
Please note, comments must be approved before they are published