Write Select Statements for Excel Spreadsheets

SelectCompare allows for data comparison of Excel spreadsheets, as well as other data sources with a simple, unified interface.

You may be initially anxious about writing an SQL statement to pull data from your Excel, but it is pretty easy!

This post will show you a few examples of writing queries against your spreadsheet that you will be able to use right away for your Excel data comparison.

If you want to find out how to connect to an Excel workbook, please read another post on this blog - Compare Excel Spreadsheets. You may also want to have a look at the blog post DSN-less ODBC Data Sources which shows a few cool connection strings to other data technologies, allowing you to compare data between Excel and CSV for example.

You will see below how to query Excel spreadsheets, specify ranges of columns and rows, and write simple filters that make SQL so powerful.

In a general sense, the data that is stored in the Excel spreadsheet can be represented in two ways - it either has a header with column names, or (more rarely) it doesn't.

Excel spreadsheet with headers

Let's assume we have a spreadsheet with headers in the first row as in the following picture.

Excel spreadsheet containing headers

The simplest query you can issue is 

Select * from [Customers$]

Select statement for Excel spreadsheet showing the data results

The query just selects all columns and all rows from the spreadsheet. Excel automatically returns only populated rows and columns.

If you need to limit the number of rows you select from the spreadsheet, use the TOP clause in the select statement:

Select top 3 * from [Customers$]

Select statement for top 3 rows from Excel spreadsheet

If you are interested only in specific columns, you can select only those that you need in the following way:

select CustomerID, CompanyName, ContactName, ContactTitle, Country
from [Customers$]
order by ContactName

Select statement with named columns from Excel spreadsheet

 

What if my data doesn't begin in the first row?

It often happens that your spreadsheet has a title and other information at the top. The actual data begin below the title:

Excel report with title starting in the second row

In such case you can specify the range of data you want to select from:

select * from [Customers$A5:K92] order by CompanyName desc

Or just simply

select * from [Customers$A5:K] order by CompanyName desc

 

Select statement for Excel to read from the range of the spreadsheet

Filtering data in the spreadsheet

You can select only data that you need for comparison by adding a WHERE clause to the query:

select CustomerID, CompanyName, ContactName from [Customers$]
where ContactTitle = 'Owner'
order by ContactName

 Select statement for Excel with WHERE filter clause

Excel spreadsheet without headers

Excel, by default, assumes that the column names of your table are stored in the first row of the table you select from. In some cases you may come across spreadsheets that do not have header information, like in the following illustration:

Excel spreadsheet containing data without header

In this case, if you select data from the spreadsheet, the first row will be treated as column names, which is not what you want.

You have to modify the connection string for your Excel Workbook. Just add the parameter "HDR=NO" to the connection string as displayed below:

Disable header interpretation in Excel connection string

Now, when you select from the spreadsheet, you will notice that Excel provided automatic names for the columns:

Automatic column names for SQL statement for Excel spreadsheet without headers

Usually, you want to have meaningful names for your columns. You can easily apply aliases to your column names, like in the following illustration.

select F1 as CustomerID, F2 as CompanyName, F3 as CustomerName, F4 as CustomerTitle
from [CustomersNoHDR$A1:K]
where F4 ='Owner'

 

Column aliases for SQL satatement for Excel spreadsheet with data without headers

 

These are only a few simple examples of queries you can use to select data for your comparison of spreadsheets. You can write more complicated queries, like joining multiple spreadsheets or calling Excel functions in the query to perform certain operations on the attributes, but this is a topic for another occasion.

Other reading

Check also other articles describing various aspects of the usage of SelectCompare.

I described two interesting use cases that use Excel data templates to facilitate data comparison of data produced by two distinct business processes.

Other articles help to understand how to use SelectCompare.Some of them are listed below:

Leave a comment

Please note, comments must be approved before they are published