Connection String Parameters for Excel Data Sources

 

In the previous article, I discussed how we can treat Excel and text files as if they were a database using DAO, and how we can open them without linking. Because they do not use ODBC drivers, their connection string will be formatted quite differently from what you might be accustomed to seeing for an ODBC connection string. There’s a dearth of documentation on the Excel connection string parameters. This is a best-effort to cover some of the gaps and discuss the ramifications of parameters.

Excel connection string parameters

Even though we have 3 different data source “types”:

  1. Excel 8.0: 97-2003 xls files
  2. Excel 12.0: xlsb files
  3. Excel 12.0 Xml: xlsx files

They all use the same parameters.

Here are the list of parameters:

HDR parameter: Header row

YES: The first row is the header and should become the column names for the “table”/”recordset”
NO: The first row is not treated differently and is just a data. All column names will be named “FN” where “N” is a number starting with 1

IMEX parameter: Import/Export Behavior

This governs how the column data types should be defined, based on the content:
1: If the column contain different data types, treat it as a string. Otherwise, match the column to the best data type.
2: Always match the column to a certain data type based on the sample. That may cause an error in reading when we read a row that contains data that does not match the expected data type.

ACCDB parameter: Indicates that Access is using ACCDB file format?

By default, this is always set ACCDB=YES in an accdb file format. However, omitting it or setting it to NO seems to do nothing. It’s a bit of a mystery. If anyone can share what effect this parameter, post in comment and I’ll update the blog.

DATABASE: Path to the Excel workbook

The parameter should contain a fully qualified path, including the workbook’s name.

Minimum working connection string

Note that the DATABASE is the only mandatory parameter in addition to the data type source keyword. Therefore a minimum working connection string can be:

Excel 8.0;DATABASE=C:\Links\Products.xls

Specifying sheet or range in connection string

In the previous sample, you saw that a sheet represented a “DAO.TableDef“. However, worksheets aren’t the only thing that can be a “Tabledef“. If the Excel spreadsheet contains a named range, the named range will be reported as a “Tabledef” as well. Additionally, we can “query” an arbitrary block in the sheet using cell address. For example:

Dim db As DAO.Database
Dim rs As DAO.Recordset

Set db = DBEngine.OpenDatabase(vbNullString, False, False, "Excel 12.0 Xml;HDR=YES;IMEX=2;ACCDB=YES;DATABASE=C:\Links\Products.xlsx")
Set rs = db.OpenRecordsset("Sheet$1A1:A3")

Debug.Print rs.Name, rs.Fields.Count

It’s important to note that the cell addresses cannot exceed the sheet’s used range. For example, the Products.xlsx only actually has contents in A1:B3, that means if you open a recordset using Sheet1$A1:D5, you still get only 2 for fields count and 3 for record count. The extra blank columns/rows are simply ignored. On the flip side, if you dirtied a cell somewhere outside the A1:B3, the sheet’s UsedRange will be now as bigger and querying would then include blank columns and rows.

Therefore those are valid names to use in a query on a Excel “database”:

  1. Sheet1$ – Entire used range of a worksheet.
  2. Sheet1$A1:B4 – Only 2 columns and 3 rows (not counting header), providing that the contents are filled. Otherwise columns or rows may be less than requested.
  3. ProductsRange – the named range with that name.

I find it much nicer to use named ranges where practical as this ensure that you are not hard-coding the addresses in your code especially if the range gets moved around due to user inserting new columns or rows but not altering the contents of the named range. However it is not always practical, especially if you are receiving spreadsheets from a 3rd party and therefore have no control over its contents or formats. In this case, writing a SQL query can work, too.

Querying Excel data source

Suppose we can’t control the format and we don’t want to rely on absolute address even though we are confident that certain columns and rows will be in fact present. In that situation, the best thing to do is to query. Here’s an example that select only one row:

  Dim db As DAO.Database
  Set db = DBEngine.OpenDatabase(vbNullString, False, False, "Excel 12.0 Xml;HDR=YES;IMEX=2;ACCDB=YES;DATABASE=C:\Links\Products.xlsx")

  Dim rs As DAO.Recordset
  Set rs = db.OpenRecordset("SELECT d.[Count] FROM [Sheet1$] AS d WHERE d.[Products] = 'Bananas';")
  Debug.Print rs.Fields(0).Value

Hopefully you can see this is much easier than iterating over each rows to find which one has “Bananas” and then reading the column to right to get the count. In this case, querying beats out automating Excel.

Conclusion

You’ve seen that DAO makes it very easy for us to work with Excel data source and pretend as if it was a relational data source and use our favorite querying language and familiar DAO objects instead of writing bunch of VBA code automating Excel to find the data we want. The connection string parameters are fairly straightforward and as long you have the path, you’re good for linking or opening an Excel spreadsheet.

In the next article we will look at text file connection parameters.