In the previous article I covered the connection string parameters for Excel data sources. We will now focus on the text files. There are different methods for describing text files’ schema and using the information during opening or linking in Access. While Excel spreadsheets had some resemblance of structure, this is not true for the text files. We have to answer several questions about the structure of a text file including:
- Is it delimited or fixed width?
- How do we tell when one column ends and other starts?
- Are text quoted or not?
- How should we parse dates and times?
- What about the currency amounts? How should they be formatted?
and possibly more. While CSV might seems to be well-defined at the glance but when you dig at it, it’s actually very loosely defined. There is no universal agreement on whether text ought to be quoted, how the dates should be formatted. For all those reasons, using text files usually require use of some kind of schema information to describe the structure of the text file. There are three ways to store schema information:
schema.inifile stored in a directory
To complicate the matters, there are several different methods we can use to work with text files but not all methods can use all the 3 different way of getting the schema information. For example,
DoCmd.TransferText works with the system tables but not saved imports/exports. On the other hand,
DoCmd.RunSavedImportExport works with the
ImportExportSpecification object. However, the
ImportExportSpecification is not used as part of the linking. So for our discussion, we actually have only 2 methods available in the context of opening or linking to a text file. It’s important to note the distinction between saving a specification to the
MSysIMEXColumns tables vs. saving an import/export which becomes a
ImportExportSpecification object. We will explore those 2 methods in next articles.
Connection string for text file
We should consider how Access will perceive a text file. In the previous article, we saw that each sheet or a named range was represented as a “table” in an Excel spreadsheet “database”. But a text file has no such construct. What makes a “database” then? The answer is that the folder represents a “database” and therefore any text files within a folder are “table”s. For that reason, it is possible to have multiple schema information for same folder if that folder contains more than one possible format for any text files stored within the folder. You will see later that when we construct the connection string, we link to the folder, then access the individual file as a table.
Therefore, using this setup as shown:
We can then open a text file using this VBA code:
Dim db As DAO.Database Set db = DBEngine.OpenDatabase(vbNullString, False, False, "Text;DATABASE=C:\Links") Dim tdf As DAO.TableDef For Each tdf In db.TableDefs Debug.Print tdf.Name Dim fld As DAO.Field For Each fld In tdf.Fields Debug.Print fld.Name, Next Debug.Print Dim rs As DAO.Recordset Set rs = tdf.OpenRecordset Do Until rs.EOF For Each fld In rs.Fields Debug.Print fld.Value, Next Debug.Print rs.MoveNext Loop Debug.Print Next
This should yield the output:
Products#csv Products Count PackDate Amount ShipDate Apples 3 12/4/2020 $ 1.02 4/12/2020 Bananas 5 4/12/2020 $ 1,234.56 12/4/2020 Figs 8 5/7/2020 $ 0.01 7/5/2020 Grapes 11 10/10/2020 $12.30 10/10/2020
Take note of the following things:
- We did not specify the text file in our connection string. We used folder instead.
- The names of “tables” were altered because a dot in the name is not a valid character. Ergo,
- Compared to Excel, there are no mandatory parameters beyond specifying the text file driver and the path to the folder.
In the next article, you will learn more about the describing the schema of a text files. However, for the connection string itself, the following keywords are recognized.
FMT parameter: Indicate the format of the text file.
Delimited: The file is delimited by a character. The character used is specified by the schema information.
Fixed: The file has fixed width for column(s). Again, the specific column width(s) are specified by the schema information.
HDR parameter: Header row
YES: The first row is the header and should become the column names for the “table”/”recordset”
NO: The first row is not treated differently and is just a data. All column names will be named “FN” where “N” is a number starting with 1
IMEX parameter: Import/Export Behavior
This governs how the column data types should be defined, based on the content:
1: If the column contain different data types, treat it as a string. Otherwise, match the column to the best data type.
2: Always match the column to a certain data type based on the sample. That may cause an error in reading when we read a row that contains data that does not match the expected data type.
ACCDB parameter: Indicates that Access is using ACCDB file format?
By default, this is always set ACCDB=YES in an accdb file format. However, omitting it or setting it to NO seems to do nothing. It’s a bit of a mystery. If anyone can share what effect this parameter, post in comment and I’ll update the blog.
DATABASE parameter: Path to the folder that contains the text files
The parameter should contain a fully qualified path. It should not include the text files’ names.
CharacterSet Parameter: Identifies the character encoding to use for reading the text files.
This will be discussed in more details in next article. This also can be described in the schema information.
DSN Parameter: Identifies the schema information to use with the text file.
The name must correspond to the
MSysIMEXSpec, which will be analyzed in later article. This only works with the
MSysIMEX*** tables. If you wish to use
schema.ini, you simply just don’t include any
DSN in your connection strings.
It’s important to note that text file driver will only consider the parameters listed above. It is not possible to put in other keywords and have it be parsed by the text file driver. For that reason, you will not be able to specify all the details about a text file from the connection string alone.
Default schema for text files
In theory, you can open or link a text file without any schema information but this will rarely work. In this situation, Access will simply assume the defaults for various options. If the text file conforms to all the current defaults then Access will succeed in reading the file. More importantly, absence of errors in opening or linking to the text file does not mean the data are represented meaningfully. For example, specially formatted currency amounts might be interpreted as text rather than currency, and non-delimited text with commas within the text might be mis-parsed, adding unwanted columns. The defaults are determined in two possible places:
- Access will look at the registry settings. For Office 365 installation, the registry can be located at:
Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\ClickToRun\REGISTRY\MACHINE\Software\Microsoft\Office\16.0\Access Connectivity Engine\Engines\Text. We will refer to this location as “Text registry keys” in next articles.
- The settings you would see in the
Regionapplet in Windows’ control panel. We will refer to this location as “Windows settings”.
Note on encoding for the text files
It is mandatory to have the correct encoding regardless of which methods you may use to access your text files. When your text file’s content is restricted to only characters within lower half of the ASCII points (e.g. 0-127), it hardly matters what encoding you choose for your text files. The default’s usually good enough. However, if your text files can contain any Unicode or any characters greater than 127, then it’s on you, the developer, to know the encoding. If wrong encoding is specified, the text may not import as expected and won’t throw any errors. For messy details, I’ll refer you to Joel Spolsky on this matter.
As you will see in the next articles, both methods has a fairly large overlap in the capabilities. Therefore, you may find yourself having a choice of using either. The main difference boils down to whether you want the schema to be stored within your application or within a folder where text files are expected to be. When you use the
schema.ini file, you are assuming that the text files will be present in a certain folder and will have a certain name.
MSysIMEX***, you can process any text files from anywhere simply by referencing the defined specification. However, it is not easy to edit the specification outside Access. Even within Access, it’s not easy to tweak specifications using UI. The
schema.ini has some additional features that are not directly available with
Nonetheless, the question of where to store the specification will most likely be your most important factor in deciding which to use.
You are strongly encouraged to have schema information defined for any text files that contains dates or currency amounts. The dates and currency amounts are sensitive to regional setting which can interfere with correct parsing of the data. Because we have two different systems with different available set of options, we must consider each in the next articles. You have the choice to use either (or even both among different text files). We will now turn to the
schema.ini in the next article. We will later look at
MSysIMEX*** tables in the following article.