Filtering the recordset

In part 5 of our series, we will learn how Microsoft Access handles implemented filters and integrates them into ODBC queries. In the prior article, we saw how Access will formulate the SELECT statements in the ODBC SQL commands. We also saw in the previous article how Access will try to update a row by applying a WHERE clause based on the key and if applicable, rowversion. However, we need to learn how Access will handle the filters that are provided to the Access queries and translate them in the ODBC layer. There are different approaches Access may use depending on how the Access queries is formulated and you will learn how to predict how Access will translate an Access query into an ODBC query for different filter predicate given.

Regardless of how you actually apply filter — be it interactively via form or datasheet’s ribbon commands or right-menu clicks, or programmatically using VBA or running saved queries — Access will issue a corresponding ODBC SQL query to perform the filtering. In general, Access will try to remote as much as filtering as is possible. However, it will not tell you if it cannot do so. Instead, if Access cannot express the filter using ODBC SQL syntax, it will instead attempt to perform filtering itself by downloading the entire contents of table and evaluate the condition locally. That can explain why sometime you may encounter a query that runs quickly but with one small change, slows down to a crawl. This section hopefully will help you understand when this may happen and how to handle it so that you can help Access remote as much as possible to the data sources for applying the filter.

For this article, we will use saved queries but the information discussed here should still apply to other methods of applying filters.

Static filters

We will start off easy and create a saved query with a filter hard-coded.

SELECT 
   c.CityID
  ,c.CityName
  ,c.StateProvinceID
FROM Cities AS c
WHERE c.CityName="Boston";
SQL definition for an Access saved query with a static filter.
If we open the query, we will see this ODBC SQL in the trace:

SQLExecDirect: 
SELECT "c"."CityID" 
FROM "Application"."Cities" "c" 
WHERE ("CityName" = 'Boston' ) 
Traced ODBC SQL for the Access query with a static filter.
Aside from the changes in the syntax, the semantics of the query has not changed; the same filter is passed in as-is. Note that only the CityID was selected because by default a query uses a dynaset-type recordset which we discussed already in the prior section.

Simple parameterized filters

Let’s change the SQL to use a parameter instead:

PARAMETERS SelectedCityName Text ( 255 );
SELECT 
  c.CityID
 ,c.CityName
 ,c.StateProvinceID
FROM Cities AS c
WHERE c.CityName=[SelectedCityName];
An Access saved query using a parameter for the filter.
If we run the query and input “Boston” in the parameter prompt value as shown, we should see the following ODBC trace SQL:
How does Access talk to ODBC data sources? Part 5
Entering “Boston” as the input to the parameter to the Access saved query.
SQLExecDirect: 
SELECT "c"."CityID" 
FROM "Application"."Cities" "c" 
WHERE ("CityName" =  ? ) 
Traced ODBC SQL for the parameterized Access saved query.
Note that we will observe the same behavior with control references or subform linking. If we used this instead:

SELECT 
   c.CityID
  ,c.CityName
  ,c.StateProvinceID
FROM Cities AS c
WHERE c.CityName=[Forms]![frmSomeForm]![txtSomeText];
An Access saved query with a control reference
We would still get the same traced ODBC SQL we saw with the original parameterized query. That is still the case even though our modified query did not have an PARAMETERS statement. This shows that Access is able to recognize that such control references which can have its value changed from time to time, are best treated as a parameter when formulating the ODBC SQL.

That also works for the VBA function. We can add a new VBA function:

Public Function GetSelectedCity() As String
    GetSelectedCity = "Boston"
End Function
A VBA function that returns “Boston” as the selected city.
We adjust the saved query to use the new VBA function:

WHERE c.CityName=GetSelectedCity();
Modified WHERE clause of the saved Access query invoking the VBA function.
If you trace this, you will see that it’s still the same. Thus, we’ve demonstrated that regardless of whether the input is an explicit parameter, a reference to a control, or a result of a VBA function, Access will treat them all as a parameter of the ODBC SQL query that it will execute on our behalf. That is a good thing because we get better performance generally when we can re-use a query and simply change the parameter.

However, there is one more common scenario that Access developers typically set up and that is creating dynamic SQL with VBA code, usually by concatenating a string and then executing the concatenated string. Let’s use the following VBA code:

Public Sub GetSelectedCities()
    Dim db As DAO.Database
    Dim rs As DAO.Recordset
    Dim fld As DAO.Field
    
    Dim SelectedCity As String
    Dim SQLStatement As String
    
    SelectedCity = InputBox("Enter a city name")
    SQLStatement = _
        "SELECT c.CityID, c.CityName, c.StateProvinceID " & _
        "FROM Cities AS c " & _
        "WHERE c.CityName = '" & SelectedCity & "';"
    
    Set db = CurrentDb
    Set rs = db.OpenRecordset(SQLStatement)
    Do Until rs.EOF
        For Each fld In rs.Fields
            Debug.Print fld.Value;
        Next
        Debug.Print
        rs.MoveNext
    Loop
End Sub
VBA code to open a recordset based on a dynamically-constructed SQL statement.
The traced ODBC SQL for the OpenRecordset is as follow:

SQLExecDirect: 
SELECT "c"."CityID" 
FROM "Application"."Cities" "c" 
WHERE ("CityName" = 'Boston' ) 
Traced ODBC SQL from the dynamically-constructed SQL generated in VBA.
Unlike previous examples, the ODBC SQL was not parameterized. Access has no way of knowing that the ‘Boston’ was dynamically populated at runtime by a VBA.InputBox. We simply handed it the constructed SQL which from Access’ POV, is just a static SQL statement. In this case, we defeat the parameterization of the query. It’s important to recognize that a popular advice given out to Access developers has been that dynamically constructed SQL is better than using parameter queries because it avoids the problem where Access engine may generate an execution plan based on one parameter value which may be actually suboptimal for another parameter value. For more details about that phenomenon, I encourage you to read up on the problem of “parameter-sniffing”. Note that this is a general problem for any database engines, not just Access. However, in Access’ case, dynamic SQL worked better because it is much cheaper just to generate a new execution plan. In contrast, a RDBMS engine may have additional strategies for handling the problem and may be more sensitive to having too many one-off execution plans as that can negatively impact its caching.

For that reason, parameterized queries from Access against ODBC sources may be preferable over dynamic SQL. Because Access will treat the references controls on a form or VBA functions that does not require column references as parameters, you do not need explicit parameters in your recordsources or rowsources. However, if you are using VBA to execute SQL, it’s usually better to use ADO which also has much better support for parameterization. In the case of building a dynamic recordsource or rowsource, using a hidden control on the form/report can be an easy way to parameterize the query. However, if the query is markedly different, building the dynamic SQL in VBA and assigning it to the recordsource/rowsource property effectively forces a full recompilation and therefore avoid using bad execution plans that will not perform well for the current set of inputs. You may find recommendations in the article discussing SQL Server’s WITH RECOMPILE helpful in deciding whether to force a recompilation versus using a parameterized query.

Using functions in SQL filtering

In previous section, we saw that a SQL statement containing a VBA function got parameterized so that Access could execute the VBA function and use the output as the input to the parameterized query. However, not all built-in functions behave this way. Let’s use UCase() as an example to filter the query. Furthermore, we will apply the function on a column.

SELECT 
   c.CityID
  ,c.CityName
  ,c.StateProvinceID
FROM Cities AS c
WHERE UCase([c].[CityName])="BOSTON";
An Access saved query with an UCase() function applied on a column.
If we look at the traced ODBC SQL, we will see this:

SQLExecDirect: 
SELECT "c"."CityID" 
FROM "Application"."Cities" "c" 
WHERE ({fn ucase("CityName" )}= 'BOSTON' )
Traced ODBC SQL for filtering with the UCase() function.
In previous example, Access was able to completely parameterize away the GetSelectedCity() since it required no inputs from columns referenced in the query. However, the UCase() requires an input. Had we provided UCase("Boston"), Access would have parameterized this away, too. However, the input is a column reference, which Access cannot easily parameterize away. However, Access can detect that the UCase() is one of supported ODBC scalar functions. Since we prefer remoting as much as possible to the data source, Access does just that by invoking ODBC’s version of ucase.

If we then create a custom VBA function that emulates UCase() function:

Public Function MyUCase(InputValue As Variant) As String
    MyUCase = UCase(InputValue)
End Function
A custom uppercasing VBA function.
and changed the filtering in the query to:

WHERE MyUCase([c].[CityName])="BOSTON";
Modified WHERE clause invoking a custom VBA function.
This is what we get:

SQLExecDirect: 
SELECT 
   "CityName"
  ,"c"."CityID" 
FROM "Application"."Cities" "c" 
Traced ODBC SQL for evaluating the query with the custom VBA function using a column as the input.
Access is unable to remote the custom VBA function MyUCase back to the data source. However, the saved query’s SQL is legal so Access has to satisfy it somehow. To do this, it ends up downloading the full set of the CityName and its corresponding CityID in order to pass into the VBA function MyUCase() and evaluate the result. Consequently the query now performs much more slowly because Access is now requesting more data and doing more work.

Though we used UCase() in this example, we clearly can see that it’s generally better to remote as much work as possible to the data source. But what if we have a complex VBA function that cannot be rewritten into the data source’s native SQL dialect? Though I think this scenario is quite rare, it’s worth considering. Let’s suppose that we can add a filter to narrow the set of cities returned.

SELECT 
   c.CityID
  ,c.CityName
  ,c.StateProvinceID
FROM Cities AS c
WHERE c.CityName LIKE "Bos*"
  AND MyUCase([c].[CityName])="BOSTON";
An Access saved query with compound filters.
The traced ODBC SQL will come out like thus:

SQLExecDirect: 
SELECT 
   "CityName"
  ,"c"."CityID" 
FROM "Application"."Cities" "c" 
WHERE ("CityName" LIKE 'Bos%' ) 
Traced ODBC SQL for the saved query with compound filter.
Access is able to remote the LIKE back to the data source, which results in getting back much smaller dataset. It will still perform local evaluation of the MyUCase() on the resulting dataset. The query runs much faster simply because of the smaller dataset returned.

This tells us if we face the undesirable scenario where we cannot easily refactor out a complex VBA function from a query, we still can mitigate the bad effects by adding filters that can be remoted to reduce the initial set of records for Access to work with.

A note on sargability

In the preceding examples, we applied a scalar function on a column. That has the potential to render the query as “non-sargable” which means that the database engine is unable to optimize the query using index to search and find matches. The “sarg” portion of the word “sargability” refers to “Search ARGument”. Suppose we have the index defined at the data source on the table:

CREATE INDEX IX_Cities_CityName
ON Application.Cities (CityName);
Transact-SQL CREATE INDEX definition for the Cities table in the SQL Server database.
Expressions such as UCASE(CityName) prevents the database engine from being able to use the index IX_Cities_CityName because the engine is forced to evaluate each row one by one to find match, just like Access did with a custom VBA function. Some database engines such as recent versions of SQL Server supports creating indices based on an expression. If we wanted to optimize the queries using UCASE() transact-SQL function, we could adjust the index definition:

CREATE INDEX IX_Cities_Boston_Uppercase
ON Application.Cities (CityName)
WHERE UCASE(CityName) = 'BOSTON';
Transact-SQL CREATE INDEX definition for the Cities table in the SQL Server database with an expression.
This enables SQL Server to treat the query with WHERE UCase(CityName) = 'BOSTON' as a sargable query because it now can use the index IX_Cities_Boston_Uppercase to return the matching records. However, if the query matched on 'CLEVELAND' instead of 'BOSTON', the sargability is lost.

Regardless of which database engine you are actually working with, it’s always preferable to design and use sargable queries wherever possible to avoid performance problems. Crucial queries ought to have covering indices to provide best performance. I encourage you to study more about the sargability and covering indices to help you avoid designing queries that are in fact non-sargable.

Conclusions

We reviewed how Access handles applying filters from Access SQL into the ODBC queries. We also explored different cases where Access will convert different types of references into a parameter, allowing Access to perform the evaluation outside the ODBC layer and passing them as inputs into the prepared ODBC statement. We also looked at what happens when it cannot be parameterized, typically due to containing column references as inputs. That can have consequences on the performance during a migration to SQL server.

For certain functions, Access may be able to convert the expression to use ODBC scalar functions instead, which allows Access to remote the expression to the ODBC data source. One ramification of this is that if the implementation of the scalar function is different, it may cause the query to behave differently or may perform more faster/slowly. We saw how a VBA function, even a simple one that wraps a otherwise remotable scalar function can defeat the efforts to remote the expression. We also learn that if we have a situation where we cannot refactor out a complex VBA function from an Access query/recordsource/rowsource, we can at least mitigate the expensive download by adding additional filters on the query that can be remoted to reduce the amount of data returned.

In the next article we will look at how joins are handled by Access.

Looking for help with Microsoft Access? Call our experts today on 773-809-5456 or email us at sales@itimpact.com.