Filtering the recordset
In part 5 of our series, we will learn how Microsoft Access handles implemented filters and integrates them into ODBC queries. In the prior article, we saw how Access will formulate the SELECT
statements in the ODBC SQL commands. We also saw in the previous article how Access will try to update a row by applying a WHERE
clause based on the key and if applicable, rowversion. However, we need to learn how Access will handle the filters that are provided to the Access queries and translate them in the ODBC layer. There are different approaches Access may use depending on how the Access queries is formulated and you will learn how to predict how Access will translate an Access query into an ODBC query for different filter predicate given.
Regardless of how you actually apply filter — be it interactively via form or datasheet’s ribbon commands or right-menu clicks, or programmatically using VBA or running saved queries — Access will issue a corresponding ODBC SQL query to perform the filtering. In general, Access will try to remote as much as filtering as is possible. However, it will not tell you if it cannot do so. Instead, if Access cannot express the filter using ODBC SQL syntax, it will instead attempt to perform filtering itself by downloading the entire contents of table and evaluate the condition locally. That can explain why sometime you may encounter a query that runs quickly but with one small change, slows down to a crawl. This section hopefully will help you understand when this may happen and how to handle it so that you can help Access remote as much as possible to the data sources for applying the filter.
For this article, we will use saved queries but the information discussed here should still apply to other methods of applying filters.
Static filters
We will start off easy and create a saved query with a filter hard-coded.
SELECT c.CityID ,c.CityName ,c.StateProvinceID FROM Cities AS c WHERE c.CityName="Boston";
SQLExecDirect: SELECT "c"."CityID" FROM "Application"."Cities" "c" WHERE ("CityName" = 'Boston' )
CityID
was selected because by default a query uses a dynaset-type recordset which we discussed already in the prior section.
Simple parameterized filters
Let’s change the SQL to use a parameter instead:
PARAMETERS SelectedCityName Text ( 255 ); SELECT c.CityID ,c.CityName ,c.StateProvinceID FROM Cities AS c WHERE c.CityName=[SelectedCityName];
SQLExecDirect: SELECT "c"."CityID" FROM "Application"."Cities" "c" WHERE ("CityName" = ? )
SELECT c.CityID ,c.CityName ,c.StateProvinceID FROM Cities AS c WHERE c.CityName=[Forms]![frmSomeForm]![txtSomeText];
PARAMETERS
statement. This shows that Access is able to recognize that such control references which can have its value changed from time to time, are best treated as a parameter when formulating the ODBC SQL.
That also works for the VBA function. We can add a new VBA function:
Public Function GetSelectedCity() As String GetSelectedCity = "Boston" End Function
WHERE c.CityName=GetSelectedCity();
However, there is one more common scenario that Access developers typically set up and that is creating dynamic SQL with VBA code, usually by concatenating a string and then executing the concatenated string. Let’s use the following VBA code:
Public Sub GetSelectedCities() Dim db As DAO.Database Dim rs As DAO.Recordset Dim fld As DAO.Field Dim SelectedCity As String Dim SQLStatement As String SelectedCity = InputBox("Enter a city name") SQLStatement = _ "SELECT c.CityID, c.CityName, c.StateProvinceID " & _ "FROM Cities AS c " & _ "WHERE c.CityName = '" & SelectedCity & "';" Set db = CurrentDb Set rs = db.OpenRecordset(SQLStatement) Do Until rs.EOF For Each fld In rs.Fields Debug.Print fld.Value; Next Debug.Print rs.MoveNext Loop End Sub
OpenRecordset
is as follow:
SQLExecDirect: SELECT "c"."CityID" FROM "Application"."Cities" "c" WHERE ("CityName" = 'Boston' )
VBA.InputBox
. We simply handed it the constructed SQL which from Access’ POV, is just a static SQL statement. In this case, we defeat the parameterization of the query. It’s important to recognize that a popular advice given out to Access developers has been that dynamically constructed SQL is better than using parameter queries because it avoids the problem where Access engine may generate an execution plan based on one parameter value which may be actually suboptimal for another parameter value. For more details about that phenomenon, I encourage you to read up on the problem of “parameter-sniffing”. Note that this is a general problem for any database engines, not just Access. However, in Access’ case, dynamic SQL worked better because it is much cheaper just to generate a new execution plan. In contrast, a RDBMS engine may have additional strategies for handling the problem and may be more sensitive to having too many one-off execution plans as that can negatively impact its caching.
For that reason, parameterized queries from Access against ODBC sources may be preferable over dynamic SQL. Because Access will treat the references controls on a form or VBA functions that does not require column references as parameters, you do not need explicit parameters in your recordsources or rowsources. However, if you are using VBA to execute SQL, it’s usually better to use ADO which also has much better support for parameterization. In the case of building a dynamic recordsource or rowsource, using a hidden control on the form/report can be an easy way to parameterize the query. However, if the query is markedly different, building the dynamic SQL in VBA and assigning it to the recordsource/rowsource property effectively forces a full recompilation and therefore avoid using bad execution plans that will not perform well for the current set of inputs. You may find recommendations in the article discussing SQL Server’s WITH RECOMPILE
helpful in deciding whether to force a recompilation versus using a parameterized query.
Using functions in SQL filtering
In previous section, we saw that a SQL statement containing a VBA function got parameterized so that Access could execute the VBA function and use the output as the input to the parameterized query. However, not all built-in functions behave this way. Let’s use UCase()
as an example to filter the query. Furthermore, we will apply the function on a column.
SELECT c.CityID ,c.CityName ,c.StateProvinceID FROM Cities AS c WHERE UCase([c].[CityName])="BOSTON";
UCase()
function applied on a column.SQLExecDirect: SELECT "c"."CityID" FROM "Application"."Cities" "c" WHERE ({fn ucase("CityName" )}= 'BOSTON' )
UCase()
function.GetSelectedCity()
since it required no inputs from columns referenced in the query. However, the UCase()
requires an input. Had we provided UCase("Boston")
, Access would have parameterized this away, too. However, the input is a column reference, which Access cannot easily parameterize away. However, Access can detect that the UCase()
is one of supported ODBC scalar functions. Since we prefer remoting as much as possible to the data source, Access does just that by invoking ODBC’s version of ucase
.
If we then create a custom VBA function that emulates UCase()
function:
Public Function MyUCase(InputValue As Variant) As String MyUCase = UCase(InputValue) End Function
WHERE MyUCase([c].[CityName])="BOSTON";
WHERE
clause invoking a custom VBA function.SQLExecDirect: SELECT "CityName" ,"c"."CityID" FROM "Application"."Cities" "c"
MyUCase
back to the data source. However, the saved query’s SQL is legal so Access has to satisfy it somehow. To do this, it ends up downloading the full set of the CityName
and its corresponding CityID
in order to pass into the VBA function MyUCase()
and evaluate the result. Consequently the query now performs much more slowly because Access is now requesting more data and doing more work.
Though we used UCase()
in this example, we clearly can see that it’s generally better to remote as much work as possible to the data source. But what if we have a complex VBA function that cannot be rewritten into the data source’s native SQL dialect? Though I think this scenario is quite rare, it’s worth considering. Let’s suppose that we can add a filter to narrow the set of cities returned.
SELECT c.CityID ,c.CityName ,c.StateProvinceID FROM Cities AS c WHERE c.CityName LIKE "Bos*" AND MyUCase([c].[CityName])="BOSTON";
SQLExecDirect: SELECT "CityName" ,"c"."CityID" FROM "Application"."Cities" "c" WHERE ("CityName" LIKE 'Bos%' )
LIKE
back to the data source, which results in getting back much smaller dataset. It will still perform local evaluation of the MyUCase()
on the resulting dataset. The query runs much faster simply because of the smaller dataset returned.
This tells us if we face the undesirable scenario where we cannot easily refactor out a complex VBA function from a query, we still can mitigate the bad effects by adding filters that can be remoted to reduce the initial set of records for Access to work with.
A note on sargability
In the preceding examples, we applied a scalar function on a column. That has the potential to render the query as “non-sargable” which means that the database engine is unable to optimize the query using index to search and find matches. The “sarg” portion of the word “sargability” refers to “Search ARGument”. Suppose we have the index defined at the data source on the table:
CREATE INDEX IX_Cities_CityName ON Application.Cities (CityName);
CREATE INDEX
definition for the Cities table in the SQL Server database.UCASE(CityName)
prevents the database engine from being able to use the index IX_Cities_CityName
because the engine is forced to evaluate each row one by one to find match, just like Access did with a custom VBA function. Some database engines such as recent versions of SQL Server supports creating indices based on an expression. If we wanted to optimize the queries using UCASE()
transact-SQL function, we could adjust the index definition:
CREATE INDEX IX_Cities_Boston_Uppercase ON Application.Cities (CityName) WHERE UCASE(CityName) = 'BOSTON';
CREATE INDEX
definition for the Cities table in the SQL Server database with an expression.WHERE UCase(CityName) = 'BOSTON'
as a sargable query because it now can use the index IX_Cities_Boston_Uppercase
to return the matching records. However, if the query matched on 'CLEVELAND'
instead of 'BOSTON'
, the sargability is lost.
Regardless of which database engine you are actually working with, it’s always preferable to design and use sargable queries wherever possible to avoid performance problems. Crucial queries ought to have covering indices to provide best performance. I encourage you to study more about the sargability and covering indices to help you avoid designing queries that are in fact non-sargable.
Conclusions
We reviewed how Access handles applying filters from Access SQL into the ODBC queries. We also explored different cases where Access will convert different types of references into a parameter, allowing Access to perform the evaluation outside the ODBC layer and passing them as inputs into the prepared ODBC statement. We also looked at what happens when it cannot be parameterized, typically due to containing column references as inputs. That can have consequences on the performance during a migration to SQL server.
For certain functions, Access may be able to convert the expression to use ODBC scalar functions instead, which allows Access to remote the expression to the ODBC data source. One ramification of this is that if the implementation of the scalar function is different, it may cause the query to behave differently or may perform more faster/slowly. We saw how a VBA function, even a simple one that wraps a otherwise remotable scalar function can defeat the efforts to remote the expression. We also learn that if we have a situation where we cannot refactor out a complex VBA function from an Access query/recordsource/rowsource, we can at least mitigate the expensive download by adding additional filters on the query that can be remoted to reduce the amount of data returned.
In the next article we will look at how joins are handled by Access.
Looking for help with Microsoft Access? Call our experts today on 773-809-5456 or email us at sales@itimpact.com.