Effect of joins in a recordset

In our sixth and final article of the ODBC tracing series, we are going to take a look at how Access will handle joins in Access queries. In the previous article, you saw how filters are handled by Access. Depending on the expression, Access may choose to parameterize it away or may be forced to evaluate it itself by downloading all the input data then performing the evaluations locally. In this article, we will focus on joins. When you think about it, joins are actually a special kind of filter. Therefore, in theory, Access ought to remote as much as possible even with joins. You might typically see joins written in the following pseudo-SQL:

FROM a INNER JOIN b ON a.ID = b.ID
A typical JOIN syntax.
However, it can be considered equivalent to the following syntax:

FROM a, b WHERE a.ID = b.ID
A theta-join syntax (old sytle, not broadly used).
That illustrates that even though we may use the more readable and familiar JOIN..ON, Access is free to treat it as a WHERE which is helpful in situations where Access cannot fully remote the query. But here’s the rub… when does Access decide to remote the joins? Let’s try a simple join query:

SELECT 
   c.CityID
  ,c.StateProvinceID
  ,c.CityName
  ,s.StateProvinceName
FROM Cities AS c 
INNER JOIN StateProvinces AS s 
  ON c.StateProvinceID = s.StateProvinceID;
A simple Access query with a join on 2 linked tables.
If we trace that query, we will see the following output:

SQLExecDirect: 
SELECT 
   "c"."CityID"
  ,"s"."StateProvinceID" 
FROM "Application"."Cities" "c",
     "Application"."StateProvinces" "s" 
WHERE ("c"."StateProvinceID" = "s"."StateProvinceID" ) 

SQLPrepare: 
SELECT 
  "CityID"
 ,"CityName"
 ,"StateProvinceID"  
FROM "Application"."Cities"  
WHERE "CityID" = ?

SQLExecute: (GOTO BOOKMARK)

SQLPrepare: 
SELECT 
   "StateProvinceID"
  ,"StateProvinceName"  
FROM "Application"."StateProvinces"  
WHERE "StateProvinceID" = ?

SQLExecute: (GOTO BOOKMARK)

SQLPrepare: 
SELECT 
   "StateProvinceID"
  ,"StateProvinceName"  
FROM "Application"."StateProvinces"  
WHERE "StateProvinceID" = ? 
   OR "StateProvinceID" = ? 
   OR "StateProvinceID" = ? 
   OR "StateProvinceID" = ? 
   OR "StateProvinceID" = ? 
   OR "StateProvinceID" = ? 
   OR "StateProvinceID" = ? 
   OR "StateProvinceID" = ? 
   OR "StateProvinceID" = ? 
   OR "StateProvinceID" = ?

SQLExecute: (MULTI-ROW FETCH)

SQLPrepare: 
SELECT 
   "CityID"
  ,"CityName"
  ,"StateProvinceID"  
FROM "Application"."Cities"  
WHERE "CityID" = ? 
   OR "CityID" = ?  
   OR "CityID" = ?  
   OR "CityID" = ?  
   OR "CityID" = ?  
   OR "CityID" = ?  
   OR "CityID" = ?  
   OR "CityID" = ?  
   OR "CityID" = ?  
   OR "CityID" = ?

SQLExecute: (MULTI-ROW FETCH)

SQLExecute: (MULTI-ROW FETCH)

SQLExecute: (MULTI-ROW FETCH)

SQLExecute: (MULTI-ROW FETCH)

SQLExecute: (MULTI-ROW FETCH)

SQLExecute: (MULTI-ROW FETCH)

SQLExecute: (MULTI-ROW FETCH)

SQLExecute: (MULTI-ROW FETCH)

SQLExecute: (MULTI-ROW FETCH)
Traced ODBC SQL syntax for the query with a join.
Access decided to not remote the join, even though the original Access query is perfectly capable of being executed at SQL Server. Instead, it got the IDs from each table in a theta-join, then set up 2 separate chains of queries as if we had opened 2 dynaset-type recordsets. The two different prepared queries are then fed the keys for the respective tables from the first query. Predictably, that can be a lot of chatter to go over the network.

If we change the same Access query to be a snapshot-type instead of the default dynaset-type, we get:

SQLExecDirect: 
SELECT 
   "c"."CityID"
  ,"c"."CityName"
  ,"c"."StateProvinceID"
  ,"s"."StateProvinceName"  
FROM "Application"."Cities" "c",
     "Application"."StateProvinces" "s" 
WHERE ("c"."StateProvinceID" = "s"."StateProvinceID" )
Traced ODBC SQL for the snapshot-type query.
So Access does remote the joins just fine in the case of snapshot-type query. Why did not Access do that with the original dynaset-type query? The clue is in the following screenshot where we attempt to edit both tables’ columns in the following screenshot:

How does Access talk to ODBC data sources? Part 6

Such query allows updating to both columns. That is not actually expressible in SQL but such action is legal for user to perform. Therefore, to execute that update, Access would submit the following ODBC SQL:

SQLExecDirect: 
UPDATE "Application"."StateProvinces" 
SET "StateProvinceName"=?  
WHERE "StateProvinceID" = ? 
  AND "StateProvinceName" = ?

SQLExecDirect: 
UPDATE "Application"."Cities" 
SET "CityName"=?  
WHERE "CityID" = ? 
  AND "CityName" = ? 
  AND "StateProvinceID" = ?
Traced ODBC SQL syntax for updating the Access query with a join.
That would not be possible if Access did not have the information required to update each table, which explains why Access chose to not remote the join when resolving the original dynaset-type query. The lesson here is that if you do not need a query to be updatable, and the resulting data is small enough, it might be better to convert the query into a snapshot type. In the case where you need to formulate a complex recordsource, you usually will get much better performance using a SQL view as the base than doing the joins on Access side.

To prove this, we will create a SQL view and link it to Access:

CREATE VIEW dbo.vwCitiesAndStates AS
SELECT 
  c.CityID
  ,c.StateProvinceID
  ,c.CityName
  ,s.StateProvinceName
FROM Application.Cities AS c 
INNER JOIN Application.StateProvinces AS s 
  ON c.StateProvinceID = s.StateProvinceID;
Transact-SQL to create a view joining the cities and the states table.
We then adjust the Access query as follows:

SELECT 
   c.CityID
  ,c.StateProvinceID
  ,c.CityName
  ,c.StateProvinceName
FROM vwCitiesAndStates AS c;
Adjusted Access query using a SQL view instead of joining linked tables.
If we then repeat the update we tried originally, we should see the following traced ODBC SQL:

SQLExecDirect: 
SELECT "c"."CityID" 
FROM "dbo"."vwCitiesAndStates" "c" 

SQLPrepare: 
SELECT 
   "CityID"
  ,"StateProvinceID"
  ,"CityName"
  ,"StateProvinceName"  
FROM "dbo"."vwCitiesAndStates"  
WHERE "CityID" = ?

SQLExecute: (GOTO BOOKMARK)

SQLPrepare: 
SELECT 
   "CityID"
  ,"StateProvinceID"
  ,"CityName"
  ,"StateProvinceName"  
FROM "dbo"."vwCitiesAndStates"  
WHERE "CityID" = ? 
  OR "CityID" = ? 
  OR "CityID" = ? 
  OR "CityID" = ? 
  OR "CityID" = ? 
  OR "CityID" = ? 
  OR "CityID" = ? 
  OR "CityID" = ? 
  OR "CityID" = ? 
  OR "CityID" = ?

SQLExecute: (MULTI-ROW FETCH)

SQLExecute: (MULTI-ROW FETCH)

SQLExecute: (MULTI-ROW FETCH)

SQLExecute: (MULTI-ROW FETCH)

SQLExecute: (MULTI-ROW FETCH)

SQLExecute: (GOTO BOOKMARK)

SQLExecDirect: 
UPDATE "dbo"."vwCitiesAndStates" 
SET "CityName"=?,
    "StateProvinceName"=?  
WHERE "CityID" = ?
  AND "StateProvinceID" = ?
  AND "CityName" = ? 
  AND "StateProvinceName" = ?
Traced ODBC SQL for selecting & updating on the linked SQL view
This demonstrates that using SQL views to “remote” the joins, Access will only work with a single source, rather than with 2 tables and remote the update on the view fully to SQL Server. One side effect is that this update will now fail with the error message:

How does Access talk to ODBC data sources? Part 6

The error message shown when updating columns from more than one table on a SQL view.
That shouldn’t come as a surprise since we were doing a UPDATE on a single source whereas in the original example, Access was actually secretly issuing two separate UPDATE statements on each individual table. Hopefully that helps make the case that you should avoid doing joins in Access queries/recordsources/rowsources especially when they need to be updatable. If they don’t, use snapshot where feasible.

A quick note regarding heterogeneous joins

We need to comment regarding joins between two linked tables that comes from two different ODBC data sources. Such joins are “heterogeneous” because Access must handle the joins locally since each data sources are assumed to not know of each other. Regardless of whether you specify a dynaset-type or snapshot-type recordsets, Access must fetch the full set of keys from each data source and resolve the joins by sending separate parameterized queries to each data source. If updating is allowed, Access will formulate a separate UPDATE query to each data source that needs to be updated. It is also important to note that a join between two linked tables that comes from two different databases each is still considered by Access as heterogeneous. That is still true even if the two databases are on the same server and you have no problem doing cross database queries. In this scenario, a SQL view can help cut down on the additional chatter by hiding the cross database joins from Access similar to what we saw already in this article.

Outer join syntax difference

As long as the outer joins do not affect the updatability of the Access query, Access will handle it similarly to how it handled the inner join version. If we modify the same query we used to be an left join, the traced ODBC SQL will output the key population query like so:

SQLExecDirect: 
SELECT 
   "c"."CityID"
  ,"s"."StateProvinceID" 
FROM {oj 
	"Application"."Cities" "c" 
	LEFT OUTER JOIN "Application"."StateProvinces" "s" 
		ON ("c"."StateProvinceID" = "s"."StateProvinceID" ) 
}
The first statement from the traced ODBC SQL for an Access query with an outer join.
The syntax looks quite different from what you might expect in other SQL dialects. That is because ODBC SQL grammar requires that any outer joins be wrapped in an {oj ...} expression. For more details on that syntax, consult the documentation. For our purpose, we can just disregard the {oj and the closing } as noise.

Conclusions

We saw that joins are treated as if they are a kind of filter and Access will try to remote the joins where it is allowed to. One particular area to pay close attention is the fact that by default we use dynaset type recordsets and Access will make no assumptions about whether we want to allow modifying so and so columns in the recordset and goes out of it way to make it possible for us to update to two tables which actually is not easily expressed in standard SQL. As a consequence, Access will do much more work to support updatability for a query that contains joins which can negatively impact the performance.

We can help avoid the penalty by using SQL views in place of joins expressed in an Access query. The tradeoff is that we are then subject to the updatability rules of a SQL view; we might not be allowed to update two tables at the same time. Usually because a well-designed Access form will only represent a single table to update, that is not much of a restriction and is a good discipline to follow.

With that, the current series is done. However, the learning that the series hopefully sparks should not be done. I sincerely hope you found the series useful and look forward to hearing about new insights you gained from using tools to help analyze and address performance issues with Access applications using ODBC data sources. Feel free to leave comments or request for more information and thanks for reading together!

For further assistance with anything Microsoft Access related, call our experts on 773-809-5456 or email us at sales@itimpact.com.