Go to Top

Query Pattern: Finding the maximum/minimum and getting the full row

Authored by Ben Clothier

The other day Juan needed my help with a query that required a special technique commonly known as the Strawberry query. He needed a way to match one group of records with the first occurrence of a item in another table. From that I decided to write this post to share the technique.

ADVERTISING
ACCESS SAFETY AND TRAINING DATABASE

Access can work great in large companies, call us to discuss

There are times where we want to not just know that a maximum or minimum value in a table but also know which row contains that maximum value. This can be a challenge as SQL does not make it easy to express such a question, but strawberry query is a wonderful answer to that question. To help appreciate the technique, we’ll be using a running example. Suppose we have a company that collects and sells rare & highly prized collector items and we want to know who is the best performing salesperson. Here’s our Sales data for that hypotehical company.

 

 

Typically, When we want to know what was the largest sales amount in a given month, we can ask:

 

This query would give us something like this:

But let’s ask – which salesman had the biggest sales each month? This wouldn’t work:

The query is actually equivalent to “give me all sales staff’s largest sales per month”. That’s a different question from “who was the salesperson that made the biggest sales in this month?”.

Instead of getting only a row per month, we got a row for numbers of month times numbers of salesperson and thus 6 rows too many for this example. Fortunately, we can express this question using the ‘strawberry query.’

Strawberry Query to the rescue

A strawberry query has these essential components:

1) it is usually a self-join within one table
2) the join criteria has more than one column being compared
3) one of those join criteria is an inequality operator.
4) the outer table’s column must evaluate to NULL.

So, to get the SalesPerson, (Full Name field below), from tblSales where the sales amount was largest for the month, we can use this query:

The query will produce this result:

 

Because the query did not use any aggregate function neither did it have a GROUP BY clause, we could then join the SalesPerson table, include the full name for the display as well as the actual Sales Date from the Sales table which would have been invalid in an aggregate query.

How does the strawberry query work?

The expression “Month(s.SalesDate) = Month(c.SalesDate)” is functionally equivalent to doing a “GROUP BY Month(SalesDate)”, and that is how we’d define our “grouping” when we use a strawberry query. The second expression, “s.Amount < c.Amount” in conjunction with the “WHERE c.Amount IS NULL;” is equivalent to doing a “MAX(Amount)”. If you’re wondering how the “c.Amount IS NULL” expression could help us extract the minimum or maximum amount, it may help to look at the query if we removed the WHERE clause.

First, we’ll reduce the query to essential skeleton of the strawberry, removing the tblSalesPerson table and other columns then include the outer columns.

 

Here is the listing of the result query would return without the WHERE clause: (lower sales were omitted to reduce the number of rows shown)

A row from left table will be matched with any other rows in right table where the amount is greater. You can see how all sales that weren’t the largest sales get matches with each other’s sales that had a larger sales amount. The only row that can’t match any other sales is when it is the largest amount, and we would get a NULL from the outer sales table’s column. Hence the WHERE clause filters the set down to only those rows and we get our maximum row.

Closing Notes

Use SQL View for Strawberry Queries

Because the strawberry query uses an inequality in the join criteria, Access Query Designer cannot represent so if you attempt to view the query in design, you’ll get an error. For those who love QBE, you can add all the tables you need, drop in fields, then give the tables an alias so that it’s a simple editing when you switch to SQL and add that inequality criteria.

Optimization

In the example, I used “Month(SalesDate)” which is not sargable and therefore the query isn’t as optimized as it could be. A sargable criteria would be among the lines of “SalesDate BETWEEN #yyyy-mm-dd# AND #yyyy-mm-dd#” but that wasn’t appropriate in this situation because that’d have limited us to only a single month when we want to query for each month for all possible sales. If one wanted to optimize the performance further using “group by months”, it may be necessary to consider approaches such as creating denormalized column to store the month number & indexing upon it or break the SalesDate column into SalesYear, SalesMonth, and SalesDay column. If we’re using different backends such as SQL Server, we can then use additional features such as persisted computed column or similar tools.

Having index on the column where you apply the criteria in the strawberry query will ensure that the query runs fast, handy when you start to deal with larger datasets.

If there’s any other query patterns you have noticed or used, we’d love it if you can share with us.

8 Responses to "Query Pattern: Finding the maximum/minimum and getting the full row"

  • Giorgio
    January 18, 2017 - 4:29 pm Reply

    “s.Amount < c.Amount” in conjunction with the “WHERE c.Amount IS NULL

    I don't see either expression in the example while andypluck's comment uses them

  • andiepluck
    March 9, 2012 - 9:01 pm Reply

    How could you get a list of each Salesmans best sale ever (global maximum)?

    • Ben Clothier
      March 9, 2012 - 9:10 pm Reply

      By grouping on the sales person, you can get the maximum for each salesperson ever.

      SELECT sp.FullName AS SalesPerson,
      Max(s.AgreedPrice) AS SalesAmount
      FROM tblSales AS s
      INNER JOIN tblSalesPeople AS sp
      ON s.SalePersonID = sp.SalePesonID
      GROUP BY sp.FullName

      However, if you wanted to also see what date they made the biggest sale ever, then you need strawberry query again, but replace the Month(s.SaleDate) from the ON clause of the non-equi join with s.SalePersonID to achieve grouping by salesperson instead of months.

      I hope that helps! Thanks for reading & commenting!

    • andiepluck
      March 9, 2012 - 9:10 pm Reply

      SELECT s.SalesDate,
      p.FullName,
      s.SalesAmount
      Or I can answer my own Question

      FROM (tblSalesPeople AS p
      INNER JOIN tblSales AS s
      ON p.SalesPersonID = s.SalesPersonID)
      LEFT JOIN tblSales AS c
      ON (s.SalesAmount < c.SalesAmount)
      AND (Month(s.SalesDate) = Month(c.SalesDate))
      WHERE c.SalesAmount IS NULL;

  • Wanderlei Santos
    January 28, 2012 - 12:45 am Reply

    Another approach, if your query/view resides in SQL Server (and in my opinion easier to read) is:

    With SalesRanked as (
    SELECT Month(s.SalesDate) AS SalesMonth,
    s.SalesDate,
    p.FullName,
    s.AgreedPrice AS SalesAmount,
    ROW_NUMBER() over(partition by month(s.salesdate) order by s.agreedprice desc) as ‘Rnum’
    FROM tblSalesPeople AS p
    INNER JOIN tblSales AS s
    ON p.SalesPersonID = s.SalesPersonID
    )
    select *
    from SalesRanked
    where Rnum = 1;

    No fruit required 😉

    • Ben Clothier
      January 28, 2012 - 12:56 am Reply

      That’s a great use of windowing function, Wanderlei. I agree that it is more easier to read & conceptualize but when you’re on a RDBMS that doesn’t have windowing functions, well, it’s good to have fruits handy! 🙂

      Thanks for sharing!

  • Giorgio
    November 18, 2011 - 4:20 pm Reply

    Hi Ben
    “First, we’ll reduce the query to essential skeleton of the strawberry, removing the SalesPerson table and other columns then include the outer columns.”
    should be:
    “First, we’ll reduce the query to essential skeleton of the strawberry, removing the tblSalesPeople table and other columns then include the outer columns.”
    Why the fruity name and who invented it?

    • Ben Clothier
      November 18, 2011 - 10:14 pm Reply

      Thanks for pointing this out, Giorgio.

      As for the origin of the fruity name, I picked it up from MySQL newsgroup which as far as I understand it, that kind of question is very common and the user named ‘strawberry’ would always answer this question with the above SQL that the denizens there came to call it ‘strawberry query’. Because it was so unusual name, it made it more memorable so I’ve been calling it since then. Hopefully, it’s sufficiently memorable for you when the time comes for you to need it, you’ll remember. 🙂

      Thanks for reading & commenting!

Leave a Reply

Your email address will not be published. Required fields are marked *

 

Contact Us
[gravityform id="16" title="false" description="false"]