Authored by Ben Clothier
The other day Juan needed my help with a query that required a special technique commonly known as the Strawberry query. He needed a way to match one group of records with the first occurrence of a item in another table. From that I decided to write this post to share the technique.
There are times where we want to not just know that a maximum or minimum value in a table but also know which row contains that maximum value. This can be a challenge as SQL does not make it easy to express such a question, but strawberry query is a wonderful answer to that question. To help appreciate the technique, we’ll be using a running example. Suppose we have a company that collects and sells rare & highly prized collector items and we want to know who is the best performing salesperson. Here’s our Sales data for that hypotehical company.
Full Name Product Date Sales Amount Ed William 1957 Nourve de Champenge 7/12/11 $ 3,820 Ed William Prince Edward's Spare Tire 7/21/11 $ 8,120 Ed William 12th Centrury Ming Vase 7/31/11 $21,100 Ed William Bruce Springsteen's wristwatch 8/13/11 $ 790 Ed William Cracked Darth Vader Helmet 8/20/11 $15,210 Ed William Tamara de Lempicka's Amythyst 9/3/11 $ 8,500 Zooey Yoko 1855 S & W Blue Colt Revolver 7/20/11 $ 4,250 Zooey Yoko Grizzly Bear Taxidermy 8/1/11 $18,300 Zooey Yoko Ringo Starr's hair lock 8/20/11 $28,100 Zooey Yoko Portrait of unknown woman 9/7/11 $ 230 Zooey Yoko 1790 American Plow 9/12/11 $ 7,000 Rolf Smith Giuseppe's Nutcracker 7/20/11 $12,730 Rolf Smith Mother Goddess Figurine 7/27/11 $19,100 Rolf Smith Fragment of Dead Sea Scrolls 8/3/11 $11,500 Rolf Smith Shroud of Turin 8/13/11 $ 160 Rolf Smith 1962 Corvette 9/2/11 $42,200
Typically, When we want to know what was the largest sales amount in a given month, we can ask:
SELECT Month(s.SalesDate) AS SalesMonth, Max(s.AgreedPrice) AS SalesAmount FROM tblSales AS s GROUP BY Month(s.SalesDate);
This query would give us something like this:
Sales Month Sales Amount 7 $21,100.00 8 $28,100.00 9 $42,200.00
But let’s ask – which salesman had the biggest sales each month? This wouldn’t work:
SELECT Month(s.SalesDate) AS SalesMonth, sp.FullName AS SalesPerson, Max(s.AgreedPrice) AS SalesAmount FROM tblSales AS s INNER JOIN tblSalesPeople AS sp ON s.SalesPersonID = sp.SalesPersonID GROUP BY Month(s.SalesDate), sp.FullName ;
The query is actually equivalent to “give me all sales staff’s largest sales per month”. That’s a different question from “who was the salesperson that made the biggest sales in this month?”.
Sales Month Sales Person Sales Amount 7 Ed William $21,100 7 Rolf Smith $19,100 7 Zooey Yoko $ 4,250 8 Ed William $15,210 8 Rolf Smith $11,500 8 Zooey Yoko $28,100 9 Ed William $ 8,500 9 Rolf Smith $42,200 9 Zooey Yoko $ 7,000
Instead of getting only a row per month, we got a row for numbers of month times numbers of salesperson and thus 6 rows too many for this example. Fortunately, we can express this question using the ‘strawberry query.’
Strawberry Query to the rescue
A strawberry query has these essential components:
1) it is usually a self-join within one table
2) the join criteria has more than one column being compared
3) one of those join criteria is an inequality operator.
4) the outer table’s column must evaluate to NULL
.
So, to get the SalesPerson, (FullName
field below), from tblSales
where the sales amount was largest for the month, we can use this query:
SELECT Month(s.SalesDate) AS SalesMonth, s.SalesDate, p.FullName, s.AgreedPrice AS SalesAmount FROM (tblSalesPeople AS p INNER JOIN tblSales AS s ON p.SalesPersonID = s.SalesPersonID) LEFT JOIN tblSales AS c ON (s.AgreedPrice < c.AgreedPrice) AND (Month(s.SalesDate) = Month(c.SalesDate)) WHERE c.AgreedPrice IS NULL;
The query will produce this result:
Sales Month Sales Date Full Name Sales Amount 7 7/31/2011 Ed William $21,100 8 8/20/2011 Zooey Yoko $28,100 9 9/2/2011 Rolf Smith $42,200
Because the query did not use any aggregate function neither did it have a GROUP BY
clause, we could then join the SalesPerson
table, include the full name for the display as well as the actual Sales Date from the Sales table which would have been invalid in an aggregate query.
How does the strawberry query work?
The expression Month(s.SalesDate) = Month(c.SalesDate)
is functionally equivalent to doing a GROUP BY Month(SalesDate)
, and that is how we’d define our “grouping” when we use a strawberry query. The second expression, s.Amount < c.Amount
in conjunction with the WHERE c.Amount IS NULL
is equivalent to doing a MAX(Amount)
. If you’re wondering how the c.Amount IS NULL
expression could help us extract the minimum or maximum amount, it may help to look at the query if we removed the WHERE
clause.
First, we’ll reduce the query to essential skeleton of the strawberry, removing the tblSalesPerson
table and other columns then include the outer columns.
SELECT Month(s.SalesDate) AS SalesMonth, s.AgreedPrice AS SalesAmount, c.AgreedPrice AS OtherSalesAmount FROM tblSales AS s LEFT JOIN tblSales AS c ON (s.AgreedPrice < c.AgreedPrice) AND (Month(s.SalesDate) = Month(c.SalesDate)) ;
Here is the listing of the result query would return without the WHERE clause: (lower sales were omitted to reduce the number of rows shown)
SalesMonth SalesAmount OtherSalesAmount ... ... ... 7 $8,120.00 $19,100.00 7 $21,100.00 NULL ... ... ... 7 $12,730.00 $21,100.00 7 $12,730.00 $19,100.00 7 $19,100.00 $21,100.00 8 $15,210.00 $18,300.00 ... ... ... 8 $11,500.00 $28,100.00 8 $11,500.00 $18,300.00 8 $11,500.00 $15,210.00 ... ... ... 8 $18,300.00 $28,100.00 8 $15,210.00 $28,100.00 8 $28,100.00 NULL 9 $7,000.00 $8,500.00 9 $8,500.00 $42,200.00 9 $7,000.00 $42,200.00 9 $42,200.00 NULL ... ... ...
A row from left table will be matched with any other rows in right table where the amount is greater. You can see how all sales that weren’t the largest sales get matches with each other’s sales that had a larger sales amount. The only row that can’t match any other sales is when it is the largest amount, and we would get a NULL
from the outer sales table’s column. Hence the WHERE
clause filters the set down to only those rows and we get our maximum row.
Closing Notes
Use SQL View for Strawberry Queries
Because the strawberry query uses an inequality in the join criteria, Access Query Designer cannot represent so if you attempt to view the query in design, you’ll get an error. For those who love QBE, you can add all the tables you need, drop in fields, then give the tables an alias so that it’s a simple editing when you switch to SQL and add that inequality criteria.
Optimization
In the example, I used Month(SalesDate)
which is not sargable and therefore the query isn’t as optimized as it could be. A sargable criteria would be among the lines of SalesDate BETWEEN #yyyy-mm-dd# AND #yyyy-mm-dd#
but that wasn’t appropriate in this situation because that’d have limited us to only a single month when we want to query for each month for all possible sales. If one wanted to optimize the performance further using “group by months”, it may be necessary to consider approaches such as creating denormalized column to store the month number & indexing upon it or break the SalesDate
column into SalesYear
, SalesMonth
, and SalesDay column
. If we’re using different backends such as SQL Server, we can then use additional features such as persisted computed column or similar tools.
Having index on the column where you apply the criteria in the strawberry query will ensure that the query runs fast, handy when you start to deal with larger datasets.
If there’s any other query patterns you have noticed or used, we’d love it if you can share with us.
“s.Amount < c.Amount” in conjunction with the “WHERE c.Amount IS NULL
I don't see either expression in the example while andypluck's comment uses them
How could you get a list of each Salesmans best sale ever (global maximum)?
By grouping on the sales person, you can get the maximum for each salesperson ever.
SELECT sp.FullName AS SalesPerson,
Max(s.AgreedPrice) AS SalesAmount
FROM tblSales AS s
INNER JOIN tblSalesPeople AS sp
ON s.SalePersonID = sp.SalePesonID
GROUP BY sp.FullName
However, if you wanted to also see what date they made the biggest sale ever, then you need strawberry query again, but replace the Month(s.SaleDate) from the ON clause of the non-equi join with s.SalePersonID to achieve grouping by salesperson instead of months.
I hope that helps! Thanks for reading & commenting!
SELECT s.SalesDate,
p.FullName,
s.SalesAmount
Or I can answer my own Question
FROM (tblSalesPeople AS p
INNER JOIN tblSales AS s
ON p.SalesPersonID = s.SalesPersonID)
LEFT JOIN tblSales AS c
ON (s.SalesAmount < c.SalesAmount)
AND (Month(s.SalesDate) = Month(c.SalesDate))
WHERE c.SalesAmount IS NULL;
Another approach, if your query/view resides in SQL Server (and in my opinion easier to read) is:
With SalesRanked as (
SELECT Month(s.SalesDate) AS SalesMonth,
s.SalesDate,
p.FullName,
s.AgreedPrice AS SalesAmount,
ROW_NUMBER() over(partition by month(s.salesdate) order by s.agreedprice desc) as ‘Rnum’
FROM tblSalesPeople AS p
INNER JOIN tblSales AS s
ON p.SalesPersonID = s.SalesPersonID
)
select *
from SalesRanked
where Rnum = 1;
No fruit required 😉
That’s a great use of windowing function, Wanderlei. I agree that it is more easier to read & conceptualize but when you’re on a RDBMS that doesn’t have windowing functions, well, it’s good to have fruits handy! 🙂
Thanks for sharing!
Hi Ben
“First, we’ll reduce the query to essential skeleton of the strawberry, removing the SalesPerson table and other columns then include the outer columns.”
should be:
“First, we’ll reduce the query to essential skeleton of the strawberry, removing the tblSalesPeople table and other columns then include the outer columns.”
Why the fruity name and who invented it?
Thanks for pointing this out, Giorgio.
As for the origin of the fruity name, I picked it up from MySQL newsgroup which as far as I understand it, that kind of question is very common and the user named ‘strawberry’ would always answer this question with the above SQL that the denizens there came to call it ‘strawberry query’. Because it was so unusual name, it made it more memorable so I’ve been calling it since then. Hopefully, it’s sufficiently memorable for you when the time comes for you to need it, you’ll remember. 🙂
Thanks for reading & commenting!