Typically, when we do reporting, we usually do it at a higher granularity. For example, clients commonly wants a monthly report of sales. The database would store the individual sales as a single record, so it’s no problem summing up the figures to the month each. Ditto with year, or even going from a sub-category to category.
But suppose they need to go down? More likely, the answer will be “the database design is no good. scrap and start over!” After all, having the right granularity for your data is essential to a solid database. But this wasn’t a case where normalization wasn’t done. Let’s consider the need to make an account of the inventory and revenues, and dealing them in a FIFO manner. I will step aside quickly to point out that I am not CBA, and any accounting claims I make is to be treated with utmost suspicion. When in doubt, call your accountant.
With the disclaimer out of the way, let’s look at how we currently store the data. In this example, we need to record the purchases of products, and then we have to record the sales of the purchases that we just bought.
Suppose that for a single product we have 3 purchases:
Date | Qty | Per-Cost
9/03 | 3 | $45
9/08 | 6 | $40
9/09 | 8 | $50
We then later sell those products at different occasions at a different price:
Date | Qty | Per-Price
9/05 | 2 | $60
9/07 | 1 | $55
9/10 | 4 | $50
9/12 | 3 | $60
9/15 | 3 | $65
9/19 | 4 | $55
Note that the granularity is at a transaction level — we create a single record for each purchase and for each order. This is very common and makes logical sense – we only need to enter the quantity of products we sold, at a specified price for a particular transaction.
OK, where’s the accounting stuff you disclaimed?
For the reports, we must calculate the revenue we made on each unit of product. They tell me that they must process the product in a FIFO manner… that is, the first unit of product that was purchased should be the first unit of product to be ordered. To then calculate the margin we made on that unit of product, we must look up the cost of that particular unit of product then deduct from the price it was ordered for.
Gross margin = revenue of product – cost of product
Nothing earth shattering, but wait, look at the purchases and orders! We had only 3 purchases, with 3 different cost points, then we had 6 orders with 3 distinct price points. Which cost point goes to which price point, then?
This simple formula of calculating gross margin, in a FIFO manner now requires us to go to the granularity of individual unit of product. We don’t have anywhere in our database. I imagine that if I suggested that the users enter one record per unit of product, there’d be a fairly loud protest and maybe some name-calling. So, what to do?
Breaking it up
Let’s say that for the accounting purpose, we will use the purchase date to sort each individual unit of the product. This is how it should come out:
Line # | Purch Date | Order Date | Per-Cost | Per-Price
1 | 9/03 | 9/05 | $45 | $60
2 | 9/03 | 9/05 | $45 | $60
3 | 9/03 | 9/07 | $45 | $55
4 | 9/08 | 9/10 | $40 | $50
5 | 9/08 | 9/10 | $40 | $50
6 | 9/08 | 9/10 | $40 | $50
7 | 9/08 | 9/10 | $40 | $50
8 | 9/08 | 9/12 | $40 | $60
9 | 9/08 | 9/12 | $40 | $60
10 | 9/09 | 9/12 | $50 | $60
11 | 9/09 | 9/15 | $50 | $65
12 | 9/09 | 9/15 | $50 | $65
13 | 9/09 | 9/15 | $50 | $65
14 | 9/09 | 9/19 | $50 | $55
15 | 9/09 | 9/19 | $50 | $55
16 | 9/09 | 9/19 | $50 | $55
17 | 9/09 | 9/19 | $50 | $55
If you study the breakdown, you can see that there are overlaps where we consume some product from one purchase for so and so orders while other time we have an order that is fulfilled by different purchases.
As noted earlier, we don’t actually have that 17 rows anywhere in the database. We only have the 3 rows of purchases and 6 rows of orders. How do we get 17 rows out of either tables?
Adding more mud
But we’re not done. I just gave you an idealized example where we happened to have a perfect balance of 17 units purchased that is countered by 17 units of orders for same product. In real life, it’s not that pretty. Sometime we are left with excess products. Depending on the business model, it might be also possible to hold more orders than what is available in the inventory. Those playing stock market recognize such as short-selling.
The possibility of an imbalance is also the reason why we can’t take a shortcut of simply summing all costs and prices, then subtract to get the margin. If we were left with X units, we need to know which cost point they are to calculate the inventory. Similarly, we can’t assume that an unfulfilled order will be neatly fulfilled by a single purchase with one cost point. So the calculations we come must not only work for the ideal example but also for where we have excess inventory or unfulfilled orders.
Let’s first deal with the matter of figuring how many inits of product we need to consider. It’s obviously that a simple SUM() of the quantities of units ordered or the quantities of units purchased will not suffice. No, rather, we must SUM() both the quantity of products purchased and the quantity of products ordered. We then will compare the SUM()s and pick the higher one. We could start with this query:
WITH ProductPurchaseCount AS (
SUM(p.QtyBought) AS TotalPurchases
FROM dbo.tblProductPurchase AS p
GROUP BY p.ProductID
), ProductOrderCount AS (
SUM(o.QtySold) AS TotalOrders
FROM dbo.tblProductOrder AS o
GROUP BY o.ProductID
IIF(ISNULL(pc.TotalPurchases, 0) > ISNULL(oc.TotalOrders, 0), pc.TotalPurchases, oc.TotalOrders) AS ProductTransactionCount
FROM dbo.tblProduct AS p
LEFT JOIN ProductPurchaseCount AS pc
ON p.ProductID = pc.ProductID
LEFT JOIN ProductOrderCount AS oc
ON p.ProductID = oc.ProductID
WHERE NOT (pc.TotalPurchases IS NULL AND oc.TotalOrders IS NULL);
What we are doing here is we break up into 3 logical steps:
a) get the SUM() of the quantities purchased by products
b) get the SUM() of the quantities ordered by products
Because we don’t know if we might have a product that may have some purchases but no orders or a product that has orders placed but we have none purchased, we can’t left join either 2 tables. For that reason, we use the product tables as the authoritative source of all ProductID we want to know about, which brings us to 3rd step:
c) match the sums to their products, determine if the product has any transaction (e.g. either purchases or orders ever made) and if so, pick the higher number of the pair. That is our count of total transactions that a product has had.
But why the transaction count?
The objective here is to figure out how many rows we need to generate per product to adequately represent each individual unit of a product that had participated in either a purchase or an order. Remember in our first ideal example, we had 3 purchases and 6 orders, both which balanced out to a total 17 units of product purchased then ordered. For that particular product, we will need to be able to create 17 row to generate the data we had in the figure above.
So how do we transform the single value of 17 in a row into 17 rows? That’s where the magic of tally table enters.
Assuming we revise the above query so that the last part is now a CTE named ProductTransactionCount, we can write the query thus:
<the 3 CTEs from previous exampe>
INSERT INTO tblProductTransactionStaging (
t.Num AS TransactionNumber
FROM ProductTransactionCount AS c
INNER JOIN dbo.tblTally AS t
ON c.TransactionCount >= t.Num;
And pesto! We now have as many rows as we will need — exactly — for each products that we need to do accounting. Note the expression in the ON clause – we are doing a triangular join — we are not using the usual equality operator because we want to generate 17 rows out of thin air. Note that same thing can be achieved with a CROSS JOIN and a WHERE clause. Experiment with both to find which works better.
Making our transaction count
So we have our temporary table set up right number of rows. Now, we need to populate the table with data about purchases and orders. As you saw in the figure, we need to be able to order the purchases and orders by the date they were purchased or ordered, respectively. And that’s where ROW_NUMBER() and tally table comes to the rescue.
ROW_NUMBER() OVER (PARTITION BY p.ProductID ORDER BY p.PurchaseDate, p.PurchaseID) AS TransactionNumber,
FROM dbo.tblProductPurchase AS p
INNER JOIN dbo.tblTally AS t
ON p.QtyBought >= t.Num;
You may wonder why we need ROW_NUMBER() when we could use the tally’s Num column. The answer is that if there are multiple purchases, the Num will only go as high as that purchase’s quantity but we need to go high as 17 — the total of 3 separate purchases of 3, 6 and 8 units. Thus, we partition by ProductID whereas tally’s Num can be said to be partitioned by PurchaseID which is not what we want.
If you ran the SQL, you will now get a nice breakout, a row returned for each unit of product purchased, ordered by purchase date. Note that we also sort by PurchaseID, to handle the case where there were multiple purchases of same product the same day so we have to break the tie somehow to ensure that the Per-Cost figures are calculated consistently. We can then update the temporary table with the purchase:
WITH PurchaseData AS (
MERGE INTO dbo.tblProductTransactionStaging AS t
USING PurchaseData AS p
ON t.ProductID = p.ProductID
AND t.TransactionNumber = p.TransactionNumber
WHEN MATCHED THEN UPDATE SET
t.PurchaseID = p.PurchaseID,
t.PurchaseDate = p.PurchaseDate,
t.CostPer = p.CostPer;
The orders part is basically the same thing – simply replace “Purchase” with “Order”, and you’d get the table filled up just like we had in the original figure at the start of the post.
And at this point, you are all set to do all other sort of accounting goodness now that you’ve broke up the products from a level of transaction down to a level of unit which you need to accurately map the cost of good to the revenue for that particular unit of product using FIFO or LIFO as required by your accountant. The calculations are now elementary.
Granularity in an OLTP world
The concept of granularity is a concept more common in data warehouse than in OLTP applications but I think the scenario discussed highlights the need to step back and clearly identify what is the current granularity of the OLTP’s schema. As we saw, we had the wrong granularity in the start and we needed to rework so that we could get the granularity required to achieve our reporting. It was a happy accident that in this case, we can accurately lower the granularity since we already have all the component data present so we simply had to transform the data. That is not always the case, and it’s more likely that if the schema is not granular enough, it will warrant redesign of the schema. Nonetheless, identifying the granularity required to satisfy the requirements helps clearly define the logical steps you must undertake to arrive to that goal.
Complete SQL script to demonstrate the point can be gotten DemoLowGranularity.sql.