SET-BASED Multiple Simultaneous Running Subtotals in SQL
Have you ever wanted to perform a set-based running-subtotals and/or running-totals operation in SQL-Server (i.e., have row-level accumulators store a running total or running subtotal?) Want the solution to not use cursors? Don't have access to SQL-Server 2012 or newer yet?Here I present a cursorless method of implementing such functionality in a SQL-Server Query, Stored-Procedure, or User Defined Function (UDF) that works with all versions of SQL-Server (SQL-Server 2005, 2008, 2008r2, 2012, 2014, 2016...) without requiring SQL2012-specific OVER / PARTITION BY functionality. I have seen running totals done before, with varied methods, but what I have not seen much about elsewhere is how to do running subtotals. The code that appears below does both running totals and running subtotals. It is easily adapted to do multi-column break-value running subtotals, as noted in the code comments.
This example builds off of Set-Based Running Subtotals (i.e., Row-Level Accumulators) Example 1, which introduces the techniques shown here (and more), and includes more discussion of the solution(s) — see the comments in the example source code.
SQL-Server 2012 (or newer) Note: once SQL-Server 2012 became widely available, this example was updated to show how to implement the running totals and running subtotals using SQL-Server 2012 OVER and PARTITION BY (windowing functionality) clauses that are ANSI standard SQL available to Microsoft Transact-SQL query writers now. See the lower portion of the source code for a solution using these modern SQL features. Included are performance (run-time) comparison values for both solution methods.
Note: for this example, I used the SQL Server 2012 Adventureworks sample database, which provided me with a test case of approximately 113,000 rows against which I perform the running-subtotal (by product level) logic and do my row-level accumulator manipulations (using SET logic). This query is relatively efficient and averaged just under 3 seconds to execute withing a VM (virtual machine) on my single-CPU Intel Core-i7 development desktop machine.
SQL-Server Source Code
--********************************************************************************
--This source code is Copyright (c) 2007-2016
-- Author: Mike Eberhart
--
--I hereby release this code under the terms of the MIT License (for freeware).
--
--Permission is hereby granted, free of charge, to any person obtaining a copy
--of this software and associated documentation files (the "Software"), to deal
--in the Software without restriction, including without limitation the rights
--to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
--copies of the Software, and to permit persons to whom the Software is
--furnished to do so, subject to the following conditions:
--
--The above copyright notice and this permission notice shall be included in
--all copies or substantial portions of the Software.
--
--THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
--IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
--FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
--AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
--LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
--OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
--THE SOFTWARE.
--********************************************************************************
--**********************************************************************************************
-- BEGIN: SET-BASED Running-Subtotals Technique #2 - Multiple Simultaneous Running-Totals
--**********************************************************************************************
CHECKPOINT; --to produce a cold buffer cache
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS; -- Remove all buffers from the buffer pool / clear data cache
DBCC FREEPROCCACHE WITH NO_INFOMSGS; -- Clears the procedure (plan) cache
SET NOCOUNT ON;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
DECLARE @starttime DATETIME = GetDate();
--Accumulator variables
DECLARE @RunningTotal MONEY = 0.0;
DECLARE @RunningTranDtSubtotal MONEY = 0.0;
DECLARE @RunningProductSubtotal MONEY = 0.0;
--Our break-level-value variables (Data-Types match those of the columns we are comparing to)
--Initialize these to some values that will NOT exist in each break-column's actual data.
DECLARE @SubTotalBreakValue1 CHAR(8) = '18010101';
DECLARE @SubTotalBreakValue2 INT = -1;
--Our work-table, where we can ensure the ORDER with which we later access the data, being
--by default, the order of the PRIMARY KEY declared here in UniqueID column.
DECLARE @ResultsTable TABLE
(
UniqueID INT IDENTITY NOT NULL PRIMARY KEY,
SubtotalBreakColumn1 CHAR(8),
SubtotalBreakColumn2 INT,
ActualCost SMALLMONEY,
RunningTranDtSubtotal MONEY,
RunningProductSubtotal MONEY,
RunningTotal MONEY
);
--Insert all values to be totaled, into our work table in the REQUIRED BREAK-LEVEL(S) ORDER
INSERT INTO @ResultsTable
(
SubtotalBreakColumn1,
SubtotalBreakColumn2,
ActualCost,
RunningTranDtSubtotal,
RunningProductSubtotal,
RunningTotal
)
SELECT
dbo.[udfConvertDateToString](TransactionDate),
ProductID,
ActualCost,
0,
0,
0
FROM
Production.TransactionHistory
ORDER BY
TransactionDate, ProductID --Insert into table in our subtotal-break order (IMPORTANT!)
;
--**********************************************************************************************
-- ALGORITHM EXPLANATION:
-- See SET-BASED RUNNING SUBTOTALS Example 1, which this builds upon as discussed
-- in the comments therein.
--**********************************************************************************************
UPDATE
@ResultsTable
SET
@RunningTotal = RunningTotal = @RunningTotal + ActualCost,
@RunningTranDtSubtotal = RunningTranDtSubtotal =
CASE
WHEN @SubTotalBreakValue1 = SubtotalBreakColumn1
THEN @RunningTranDtSubtotal + ActualCost
ELSE ActualCost
END,
@RunningProductSubtotal = RunningProductSubtotal =
CASE
WHEN ( @SubTotalBreakValue1 = SubtotalBreakColumn1 AND
@SubTotalBreakValue2 = SubtotalBreakColumn2 )
THEN @RunningProductSubtotal + ActualCost
ELSE ActualCost
END,
@SubTotalBreakValue1 = SubtotalBreakColumn1,
@SubTotalBreakValue2 = SubtotalBreakColumn2
;
--Output the results... format to fit printout better.
SELECT
UniqueID,
SubtotalBreakColumn1 AS 'DateBreak',
CONVERT(CHAR(6),SubtotalBreakColumn2) AS 'ProdID',
ActualCost,
RunningTranDtSubtotal AS 'RunningDtSub',
RunningProductSubtotal AS 'RunningProdSub',
RunningTotal
FROM
@ResultsTable;
PRINT dbo.[udfUtility_PrintElapsedRuntimeDebugInfo]('Custom Multiple-Running-Subtotals / Total Example', @starttime, DEFAULT);
--**********************************************************************************************
--Using the AdventureWorks DB (with SQL2005), you'll see a large set with data like this:
--**********************************************************************************************
--UniqueID DateBreak ProdID ActualCost RunningDtSub RunningProdSub RunningTotal
---------- --------- ------ ------------ ----------------- ----------------- ---------------------
--1 20030901 3 0.00 0.00 0.00 0.00
--2 20030901 316 0.00 0.00 0.00 0.00
--...
--...
--25 20030901 534 0.00 0.00 0.00 0.00
--26 20030901 707 20.994 20.994 20.994 20.994
--27 20030901 707 20.994 41.988 41.988 41.988
--28 20030901 707 20.994 62.982 62.982 62.982
--...
--...
--75 20030901 707 34.99 1105.684 1105.684 1105.684
--76 20030901 707 34.99 1140.674 1140.674 1140.674
--77 20030901 708 34.99 1175.664 34.99 1175.664
--78 20030901 708 34.99 1210.654 69.98 1210.654
--79 20030901 708 34.99 1245.644 104.97 1245.644
--...
--...
--3942 20030901 999 388.7928 1674280.9813 7775.856 1674280.9813
--3943 20030901 999 388.7928 1674669.7741 8164.6488 1674669.7741
--3944 20030901 999 388.7928 1675058.5669 8553.4416 1675058.5669
--3945 20030902 3 0.00 0.00 0.00 1675058.5669
--3946 20030902 316 0.00 0.00 0.00 1675058.5669
--...
--...
--113437 20040903 929 37.086 1086.456 37.086 27307057.0617
--113438 20040903 930 43.0395 1129.4955 43.0395 27307100.1012
--113439 20040903 931 34.3455 1163.841 34.3455 27307134.4467
--113440 20040903 931 34.8705 1198.7115 69.216 27307169.3172
--113441 20040903 932 39.7635 1238.475 39.7635 27307209.0807
--113442 20040903 932 39.2385 1277.7135 79.002 27307248.3192
--113443 20040903 948 82.8345 1360.548 82.8345 27307331.1537
--**********************************************************************************************
-- END: SET-BASED Running-Subtotals Technique #2 - Multiple Simultaneous Running-Totals
--
-- Average run-time against source data count of 113K rows in AdventureWorks2012 are decent:
-- 2980ms (3.0 seconds)
--**********************************************************************************************
--**********************************************************************************************
--**********************************************************************************************
-- BEGIN: alternative SET-BASED Running-Subtotal and Running-Totals Technique using
-- "OVER" (windowing) functionality introduced in SQL-Server 2012.
--**********************************************************************************************
--**********************************************************************************************
CHECKPOINT; --to produce a cold buffer cache
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS; -- Remove all buffers from the buffer pool / clear data cache
DBCC FREEPROCCACHE WITH NO_INFOMSGS; -- Clears the procedure (plan) cache
--DECLARE @starttime DATETIME = GetDate();
SET @starttime = GetDate();
SELECT
dbo.[udfConvertDateToString](TransactionDate) AS DateBreak, --show without zero'd HHMMSS info
ProductID AS ProdID,
ActualCost,
SUM(ActualCost) OVER
(
PARTITION BY TransactionDate
ORDER BY TransactionDate, ProductID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS RunningDtSub,
SUM(ActualCost) OVER
(
PARTITION BY ProductID
ORDER BY TransactionDate, ProductID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS RunningProdSub,
SUM(ActualCost) OVER
(
PARTITION BY @@SERVERNAME
ORDER BY TransactionDate, ProductID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS RunningTotal
FROM
Production.TransactionHistory
ORDER BY
--Use @@SERVERNAME as first column in order by so we can create a "partition" of the ENTIRE set for Running-TOTAL.
--ORDER BY requires non-constant-values for columns, and @@SERVERNAME yields a "constant" (without being flagged as such)
@@SERVERNAME,
TransactionDate,
ProductID;
PRINT dbo.[udfUtility_PrintElapsedRuntimeDebugInfo]('OVER / PARTITION BY Multiple-Running-Subtotals / Total Example', @starttime, DEFAULT);
--SELECT DateDiff(ms, @starttime, GetDate()); --Display elapsed Milliseconds
--**********************************************************************************************
-- END: "OVER" (WINDOWING FUNCTINO) BASED Runing-Total/Running-Subtotals Technique,
-- which produces same result (NOTE: requires SQL-Server 2012+)
--
-- 12200ms (1.2 sec) average : FAST!
-- Being "native" functionality in SQL-Server 2012, it is not surprising this approach produces
-- the SPEEDIEST EXECUTION TIME.
--
-- But, for a non-native and/or pre-SQL2012 solution, our original approach we laid out (above)
-- is perhaps as performant as possible while being relatively easy to implement.
--**********************************************************************************************
Continue to read this Software Development and Technology Blog for computer programming, software development, and technology Techniques, How-To's, Fixes, Reviews, and News — focused on Dart Language, SQL Server, Nvidia CUDA, VMware, TypeScript, SVG, other technology tips and how-to's, and my varied political and economic opinions.
No comments:
Post a Comment