SET-BASED Running Subtotals in SQL
Have you ever wanted to perform a set-based running-subtotals and/or running-totals operation in SQL-Server (i.e., have row-level accumulators store a running total or running subtotal?) Want the solution to not use cursors? Don't have access to SQL-Server 2012 or newer yet?Here I present a cursorless method of implementing such functionality in a SQL-Server Query, Stored-Procedure, or User Defined Function (UDF) that works with all versions of SQL-Server (SQL-Server 2005, 2008, 2008r2, 2012, 2014, 2016...) without requiring SQL2012-specific OVER / PARTITION BY functionality. I have seen running totals done before, with varied methods, but what I have not seen much about elsewhere is how to do running subtotals. The code that appears below does both running totals and running subtotals. It is easily adapted to do multi-column break-value running subtotals, as noted in the code comments.
SQL-Server 2012 (or newer) Note: once SQL-Server 2012 became widely available, this example was updated to show how to implement the running totals and running subtotals using SQL-Server 2012 OVER and PARTITION BY (windowing functionality) clauses that are ANSI standard SQL available to Microsoft Transact-SQL query writers now. See the lower portion of the source code for a solution using these modern SQL features. Included are performance (run-time) comparison values for both solution methods.
Note: for this example, I used the SQL Server 2012 Adventureworks sample database, which provided me with a test case of approximately 113,000 rows against which I perform the running-subtotal (by product level) logic and do my row-level accumulator manipulations (using SET logic). This query is relatively efficient and took just 2 seconds to execute withing a VM (virtual machine) on my single-CPU Intel Core-i7 development desktop machine.
For a more complex running-totals / running-subtotals example that builds off this one, see Set-Based Running Subtotals (i.e., Row-Level Accumulators) Example 2 — Multiple Simultaneous Running-SubTotals, which builds upon the techniques shown here.
SQL-Server Source Code
--********************************************************************************
--This source code is Copyright (c) 2007-2016
-- Author: Mike Eberhart
--
--I hereby release this code under the terms of the MIT License (for freeware).
--
--Permission is hereby granted, free of charge, to any person obtaining a copy
--of this software and associated documentation files (the "Software"), to deal
--in the Software without restriction, including without limitation the rights
--to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
--copies of the Software, and to permit persons to whom the Software is
--furnished to do so, subject to the following conditions:
--
--The above copyright notice and this permission notice shall be included in
--all copies or substantial portions of the Software.
--
--THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
--IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
--FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
--AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
--LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
--OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
--THE SOFTWARE.
--********************************************************************************
--**********************************************************************************************
-- BEGIN: SET-BASED Running-Subtotals Technique #1
--**********************************************************************************************
CHECKPOINT; --to produce a cold buffer cache
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS; -- Remove all buffers from the buffer pool / clear data cache
DBCC FREEPROCCACHE WITH NO_INFOMSGS; -- Clears the procedure (plan) cache
SET NOCOUNT ON;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
DECLARE @starttime DATETIME = GetDate();
--Accumulator variables
DECLARE @RunningTotal MONEY = 0.0;
DECLARE @RunningSubtotal MONEY = 0.0;
--Our break-level-value variable (Data-Type should match that of the column we are comparing)
--Initialize this to some value that will NOT exist in the break-column's actual data.
DECLARE @SubTotalBreakValue1 INT = -1;
--Our work-table, where we can ensure the ORDER with which we later access the data, being
--by default, the order of the PRIMARY KEY declared here in UniqueID column.
DECLARE @ResultsTable TABLE
(
UniqueID INT IDENTITY NOT NULL PRIMARY KEY,
SubtotalBreakColumn1 INT,
ActualCost MONEY,
RunningSubtotal MONEY,
RunningTotal MONEY
);
--Insert all values to be totaled, into our work table in the REQUIRED BREAK-LEVEL(S) ORDER
INSERT INTO @ResultsTable
(
SubtotalBreakColumn1,
ActualCost,
RunningSubtotal,
RunningTotal
)
SELECT
ProductID,
ActualCost,
0,
0
FROM
Production.TransactionHistory
ORDER BY
ProductID --Insert into table in our subtotal-breaking order (IMPORTANT!)
;
--**********************************************************************************************
-- ALGORITHM EXPLANATION:
-- We will be updating the row-level accumulators now through an UPDATE statement.
--
-- The subtotal logic relies on the fact that our break-value-detection variable(s)
-- (@SubTotalBreakValue1,[...]) have their value(s) updated AFTER the current row's running
-- total and subtotal variable values have had their values calculated and updated.
-- The sequence of variable and column updates here is important, since the variable(s) for
-- break-detection comparisons must have access to "prior row" values essentially, during the
-- CASE statement that determines whether to "reset" a running-subtotal variable at a break.
--
-- Additionally, you will notice the running-total and running-subtotal variables are SET at
-- the same time the table-column value is assigned, taking advantage of the order of operation
-- within the statement, whereby we can update both the variable(s) and the column(s) with any
-- newly calculated running-total/subtotal(s).
--
-- This logic can easily be extended to support subtotal-breaks that rely on multiple-column
-- values simply by:
-- 1) declaring additional break-value variables
-- 2) adding additional column(s) to the work-table for those break-levels and/or any
-- additional subtotal-accumulator column(s) you desire
-- 3) altering the INSERT (into work-table) to also ORDER BY the additional break-column(s)
-- 4) altering the CASE statement used in the UPDATE to compare more than one column prior to
-- resetting the Subtotal accumulator. E.g.,:
-- CASE
-- WHEN ( @SubTotalBreakValue1 = SubtotalBreakColumn1 AND
-- @SubTotalBreakValue2 = SubtotalBreakColumn2 AND...)
-- THEN @RunningSubtotal + ActualCost
-- ELSE 0
--**********************************************************************************************
UPDATE
@ResultsTable
SET
@RunningTotal = RunningTotal = @RunningTotal + ActualCost,
@RunningSubtotal = RunningSubtotal =
CASE
WHEN @SubTotalBreakValue1 = SubtotalBreakColumn1
THEN @RunningSubtotal + ActualCost
ELSE ActualCost
END,
@SubTotalBreakValue1= SubtotalBreakColumn1
;
--Output the results...
SELECT
SubtotalBreakColumn1 AS ProductID, ActualCost, RunningSubtotal, RunningTotal
FROM
@ResultsTable
ORDER BY
UniqueID
;
--**********************************************************************************************
--Using the AdventureWorks2012 DB, you'll see a large result set with data like this:
--**********************************************************************************************
--ProductID ActualCost RunningSubtotal RunningTotal
------------- ---------- --------------- ------------
--1 50.2635 50.2635 50.2635
--1 50.2635 100.527 100.527
--1 50.2635 150.7905 150.7905
--1 50.2635 201.054 201.054
--1 50.2635 251.3175 251.3175
--...
--... skip forward to first product break now...
--...
--1 50.2635 2211.594 2211.594
--1 50.2635 2261.8575 2261.8575
--2 41.916 41.916 2303.7735
--2 41.916 83.832 2345.6895
--...
--... and take a look at the end of the entire result set now...
--...
--999 388.7928 286518.694 27306402.3709
--999 388.7928 286907.4868 27306791.1637
--999 0.00 286907.4868 27306791.1637
--999 539.99 287447.4768 27307331.1537
PRINT dbo.[udfUtility_PrintElapsedRuntimeDebugInfo]('Custom Running-Subtotal Example', @starttime, DEFAULT);
--SELECT DateDiff(ms, @starttime, GetDate()); --Display elapsed Milliseconds
--**********************************************************************************************
-- END: SET-BASED Running-Subtotals Technique #1
--
-- Average runtime against source data count of 113K rows in AdventureWorks2012 are decent:
-- 2100ms (2.1sec) when selecting the results for output
-- 1560ms (1.55sec) when calculating but not selecting results for output
--**********************************************************************************************
--**********************************************************************************************
--**********************************************************************************************
-- BEGIN: alternative SET-BASED Running-Subtotal and Running-Totals Technique using
-- "OVER" (windowing) functionality introduced in SQL-Server 2012.
--**********************************************************************************************
--**********************************************************************************************
CHECKPOINT; --to produce a cold buffer cache
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS; -- Remove all buffers from the buffer pool / clear data cache
DBCC FREEPROCCACHE WITH NO_INFOMSGS; -- Clears the procedure (plan) cache
SET NOCOUNT ON;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
--DECLARE @starttime DATETIME = GetDate();
SET @starttime = GetDate();
SELECT
ProductID,
ActualCost,
SUM(ActualCost) OVER
(
PARTITION BY ProductID
ORDER BY ProductID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS RunningSubtotal,
SUM(ActualCost) OVER
(
PARTITION BY @@SERVERNAME
ORDER BY ProductID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS RunningTotal
FROM
Production.TransactionHistory
ORDER BY
--Use @@SERVERNAME as first column in order by so we can create a "partition" of the ENTIRE set.
--ORDER BY requires non-constant-values for columns, and this function produces a "constant" (without being flagged as such)
@@SERVERNAME,
ProductID;
PRINT dbo.[udfUtility_PrintElapsedRuntimeDebugInfo]('OVER / PARTITION BY Running-Subtotal Example', @starttime, DEFAULT);
--SELECT DateDiff(ms, @starttime, GetDate()); --Display elapsed Milliseconds
--**********************************************************************************************
-- END: "OVER" (WINDOWING FUNCTION) BASED Running-Total/Running-Subtotal Technique,
-- which produces same result (NOTE: requires SQL-Server 2012+)
--
-- 610ms average : FAST!
-- Being "native" functionality in SQL-Server 2012, it is not surprising this approach produces
-- the SPEEDIEST EXECUTION TIME.
--
-- But, for a non-native and/or pre-SQL2012 solution, our original approach we laid out (above)
-- is perhaps as performant as possible while being relatively easy to implement.
--**********************************************************************************************
Continue to read this Software Development and Technology Blog for computer programming, software development, and technology Techniques, How-To's, Fixes, Reviews, and News — focused on Dart Language, SQL Server, Nvidia CUDA, VMware, TypeScript, SVG, other technology tips and how-to's, and my varied political and economic opinions.
No comments:
Post a Comment