Tuesday, September 20, 2016

SQL-Server Set-Based Running Subtotals (i.e., Row-Level Accumulators) Example 1

SET-BASED Running Subtotals in SQL

Have you ever wanted to perform a set-based running-subtotals and/or running-totals operation in SQL-Server (i.e., have row-level accumulators store a running total or running subtotal?) Want the solution to not use cursors? Don't have access to SQL-Server 2012 or newer yet?

Here I present a cursorless method of implementing such functionality in a SQL-Server Query, Stored-Procedure, or User Defined Function (UDF) that works with all versions of SQL-Server (SQL-Server 2005, 2008, 2008r2, 2012, 2014, 2016...) without requiring SQL2012-specific OVER / PARTITION BY functionality. I have seen running totals done before, with varied methods, but what I have not seen much about elsewhere is how to do running subtotals. The code that appears below does both running totals and running subtotals. It is easily adapted to do multi-column break-value running subtotals, as noted in the code comments.

SQL-Server 2012 (or newer) Note: once SQL-Server 2012 became widely available, this example was updated to show how to implement the running totals and running subtotals using SQL-Server 2012 OVER and PARTITION BY (windowing functionality) clauses that are ANSI standard SQL available to Microsoft Transact-SQL query writers now. See the lower portion of the source code for a solution using these modern SQL features. Included are performance (run-time) comparison values for both solution methods.

Note: for this example, I used the SQL Server 2012 Adventureworks sample database, which provided me with a test case of approximately 113,000 rows against which I perform the running-subtotal (by product level) logic and do my row-level accumulator manipulations (using SET logic). This query is relatively efficient and took just 2 seconds to execute withing a VM (virtual machine) on my single-CPU Intel Core-i7 development desktop machine.

For a more complex running-totals / running-subtotals example that builds off this one, see Set-Based Running Subtotals (i.e., Row-Level Accumulators) Example 2 — Multiple Simultaneous Running-SubTotals, which builds upon the techniques shown here.

SQL-Server Source Code

--********************************************************************************
--This source code is Copyright (c) 2007-2016
--     Author: Mike Eberhart
--
--I hereby release this code under the terms of the MIT License (for freeware).
--
--Permission is hereby granted, free of charge, to any person obtaining a copy
--of this software and associated documentation files (the "Software"), to deal
--in the Software without restriction, including without limitation the rights
--to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
--copies of the Software, and to permit persons to whom the Software is
--furnished to do so, subject to the following conditions:
--
--The above copyright notice and this permission notice shall be included in
--all copies or substantial portions of the Software.
--
--THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
--IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
--FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
--AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
--LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
--OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
--THE SOFTWARE.
--********************************************************************************
--**********************************************************************************************
-- BEGIN: SET-BASED Running-Subtotals Technique #1
--**********************************************************************************************
CHECKPOINT; --to produce a cold buffer cache
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS; -- Remove all buffers from the buffer pool / clear data cache
DBCC FREEPROCCACHE    WITH NO_INFOMSGS; -- Clears the procedure (plan) cache

SET NOCOUNT ON;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

DECLARE @starttime DATETIME = GetDate();

--Accumulator variables
DECLARE @RunningTotal    MONEY = 0.0;
DECLARE @RunningSubtotal MONEY = 0.0;

--Our break-level-value variable (Data-Type should match that of the column we are comparing)
--Initialize this to some value that will NOT exist in the break-column's actual data.
DECLARE @SubTotalBreakValue1    INT = -1;

--Our work-table, where we can ensure the ORDER with which we later access the data, being
--by default, the order of the PRIMARY KEY declared here in UniqueID column.
DECLARE @ResultsTable TABLE
(
    UniqueID                INT IDENTITY NOT NULL PRIMARY KEY,
    SubtotalBreakColumn1    INT,
    ActualCost              MONEY,
    RunningSubtotal         MONEY,
    RunningTotal            MONEY
);

--Insert all values to be totaled, into our work table in the REQUIRED BREAK-LEVEL(S) ORDER
INSERT INTO @ResultsTable
(
    SubtotalBreakColumn1,
    ActualCost,
    RunningSubtotal,
    RunningTotal
)
SELECT
    ProductID,
    ActualCost,
    0,
    0
FROM
    Production.TransactionHistory
ORDER BY
    ProductID    --Insert into table in our subtotal-breaking order (IMPORTANT!)
;

--**********************************************************************************************
-- ALGORITHM EXPLANATION:
-- We will be updating the row-level accumulators now through an UPDATE statement.
--
-- The subtotal logic relies on the fact that our break-value-detection variable(s) 
-- (@SubTotalBreakValue1,[...]) have their value(s) updated AFTER the current row's running 
-- total and subtotal variable values have had their values calculated and updated.
-- The sequence of variable and column updates here is important, since the variable(s) for
-- break-detection comparisons must have access to "prior row" values essentially, during the
-- CASE statement that determines whether to "reset" a running-subtotal variable at a break.
--
-- Additionally, you will notice the running-total and running-subtotal variables are SET at
-- the same time the table-column value is assigned, taking advantage of the order of operation
-- within the statement, whereby we can update both the variable(s) and the column(s) with any 
-- newly calculated running-total/subtotal(s).
--
-- This logic can easily be extended to support subtotal-breaks that rely on multiple-column 
-- values simply by:
-- 1) declaring additional break-value variables
-- 2) adding additional column(s) to the work-table for those break-levels and/or any
--     additional subtotal-accumulator column(s) you desire
-- 3) altering the INSERT (into work-table) to also ORDER BY the additional break-column(s)
-- 4) altering the CASE statement used in the UPDATE to compare more than one column prior to 
--    resetting the Subtotal accumulator.  E.g.,:
-- CASE
--  WHEN ( @SubTotalBreakValue1 = SubtotalBreakColumn1 AND 
--    @SubTotalBreakValue2 = SubtotalBreakColumn2 AND...) 
--   THEN @RunningSubtotal + ActualCost 
-- ELSE 0
--**********************************************************************************************
UPDATE
    @ResultsTable
SET
    @RunningTotal       = RunningTotal      = @RunningTotal + ActualCost,
    @RunningSubtotal    = RunningSubtotal   =
        CASE 
            WHEN @SubTotalBreakValue1 = SubtotalBreakColumn1 
                THEN @RunningSubtotal + ActualCost
            ELSE ActualCost 
        END,
    @SubTotalBreakValue1= SubtotalBreakColumn1
;

--Output the results...
SELECT 
    SubtotalBreakColumn1 AS ProductID, ActualCost, RunningSubtotal, RunningTotal 
FROM 
    @ResultsTable
ORDER BY
    UniqueID
;

--**********************************************************************************************
--Using the AdventureWorks2012 DB, you'll see a large result set with data like this:
--**********************************************************************************************
--ProductID ActualCost   RunningSubtotal RunningTotal
------------- ---------- --------------- ------------
--1         50.2635      50.2635         50.2635
--1         50.2635      100.527         100.527
--1         50.2635      150.7905        150.7905
--1         50.2635      201.054         201.054
--1         50.2635      251.3175        251.3175
--...
--... skip forward to first product break now...
--... 
--1         50.2635      2211.594        2211.594
--1         50.2635      2261.8575       2261.8575
--2         41.916       41.916          2303.7735
--2         41.916       83.832          2345.6895
--...
--... and take a look at the end of the entire result set now...
--...
--999       388.7928     286518.694      27306402.3709
--999       388.7928     286907.4868     27306791.1637
--999       0.00         286907.4868     27306791.1637
--999       539.99       287447.4768     27307331.1537


PRINT dbo.[udfUtility_PrintElapsedRuntimeDebugInfo]('Custom Running-Subtotal Example', @starttime, DEFAULT);
--SELECT DateDiff(ms, @starttime, GetDate()); --Display elapsed Milliseconds 
--**********************************************************************************************
-- END: SET-BASED Running-Subtotals Technique #1
--
-- Average runtime against source data count of 113K rows in AdventureWorks2012 are decent: 
-- 2100ms (2.1sec) when selecting the results for output
-- 1560ms (1.55sec) when calculating but not selecting results for output
--**********************************************************************************************



--**********************************************************************************************
--**********************************************************************************************
-- BEGIN: alternative SET-BASED Running-Subtotal and Running-Totals Technique using 
-- "OVER" (windowing) functionality introduced in SQL-Server 2012.
--**********************************************************************************************
--**********************************************************************************************
CHECKPOINT; --to produce a cold buffer cache
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS; -- Remove all buffers from the buffer pool / clear data cache
DBCC FREEPROCCACHE    WITH NO_INFOMSGS; -- Clears the procedure (plan) cache

SET NOCOUNT ON;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

--DECLARE @starttime DATETIME = GetDate();
SET  @starttime = GetDate();

SELECT
    ProductID, 
    ActualCost,
    SUM(ActualCost) OVER
        (
        PARTITION BY ProductID
        ORDER BY ProductID
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS RunningSubtotal,
    SUM(ActualCost) OVER
        (
        PARTITION BY @@SERVERNAME
        ORDER BY ProductID
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS RunningTotal
FROM
    Production.TransactionHistory
ORDER BY
    --Use @@SERVERNAME as first column in order by so we can create a "partition" of the ENTIRE set.
    --ORDER BY requires non-constant-values for columns, and this function produces a "constant" (without being flagged as such) 
    @@SERVERNAME, 
    ProductID;


PRINT dbo.[udfUtility_PrintElapsedRuntimeDebugInfo]('OVER / PARTITION BY Running-Subtotal Example', @starttime, DEFAULT);
--SELECT DateDiff(ms, @starttime, GetDate()); --Display elapsed Milliseconds 
--**********************************************************************************************
-- END: "OVER" (WINDOWING FUNCTION) BASED Running-Total/Running-Subtotal Technique, 
-- which produces same result (NOTE: requires SQL-Server 2012+)
--
-- 610ms average : FAST!
-- Being "native" functionality in SQL-Server 2012, it is not surprising this approach produces
-- the SPEEDIEST EXECUTION TIME. 
--
-- But, for a non-native and/or pre-SQL2012 solution, our original approach we laid out (above)
-- is perhaps as performant as possible while being relatively easy to implement. 
--**********************************************************************************************


Continue to read this Software Development and Technology Blog for computer programming, software development, and technology Techniques, How-To's, Fixes, Reviews, and News — focused on Dart Language, SQL Server, Nvidia CUDA, VMware, TypeScript, SVG, other technology tips and how-to's, and my varied political and economic opinions.

No comments: