Microsoft SQL-Server Query / Procedure: Set-Based Moving Averages (without Cursors, self-joins, or sub-selects) — Example 2, with break-level average-resetting abilities
This blog builds on my prior entry where I provided the source code for a Microsoft SQL-Server Transact-SQL (T-SQL) query example (easily adapted to a stored procedure) that demonstrates an innovative way of efficiently calculating moving-averages, with variable window-widths / queue-depths, without using cursors or self-joins or any of the usual techniques.NOTE: for a more in-depth discussion, read my prior blog which compares this approach to APPLY and OVER approaches as well. This code introduces the new feature of being able to reset the moving-average accumulator(s) value at your chosen break level(s).
This is purely a set-based method, and is very efficient — the only solution that I have found to be more efficient is the native implementation of windowing (OVER clause) functionality in Microsoft SQL-Server 2012, but that has limitations since the native approach does not support using variables to define the number of data-points (i.e., the "window width") to include in the moving-average value like my custom solution is capable of.
This example used the Microsoft AdventureWorks sample database tables and values from SQL-Server 2012 release.
SQL-Server Procedure / Query Source Code
--********************************************************************************
--This source code is Copyright (c) 2007-2017
-- Author: Mike Eberhart
--
--I hereby release this code under the terms of the MIT License (for freeware).
--
--Permission is hereby granted, free of charge, to any person obtaining a copy
--of this software and associated documentation files (the "Software"), to deal
--in the Software without restriction, including without limitation the rights
--to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
--copies of the Software, and to permit persons to whom the Software is
--furnished to do so, subject to the following conditions:
--
--The above copyright notice and this permission notice shall be included in
--all copies or substantial portions of the Software.
--
--THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
--IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
--FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
--AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
--LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
--OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
--THE SOFTWARE.
--********************************************************************************
--**********************************************************************************************
-- BEGIN: Mike's SET-BASED Moving-Average Technique #2 (includes break-level resets of averages)
-- Demonstrate how, without any self-JOIN operation, without any CURSOR, without subselects,
-- we can still calculate a moving average, including the ability to reset that moving average
-- at particular break-values, and easily alter the number of values included in average.
--
-- Queue-Depth (i.e., "moving average values-count") size changes have nearly no impact on
-- execution speed, meaning there is essentially no performance penalty for larger moving-avg
-- "windows" than small ones -- so, a 30-day moving average calculates at nearly same speed
-- as a 120-day moving average due to efficient algorithm.
--
-- Excellent for financial-type applications, like moving-average stock-price calculations.
-- This example even "resets" the moving-average at a break-value, and can easily be extended
-- to do so at multi-column breaks, etc (see SET-BASED RUNNING TOTALS EXAMPLE 2 for technique).
--**********************************************************************************************
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS -- Clears the data cache
DBCC FREEPROCCACHE WITH NO_INFOMSGS -- Clears the procedure cache
GO
SET NOCOUNT ON
DECLARE @starttime DATETIME
SELECT @starttime = GetDate()
--**********************************************************************************************
-- BEGIN: SET-BASED Moving-Average Technique
-- Demonstrate how, without any self-JOIN operation, without any CURSOR, without subselects,
-- we can still calculate a moving average, including the ability to reset that moving average
-- at particular break-values.
-- Excellent for financial-type applications, like moving-average stock-price calculations.
-- This example even "resets" the moving-average at a break-value, and can easily be extended
-- to do so at multi-column breaks, etc (see SET-BASED RUNNING TOTALS EXAMPLE 2 for technique).
--**********************************************************************************************
--Our work-table, where we can ensure the ORDER with which we later access the data, being
--by default, the order of the PRIMARY KEY declared here in UniqueID column.
DECLARE @Results TABLE
(
UniqueID INT IDENTITY NOT NULL PRIMARY KEY,
AverageResetBreakColumn1 INT,
OrderID INT,
ValueBeingAveraged MONEY,
TotalSaleMovingAvg MONEY
)
--Insert all values to be totaled, into our work table in the REQUIRED ORDER by which
--we will be calculating moving-average for. In this example, we will look at moving
--average of Order SubTotals by OrderDate within each Teritory.
INSERT INTO @Results(
AverageResetBreakColumn1,
OrderID,
ValueBeingAveraged)
SELECT
Detail.ProductID,
Detail.SalesOrderID,
ISNULL(Detail.LineTotal, 0) --Handle NULL values
FROM
Sales.SalesOrderDetail AS Detail
ORDER BY
Detail.ProductID,
Detail.SalesOrderID
--Whether we call it "moving window width", or "Queue Depth", or whatever, this indicates how
--many elements are to be included in our moving average.
--E.g., a common moving average in finance situations could be a 30 day moving average, you
--would set "depth" to 30. For this example, keep queue small for easy validation of calcs.
DECLARE @QueueDepth INT
SET @QueueDepth = 2
--Space we'll use to store each value in our queue.
--In this example, allow for up to 9 leading digits, the decimal, and 4 trailing digits each.
DECLARE @SingleValueCharLength INT
SET @SingleValueCharLength = 14
--Variable to accumulate our delimited string of values-per-break-level used to calc moving avg.
--This is, essentially, our values-queue. Initialize it, so always predictable fixed length.
--New values (i.e., current-row value) are prepended to this String, with oldest (in queue)
--value appearing in the rightmost position of string... i.e., add new value to front of string
--and old values fall off the "end" (right side).
--
--NOTE: SET SIZE of this var to @QueueDepth * @SingleValueCharLength, OR leave as 8000 or MAX,
--keeping in mind that MAX will add slight performance penalty.
DECLARE @MovingSubtotalValuesString VARCHAR(8000)
SET @MovingSubtotalValuesString = REPLICATE('0', @SingleValueCharLength * @QueueDepth)
--Our break-level-value variables (Data-Types match those of the columns we are comparing to)
--Initialize these to some values that will NOT exist in each break-column's actual data.
DECLARE @AverageResetBreakVal INT
SET @AverageResetBreakVal = -999
--We will need to track the moving (or "windowed") subtotal during the update process
DECLARE @MovingSubtotal MONEY
SET @MovingSubtotal = 0
DECLARE @RowNumberThisGroup INT
SET @RowNumberThisGroup = 1
SELECT DateDiff(ms, @starttime, GetDate()) --Display elapsed Milliseconds
--**********************************************************************************************
-- ALGORITHM EXPLANATION:
-- See SET-BASED RUNNING SUBTOTALS and CONCATENTATION examples for background info, and
-- the remainder of specific moving-average logic is described (intracode comments) below.
--**********************************************************************************************
UPDATE
@Results
SET
--Keep track of what row# within a break-grouping we are on
@RowNumberThisGroup =
CASE
WHEN @AverageResetBreakVal = AverageResetBreakColumn1
THEN @RowNumberThisGroup + 1
ELSE 1
END,
--If at break, reset moving-subtotal (first value in group is current value); otherwise we
--add the most recent value (current row value) to be included in the subtotal, and then
--subtract the last value falling outside of queue-depth -- this is the secret to getting
--the "moving subtotal window" to work, and the entire reason we need the values-queue!
@MovingSubtotal =
CASE
WHEN @AverageResetBreakVal = AverageResetBreakColumn1
THEN @MovingSubtotal + ValueBeingAveraged --ADD NEW VALUE
- CONVERT(MONEY, RIGHT(@MovingSubtotalValuesString, @SingleValueCharLength)) --POP A VALUE OFF THE QUEUE TO ADJUST SUBTOTAL DOWN BY (AS IT MOVES OUTSIDE OUR "WINDOW" OR QUEUE-DEPTH)
ELSE ValueBeingAveraged
END,
--If at break, reset moving-values-list-string to contain current row value, with the rest
--of our "queue" (this string) holding just zero/empty values.
--Otherwise, we will be adding new value to front of the queue-string and dropping the
--last value from the end of the queue.
@MovingSubtotalValuesString =
CASE
WHEN @AverageResetBreakVal = AverageResetBreakColumn1
THEN
LEFT(CONVERT(CHAR(14), ValueBeingAveraged, 2), @SingleValueCharLength) + --ADD NEW VAL TO LEFT OF QUEUE-STRING
LEFT(@MovingSubtotalValuesString, @SingleValueCharLength * (@QueueDepth -1)) --DROP THE LAST RIGHTMOST VALUE
ELSE
CONVERT(CHAR(14), ValueBeingAveraged, 2) + REPLICATE('0', @SingleValueCharLength * (@QueueDepth -1)) --RESET AT BREAK!
END ,
--If at break, reset moving-avg (first value in group is current value); otherwise if
--less than our queue-depth into a group, moving average is our moving-subtotal divided by
--how many rows into group we are; otherwise, simply divide moving subtotal by queue-depth
TotalSaleMovingAvg =
CASE
WHEN (@AverageResetBreakVal = AverageResetBreakColumn1)
AND (@RowNumberThisGroup <= @QueueDepth)
THEN @MovingSubtotal / @RowNumberThisGroup
WHEN (@AverageResetBreakVal = AverageResetBreakColumn1)
THEN @MovingSubtotal / @QueueDepth
ELSE ValueBeingAveraged
END,
--And finally, keep track of whether we hit a new break-value.
@AverageResetBreakVal= AverageResetBreakColumn1
--**********************************************************************************************
--Output the results, showing all rows to demonstrate the accumulation...
--**********************************************************************************************
SELECT * FROM @results
--**********************************************************************************************
-- END: SET-BASED Moving-Average Technique
--**********************************************************************************************
SELECT DateDiff(ms, @starttime, GetDate()) --Display elapsed Milliseconds
Continue to read this Software Development and Technology Blog for computer programming, software development, and technology Techniques, How-To's, Fixes, Reviews, and News — focused on Dart Language, SQL Server, Nvidia CUDA, VMware, TypeScript, SVG, other technology tips and how-to's, and my varied political and economic opinions.
No comments:
Post a Comment