SQL-Server Solutions to Moving-Average Query Requirements
Moving-Averages Overview
This article references future blog posts which present the source-code for several additional SQL-Server queries, functions, and stored procedures — these used to be hosted on my external Free SQL code library, which was migrated here in 2016/17. Among this set of SQL queries are a few pieces of code that address the topic of moving averages and approaches to calculating moving average values using only SQL.Moving averages — and/or rolling averages, moving means, sliding averages, running averages, temporal averages, etc. — are a common data requirement in these problem domains:
- financial instrument pricing, and especially the analysis of financial time-series values and information: e.g., stock prices, commodity prices, interest-rates, etc. where it is very common to look at indicators like a 30-day or 90-day moving-average-price
- operational indicators at a business: production and sales volume trends, error/defect rates trends, and so forth
- economic indicators like unemployment and jobless claims, new home construction rates
- environmental indicators like seasonal temperature trends, rainfall, snowfall, and so forth
- and countless other applications where "smoothing" or "trending" or "momentum" is to be included in some analysis or reporting or charting.
SQL-only Solutions Compared
See my Set-based Moving-Averages SQL Example 1 page, as I will reference the techniques on that page in my discussion here.That page and example includes execution-speed / performance comparisons between techniques also (run against the AdventureWorks sample database). That page demonstrates three different approaches to solving (using only SQL-Server T-SQL) the moving average challenge using:
- my own custom approach to calculating a rolling average that takes advantage of some interesting features of Transact-SQL that, although may not work "forever" with every future release of SQL-Server, have worked fine from SQL-Server 2005 through SQL-Server 2008 and 2008r2 and SQL-Server 2012.
The technique relies on CASE statements and a (so far, so good) predictable order-of-processing the values-assignments within the UPDATE statement where I use local-variables to maintain a "queue" of values as well as break-level-reset tracking information. This code is quite speedy and flexible and easily adapted to related problem domains. - an example of solving the same problem using a CTE (Common Table Expression) coupled with APPLY (CROSS APPLY / OUTER APPLY that are Microsoft SQL-Server specific extensions to ANSI SQL). This approach is SLOW!
- A newer ANSI-SQL compatible solution that is available to you if you have SQL-Server 2012 or newer that implements the windowing functions using OVER and PARTITION BY. This method is the simplest solution, is the fastest running, but requires SQL2012+ and also lacks the ability to easily change the "window size" (i.e., number of data-points the moving-average is based upon) without dynamic SQL.
Performance Results / Limitations Examined
I have been very pleased with the performance of my custom solution to this problem and have used this approach (#1 above) for years now. Sure, I could simplify my code using SQL-Server 2012's new OVER / PARTITION BY abilities while also gaining a performance boost. But, until SQL-Server makes the PARTITION window-size (in the "ROWS BETWEEN integer_value_here PRECEDING AND CURRENT ROW" portion of the command) able to utilize a variable instead of a hard-coded integer value, this newest SQL method has limited value in my opinion.My custom approach comes amazingly close to the native speed of the newer SQL-2012 windowing features, while also having the advantage of being able to handle a variable-window-size (which is perfect for requirements where you want to pass a moving-average-window-size value as a parameter to the procedure).
Notice that I hardly give the APPLY / CTE "solution" much thought. I presented it as an "option" for a set-based solution to this problem, but its performance is awful and the APPLY is Microsoft-specific as well. So, I might as well use my own custom approach that may rely on the specifics of MS T-SQL processing-order while gaining the advantage of speed and a rather simple-to-understand approach (as compared to the CTE/APPLY which I find not-readable or easily understandable by comparison).
Conclusion
There are certainly multiple ways to solve the rolling-average / moving-average problem using only SQL, and the choice is yours as to which approach best fits with your needs and situation. I'd say that if you have a static-window-size and are using SQL-Server 2012 or newer, then the obvious choice is the newest OVER / PARTITION approach. But, I personally have not run into rolling-average requirements that did not include a variable "window" (e.g., allowing a user to choose how many data-points, periods, etc to include in the average, as specified in a run-time query parameter).Check out the other SQL queries, functions, and stored procedures I have in provided here on my blog for other related problems and solutions like running subtotals and much more. I have made it all available as open-source-software (OSS) under a very permissive MIT License. If you want to learn how to compute moving averages using only SQL, and solve other interesting problems within Microsoft Transact-SQL, this resource should provide some very useful information and ideas. Enjoy.
Continue to read this Software Development and Technology Blog for computer programming articles (including useful free / OSS source-code and algorithms), software development insights, and technology Techniques, How-To's, Fixes, Reviews, and News — focused on Dart Language, SQL Server, Delphi, Nvidia CUDA, VMware, TypeScript, SVG, other technology tips and how-to's, plus my varied political and economic opinions.
No comments:
Post a Comment