Monday, September 26, 2016

Tuning Very Large SQL-Server Databases - Part 3 (Index Optimization - Clustered / Non-Clustered)

This is the third in a series of three SQL-Server best practice tuning guides to help you achieve optimal performance, speed, and efficiency from your Very Large SQL-Server Databases.  Although these techniques will apply to Microsoft SQL Server, they will generally also apply to other relational database systems (RDMBS) like MySQL, Oracle, DB2, Postgres and others.

The series now continues with a focus on optimizing physical data-storage requirements (i.e., space reduction) by making smart use of Clustered Indexes when possible.

Tuning Very Large SQL-Server Databases — Part 3

In Part 1 of SQL-Server Large Database Performance Optimization Tuning and Part 2 of SQL-Server Large Database Performance Optimization Tuning, I discussed the need to Reduce Physical Storage Requirements and use proper data-normalization as the first foundation-step in achieving optimal SQL-Server processing Speed, and explored the reasons WHY a reduction in physical storage requirements was so important — whether the physical data-storage is hard drive storage, RAM device storage or SSD (Solid State Disks), RAID arrays, SAN / NAS networked storage... the need for speed implies a need to reduce storage requirements regardless.

Note: the database performance tuning concepts discussed in this series thus far, apply nearly universally to all mainstream RDBMS (Relational Database Management Systems) products — not just Microsoft SQL-Server — but also to the likes of MySQL, PostgreSQL, Oracle, IBM DB/2 (DB2), and others. And some of the concepts discussed herein (clustered indexes and such) will also apply to many database solutions too, since this concept is not unique to SQL-Server.

Now I will discuss some specific database-tuning strategies with respect to indexes in SQL Server relational databases.

Clustered vs. non-Clustered Indexes in SQL-Server

One of the most important decisions to make when defining an indexing structure and strategy throughout your SQL-Server databases is how to best make use of Clustered and/or non-Clustered indexes on individual tables.

Knowing how to best use, and when to use, a clustered index can make a serious difference in performance tuning large SQL-Server databases. Basically, if a key-column is ALWAYS increasing (in sort order) as data is added, and that column is relatively "narrow" (in bytes), make it CLUSTERED and you'll save a LOT space (since, the data for the Clustered version of an index is already PART of the same file). This is why applying clustered indexing to pre-ordered data is a great fit, and why applying clustered indexes to monotonically increasing values (like those in IDENTITY columns) makes a lot of sense — the data is physically ordered by the values on the table, and adding values to the end of a table can be done quite efficiently in most cases.

Certainly you can still use clustered indexed for data that is not pre-sorted or otherwise ordered already, but depending on the situation, this can lead to some significant inefficiencies as SQL-Server will need to physically re-order the table's data as values are added in order to arrange the data in the table by the clustered-index column(s) data values — which can make disks thrash like crazy. Keep this in mind, especially in high-volume and large database scenarios.

Here are some further points to consider regarding Clustered vs. non-Clustered Indexes and performance in SQL-Server:

  • As mentioned before, if there is less data (actual bytes) to move, regardless of what speed your system can move this data (whether in RAM or to and from disk), SQL-Server will still perform the same operation in less time — and, there is a good chance the elapsed processing time will decrease by a percentage greater than the percentage of storage-space you have saved...
  • ... so, choose an indexing strategy that prevents unnecessary data-movement operations, and/or minimizes data-movement. This will involve understanding what percentage of time your application spends performing INSERTS as opposed to SELECTS, UPDATES, and DELETES. Why does this matter? Well...
  • beware that UPDATES to Clustered-Index Columns can be painfully slow operations... perhaps the reasons are obvious; if not, let me know and perhaps I can expand upon this article in the future. Basically, just think about how, if a table is physically ordered by the column(s) that you are altering, the entire table is subject to physical re-ordering when those column values(s) change.


Now I will quickly discuss a real-world (i.e., production-environment) time-savings and performance comparison I have witnessed before and after optimizing database indexing strategies, so you can get a feel for the magnitude of performance increases that are possible:

Before Tuning

I encountered a rather substantial SQL-Server database at a client site that had an Order-Lines table with millions of rows of data residing in it. The client had a custom application that performed inserts, updates, and deletions on this OrderLines table on a regular basis, and with a rather high volume of transactions.

And, though the custom application's GUI was responding within the threshold of "acceptable response time" at a few seconds (up to as much as 10-15 seconds under the heaviest loads throughout the day), the delay was bothersome (and costly) as hourly workers had to basically slow down and wait for the application to "catch up to them".

Tuning Performed

A quick examination of the database stored procedures that were used to perform Order-Line additions, updates, and deletes, revealed which database tables were involved in the order-processing maintenance business process (and related application GUI forms) and what tables and columns and indexes these procedures interacted with. Further digging revealed that within the primary Order-Lines table, there was an already-existing clustered index on that table which had been applied to a group of columns that, together, uniquely identified each OrderLine row.

I also noticed that the table in question had another OrderLineID column (a unique value, IDENTITY value, monotonically increasing auto-increment value) on it already. Perhaps it was added at a later point in the software development and database design process — who knows — but, this jumped out as an easy, immediate, on-the-spot fix opportunity that would deliver serious performance improvements.

I simply altered the database table index definitions to make the table use the IDENTITY value as the Clustered index, and the multi-column-identifying-value column-group a non-clustered index (i.e., the new post-tuning index setup was using a Clustered Index on the Identity values vs. a cluster on a combination of other columns' values — or, that is, what used to be the clustered index was now a multi-column unclustered index).

After Tuning

The results were dramatic and GUI operation responsiveness became nearly instant (as compared to multi-second response times). I also had to verify (as you always should do) that my SQL Server Database Index-tuning changes did not impact the performance of OTHER application functions — like reporting operations, ad-hoc queries, web-application access to this same data, and so on — all of which needed to be verified as having not been negatively impacted.

This testing proved out OK, as no regressions were encountered; though, thanks to the opportunity to do this regression testing, a direct outcome was that I was even able to make a few other index-tuning changes that delivered additional speed gains and performance improvements to the various portions of the database applications. This is just one simple case of looking for some "low hanging fruit" to pick when performance tuning with indexing strategy considerations in focus.

I hope you have found this a useful series of Microsoft SQL-Server database tuning and optimization information. Feel free to leave a comment to contact me if your business has a particular SQL performance-tuning challenge, or if you wish to leave any suggestions, point out errata, etc. Enjoy!

by Mike Eberhart (originally published: Jun-2007; updated Sep-2016)

Continue to read this Software Development and Technology Blog for computer programming, software development, and technology Techniques, How-To's, Fixes, Reviews, and News — focused on Dart Language, SQL Server, Nvidia CUDA, VMware, TypeScript, SVG, other technology tips and how-to's, and my varied political and economic opinions.

No comments: