The series continues with a focus on optimizing physical data-storage requirements (i.e., space reduction), since it is perhaps the most important overall tuning consideration for large databases.
Tuning Very Large SQL-Server Databases — Part 2In Part 1 of SQL-Server Large Database Performance Optimization Tuning, I discussed the need to Reduce Physical Storage Requirements and use proper data-normalization as the first foundation-step in achieving optimal SQL-Server processing Speed. I am now going to go deeper into explaining the reasons why this is so important and how it can make considerable speed and performance improvements possible in SQL Server relational databases.
Your SQL Database Server has Finite ResourcesI do not care how modern or powerful your database server is, how many processors it has, or how much physical memory (RAM) or hard drive space it has. The fact is, your SQL-Server machine has finite resources and bandwidth / processing-power, and it is utterly foolish to waste that power on pushing around extra data when it is not necessary.
In my years of dealing with large databases for various clients, I repeatedly see management more willing to throw huge sums of money at a "hardware solution" to a performance problem instead of applying some common sense database tuning and optimization techniques. Attempting to fix a performance problem by just throwing ever larger and more powerful computers at it may work in some cases, but in many situations it will just act to postpone the inevitable performance implosion that is lurking in poorly optimized databases. Certainly your database server equipment needs to be up to the task, but the "task" should not include performing completely avoidable disk I/O and extraneous processing.
I don't want to get too deep into how to configure an optimal SQL-Server hardware solution here, but the general things you will need to be aware of when tuning and optimizing for very large SQL databases include:
- First and foremost, you will need to understand the various (interacting) limitations of Microsoft Operating Systems and Microsoft SQL Server versions. Both the Windows Server Operating System and the database product impose limits on the physical resources that can be utilized by your database application. There are "Standard" and "Enterprise" (and other) versions of each product, and they each impose limitations on the number of processors and RAM that will be made available to your database, not to mention the availability or lack of other substantial features of the database or OS software itself.
- Hardware: When I originally wrote this how-to article back in 2007, CPUs were no where near as advanced as they are in 2016, and RAM memory was considerably more expensive, as were disk drives, and SSDs were seen as ridiculously expensive, and things like virtual servers and cloud-based SQL-Server solutions were rather in their infancy. Regardless, some basics remain the same about how to choose appropriate hardware for your SQL-Server requirements.
Processors (CPUs) — SQL Server can really benefit from multiple CPUs and processor cores. At a minimum, you should have a dual-processor server if you really want to improve performance.
Likewise, RAM is your friend, and SQL-Server's friend too. Get as much as you can afford, and be sure to couple the version of SQL-Server and Windows Server with the amount of RAM in your system.
Disk Space — you had better have a RAID array of some sort, or an ultra high-speed SAN or such for your physical storage device (if you are not simply using some hosted cloud solution that takes care of all this for you). And, though you will need a large amount of space, increasing the physical number of hard-drives (whether old-school spindles, or modern SSDs) is how to really start boosting performance. I.e., if you have your database on one or two huge disks, instead of a dozen or more (probably smaller) drives in an array that in aggregate offers the space you need, chances are you'll quickly reach the limitations of the disk subsystem's throughput.
Note: our main focus is to keep physical disk I/O to a minimum, and use RAM wherever possible since physical disk I/O, even on the fastest drive arrays, is many orders of magnitude slower than going to RAM for data.
Network — if you'll be moving large quantities of data to and from the database server machine(s), you best have a modern ultra-high-speed network in place between those machines.
Moving Data around is Costly, and occurs within the Confines of your HardwareI've focused on minimizing the data-storage requirements in your massive SQL-Server database for the sole purpose of increasing performance. The primary way this strategy improves performance is by reducing the amount of data that is moved around, especially to and from (slow) disk drives.
Here are some explanations for how minimizing data-storage requirements increases performance and speed within your SQL Server database solution:
- Just think about it: if you have less data (actual bytes) to move, regardless of what speed your system can move this data (whether in RAM or to and from disk), means that your machine will be able to perform the same operation in less time — and, there is a good chance the elapsed processing time will decrease by a percentage greater than the percentage of storage-space you have saved...
- ...this is especially true if the reduced data-size causes operations that otherwise were unable to be performed in RAM to now fit within the confines of your server's RAM. If disk-swapping has been eliminated, performance improvements will be stellar.
- Every SQL-Server operation, whether you are performing a SELECT, INSERT, UPDATE, or DELETE, must move data between RAM and physical storage. Smaller data equals faster movement and higher row-throughput per time-slice. Period. I am not ignoring caching here either: keep in mind that even data retrieved from any cache is some type of physical storage, even if it is other locations in RAM, and data is still going to move around.
- If an operation is logged, as are any operations that modify data, the amount of data that must be written to the log will also be reduced, and thus speed up your database operation. Optimizing your database design so as to reduce physical storage requirements can therefore also help keep your transaction-log size under control.
- Index-widths (in bytes) should be smaller, thus reducing storage requirements, and even more so allowing much more index information to be cached in RAM at any given time — this is crucially important for the speed of JOIN operations, sorting, and much more.
- Any comparison operations will be faster. And, all sorts of operation are comparison operations, including sorts (ORDER BY) operations. Things like JOIN operators, and ORDER BY clauses, and WHERE conditions — these all require comparisons to be performed. And, the less data to compare, the smaller the amount of resources required to perform the comparison (RAM / Disk). If you've done a great job of reducing physical data-storage requirements, these operations will all run much faster.
- Backup jobs, and even replication, will run much faster, since there is much less data to write out to tape or copy to another storage device. Likewise, data-recovery and transfer is much more efficient.
REAL-WORLD SQL-SERVER TUNING EXAMPLE(S)Now I will quickly discuss a real-world (i.e., production-environment) time-savings and performance comparison I have witnessed before and after optimizing data-storage requirements, so you can get a feel for the magnitude of performance increases that are possible:
Before TuningI was confronted with a SQL-Server database a client had, where they basically "hit a wall" with their internally designed database and stored procedures. Their (relatively new) database was growing at such a phenomenal pace, both in row-counts and physical size, that their database-server hardware was being pushed to its limits, and some reporting processes were unable to finish in less than a couple hours.
There were a few core tables, each in excess of 50 million rows, that were involved in the queries which were so long running. The total database size, with indexes, was pushing past 45 Gigabytes, and growing by nearly 10 Gigabytes a month (and, Indexes were about ⅓ of the total DB size). This thing was a serious time-bomb ready to implode, and it was (of course) meant to be a data-warehouse critical to management-reporting in a Fortune-100 firm.
Tuning PerformedThis was one of the largest databases I had ever seen built by someone with absolutely no clue how to properly normalize data and use the minimum amount of storage space possible. Though the few core tables had relatively "narrow" (byte-width) rows (just a couple hundred characters per row), a large part of each row was used as an index (NON-Clustered), and the bulk of the columns in this index were VarChar fields where the columns were mostly filled with data on each row. There were columns like "DivisionName" and "OfficeName" and the like in this Index, each containing data like "Midwest US" and "Houston, Texas Branch". I couldn't even believe it when I saw it.
So, I began by ripping those Varchars out, and replacing them with a TinyInt foreign-key to a Divisions table I made (there were only like eight total values!), and a SmallInt foreign-key to an Offices table I made (with less than a thousand offices). Needless to say, the database size took a large step downward in size. I continued with this type of work...
After TuningBy the time I was done, the database was only a third of its original size. And, I rewrote the reporting process, that was taking hours to run, to now join on the various Integer columns (vs. all the varchars) and only bring in the "lookup data" (Office Names, etc) when needed at the final output stage.
As soon as I broke through the size-threshold where much more of the information required to perform complex JOIN conditions could fit in the database server's 4 Gigabytes of RAM (which was relatively large RAM back in 2007), performance was now incredible. Procedures that took hours to run before were now running in a minute or less.
This all required very little coding changes, and just some logical application of normalization techniques. The biggest challenge was getting enough processing time to convert the existing production database to the new layout, since it's poor design and massive size made alterations a very slow process — the production server didn't have enough disk-space to allow the conversion to take place at first even!
More such examples to come, as I move forward with SQL Server Performance tuning through proper database design, normalization, and even Transact-SQL coding...
Continue with Part 3 of SQL-Server Database Tuning now.
by Mike Eberhart (originally published: May-2007; updated Sep-2016)
Continue to read this Software Development and Technology Blog for computer programming, software development, and technology Techniques, How-To's, Fixes, Reviews, and News — focused on Dart Language, SQL Server, Nvidia CUDA, VMware, TypeScript, SVG, other technology tips and how-to's, and my varied political and economic opinions.