Learning from Index Fragmentation Woes


The 2 sides of Index Maintenance

Sometimes good old re-index does make a difference…

Hello again, I’m reporting again another interesting story related to my everyday job maintaining and troubleshooting SQL Server. I decided to write a blog post about this one because, as I was troubleshooting, I took print-screens and by the time I had some slight idea of what is going on I compiled a good half a dozen pictures or more and although they have all been blurred (because this is not my data… sorry guys…) there is still enough info for the sake of sharing the experience and the learning.

I thought If I can sprinkle those images with some text around I could end up with a pretty interesting take on why on my opinion, although Index Maintenance plans are not as critical as they used to be, are a bit under-appreciated and should never be left behind.

The irony of all this is that lately I have been more and more convinced that there was not so much big deal about not maintaining your indexes as long as you keep your statistics up to date. There were very good reasons for thinking that way. On one hand I was reading posts from recognized leaders in the SQL Server community saying just that, and also the realities of my everyday job was taking me also in that direction.

I was a convinced follower of the DBA SQL rule that calls for “one index maintenance job a week” but after seeing some databases doing pretty well, even without that schedule set as part of their regular maintenance activities I started having more and more doubts about it. I mean… if SQL is informed and knows all about your data distribution via up-to-date histograms (statistics) and by using that it creates the right execution plans things should be good… right?

Yes and no…

As everything in SQL, the answer is yes and no… the more I study this product and the more time I spend supporting it the more I have to answer everything “it depends”. Hold on tight… I’m coming back to the story and explaining why I say this in a second.

So back to my story: today I was looking a seemingly harmless UPDATE statement (without a WHERE clause) was performing super poorly. If you asked me before executing, I would have said that there is no way in the world this query would take more than a few seconds (even in my not-so-powerful 3.7Ghz Intel 4 Core, 16 GB RAM desktop). The update goes like this…

EXEC ('UPDATE  dbo.[some_table] SET [some_column] = 1;');


This UPDATE was part of a larger script. As I was tracing the script in Profiler I come to this.

fig1: slow, slow UPDATE

827 seconds for an update is about 14 minutes (believe me… I have use the calculator and I know to divide by 60)!

Fig2: the UPDATE in SQL Server Management Studio

So, armed with this information I set on a quest to find the reasons for the slowness

Red Herrings

I will save you the pain of reading each and every little thing that I thought would take me to my GOTCHA moment that did nothing but create more frustration. Things like statistics not being up to date or stuff like that.

All I knew at the moment was that my table had 2,851,229 rows and that if I created a sample table and populated it with that same amount of rows the update in this test table will happen in seconds. At that time, I was really grasping at straws so I started looking other options.

Granted, like I said, the table was “long” (almost 3 million rows) but there was nothing unusual with how “wide” it was (how many columns or how many bytes per row) so that was not likely a factor either.

Fig3: not that many bytes per row

Not much info revealed so far… time to look at the execution plan.

Before I talk about the execution plan there is one thing I want to stress. UPDATE statements without WHERE clauses on large tables are obviously large writers, but we should always remember that also they do a lot of reading. Granted, UPDATES statements write new data, but in order to do so, they have to first locate that data, and to locate the data we need to change we need to read it first. That’s why all UPDATE statements with no WHERE clause start with some form of Index (clustered or non-clustered) scan.

What’s the plan?

First, here is the graphical representation of the plan

Fig4: the plan diagram

Like I mentioned before, the first operator is an Index Scan (as expected) and then comes the TOP. According to Technet, the TOP operator in update plans “is used to enforce row count limits”. In this case, no limits have been enforced because the TOP clause was not used in the UPDATE query.

Then we have the Compute Scalar, who is in charge of conversion (CONVERT_IMPLICIT) the value 1 to a BIT, because the column I’m updating is indeed a BIT column and I’m passing to the update the literal value 1 (one). Finally you have to go and change those values and that’s the bulk of the task (99.8% cost) and the one in charge of doing that is the Clustered Index Update operator.

Here I thought I saw see the first piece of evidence (other than the slow response time) that something was going bad. The estimated Data size was big (62. something Gigs) and the Est IO cost was also very big.

Fig5: reading lots of data, with super costly IO

So, if my theory was right and indeed reading a bit more than 62GB of data was creating an excessive IO cost. What caused that excessive cost?

Index maintenance is not only statistics

Index Rebuilds and Re-orgs are sometimes necessary and I believe this was one of those cases. Even with up-to-date statistics, and even if you have enough space in your B-Tree indexes to allocate more data without causing page splits, you still have the issue of fragmentation. With high fragmentation, doing a table scan can cause take more time, even with the same amount of data as another table (having low or no fragmentation).

First… I confirmed that fragmentation was indeed ailing my indexes on that table.

Fig6: Heavy fragmentation

Here is how my story continued. When I suspected that fragmentation could have been the culprit I did what most human beings (most DBA’s at least) would do. Try a ALTER INDEX.. REBUILD and see what I get back…

By the way… I did try only UPDATE STATISTICS (just to rule out statistics affecting performance in any shape or form) but only refreshing the statistics did nothing.

Fig7: Bye-by fragmentation gone…

The response time was only a tad slower than 1/5th of what the original UPDATE was, a bit more than 3 minutes! (I know in the picture is hard to read, but trust me…)

Fig8: Better execution response time

I tried this a few times and each and every time, the UPDATE before the REBUILD was 5 times slower, the REBUILD did have a major impact on this particular UPDATE statement (something I would have never anticipated when I started investigating).

Proving my theory by comparing plans did not yield what I anticipated

OK, now the UPDATE runs 5 times faster, but being a curious human being (as I am) I needed to see that the I/O costs were the problem so I looked at the plans both before and after, hoping to see a big difference in I/O costs when comparing the before and after. No luck here.

Fig9: Almost no difference!

Since the costs in the plans were no different, I then continued by comparing the IO using SET STATISTICS IO ON and looking at the amount of page reads.

Table '[My_TABLE]'. Scan count 1, logical reads 24,112,733, physical reads 38, 
read-ahead reads 1132898, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

And after performance improvements

Table '[My_TABLE]'. Scan count 1, logical reads 17,161,032, physical reads 18, 
read-ahead reads 1135669, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

30% decrease (from 24 million reads to 17 million reads). Not that bad… and certainly an improvement caused by the index re-index.

In Conclusion…

I found no crystal-clear evidence in the plans or the STATISTICS TIME that re-indexing improved my performance although there is some data to suggest that (the decrease in logical reads). Having said that, I KNOW the re-index improved things, even without any additional proof, because in this case I could see how much time it took before and after, which is what a customer would evidence.

The main takeaway for me (and the action item coming out of this exercise) is to make sure I don’t underestimate the damage a fragmented index can cause to large operations. I should make sure that, from time to time, index maintenance jobs doing index re-orgs and re-builds run on my servers and attack those tables and indexes with higher degree of fragmentation. For that purpose, I would never use the out-of-the-box SSIS Index Maintenance Task, having said that, there are other very good automation techniques for index maintenance (most notable ola halengreen SQL Server Maintenance Solution) that we should all consider to use to avoid these problems.

Advertisements

About Martin Surasky

Full-time time daddy, part time DBA, and guitar player (only if I'm not administering a databases or with my daughter) and part time Coffee Crisp addict (thanks Nestle)...
This entry was posted in Performance and Troubleshooting and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s