Apr 14, 2012

Page Splits Part 4: Cost and Ways of Controlling

This is the final part of the series. I was not initially planned to write this series in a week interval, but due to various reasons it got dragged.

Cost of Page Splits

Many writers have written many times about this, so I like to summarize them

  • It needs to take the last page and delete half of the content
  • In order to do that this and maintain the consistency, the activity is logged.   
    • It is this operation when captured reading the log or by using extended events provides information about page splits.
    • Microsoft has done a performance improvement in page splits, by making them not transactional. It means that the page split operation could not be rolled back. Since the page is at the verge of split, we may have to perform a page split at a future time anyway. Microsoft keeps the splitted pages for a future need. But to make sure the backups and high availability features work well, it will be logged. Just enclose my code with BEGIN TRAN and ROLLBACK TRAN and check the space used by the table. I also checked with LOP_UNDO_DELETE_SPLIT operation in extended events as well as transaction log, and it does not fire when a transaction is rolled back.
  • If the data file already full (mean all pages are allocated; it does not necessarily mean that all pages are fully utilized) , this insert/update happens on a page which needs a page split and hat needs a new page.  Since there is no extra page available, it may lead to file growth. (Unless instance file initialization is set and SQL Server start-up account has right to perform volume maintenance tasks, this will be a time consuming operation.)
  • Your insert and update statements could become costly. This may lead to slowness in the OLTP operation.
  • A simple update could take longer time than expected. I remember reading an article about ripple effect of page splits. I believe it is from Kimberly Tripp (Sorry I am unable to find any links). Your update statement could cause the a page split and the page you intended to store the new/updated row is already (almost) half full. If the new cannot be inserted into the balance half, it may lead to another split..
  • Since the data is divided, each page will contain less rows and more empty space.  This is called internal fragmentation.  This leads to more pages to be read when scanning or seeking on a range. It also makes your database and backups bigger. Some high availability options like transactional replication, database mirroring and log shipping too will get higher load.   
  • Since the new page logically fits in the middle of the content, but physically located in the next available free page, they are not aligned together. (Unless you delete at least one data page in the appropriate page which may give you some space to place the new page.  This is an extremely rare scenario.) This is called external fragmentation.  It leads to unnecessary rotation of disk which means more time to read the data.

Controlling Page Splits

Page splits cannot be stopped completely but there are two important aspects on controlling page splits and its effect.
  • First of all periodically re-indexing the fragmented tables/indexes will remove fragmentation. It should be part of the job of the DBA
  • Another preventive action is to control shrink database or shrink file operation.  While this operation can reduce the size of the database, it increases fragmentation. (I have written about it before. Visit this link: http://preethiviraj.blogspot.com/2011/11/will-performance-improve-if-database.html In case you are in absolute need of shrinking the database, you may have to rebuild the clustered index again)

The other side of controlling page splits is part of the design of the database.
A couple of things could be done here:
  • By introducing the identity column, a numeric column getting sequence value or Sequential GUID as the clustered index. Since this will increase sequentially, it will prevent the inserts from causing page splits. Generally, user defined clustered keys have a tendency of not incrementing naturally, unless it is similar to serial number.
  • Minimize the use of variable length columns.  Fixed length column pad additional space so they could be used during updates. I have seen a similar approach in no-sql (or non-relational to correctly say) implementations where programs manually pad extra space for future use.
  • I having a proper fill factor:  Generally keeping the fill factor of the index near 80% -90% is a good idea, but the number should be verified in each environment.
Final thoughts
I need to warn about one thing about using some of the methods/scripts used in this series. Some of the scripts/|Methods are are resource intensive and may slow down the instance.For example, if you try to track operations in transaction logs to identify page splits, you may make the application to slow down. So it should be used as the last resource. Also be aware of using some of the undocumented feature on a longer run.  Microsoft may change them at any time.  Additionally, it could lead to losing the support of Microsoft on certain issues/problems too. Please check with Microsoft about what you are trying to use and the impact.

With this note, this series comes to an end. Hope it has provided a very good background to all my readers.

References and further reading:

I went through a lot of articles and my own tests before producing this series. I believe I have mentioned all the possible references at the appropriate places, But there could be certain citation I could have missed. It is because, at the time of reading I do not think about writing about any point mentioned there. It is later, I get the idea of writing about something, I have the memory about the content, but it is in my words. I may use my words to search and lose the actual article. I have not mentioned various parts of books online here, but it is one of my sources always.

Another good article but written some time ago.  I was under the impression that it was written by Brad McGehee (from Red-gate.com when he was handling sql-server-performance.com. I may be wrong. The current link simply says it was written by Admin.) The title says the purpose.

One of the greatest articles from a SQL Server Guru, Paul Randal, which explains about the cost of Page splits in terms Log operations. This is the article which provoked my curiosity which led to this series. I will call this a must read, if you are really interested in in depth knowledge.

Talks about page splits and transactions. nice article to read.

No comments:

Post a Comment