Feb 19, 2012

Page Splits


Introduction

I thought of writing a series about page splits. Initially I started writing about it and it slowly became a huge article. I was in two minds on whether I should write a series, of blogs or go for a big article. Finally, I have made up my mind for the first option and here is the first part of the five part series.


What is page Split?

In SQL Server data is stored into a page. A data page is 8 kb of size. It means when the rows are inserted they need to be inserted into a page. There is an exception when large blob data is inserted sql server can insert the data separately into another page and keep only the pointer (For more details refer: row overflow). Apart from that, a row should fit into the page and cannot split into multiple pages.

The decision on where the data should be inserted is decided by the clustered key. If the table does not have a clustered key, (which is called a "heap") the data inserted always be inserted at the end. On clustered tables the data is logically stored in clustered key order. (I'll discuss later on why the word "logically" is used here. But for the moment, it is safe to assume that the pages are in clustered key order) When a row need to be inserted or updated but it cannot fit into the balance space within the current page, The data is inserted into a new page and is called page split. There are two types of page splits occur in SQL Server. Firstly, the natural growth of data that periodically requires new pages, irrespective of whether it is a heap of clustered table. Secondly, a data insert or update operation happens within an existing page that the page needs to move some of the existing data in order to accommodate the inserted /updated data. Even though technically the first operation too is page split, as the content of the page dos not split in that type of operation. The second operation also called as mid-page split, which is a concern of DBAs.
There are primarily two reasons for mid-page split to occur:
  • Data inserts happen not in clustered key order
  • Update of variable length columns
Let's analyze each scenario: