Note: I thought of writing about page splits. Initially I started writing a blog and it slowly became a huge article. I was having two minds on whether I should write a series, of blogs or go for a big article. Finally, I have made up my mind for the first option and here is the first part of the article. My plan is to write different aspects of page splits in coming days…
In SQL Server data is stored into a page. A data page is 8 kb of size. It means when the rows are inserted they need to be inserted into a page. There is an exception when large blob data is inserted sql server can insert the data separately into another page and keep only the pointer (For more details refer: row overflow). Apart from that, a row should fit into the page and cannot split into multiple pages.
The decision on where the data should be inserted is decided by the clustered key. The data is logically stored in clustered key order. (I’ll discuss later on why the word “logically” is used here. But it is safe to assume the order of the pages is clustered key order) When a row need to be inserted or updated but it cannot fit into the empty space, the content of the row splits into two pages (without breaking the row) and it is called page split.
There are primarily two reasons for it to occur:
· Data inserts not in clustered key order
· Update of variable length columns
Let’s analyse each scenario:
Now let us assume we have a table which has a row size of 1300 bytes. It means a fully occupied page will have 6 rows. Let us assume that we have a table where a particular page has 5 rows having the clustered key values 2,4,6,8, 10 and 12. Now a new row is inserted with the clustered key of 5. It should be inserted into the same page as the value 5 falls between 4 and 6. But the page does not have space and it needs a new page. However, if the only the new row is copied into the new page, the clustered key order cannot be maintained.
Now SQL server will perform the operation of page split. It will create a new page and copy half of the rows into the new page. Now both pages will be half empty, so that the new page will be inserted into the page it needs to be inserted. In this case the first page will have rows 2,4,5 and 6 and the next page will have rows 8, 10 and 12.
The following code illustrates the point:
CREATE TABLE dbo.Note
(
NoteID int NOT NULL CONSTRAINT PK_Note PRIMARY KEY CLUSTERED (NoteID),
NoteText varchar(2000) NOT NULL
)
GO
INSERT INTO dbo.Note VALUES( 2,'Some notes here.')
INSERT INTO dbo.Note VALUES( 4,'Some notes here.')
INSERT INTO dbo.Note VALUES( 6,'Some notes here.')
INSERT INTO dbo.Note VALUES( 8,'Some notes here.')
INSERT INTO dbo.Note VALUES(10,'Some notes here.')
INSERT INTO dbo.Note VALUES(12,'Some notes here.')
As the data is ordered on clustered key (i.e. NoteID) this will not create page split. However, when the data for the next day is inserted it will cause page split.
INSERT INTO dbo.Note VALUES( 5,'New notes here.')
The reason for page split is it cannot hold the all the rows in the page. When a page split happens the row will be divided into two and the second half of the data will be moved into the new page.
There is another reason for page splits. Assume you have a variable length column in the table. Assume it is defined as varchar(2000). Since it is a variable length column, it will not occupy all 2000 bytes. Lets take this example:
CREATE TABLE dbo.Note
(
NoteID int NOT NULL CONSTRAINT PK_Note PRIMARY KEY CLUSTERED (NoteID),
NoteText varchar(2000) NOT NULL
)
GO
INSERT INTO dbo.Note VALUES ( 1,REPLICATE('a', 900))
INSERT INTO dbo.Note VALUES ( 2,REPLICATE('a', 900))
INSERT INTO dbo.Note VALUES ( 3,REPLICATE('a', 900))
INSERT INTO dbo.Note VALUES ( 4,REPLICATE('a', 900))
INSERT INTO dbo.Note VALUES ( 5,REPLICATE('a', 900))
INSERT INTO dbo.Note VALUES ( 6,REPLICATE('a', 900))
INSERT INTO dbo.Note VALUES ( 7,REPLICATE('a', 900))
INSERT INTO dbo.Note VALUES ( 8,REPLICATE('a', 900))
Even though the NoteText is specified to have 2000 bytes, they hold only 900 bytes. Now when a row is updated with a higher length string, it may need space more than what is available.
UPDATE dbo.Note SET NoteText = REPLICATE('c', 2000) WHERE NoteID = 3
Since there is no space to accommodate the additional 1100 bytes, it needs a page split. Again the page is divided where the original page will have only four rows ( NoteID <=4) and rest will be in new page.
There are couple of differences between a new page getting added and page split.
- Data growth in the order of clustered key leads to new pages getting added. But it ALWAYS happen at the end of the page. Page split refers to a page getting inserted in the middle of the list
- Adding new page starts with the blank page and only updates a pointer in the previous page. When page split occurs half of the rows are getting copied into the new page.
There are there any more reasons for page splits? Lets look it in another blog…