Introduction
I thought of writing a series about page splits. Initially I started writing about it and it slowly became a huge article. I was in two minds on whether I should write a series, of blogs or go for a big article. Finally, I have made up my mind for the first option and here is the first part of the five part series.
What is page Split?
In SQL Server data is stored into a page. A data page is 8 kb of size. It means when the rows are inserted they need to be inserted into a page. There is an exception when large blob data is inserted sql server can insert the data separately into another page and keep only the pointer (For more details refer: row overflow). Apart from that, a row should fit into the page and cannot split into multiple pages.
The decision on where the data should be inserted is decided by the clustered key. If the table does not have a clustered key, (which is called a "heap") the data inserted always be inserted at the end. On clustered tables the data is logically stored in clustered key order. (I'll discuss later on why the word "logically" is used here. But for the moment, it is safe to assume that the pages are in clustered key order) When a row need to be inserted or updated but it cannot fit into the balance space within the current page, The data is inserted into a new page and is called page split. There are two types of page splits occur in SQL Server. Firstly, the natural growth of data that periodically requires new pages, irrespective of whether it is a heap of clustered table. Secondly, a data insert or update operation happens within an existing page that the page needs to move some of the existing data in order to accommodate the inserted /updated data. Even though technically the first operation too is page split, as the content of the page dos not split in that type of operation. The second operation also called as mid-page split, which is a concern of DBAs.
There are primarily two reasons for mid-page split to occur:
- Data inserts happen not in clustered key order
- Update of variable length columns
Scenario 1: Inserts which cause mid page splits
Now let us assume we have a table which has a row size of 1300 bytes. (The example given below takes a few bytes more than 1300, but for this example, it does not make any change) It means a fully occupied page will have 6 rows. Let us assume that we have a table where a particular page has 5 rows having the clustered key values 1,2,3,5 and 6. Note that I purposefully skipped the row with clustered key value 4.
CREATE TABLE dbo.Note
(
NoteID int NOT NULL CONSTRAINT
PK_Note PRIMARY KEY
CLUSTERED (NoteID),
NoteText char(1300) NOT NULL
)
Insert into dbo.Note Values (1,REPLICATE('a', 1300))
Insert into dbo.Note Values (2,REPLICATE('b', 1300))
Insert into dbo.Note Values (3,REPLICATE('c', 1300))
-- The row with
NoteID = 4 is missing here
Insert into dbo.Note Values (5,REPLICATE('e', 1300))
Insert into dbo.Note Values (6,REPLICATE('f', 1300))
As the data can fit into one page the total number of pages
needed for data will be one.
EXEC sp_SpaceUsed 'dbo.Note'
Now a new row is inserted with the clustered key of 4. It
should be inserted into the same page as the value 4 falls between 3 and 5.
Insert into dbo.Note Values (4,REPLICATE('5', 1300))
EXEC sp_SpaceUsed 'dbo.Note'
Since the page already has space within the page there is no
need for page split. Even though the inserted order does not match the clustered
key order (NoteID =5 and NoteID =6 were inserted before NoteID =4) they all
fall into the same page without a page split. The reason behind this is the
insert mechanism employed by SQL Server. During any DML query (select, insert,
update or delete) all the required pages will be loaded into memory, not just
the rows required. During inserts, SQL
Server will identify that the data should be inserted into a particular page
and load the page into memory and inserts are done into the page.
Now assume that another table has a row size of 1900 bytes.
CREATE TABLE dbo.Note2
(
NoteID int NOT NULL CONSTRAINT
PK_Note2 PRIMARY KEY
CLUSTERED (NoteID),
NoteText char(1900) NOT NULL
)
When the above code is executed, SQL Server will create the table definition no data page will
be allocated until the first row is inserted.
Insert into dbo.Note2 Values (1,REPLICATE('a', 1900))
Insert into dbo.Note2 Values (2,REPLICATE('b', 1900))
Insert into dbo.Note2 Values (3,REPLICATE('c', 1900))
-- The row with
Note2ID = 4 is missing here
Insert into dbo.Note2 Values (5,REPLICATE('e', 1900))
Insert into dbo.Note2 Values (6,REPLICATE('f', 1900))
EXEC sp_SpaceUsed 'dbo.Note2'
The first four rows will take slightly more than 7616 bytes
and there is no room for the fifth row.
But since the last row inserted does fall as the last entry in the
clustered key order, it will get into a new page without making considerable
changes into the previous page. Only change happens in the previous page is the
pointer to next row is updated.
As mentioned before, page split occurs even when the data is
inserted into a new page. However, this is inevitable and does not cause
fragmentation or heavy writes.
As usual SQL Server will find the first free page and insert
the first row (PK = 1). Since the table
is small the free page will be allocated from a mixed extend SGAM (Shared
Global Allocation map)
The subsequent three rows too will be inserted into the same
page. As a page cannot be shared across
multiple tables, any concurrent inserts to other tables will not make any
difference here. Let us assume the data was inserted into
Page ID 159.
When the row with PK = 6 was inserted it will get into
another page, which will be after 159, unless another page before 159 became
free. (Suppose you removed a large table
or deleted large amount of data or rebuilt the clustered index of a table, this
situation can occur) Let us assume that the new row is 165.
Now, let us insert another row into the table
Insert into dbo.Note2 Values (4,REPLICATE('5', 1900))
EXEC sp_SpaceUsed 'dbo.Note2'
Now the data needs to be inserted into middle of first page
which is already full. (The word “Full” needs to be taken with a pinch of salt
here. Technically the page has some free space left. But the free space is not
enough to insert the new data.) It causes mid page split which also leads to
fragmentation and heavy writes.
Let us look into how SQL server performs the operation of
page split. It will create a new page and copy half of the rows from the
existing page into the new page. As
mentioned before, the new page usually get allocated after the last allocated
page. Let us assume it as 166. Now both pages will be half empty, so that the
new page will be inserted into the page it needs to be inserted. In this case the first page which had rows
with PK 1, 2, 3 and 5 will split into 2. The first page will now have rows 1
and 2 and the new page will have rows 3 and 5. As the row with PK =4 will
logically fit into the new page so that the new page will have rows 3, 4 and 5.
Scenario 2: Updates which cause mid page splits
The following code illustrates the point.
CREATE TABLE dbo.Note3
(
NoteID int NOT NULL CONSTRAINT
PK_Note3 PRIMARY KEY
CLUSTERED (NoteID),
NoteText varchar(2500) NOT NULL
)
Insert into dbo.Note3 Values (1,REPLICATE('a', 1900))
Insert into dbo.Note3 Values (2,REPLICATE('b', 1900))
Insert into dbo.Note3 Values (3,REPLICATE('c', 1900))
Insert into dbo.Note3 Values (4,REPLICATE('d', 1900))
Insert into dbo.Note3 Values (5,REPLICATE('e', 1900))
Insert into dbo.Note3 Values (6,REPLICATE('f', 1900))
As users may know, there is a difference between varchar and
char data types. Char data type always
hold the same amount of data on every row, irrespective of the amount of data
sent in the INSERT statement. If the statement
sends a smaller string, the balance will be padded with spaces. But varchar
data types holds variable length strings, which means the amount of data sent
will be saved and nothing more. In this example, even though the column can hold upto 4000
bytes, the column in each row will occupy only 1900 bytes. As seen before, the six rows can fit into 2
pages, where PK values from 1 to 4 will fit into the first page and 5 and 6
will fit into the second page. First page
will occupy 7616+ bytes (with overhead) and the second page will occupy 3808+
bytes. Now let us issue an update statement:
Update dbo.Note3 Set NoteText = REPLICATE('a', 2500) WHERE NoteID = 1
With this statement, the row updated resides in the first
page and the page does not have enough space, page split will occur. Here, even though the maximum amount of data
the page holds is around 2504 bytes and even if the other rows are updated to
hold the maximum amount of data, the first page can hold 3 rows, page split
will not consider it. The data will be divided into half and the second half
will be copied to new page.
These are the possible scenarios for page splits. As it is explained in detail how a page split occurs, let us move to the next section.
References
http://sqlblogcasts.com/blogs/tonyrogerson/archive/2007/06/28/what-is-a-page-split-what-happens-why-does-it-happen-why-worry.aspx
This article was written around 5 years ago, but it gives a complete understanding of page splits.
References
http://sqlblogcasts.com/blogs/tonyrogerson/archive/2007/06/28/what-is-a-page-split-what-happens-why-does-it-happen-why-worry.aspx
This article was written around 5 years ago, but it gives a complete understanding of page splits.
-- To be Continued--
No comments:
Post a Comment