Apr 15, 2012

New certification Paths

Interestingly,  I got an email on 11th from a friend of mine inquiring about the certifications Microsoft is offering.  I know that Microsoft was planning to bring new set of certifications.  So While writing the reply when I checked the site, I found out the Microsoft has just updated the info.

Microsoft has done a major restructuring of the certification program related to SQL Server and related technologies.  Its look like Microsoft believes that, the current program was developed couple of years ago and should be changed to meet the current market trend.  Microsoft is placing a lot of weight on cloud side.  According to Microsoft, this trend will eventually applied to other areas (such as Development and Windows Administration) as well. It is divided into more granular form.

From remarks I see many have worried about the name change: Earlier we had MCSE for Microsoft Certified Software Engineer,  and it moved to MCDBA, later it changed into MCITP and now we are back to MCSE (Microsoft certified Solutions Expert) (Yes, the world is round but not fully round).  Most of the HR firms will find it difficult to keep updating their staff and recruitment process.  I still see vacancies referring to MCDBA certification, which was the certification for SQL Server 2000.

For the DBAs the biggest change is narrowing tracks of database Development, database administration and Business intelligence.  Actually to pass the first level (MCSA) you need to pass three exams:  70-461 (Querying Microsoft SQL Server 2012), 70-462 (Administering a SQL Server database)  and 70-463 (Implementing a Data Warehouse with Microsoft SQL Server 2012) . It means every database professional who wishes to have a certification should know BI stuff and administration stuff as well.

Let me warn you about this exam:  These are not easy exams. 70-462 talks about many advanced stuff including using trace flags, DBCC commands, TDE, Clustering, extended events and always one. Similarly, 40-463 includes Data Quality Services, advanced stuff on SSIS (Fuzzy actions, Optimizing performance etc.), package configuration, Deployment, Using PowerShell, SCD type 2 and Master data Services.
I remember someone telling, that he would score more on the Business Intelligence exam if he is permitted to take dice to the exam hall.  :)  I am sure BI guys may have the same sentiments towards the administration exam too.

Ultimately, if the exams are harder, will many people take the exam?  There is fear that it may encourage many people to use brain dumps rather than studying for the exam.  Also, many people will try to go for classes, and memorize that is being said, rather than studying on their own.  It could be true.  Additionally, the cost of gaining a certification is also increased (the price for each exam was increased last year; and you need to pass more exams as well) which may create more reluctant DBAs (Some companies reimburse the exams if their employees pass the exam which may be an incentive.)

Apr 14, 2012

Page Splits Part 4: Cost and Ways of Controlling

This is the final part of the series. I was not initially planned to write this series in a week interval, but due to various reasons it got dragged.

Cost of Page Splits

Many writers have written many times about this, so I like to summarize them

  • It needs to take the last page and delete half of the content
  • In order to do that this and maintain the consistency, the activity is logged.   
    • It is this operation when captured reading the log or by using extended events provides information about page splits.
    • Microsoft has done a performance improvement in page splits, by making them not transactional. It means that the page split operation could not be rolled back. Since the page is at the verge of split, we may have to perform a page split at a future time anyway. Microsoft keeps the splitted pages for a future need. But to make sure the backups and high availability features work well, it will be logged. Just enclose my code with BEGIN TRAN and ROLLBACK TRAN and check the space used by the table. I also checked with LOP_UNDO_DELETE_SPLIT operation in extended events as well as transaction log, and it does not fire when a transaction is rolled back.
  • If the data file already full (mean all pages are allocated; it does not necessarily mean that all pages are fully utilized) , this insert/update happens on a page which needs a page split and hat needs a new page.  Since there is no extra page available, it may lead to file growth. (Unless instance file initialization is set and SQL Server start-up account has right to perform volume maintenance tasks, this will be a time consuming operation.)
  • Your insert and update statements could become costly. This may lead to slowness in the OLTP operation.
  • A simple update could take longer time than expected. I remember reading an article about ripple effect of page splits. I believe it is from Kimberly Tripp (Sorry I am unable to find any links). Your update statement could cause the a page split and the page you intended to store the new/updated row is already (almost) half full. If the new cannot be inserted into the balance half, it may lead to another split..
  • Since the data is divided, each page will contain less rows and more empty space.  This is called internal fragmentation.  This leads to more pages to be read when scanning or seeking on a range. It also makes your database and backups bigger. Some high availability options like transactional replication, database mirroring and log shipping too will get higher load.   
  • Since the new page logically fits in the middle of the content, but physically located in the next available free page, they are not aligned together. (Unless you delete at least one data page in the appropriate page which may give you some space to place the new page.  This is an extremely rare scenario.) This is called external fragmentation.  It leads to unnecessary rotation of disk which means more time to read the data.

Controlling Page Splits

Page splits cannot be stopped completely but there are two important aspects on controlling page splits and its effect.
  • First of all periodically re-indexing the fragmented tables/indexes will remove fragmentation. It should be part of the job of the DBA
  • Another preventive action is to control shrink database or shrink file operation.  While this operation can reduce the size of the database, it increases fragmentation. (I have written about it before. Visit this link: http://preethiviraj.blogspot.com/2011/11/will-performance-improve-if-database.html In case you are in absolute need of shrinking the database, you may have to rebuild the clustered index again)

The other side of controlling page splits is part of the design of the database.
A couple of things could be done here:
  • By introducing the identity column, a numeric column getting sequence value or Sequential GUID as the clustered index. Since this will increase sequentially, it will prevent the inserts from causing page splits. Generally, user defined clustered keys have a tendency of not incrementing naturally, unless it is similar to serial number.
  • Minimize the use of variable length columns.  Fixed length column pad additional space so they could be used during updates. I have seen a similar approach in no-sql (or non-relational to correctly say) implementations where programs manually pad extra space for future use.
  • I having a proper fill factor:  Generally keeping the fill factor of the index near 80% -90% is a good idea, but the number should be verified in each environment.
Final thoughts
I need to warn about one thing about using some of the methods/scripts used in this series. Some of the scripts/|Methods are are resource intensive and may slow down the instance.For example, if you try to track operations in transaction logs to identify page splits, you may make the application to slow down. So it should be used as the last resource. Also be aware of using some of the undocumented feature on a longer run.  Microsoft may change them at any time.  Additionally, it could lead to losing the support of Microsoft on certain issues/problems too. Please check with Microsoft about what you are trying to use and the impact.

With this note, this series comes to an end. Hope it has provided a very good background to all my readers.

References and further reading:

I went through a lot of articles and my own tests before producing this series. I believe I have mentioned all the possible references at the appropriate places, But there could be certain citation I could have missed. It is because, at the time of reading I do not think about writing about any point mentioned there. It is later, I get the idea of writing about something, I have the memory about the content, but it is in my words. I may use my words to search and lose the actual article. I have not mentioned various parts of books online here, but it is one of my sources always.

Another good article but written some time ago.  I was under the impression that it was written by Brad McGehee (from Red-gate.com when he was handling sql-server-performance.com. I may be wrong. The current link simply says it was written by Admin.) The title says the purpose.

One of the greatest articles from a SQL Server Guru, Paul Randal, which explains about the cost of Page splits in terms Log operations. This is the article which provoked my curiosity which led to this series. I will call this a must read, if you are really interested in in depth knowledge.

Talks about page splits and transactions. nice article to read.

Apr 12, 2012

Page Split 3.5: Advantage of Extended Events in SQL Server 2012

[Updated 13th April 2012 12:27 am] Note: Jonathan Kehayias (Blog)  pointed out that some of his work is being referenced but not cited. Since it is a series, I was under the impression that all references could be published at the end; but Jonathan updated me that it should be there with each post. So it is mentioned at the end of this post. Regret for the anxiety caused.  Thanks Jonathan. I will be updating other posts int his series.
Unlike previous versions, SQL Server 2012 provides more details on page splits. You you all would have seen in the first part of the series, that page split can be an event, where a new page added to the table.
The page split we should be worried is an existing page splits to accommodate either a new or an existing row.
In SQL Server 2012 extended events, we can track the database, the sql statement which caused the page split and even the type of page split.

Lets use this example: Lets select the page split test, extended event session and (if it is stopped , start and) open the watch live data. (There could be a delay in showing the events int he screen, but they are captured live)
Now execute the following script

IF DB_ID('page_split') IS NULL
       create database page_split;
use page_split
IF Object_Id('dbo.Note') IS NOT NULL
drop table Note

NoteText char(1300) NOT NULL

Insert into dbo.Note Values (1,REPLICATE('a', 1300));
Insert into dbo.Note Values (2,REPLICATE('b', 1300));
Insert into dbo.Note Values (3,REPLICATE('c', 1300));
-- The row with NoteID = 4 is missing here
Insert into dbo.Note Values (5,REPLICATE('e', 1300));
Insert into dbo.Note Values (6,REPLICATE('f', 1300));
Insert into dbo.Note Values (7,REPLICATE('g', 1300));
Insert into dbo.Note Values (4,REPLICATE('d', 1300));

When the code is executed, we can see three page split related information getting into our watch window.
(Lets add the following columns from details section to the table view: database_name, SplitOperation, new_page_file_id, new_page_page_id, page_id, sql_text)

The watch window will look like this:

Apr 11, 2012

Page Splits Part 3 – Identifying through Extended events

My sincere apologies to all in delaying producing part 3 of the series.  I got into a set of urgent tasks which prevented me from updating my blog.

SQL server has one more mechanism which allows us to identify page splits.  This feature called extended events introduced with SQL server 2008 and According to Microsoft sources,  eventually, it will replace SQL trace.

SQL Server 2012 extended the feature not only by introducing new events but also by introducing a new UI for capturing extended events
By using extended events we can identify the database where the page split has occurred.  As you may remember, by using performance monitor we couldn’t identify the database where the page split has occurred.

In addition, extended events provide more information, which we couldn’t gather before.

Let us have a walk-through of identifying page splits through extended events
In SQL Server 2012, extended events (commonly referred as XE) management is part of the management tab. Under management there is a section for extended events, and it has a folder named sessions.  There are two sessions already. System-health is the XE equivalent of default trace.
Let us right click and select new session, as shown in picture 1. (New session wizard too will take you to the same destination, but to understand the basics, let us bypass the wizard).