Nov 2, 2011

Will The Performance Improve If The Database Size is Smaller?

Few days back, when I was speaking at SSLUG (SQL Server Sri Lanka User Group) meeting on "managing VLDB" (VLDB stands for Very Large Data Bases) there was a question came on improving the performance of VLDB. There was a participant suggested that by shrinking the database we could improve the performance. I noted it then and answered later, but it could be useful for many if I share in the blog.  So I am sharing it here for the benefit of many.
Argument:  Large databases may take longer time to execute a query because scan operations may have to read more pages.
Answer: If you have an operation which scans the entire database (generally a maintenance or troubleshooting related operation) yes you may have to scan more pages.  However when you run a query against tables, you will touch only the pages allocated to the table and indexes.  If the table is clustered, each page will have a pointer to the next page and all read operations happen based on these pointers.  The performance is better, if you have read forward only. when you have free pages at the end of the database, it will not be touched at all. But if you have free pages  in between, shrink operation moves the pages at the end to the free space available. It will make the physical order different from logical order and forces read backward. This is the major concern for performance.
Argument: Shrink operations reduces the pages occupied by an object
Answer: NO. Please note this point:  Shrink operation does not remove the free space available within a page. Shrink operation does not compress the data either. The free space within a page can occur due to many reasons.

  1. If the clustered index was created with a fill factor  between 1-99 generally you expect some space within a page when the index is created/rebuilt.
  2. When page split happens due to updates there is some space left in old page. 
  3. When a row within a page is deleted their is free space created. (Okay, SQL Server does not initialize every bit of deleted row to 0. But rather the entry is removed from index and row mark within a page. SQL Server treats that area as free space)
Shrink operations work at page level and not at object level or row level.  If there is a free page in the middle and last non-free page will be moved there and that last page will be marked as free. Then it changes the pointers in the pages accordingly At the end, the free pages at the end could truncated if the options are set properly. The internal fragmentation (the fragmentation happens within a page due to page splits etc.)
Then what is the mechanism to remove the fragmentation? The answer is to rebuild the clustered index. As the data is ordered in the clustered index key, it can move the data pages according to the clustered key order.  This is why rebuilding a clustered index is a resource consuming operation.