Let us consider the table Sales.SalesOrderHeader in AdventureWorks. This table has two columns for us to consider: SalesOrderID and CustomerID
Assume that you want to retrieve the rows for a SalesOrderID or for a CustomerID. That means, if you know the SalesOrderID you will retrieve only one row of data, otherwise, multiple rows of data.
This table has 700 data pages (5600 KB) and 360 index pages (2880 kb) where IX_SalesOrderHeader_CustomrID uses 43 pages this has the depth of 2 (Root page and leaf only).
This information could be obtained using the following:
EXEC sp_SpaceUsed 'Sales.SalesOrderHeader'
select * from sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('Sales.SalesOrderHeader'), INDEXPROPERTY(OBJECT_ID('Sales.SalesOrderHeader'),'IX_SalesOrderHeader_CustomerID','IndexID'),1,DEFAULT)
How assume that you have written a stored procedure to handle this.
CREATE PROCEDURE Sales.SalesOrderHeader_get_BySalesOrderIDORCustomerID
(
@SalesOrderID int = NULL,
@CustomerID int = NULL
)
AS
SELECT SalesOrderID,
CustomerID,
SubTotal,
TaxAmt,
Freight
FROM Sales.SalesOrderHeader
WHERE
(@SalesOrderID IS NOT NULL OR @CustomerID IS NOT NULL) AND
(@SalesOrderID IS NULL OR SalesOrderID = @SalesOrderID) AND
(@CustomerID IS NULL OR CustomerID = @CustomerID)
RETURN 0
GO
The first condition makes sure that we will not accept the parameters of the first case. Now we have the procedure ready and let us test with our test parameters. (Please click on pictures for enlarged view)
Now, as the procedure is ready let us test the stored procedure
These are the test cases: (each one is represented with a procedure call)
Case 1: When both parameters are null (either explicitly supplied as null or not supplied at all.
Case 2: First parameter has a value and the second Parameter is null
Case 3: First parameter is null and second parameter has a value
Case 4: Both parameters are supplied
EXEC Sales.SalesOrderHeader_get_BySalesOrderIDORCustomerID
EXEC Sales.SalesOrderHeader_get_BySalesOrderIDORCustomerID 50000
EXEC Sales.SalesOrderHeader_get_BySalesOrderIDORCustomerID NULL, 27116
EXEC Sales.SalesOrderHeader_get_BySalesOrderIDORCustomerID 50000,27116
Based on the code we have the execution plan will be the same for all parameters. We also want to capture the IO cost involved in each execution.
Before checking the execution, let us have a look on the indexes of the table. We have the clustered index on SalesOrderID and non clustered index in addition to other, on CustomerID. That means I have a mechanism to select the CustomerID to get the data rows needed. However, as not all the columns are indexed, SQL Server has to look up the data for other columns. This method is formally called as bookmark lookup, but now it is replaced with key lookup/rid lookup or sometimes clustered index scan. But when we have the SalesOrderID we can retrieve the data directly using Clustered index seek.
Let us take a look on the execution plans…
All the queries give the same execution plan and their cost is same. Irrespective of what data you receive in the results, it does a clustered index scan. Now, just to caution you, scan does not always mean that SQL server reads all the pages from top to bottom (or vice versa), we need to check the number of pages read by using SET STATISTICS IO ON statement (just before the execution).
(0 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
(1 row(s) affected)
Table 'SalesOrderHeader'. Scan count 1, logical reads 704, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
(2 row(s) affected)
Table 'SalesOrderHeader'. Scan count 1, logical reads 704, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
(1 row(s) affected)
Table 'SalesOrderHeader'. Scan count 1, logical reads 704, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
The first query does not do anything on the code so, no table is scanned. But the execution plan created includes all the possible scenarios and work for all cases. The danger here is it uses the same plan and scans the entire table even when few rows are needed and they could be achieved through index seek.
Now let us remove the cached plan by using DBCC FREEPROCCACHE and execute the queries in different order:
EXEC Sales.SalesOrderHeader_get_BySalesOrderIDORCustomerID 50000
EXEC Sales.SalesOrderHeader_get_BySalesOrderIDORCustomerID 50000,27116
EXEC Sales.SalesOrderHeader_get_BySalesOrderIDORCustomerID NULL, 27116
EXEC Sales.SalesOrderHeader_get_BySalesOrderIDORCustomerID
Now the execution plans are different
There is a difference in the number of pages scanned too:
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
(1 row(s) affected)
Table 'SalesOrderHeader'. Scan count 1, logical reads 33, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
(1 row(s) affected)
Table 'SalesOrderHeader'. Scan count 1, logical reads 33, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
(2 row(s) affected)
Table 'SalesOrderHeader'. Scan count 1, logical reads 94425, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
(0 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
/code>
Please note that the third query read 94,425 pages of data/index (each page is of 8 kb size;it makes the total reads to 755 MB) even though the same execution plan used for all queries.
The reason behind is the execution plans are made during their first execution and they are optimized for that query. Unless it recompiles, the same plan will be used. When we use a single select statement for all scenarios, we make SQL Server to use the same execution plan many times, even though it is not the optimized way always.
So what is the best way of handling optional parameters?
The best method is to handle them in code, create two procedures one for each scenario. In case that is not possible, (if you are trying to improve performance of a piece written some time back and you do not have the luxury of changing the application code),branch the code within stored procedure.
This is the method I use:
ALTER PROCEDURE [Sales].[SalesOrderHeader_get_BySalesPersonID_OR_CustomerID]
(
@SalesPersonID int = NULL,
@CustomerID int = NULL
)
AS
IF @SalesPersonID IS NOT NULL AND @CustomerID IS NULL
SELECT SalesPersonID,
CustomerID,
SubTotal,
TaxAmt,
Freight
FROM Sales.SalesOrderHeader
WHERE
SalesPersonID = @SalesPersonID
ELSE IF @SalesPersonID IS NULL AND @CustomerID IS NOT NULL
SELECT SalesPersonID,
CustomerID,
SubTotal,
TaxAmt,
Freight
FROM Sales.SalesOrderHeader
WHERE
CustomerID = @CustomerID
ELSE IF @SalesPersonID IS NOT NULL AND @CustomerID IS NOT NULL
SELECT SalesPersonID,
CustomerID,
SubTotal,
TaxAmt,
Freight
FROM Sales.SalesOrderHeader
WHERE
CustomerID = @CustomerID AND
SalesPersonID = @SalesPersonID
ELSE
SELECT SalesPersonID,
CustomerID,
SubTotal,
TaxAmt,
Freight
FROM Sales.SalesOrderHeader
WHERE
0 = 1
RETURN 0
GO
Now execute all the test cases in different orders. You will get the execution plan optimized for each case. SQL Server keeps the execution plan for each statement. When you have multiple statements in a procedure one for each option, you will force SQL Server to create a plan for each statement optimized for that statement.
Try and see whether you get the optimized query for each execution :)
No comments:
Post a Comment