dbthoughts: September 2009

Sep 29, 2009

Active portion of Transaction Log

A transaction is considered active under various circumstances.

In any recovery model, while a transaction is non completed (neither committed nor rolled back) it is considered as active
In full or partial recovery model, until a log backup is taken even the committed transactions are considered active. In simple recovery model, it automatically becomes inactive when the transaction is committed.

Active transactions in the log keep the space. There is no way of removing the space without data loss. It is always the inactive part of the log which get cleared when BACKUP LOG with truncate only or SHRINK DATABASE command is executed. (By the way, from SQL Server 2008 onwards, BACKUP LOG WITH TRUNCATE_ONLY has no meaning. Microsoft recommends changing the recover model to simple.)

I have seen log files growing even in simple recovery model. There weren’t many large transactions happen either. I couldn’t figure the reason and I didn’t bother much too as I didn’t become that big issue to investigate.

Recently I read an article, which explained the reason: During backup process, whatever is written to the transaction log is kept active until the end of the backup process. So if the backup job takes more time, or if there were large volume of transactions (need not to be large transactions; small many transactions would do the job!) the entire volume of transaction log will be kept active until the backup job completes.

Wanna read more? Try this: Understanding SQL Backups
This article not only gives the reason for the growth of Transaction log, but also an in-depth understanding of how backup is taken in SQL Server. A good article to all those need in-depth knowledge of SQL backup!

Sep 26, 2009

Database Layer Testing

Recently I faced a problem: we had one old complex procedure taking more resources of the server. It had high number of executions, high duration and high reads. Constantly in our weekly reports we see it. At one point, we decided to rewrite the procedure. Even though reading the code and understanding the logic was a big challenge, the biggest challenge was testing it. While we were fine tuning the procedure and make it execute faster, Dev and QA gurus jumped in and identified the areas this procedure was called. As it is called by few core-components and they are used by many systems, QA team said it will take a tremendous effort to test all subsystems.
We came up with a new strategy: I should admit that even though it is yours truly who came-up with the idea, I am sure it came from above. I am not that smart to get something like this just like that
As the core-component cannot be tested directly, we decided to test the procedure directly: For all possible parameters which points to a column in the table we decided to execute both old and new versions of the procedure and compare the results. Unfortunately, the combination of parameters and the representation in the database turned the number of combinations to thousands.
To overcome the time needed for testing we decided to automate it. This is how we did that:

Create two tables with the same set of columns. The columns name, data type and order should match the order of the output.
Add additional columns to hold parameters.
Add additional column to hold return value.
Add an additional column for row id (we used bigint identity).
Return value and row id are the only columns which have NOT NULL set to true.
Create a script which will do the following:

Execute the old and new procedures for all possible parameter values and store the data with parameter values.
Immediately update the table with the return value and parameters.
Finally compare the tables for any differences. SQL Compare or TableDiff could be used for this purpose.

For this method to work two conditions should be met:

The signatures of the procedures should be identical. The names, order and data types of parameters (input) and results (output) of the new procedure should be identical to the old one.
The results should be deterministic. That means the definition of the output should be static. Whatever the value of the parameters are the column names, data types and ordinal position should remain static.

We wrote a script that generated the execute statements which inserted the results into respective tables. At the end we compared both tables and they were exactly the same.
So we concluded that the results were identical. We were able to produce the results and the management was convinced that we do not need additional QA testing. This saved weeks of QA efforts.

Do you think that you need the script which can do that? Unfortunately I do not have a script which can do that magic for all procedures but I am ready to help you in creating one to meet your needs. Please drop me a note :)

Sep 21, 2009

IIS Database Manager

Microsoft is trying its very best to make the life of developers easier especially on the database side. Earlier it came with visual database tool, then with LINQ and now it has released the Release Candidate (RC) of IIS database Manager. This product allows developers and system administrators to easily manage the database servers both local and remote to be managed easily and faster. It can run on any Windows 2008 server if you have IIS 7 installed (and IIS Manager). This is specific product for SQL Server and uses SQL Server Native Client 2008. (If you don’t have it right now, don’t worry, during installation it will ask for you permission to download it.)

These are some of the links which may help you all to ready further about the product

Using IIS Database Manager: http://learn.iis.net/page.aspx/552/using-the-iis-database-manager/

Download IIS Manager Release Candidate (32 bit): http://www.microsoft.com/downloads/details.aspx?FamilyID=231989B4-2A52-4C31-B2D6-96E8E97F8295&displaylang=en

Download IIS Manager Release Candidate (64 bit): http://www.microsoft.com/downloads/details.aspx?FamilyID=2F238709-5618-4693-B7BA-D8C4035AC050&displaylang=en

The above links have links to many more pages to read:

I am planning to use it in the future if it seems productive for our environment. We are starting the evaluation…

Sep 18, 2009

Handling Multiple Jobs - Revisited

Few days back I wrote about organizing multiple backup jobs so that they will not overlap (to avoid disk and memory contention) and won’t be idle for a long time. Even though my post is about backup jobs, after talking to many people, I decided to genralize the issue
A recap of the scenario here:

We have multiple long time consuming jobs in our servers. Even though they are scheduled to run at different times, often a job can start when another job is still executing, Also there could be an idle time where no job is running. As these jobs take considerable resources, we need to complete all the tasks within the time.

After writing the post, there were two solutions suggested:

Remove the schedules from all your jobs. Manually plan the order of the jobs. Add a step on the first job to call the second job and second job to call the third job and so on. Schedule the first job only
Remove the schedules from all your jobs. Manually plan the order of the jobs. Create another Job to call each job and check for the completion of the job and start the next job until all jobs are executed. Schedule this new job.

Each method has its pros and cons.

Method 1:
This is a great method, if you have many jobs of different types. By simply removing the schedule and adding an additional step, (you also need to update the previous job outcome) this could be implemented. Either the code should be cleaver enough in finding the next job using a clear methodology, or else when a job is deleted / disabled, the jobs may start to fail and no other job will execute unless someone change the next job again. Additionally we may need a mechanism of adding the jobs too.

Method 2:

This really works well for backup jobs. If you have many databases and you want to create backups, it is great to have one maintenance job that will create the backups one after the other. In fact, SQL Server’s database maintenance plan does something similar. However, for jobs of multiple categories this is not the best option as different jobs may have different jobs and different notification methods.

Users need to decide on the best solution before making a decision.

Sep 15, 2009

Handling Optional Parameters

I have spoken at different forums and written in many forums about how optional parameters could be handled. As I have seen this issue again and again I thought of writing a blog on this. In this blog I will talk on how mismanagement of optional parameters can hinder the performance
Let us consider the table Sales.SalesOrderHeader in AdventureWorks. This table has two columns for us to consider: SalesOrderID and CustomerID

Assume that you want to retrieve the rows for a SalesOrderID or for a CustomerID. That means, if you know the SalesOrderID you will retrieve only one row of data, otherwise, multiple rows of data.
This table has 700 data pages (5600 KB) and 360 index pages (2880 kb) where IX_SalesOrderHeader_CustomrID uses 43 pages this has the depth of 2 (Root page and leaf only).
This information could be obtained using the following:





EXEC sp_SpaceUsed 'Sales.SalesOrderHeader'

select * from sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('Sales.SalesOrderHeader'), INDEXPROPERTY(OBJECT_ID('Sales.SalesOrderHeader'),'IX_SalesOrderHeader_CustomerID','IndexID'),1,DEFAULT)

How assume that you have written a stored procedure to handle this.

NOT IN, NOT EXISTS and LEFT JOIN ... What To Use?

This blog is to analyze the differences between the above statements and to find what suits where better.
Lets consider this scenario: (This example is taken from AdventureWorks) Find all customers who do not have Sales. It could be written in three ways:



SET ANSI_NULLS ON

-- Method 1: Using NOT IN

SELECT CustomerID, AccountNumber 

FROM Sales.Customer

WHERE CustomerID NOT IN

(

SELECT CustomerID

FROM Sales.SalesOrderHeader

)



-- Method 2: Using NOT EXISTS

SELECT CustomerID, AccountNumber  

FROM Sales.Customer C

WHERE NOT EXISTS

(

SELECT 1

FROM Sales.SalesOrderHeader S 

WHERE S.CustomerID = C.CustomerID 

)



-- MEthod 3:  Using LEFT JOIN 

SELECT c.CustomerID, c.AccountNumber  

FROM Sales.Customer c

LEFT JOIN Sales.SalesOrderHeader soh

ON soh.CustomerID = c.CustomerID

WHERE soh.CustomerID IS NULL

All the above queries give the same results, show the same execution plan, take the same amount of resources in IO and executed almost at the same time. However, this example is quite simple. The column CustomerID is not having any null values. When it comes with null values, the results may vary:

Update on Getting Table Row Count(For Replicated Data)

After seeing my last post, I got a few requests:

Can't we resolve the schema issue with sp_SpaceUsed?
Can we modify the query to handle multiple susbcriptions bit more efficiently?

Let me explain the second issue first. the query in the post calculates the row count each time for each subscription. It could be written efficiently for multiple subscriptions.

Getting The Table Usage

Recently I had an issue with replication. Suddenly due to some reason, some of the replications have failed but no error is reported in replciation monitor. We found out only when developers complained that the data they entered has not reflected properly in all places. We identified that the issue is with replication and quickly checked the tables he mentioned. There was a difference in number of rows between the source table and the destination table. We immediately wrote some scripts to transfer the remaining data and dropped the subscription and created it again.
When replication started working, (Okay it didn’t start working until we find out that there was a security issue and correct it.) I wanted to check what are the other tables were affected.

As I know the publication database and subscription database, we can get the data from sysarticles table from the publication database.

There were three methods before me to get row count.

Selecting data by using SELECT COUNT(1) FROM <table> method. This result is accurate, but it will consume a lot of memory and time. This will result reading the entire clustered index (a full clustered index scan) or a table scan.
Reading the rows FROM sysindexes for the clustered index or heap. This is the fastest method but there is a possibility of inaccuracy. (Even when I ran the test I got some slightly inaccurate results.) This method is fairly okay if you want to know the approximate row count, but in case you need to get the exact row count, this method is not recommended.
Using sp_SpaceUsed system procedure. This method also does not give accurate numbers; additionally as this is a stored procedure, it does not gives the flexibility to add additional columns or remove unnecessary columns. Additionally, even though this stored procedure accepts schema_name.table_name format as input, on the output only table name is mentioned. This may give some issues if your database having multiple tables with the same name (in different schemas).

I went with the first method to get the row count and completed the issues as I got the weekend to resolve it. (Remember last weekend was a long weekend.) as I already had a script to work with and I added a few things to complete it. I have added the script with some modifications for you all to use it.



SELECT '

        SELECT ''source'' AS Table_Location,

               '''+ OBJECT_SCHEMA_NAME(a.objid)+''' AS SchemaName, 

               '''+ OBJECT_NAME(a.objid) +''' AS TableName, 

               COUNT(*) as row_count 

        FROM '+ OBJECT_SCHEMA_NAME(a.objid)+'.'+ OBJECT_NAME(a.objid)+' 

        UNION ALL

        SELECT ''destination'' AS Table_Location, 

               '''+ a.dest_owner +''' AS SchemaName, 

               '''+ a.dest_table +''' AS TableName, 

               COUNT(*) as row_count 

         FROM '+ s.dest_db +'.'+ a.dest_owner +'.'+ a.dest_table 

FROM dbo.sysarticles a 

INNER JOIN dbo.syssubscriptions s 

ON a.artid = s.artid

Is this usefull?

Sep 8, 2009

MTI Technology Forum

Okay, now a slightly different entry here. MTI Consulting is organizing a technology forum for IT sector. This has nothing to do with SQL Server specifically, but I thought of adding this, as some of the top guys (yes guys only, I couldn't see a single female speaker) from IT firms in Sri Lanka are included as speakers in this event.

I got this invitation today as my sister-in-law is working there. From what I hear and based on the speakers, this forum is connecting the IT into business, or looking IT from the business perspective. So it may not be a tech session even though they call it a technology forum. However, if any one of you think that you/ your company should participate, please contact MTI Consulting. It is a good forum for management.

Sep 7, 2009

Looking For More Security Vulnerabilities in Database Code

A few days back I read a blog from Kimberly L. Tripp, one of the great teachers on SQL.Server. She opened the question on ways to identify vulnerable code written inside stored procedures. Even though she came up with dynamic SQL and execution context the discussion went on one direction only on dynamic SQL.

She proposed two main areas for search:

CASE

WHEN sm.definition LIKE '%EXEC (%' OR sm.definition LIKE '%EXEC(%' THEN 'WARNING: code contains EXEC'

WHEN sm.definition LIKE '%EXECUTE (%' OR sm.definition LIKE '%EXECUTE(%' THEN 'WARNING: code contains EXECUTE'

When discussion proceeded two major additions came up: One on handling white spaces and commented lines and the other on sp_ExecuteSQL

Does that mean if we take care of the above vulnerabilities our code is secure? Unfortunately, we have many other possibilities to look into.

In fact, when I saw Kimerly’s blog, I thought of writing some additional vulnerabilities there as a response, but a few factors prevented me: 1. My answer seems to be slightly larger for a response. 2. The original post was made months ago. So, here I have the things I found.
I suggest you all to read her blog and the responses.
Looking for security vulnerabilities in database code

Well SQL Server 2000 prior to sp3 had an undocumented extended stored procedure called xp_ExecResultset which allowed the SQL statement to be constructed using another query. In later service pack this was replaced with a TSQL stored procedure. With sp3, another procedure named sp_ExecResultSet too was added (this did nothing more than calling the previous procedure).

Even though the extended procedure was removed, it is not impossible to use the same dll used for xp_ExecResultset in SQL Server 2000 to create a new extended procedure.

Of course no one can take the same dll and attach to SQL Server 2005 due to architectural changes, but it does not mean that no one can attach a dll and call dynamic SQL. Actually calling dynamic SQL is simpler than before due to the introduction of CLR integration. That means someone can write code that simply runs ad-hoc query (or similar) statement based on the parameters passed.

In SQL Server 2005, there is a new stored procedure named sp_sqlExec which accepts SQL statement/variable as parameter and simply executes it.

Additionally there is another undocumented procedure sp_execute which accepts @handle (int) as a parameter. This refers to a cached plan in memory prepared by sp_prepare (another undocumented extended procedure) That means someone could call sp_execute and pass the handle of a dynamic statement (which would have been executed before too).

Interested in finding more security vulnerabilities? I see a lot of other places like xp_cmdshell, sqlcmd, jobs (which can run ActiveX & WMI scripts, OS commands and TSQL statements) and alerts (where you have a chance to run powershell script too.

So, be wise and do a security audit on the system.

Share this post :

Sep 6, 2009

Free Online Training on SQL Server

How many of you like to join a really technical deep dive seminar? Well, if you are in US you have a high chance of joining the seminar after paying the registration fee. But for those who are in other places now have a chance to attend one online, for free. Quest Software - a leading provider of tools in windows administration, virtualization and of course, SQL Server - is organizing one.

This training mainly on administration of SQL Server, where resource governor, profiler and other Microsoft tools related to performance optimization will be discussed. The afternoon sessions will be too late for most of the Sri Lankans, but the morning session is a good time to attend. (Do you sleep before 9.30 pm? Can you stay up till 12:30 am?)

Interested? Have a look on the agenda here:

http://www.quest.com/events/listdetails.aspx?ContentID=10219

Recently PASS had a 24 hour free online seminar on various subjects. I was able to attend a few, but due to some issues in my online calendar, I couldn’t attend all. They were really good ones. I was able to gain much from them.

For those who have missed them, they have a chance to view them later. The meetings are being archived, and they’ll be available for replay after the PASS Summit in early November.

Share this post :

Sep 5, 2009

What SSWUG.ORG Says About My Article?

Hope you've read my blog about (Mis) Understanding Scans and Seeks. This is about my article recently published at sql server performance. As the title was "understanding scans and seeks" many would have thought that it is all about the basics of scans and seeks and simply skipped it. However, SQL Server Worldwide User Group (SSWUG.ORG) recognized something about the value of thearticle and added it in its collection with a simple one line comment:
http://www.sswug.org/articles/extarticle.aspx?id=46032

Thinking about reading it again or at least now?

Share this post :

Scheduling Database Backups: Need a Solution

Recently we had this issue.

We have a huge database server with 2.5 TB of database space. The server has 8 GB of RAM and two dual core processors. It also has a not so good storage device. Due to the limitations in RAM and hard disk, (the disks are really slow) we were facing a problem in taking full backups.

Initially, the database server had a set of jobs, each one was starting on 5th of every month at 2:00 am. It created a whole heap of issues, including too many (is 59 a big number?) backup jobs starting at the same time causing each one blocking the other due to memory crunch and hard disk bottleneck.

So we decided to schedule the task one after the other. We also scheduled it to happen on a weekend of the moth. (example: First Saturday) We did a calculation on hard disk speed and database size and schedule the jobs to start at different times. For the first month everything worked nicely. But when the databases started growing, it forced us to reschedule the backup jobs. Can we reschedule the jobs every month? Currently we have 29 servers to maintain and some have more databases. If we don’t re schedule the jobs we may face one of the two scenarios (actually both scenarios)

Before a database backup is completed, the second one starts and creates memory and disk contention.
One backup completes earlier than expected but the next backup waits until its time comes.

We also had another issue: most of the databases are inter-connected so the backup jobs need to be completed one after the other. As some of the databases are really huge, the backup jobs pass the 24 hour time frame. It created another issue for scheduling backups.

How to solve the issue?

Please come up with your solutions.

Share this post :

Sep 4, 2009

Develop Database For Future

Do you write TSQL statements which do not end with a semicolon?
Do you use system tables or system procedures in your code?
Is your code contains ANSI settings?
Do you have text, ntext or image columns in your databases?

Your database is having a danger. The above list is just a few of the massive list of items marked for deprecation. The best option is now itself startchanging them to the new standards. Add them for your standards and whenever you need to change the code to add a functionality or to remove a bug, change the things marked for deprecation also. To my surprise, the vardecimal storage format, whcih came with SQL Server 2005 sp2, is also marked for deprecation.

Check the complete list here: http://msdn.microsoft.com/en-us/library/ms143729.aspx

When talking about writing for future, we also need to consider about reserved keywords. Microsoft keeps on adding new reserved keywords to the list, where words like “LineNo” and “National” are reserved keywords already. The recommended practice is to use them enclosed with square brackets (for example , [Description]). Unfortunately, a word which was not a keyword now, can become one in the next release. So, how can we prevent this? Gogula gave a nice answer in his recent presentation: “Always enclose with square brackets”.

Read about it more here: http://dbantics.blogspot.com/2009/08/t-sql-best-practices-and-coding.html

Finally, beware of less used features. They are most likely the candidates for deprecation. You all know what happened to notification services right?

Share this post :

Sep 2, 2009

Discount offer for Microsoft Certification

Prometric is offering 15% - 25% discount for many certifications including many Microsoft certifications.
This offer includes most of the SQL Server related certificates if not all.
Interested?
If Prometric has the information of you it would have sent you a mail on this.
If not just visit their site and give your basic information. They will mail the voucher to you.

http://www.prometric.com/microsoft/careeroffer.htm

The only catch is you need to complete the exam before december 31. (Is it too long? Set an internal target for October 31)
Good Luck!

Share this post :