Feb 23, 2009

Interesting Replication Issue

Recently I saw an interesting replication Issue: Replications started failing due to deadlocks.
In our environment we have multiple transactional replications setup with multiple articles published and consumed by different subscribers. Some time back, we made a decision to have different publication for each article, instead of combining multiple articles into one publication. This option gave us free hand in modifying any article. When you recreate the publication it is only one article which gets affected.

When this problem came up we checked the replication error log. We found out that the error is with distributor, where the replication failed due to deadlock when it wanted to update a system table in the subscriber. The table name is MSreplication_subscriptions, which has a row for each publication. We found that subscribers of two different publications tried to update the same table deadlocking each other.

Our first approach was to resolve the issues on each publisher. We restarted the distribution agent of each publication and the deadlock started occurring for another set of publications. We know that the deadlock can occur when two resources tried to lock the same page. The table does not have any other index than the clustered index, we wanted to check the error. When we went there we found the error: the clustered index was out of sync. While the table is having only 127 rows, the index shows as 4 billion rows. It caused a table scan for a single row update and resulted deadlock.

We recreated the index, and things started working well.