Monday, March 26, 2012

Replication Deadlocks

Hi All,

I need to get some assistance with some replication deadlocks im getting.

We have two servers, which push information between each other, which I have a hunch could be a cause.

Basically I keep getting the 'Agent Retry' Alert from SQL:

DESCRIPTION: Error: 14152, Severity: 10, State: 1

Replication-Replication Distribution Subsystem: agent SQ01-EMS-SQ02\UK-183 scheduled for retry. Transaction (Process ID 68) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

When doing some googleing, I found the Trace Flags and setup flags 1204 and 1205, but they dont give me any additional information.

I've looked in the error log, but all I get is this:

DESCRIPTION: Error: 14152, Severity: 10, State: 1

Replication-Replication Distribution Subsystem: agent SQ01-EMS-SQ02\UK-183 scheduled for retry. Transaction (Process ID 68) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

Does anyone have any suggestions on what I can try to find more information on the cause?

Cheers

This may be a genuine deadlock. Did it work when you restart the replication agent affected?|||If you're using SQL 2005, then use trace flag 1222, it will print out more detailed information to the errorlog. The other alternative is to capture a profiler trace using the Deadlock events.|||Hi, I've restarted and recreated the replication, but every now and again it comes back.

It's not a mega issue, but I want to resolve it.

We are running SQL 2K, sadly wont be going to 2K5 until at least SP1 to allow plenty of time to watch what's happenning with it.|||I'm doing that now, but I've already had a couple of occurances this morning which have not shown up.

Will check it over again. Thanks for the tip.|||Ok, then you can use 1204 trace flag instead. If you're restarting your server frequently, then you can just add it to the server startup trace flag section.

When you find the issue, and need help parsing it, cut/paste the output here and we can help you figure it out.
|||Thats the problem. I use Traceflag 1204, 1205, but it doesent give me any more information, than what I've already posted.

If i look through the log file, I can see the time and date where I run a DBCC TraceOn(1204, 1205), but then afterwards, when I get a replication retry/deadlock, I only get the message in the initial post...

Tongue Tied|||

How are you enabling the trace flags? And are you enabling the trace flags on the correct machine? Try enabling them on both the publisher and subscriber machine.

You can enable them in one of two ways:

1. dbcc traceon(1204, 1205, 3605, -1). 3605 means write to errorlog, -1 means enable for all client connections.

2. In Enterprise Manager, right-click on your server, select properties. On tab General, click on "Startup Parameters". add -T 1204 and -T 1205. Then restart your server.

|||Thanks for your help so far...Much appreciated.

I was enabling them via Query Analyser. I've just re-entered your statement, so will see what I can find out on the next agent retry.

Thanks

Steve|||

Right,

I've ran the command as above, which I can see in the error log, but anything after that fails to show any more information.

This is all I get from the log:

2005-09-21 16:12:30.98 spid4 -

2005-09-21 16:16:07.60 spid188 Error: 14152, Severity: 10, State: 1

2005-09-21 16:16:07.60 spid188 Replication-Replication Distribution Subsystem: agent <SNIP>SQ01-EMS-<SNIP>SQ02\UK-183 scheduled for retry. Transaction (Process ID 87) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction..

2005-09-21 16:17:56.21 spid4 -
2005-09-21 16:17:56.21 spid4 Starting deadlock search 41806

2005-09-21 16:17:56.21 spid4 Target Resource Owner:

Do the traceflags expire after a certain period of time or do the remove themselves when you close Query Analyser?

Thanks

|||

I'm not sure why the traceflag isn't working. Please use profiler instead then, use the Lock:Deadlock and Lock:Deadlock Chain events.

|||

Bringing this one back up...

I'm still having this issue. Profiler and TraceFlags give me nothing, so I've done some digging into the replications themselves.

I have a hunch that the table is being locked when data is being pushed to it, it cant because it's locked.

Is there any way I can check the type of locking happenning ?

|||Hi Steve,
I hope you enabled the trace flags with the options that Greg mentioned.

1. dbcc traceon(1204, 1205, 3605, -1). 3605 means write to errorlog, -1 means enable for all client connections.

2. In Enterprise Manager, right-click on your server, select properties. On tab General, click on "Startup Parameters". add -T 1204 and -T 1205. Then restart your server.

If you used option 2, you may need to restart your server. Also add -T 3605 if you used option 2.

And as a sidenote, would it be a case the replication agent is trying to select/insert rows while an external process (user query) is also trying to update the same table? then it could be understandable. Otherwise, without the deadlock graph, it will hard to understand what could be going on.

|||

Hi Mahesh,

I have indeed put those flags in the startup parameters section.

I have thought about what you suggested with the agent accessing whilt another process is working...

I've done a diag. of our replication: http://www.aoqz41.dsl.pipex.com/Replication.jpg

The DB Circled is the one causing problems, with the replication from 'EMS Claims Payments'. I'm wondering if this is a 'Real Deadlock' situation?

Steve

|||But after adding the trace flags to the startup paramters, did you restart the SQL server?

And looking at your diagram, I assume the line from EMS (EMS Claims Payments) is the replication agent writing data. I see only one more arrow into the UKResManClock db (circled one). Does this mean that this also writing data (through replication or user DML)? If so, that can be another thing to look at. Also would the same tables be updated by these two lines?

Also just FYI: Merge replication does not recommend a central subscriber topology. It has its problems.

No comments:

Post a Comment