Friday, March 23, 2012

Replication Cleanup in high volume database

We're having intermittent performance problems on a SQL 2005 SP1 system that
we suspect is related to transactional replication. There may be overlap
with Jack Griffith's thread "Replication system disk performance severly
degrades after 1 month", but I'm not sure.....
While troubleshooting the problem, we attempted to make things better by
clearing out any transactions that had already been replicated to the
subscriber (EXEC dbo.sp_MSdistribution_cleanup @.min_distretention = 0,
@.max_distretention = 0). BAD, BAD, BAD!!!!! Setting @.max_distretention = 0
marked our subscription as inactive and we had to rebuild the subscriber
database to get it back in sync with the publisher.
I estimate add/update 2-4 million rows a day in our publication database.
The distribution database has grown to 9GB in < a week.
I have not altered any of the database cleanup jobs as of yet. The current
Distribution Cleanup job runs every 10 min and is taking 6-15 minutes to run.
We observe a tremendous number of disk reads on the Distribution database
disks. When we experience performance issues, we also see a HUGE number of
memory pages lasting ~ 15 seconds and occuring exactly every 2 minutes.
Restarting the SQLServer service seems to clear up the performance issues for
several hours.
* msrepl_commands rowcount approaches 35 million rows and
msrepl_transactions approaches 6 million rows. With a high volume in
transactional replication, should there be any indexes on the tables?
* While tracing through the MS SPs, I found that sp_MSdelete_dodelete only
deletes the top 5000 rows from MSrepl_transactions. In a high volume
transactional state, this might not keep up. Can this be changed safely?
* What other parameters can be changed to keep the transactions only as long
as they are needed but yet not mark the subscription as inactive?
Thanks!
If you have a few subscribers use raid 10, if you have a large number use
raid 5.
Set max_distretention to 3 days. This will only pool undistributed commands
for up to 3 days.
the setting for only deleting 5000 rows at a time is by design. If you
choose to modify it to something higher make sure you make the modification
to the end of
sp_MSdelete_publisherdb_trans as well.
The appropriate indexes are in place on msrepl_commands and
msrepl_transactions. You might want to defrag these indexes to see if that
helps or update statistics.
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"RL" <RL@.discussions.microsoft.com> wrote in message
news:71F2956A-5B80-4918-993F-F8CF3AD2C0D5@.microsoft.com...
> We're having intermittent performance problems on a SQL 2005 SP1 system
> that
> we suspect is related to transactional replication. There may be overlap
> with Jack Griffith's thread "Replication system disk performance severly
> degrades after 1 month", but I'm not sure.....
> While troubleshooting the problem, we attempted to make things better by
> clearing out any transactions that had already been replicated to the
> subscriber (EXEC dbo.sp_MSdistribution_cleanup @.min_distretention = 0,
> @.max_distretention = 0). BAD, BAD, BAD!!!!! Setting @.max_distretention =
> 0
> marked our subscription as inactive and we had to rebuild the subscriber
> database to get it back in sync with the publisher.
> I estimate add/update 2-4 million rows a day in our publication database.
> The distribution database has grown to 9GB in < a week.
> I have not altered any of the database cleanup jobs as of yet. The
> current
> Distribution Cleanup job runs every 10 min and is taking 6-15 minutes to
> run.
> We observe a tremendous number of disk reads on the Distribution database
> disks. When we experience performance issues, we also see a HUGE number
> of
> memory pages lasting ~ 15 seconds and occuring exactly every 2 minutes.
> Restarting the SQLServer service seems to clear up the performance issues
> for
> several hours.
>
> * msrepl_commands rowcount approaches 35 million rows and
> msrepl_transactions approaches 6 million rows. With a high volume in
> transactional replication, should there be any indexes on the tables?
> * While tracing through the MS SPs, I found that sp_MSdelete_dodelete only
> deletes the top 5000 rows from MSrepl_transactions. In a high volume
> transactional state, this might not keep up. Can this be changed safely?
> * What other parameters can be changed to keep the transactions only as
> long
> as they are needed but yet not mark the subscription as inactive?
> Thanks!
|||Thanks Hilary. We have one subscriber and are using RAID 10 as recommended.
The max_distretention is set to 72 hours already. The distribution database
was just rebuilt a week ago, so it only has 4 days of deletes in it, so i'm
not sure an index defrag will help.
Why do we need to keep 3 days worth of commands that have already been
replicated? Any suggestions on how to make the distribution cleanup job run
more efficiently? It's running at least 50% of the time now. Is it possible
we're just outpacing what replication was designed for? In my stress testing
before moving to SQL 2005 x64, replication was the weak performance link.
Thanks,
Rob
"Hilary Cotter" wrote:

> If you have a few subscribers use raid 10, if you have a large number use
> raid 5.
> Set max_distretention to 3 days. This will only pool undistributed commands
> for up to 3 days.
> the setting for only deleting 5000 rows at a time is by design. If you
> choose to modify it to something higher make sure you make the modification
> to the end of
> sp_MSdelete_publisherdb_trans as well.
> The appropriate indexes are in place on msrepl_commands and
> msrepl_transactions. You might want to defrag these indexes to see if that
> helps or update statistics.
>
> --
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
> Looking for a FAQ on Indexing Services/SQL FTS
> http://www.indexserverfaq.com
> "RL" <RL@.discussions.microsoft.com> wrote in message
> news:71F2956A-5B80-4918-993F-F8CF3AD2C0D5@.microsoft.com...
>
>
|||I like to keep 3 days around in case I leave on a Friday night of a long
weekend when I come back on Tuesday I can still restart failed distribution
agents without them going expired on me.
The problem is that if you delete too many rows at a time the locks will
escalate to table level locks and your log reader agents will fail.
I have a friend who was in a similar situation as yours (Wachovia bank IIRC
in NYC) and he manually deleted rows in large batches. Reverse engineer to
the procs to see what to delete and stop your log reader agents while doing
this.
I also worked on a similar problem in Philly where the backlog was something
similar and we were able to get through everything in 2 hours.
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"RL" <RL@.discussions.microsoft.com> wrote in message
news:1F28BAE1-FD52-470F-9B35-E596A2544D46@.microsoft.com...[vbcol=seagreen]
> Thanks Hilary. We have one subscriber and are using RAID 10 as
> recommended.
> The max_distretention is set to 72 hours already. The distribution
> database
> was just rebuilt a week ago, so it only has 4 days of deletes in it, so
> i'm
> not sure an index defrag will help.
> Why do we need to keep 3 days worth of commands that have already been
> replicated? Any suggestions on how to make the distribution cleanup job
> run
> more efficiently? It's running at least 50% of the time now. Is it
> possible
> we're just outpacing what replication was designed for? In my stress
> testing
> before moving to SQL 2005 x64, replication was the weak performance link.
> Thanks,
> Rob
> "Hilary Cotter" wrote:
sql

No comments:

Post a Comment