Replicating mirrored dbs: suspect

Showing posts with label suspect. Show all posts

Wednesday, March 28, 2012

replication error - agent is suspect

Hi,
I have set up a replication and it was doing the bulk copy from SrvA to
SrvB when it says "The agent is suspect as there's no response for the last
10 mins" or something like that ?
what could be the problem here ?
however, after i refresh it , it keeps running though but i am unsure if the
data has been successfully replicated over as the processing's running
appreciate any advise
tks & rdgs
Hi,
See the previous posts:-
http://groups.google.co.in/group/mic...36c788aeacadb7
Thanks
Hari
SQL Server MVP
"maxzsim" <maxzsim@.discussions.microsoft.com> wrote in message
news:13D1D8DF-E095-4F6C-9CAB-06D63015A95D@.microsoft.com...
> Hi,
> I have set up a replication and it was doing the bulk copy from SrvA to
> SrvB when it says "The agent is suspect as there's no response for the
> last
> 10 mins" or something like that ?
> what could be the problem here ?
> however, after i refresh it , it keeps running though but i am unsure if
> the
> data has been successfully replicated over as the processing's running
> appreciate any advise
> tks & rdgs
|||Hi ,
I have managed to resolve this issue by increasing the default 10 mins of
agent inactivity
tks & rdgs
"maxzsim" wrote:

> Hi,
> I have set up a replication and it was doing the bulk copy from SrvA to
> SrvB when it says "The agent is suspect as there's no response for the last
> 10 mins" or something like that ?
> what could be the problem here ?
> however, after i refresh it , it keeps running though but i am unsure if the
> data has been successfully replicated over as the processing's running
> appreciate any advise
> tks & rdgs

replication error - agent is suspect

Hi,
I have set up a replication and it was doing the bulk copy from SrvA to
SrvB when it says "The agent is suspect as there's no response for the last
10 mins" or something like that ?
what could be the problem here ?
however, after i refresh it , it keeps running though but i am unsure if the
data has been successfully replicated over as the processing's running
appreciate any advise
tks & rdgsHi,
See the previous posts:-
http://groups.google.co.in/group/mi...936c788aeacadb7
Thanks
Hari
SQL Server MVP
"maxzsim" <maxzsim@.discussions.microsoft.com> wrote in message
news:13D1D8DF-E095-4F6C-9CAB-06D63015A95D@.microsoft.com...
> Hi,
> I have set up a replication and it was doing the bulk copy from SrvA to
> SrvB when it says "The agent is suspect as there's no response for the
> last
> 10 mins" or something like that ?
> what could be the problem here ?
> however, after i refresh it , it keeps running though but i am unsure if
> the
> data has been successfully replicated over as the processing's running
> appreciate any advise
> tks & rdgs|||Hi ,
I have managed to resolve this issue by increasing the default 10 mins of
agent inactivity
tks & rdgs
"maxzsim" wrote:

> Hi,
> I have set up a replication and it was doing the bulk copy from SrvA to
> SrvB when it says "The agent is suspect as there's no response for the las
t
> 10 mins" or something like that ?
> what could be the problem here ?
> however, after i refresh it , it keeps running though but i am unsure if t
he
> data has been successfully replicated over as the processing's running
> appreciate any advise
> tks & rdgs

replication error - agent is suspect

Hi,
I have set up a replication and it was doing the bulk copy from SrvA to
SrvB when it says "The agent is suspect as there's no response for the last
10 mins" or something like that ?
what could be the problem here ?
however, after i refresh it , it keeps running though but i am unsure if the
data has been successfully replicated over as the processing's running
appreciate any advise
tks & rdgsHi,
See the previous posts:-
http://groups.google.co.in/group/microsoft.public.sqlserver.replication/browse_thread/thread/45cba0e7e1543879/7936c788aeacadb7?q=replication+agent+is+suspect&rnum=2&hl=en#7936c788aeacadb7
Thanks
Hari
SQL Server MVP
"maxzsim" <maxzsim@.discussions.microsoft.com> wrote in message
news:13D1D8DF-E095-4F6C-9CAB-06D63015A95D@.microsoft.com...
> Hi,
> I have set up a replication and it was doing the bulk copy from SrvA to
> SrvB when it says "The agent is suspect as there's no response for the
> last
> 10 mins" or something like that ?
> what could be the problem here ?
> however, after i refresh it , it keeps running though but i am unsure if
> the
> data has been successfully replicated over as the processing's running
> appreciate any advise
> tks & rdgs|||Hi ,
I have managed to resolve this issue by increasing the default 10 mins of
agent inactivity
tks & rdgs
"maxzsim" wrote:
> Hi,
> I have set up a replication and it was doing the bulk copy from SrvA to
> SrvB when it says "The agent is suspect as there's no response for the last
> 10 mins" or something like that ?
> what could be the problem here ?
> however, after i refresh it , it keeps running though but i am unsure if the
> data has been successfully replicated over as the processing's running
> appreciate any advise
> tks & rdgs

Friday, March 23, 2012

Replication Cleanup in high volume database

We're having intermittent performance problems on a SQL 2005 SP1 system that
we suspect is related to transactional replication. There may be overlap
with Jack Griffith's thread "Replication system disk performance severly
degrades after 1 month", but I'm not sure.....
While troubleshooting the problem, we attempted to make things better by
clearing out any transactions that had already been replicated to the
subscriber (EXEC dbo.sp_MSdistribution_cleanup @.min_distretention = 0,
@.max_distretention = 0). BAD, BAD, BAD!!!!! Setting @.max_distretention = 0
marked our subscription as inactive and we had to rebuild the subscriber
database to get it back in sync with the publisher.
I estimate add/update 2-4 million rows a day in our publication database.
The distribution database has grown to 9GB in < a week.
I have not altered any of the database cleanup jobs as of yet. The current
Distribution Cleanup job runs every 10 min and is taking 6-15 minutes to run.
We observe a tremendous number of disk reads on the Distribution database
disks. When we experience performance issues, we also see a HUGE number of
memory pages lasting ~ 15 seconds and occuring exactly every 2 minutes.
Restarting the SQLServer service seems to clear up the performance issues for
several hours.
* msrepl_commands rowcount approaches 35 million rows and
msrepl_transactions approaches 6 million rows. With a high volume in
transactional replication, should there be any indexes on the tables?
* While tracing through the MS SPs, I found that sp_MSdelete_dodelete only
deletes the top 5000 rows from MSrepl_transactions. In a high volume
transactional state, this might not keep up. Can this be changed safely?
* What other parameters can be changed to keep the transactions only as long
as they are needed but yet not mark the subscription as inactive?
Thanks!
If you have a few subscribers use raid 10, if you have a large number use
raid 5.
Set max_distretention to 3 days. This will only pool undistributed commands
for up to 3 days.
the setting for only deleting 5000 rows at a time is by design. If you
choose to modify it to something higher make sure you make the modification
to the end of
sp_MSdelete_publisherdb_trans as well.
The appropriate indexes are in place on msrepl_commands and
msrepl_transactions. You might want to defrag these indexes to see if that
helps or update statistics.
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"RL" <RL@.discussions.microsoft.com> wrote in message
news:71F2956A-5B80-4918-993F-F8CF3AD2C0D5@.microsoft.com...
> We're having intermittent performance problems on a SQL 2005 SP1 system
> that
> we suspect is related to transactional replication. There may be overlap
> with Jack Griffith's thread "Replication system disk performance severly
> degrades after 1 month", but I'm not sure.....
> While troubleshooting the problem, we attempted to make things better by
> clearing out any transactions that had already been replicated to the
> subscriber (EXEC dbo.sp_MSdistribution_cleanup @.min_distretention = 0,
> @.max_distretention = 0). BAD, BAD, BAD!!!!! Setting @.max_distretention =
> 0
> marked our subscription as inactive and we had to rebuild the subscriber
> database to get it back in sync with the publisher.
> I estimate add/update 2-4 million rows a day in our publication database.
> The distribution database has grown to 9GB in < a week.
> I have not altered any of the database cleanup jobs as of yet. The
> current
> Distribution Cleanup job runs every 10 min and is taking 6-15 minutes to
> run.
> We observe a tremendous number of disk reads on the Distribution database
> disks. When we experience performance issues, we also see a HUGE number
> of
> memory pages lasting ~ 15 seconds and occuring exactly every 2 minutes.
> Restarting the SQLServer service seems to clear up the performance issues
> for
> several hours.
>
> * msrepl_commands rowcount approaches 35 million rows and
> msrepl_transactions approaches 6 million rows. With a high volume in
> transactional replication, should there be any indexes on the tables?
> * While tracing through the MS SPs, I found that sp_MSdelete_dodelete only
> deletes the top 5000 rows from MSrepl_transactions. In a high volume
> transactional state, this might not keep up. Can this be changed safely?
> * What other parameters can be changed to keep the transactions only as
> long
> as they are needed but yet not mark the subscription as inactive?
> Thanks!
|||Thanks Hilary. We have one subscriber and are using RAID 10 as recommended.
The max_distretention is set to 72 hours already. The distribution database
was just rebuilt a week ago, so it only has 4 days of deletes in it, so i'm
not sure an index defrag will help.
Why do we need to keep 3 days worth of commands that have already been
replicated? Any suggestions on how to make the distribution cleanup job run
more efficiently? It's running at least 50% of the time now. Is it possible
we're just outpacing what replication was designed for? In my stress testing
before moving to SQL 2005 x64, replication was the weak performance link.
Thanks,
Rob
"Hilary Cotter" wrote:

> If you have a few subscribers use raid 10, if you have a large number use
> raid 5.
> Set max_distretention to 3 days. This will only pool undistributed commands
> for up to 3 days.
> the setting for only deleting 5000 rows at a time is by design. If you
> choose to modify it to something higher make sure you make the modification
> to the end of
> sp_MSdelete_publisherdb_trans as well.
> The appropriate indexes are in place on msrepl_commands and
> msrepl_transactions. You might want to defrag these indexes to see if that
> helps or update statistics.
>
> --
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
> Looking for a FAQ on Indexing Services/SQL FTS
> http://www.indexserverfaq.com
> "RL" <RL@.discussions.microsoft.com> wrote in message
> news:71F2956A-5B80-4918-993F-F8CF3AD2C0D5@.microsoft.com...
>
>
|||I like to keep 3 days around in case I leave on a Friday night of a long
weekend when I come back on Tuesday I can still restart failed distribution
agents without them going expired on me.
The problem is that if you delete too many rows at a time the locks will
escalate to table level locks and your log reader agents will fail.
I have a friend who was in a similar situation as yours (Wachovia bank IIRC
in NYC) and he manually deleted rows in large batches. Reverse engineer to
the procs to see what to delete and stop your log reader agents while doing
this.
I also worked on a similar problem in Philly where the backlog was something
similar and we were able to get through everything in 2 hours.
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"RL" <RL@.discussions.microsoft.com> wrote in message
news:1F28BAE1-FD52-470F-9B35-E596A2544D46@.microsoft.com...[vbcol=seagreen]
> Thanks Hilary. We have one subscriber and are using RAID 10 as
> recommended.
> The max_distretention is set to 72 hours already. The distribution
> database
> was just rebuilt a week ago, so it only has 4 days of deletes in it, so
> i'm
> not sure an index defrag will help.
> Why do we need to keep 3 days worth of commands that have already been
> replicated? Any suggestions on how to make the distribution cleanup job
> run
> more efficiently? It's running at least 50% of the time now. Is it
> possible
> we're just outpacing what replication was designed for? In my stress
> testing
> before moving to SQL 2005 x64, replication was the weak performance link.
> Thanks,
> Rob
> "Hilary Cotter" wrote:
sql

Wednesday, March 7, 2012

replication - agent is suspect

I setup a client with merge replication, the following error was
returned after working successfully for a day "The agent is suspect.
No response within the last 10 minutes". Most research I found on
the newsgroup is this error does not stop merge replication from
working, but in my case it did stop replication from working. I was
able to get to working again by running the following command.
sp_replication_agent_checkup @.heartbeat_interval = 6000
Can anybody explain why replication stopped working altogether with the
agent is suspect message, and what can I do to avoid this message to
begin with?
Thank you,
Pauly C
Suspect means that the agent hasn't communicated with the job subsystem in
the interval defined in the heartbeat_interval.
The job will be failed after the heartbeat_interval has passed, but I am not
exactly sure when.
The agent could be busy, dead, or trying to connect. Run profiler on the
subscriber to try to figure out what is going on. Profiler might not reveal
anything. There is a condition when the publisher or subscriber has
exhausted buffers and hangs on initialization. You can clear this by
rebooting.
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.
This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"paulysc" <paulysc@.optonline.net> wrote in message
news:1141228700.833217.245870@.t39g2000cwt.googlegr oups.com...
>I setup a client with merge replication, the following error was
> returned after working successfully for a day "The agent is suspect.
> No response within the last 10 minutes". Most research I found on
> the newsgroup is this error does not stop merge replication from
> working, but in my case it did stop replication from working. I was
> able to get to working again by running the following command.
> sp_replication_agent_checkup @.heartbeat_interval = 6000
> Can anybody explain why replication stopped working altogether with the
> agent is suspect message, and what can I do to avoid this message to
> begin with?
> Thank you,
> Pauly C
>
|||I saw this error message if the job runs against a wrong database. In my
case, the Queue Reader was running against the master database, not the
distribution, which resulted to the same error message as yours.
hope it helps,
FJY
"paulysc" wrote:

> I setup a client with merge replication, the following error was
> returned after working successfully for a day "The agent is suspect.
> No response within the last 10 minutes". Most research I found on
> the newsgroup is this error does not stop merge replication from
> working, but in my case it did stop replication from working. I was
> able to get to working again by running the following command.
> sp_replication_agent_checkup @.heartbeat_interval = 6000
> Can anybody explain why replication stopped working altogether with the
> agent is suspect message, and what can I do to avoid this message to
> begin with?
> Thank you,
> Pauly C
>

Wednesday, March 28, 2012

replication error - agent is suspect

replication error - agent is suspect

replication error - agent is suspect

Friday, March 23, 2012

Replication Cleanup in high volume database

Wednesday, March 7, 2012

replication - agent is suspect

Replicating mirrored dbs

Blog Archive

About Me