Replicating mirrored dbs: Replication bandwidth comparison

I'm trying to find out which type of replication takes the least network
bandwidth to transfer the same amount of data between two servers.
I need to transfer new data from one to another SQL2005 server. I could use
merge or transactional replication
or even create my own scripts that would select the right data from one
server and import it on another but my real concern is what method performs
best on slow connections (64Kb/s range)?
I'm planning to test all methods but would still like to get some info from
people with more experience in that area..
Thanks.
Tom
That depends on your data modification patterns. If the same row gets
updated many times, like a NASDAQ stock table, with merge replication only
the final value with move across the wire. With transactional replication
each change will flow.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Tom" <mcseman@.hotmail.com> wrote in message
news:OzH2p9iFGHA.2444@.TK2MSFTNGP11.phx.gbl...
> I'm trying to find out which type of replication takes the least network
> bandwidth to transfer the same amount of data between two servers.
> I need to transfer new data from one to another SQL2005 server. I could
> use merge or transactional replication
> or even create my own scripts that would select the right data from one
> server and import it on another but my real concern is what method
> performs best on slow connections (64Kb/s range)?
> I'm planning to test all methods but would still like to get some info
> from people with more experience in that area..
> Thanks.
> Tom
>
>
|||Both. It has no bearing on inserts. You get essentially the same amount of
data. It also has no bearing on deletes, because the transaction is simply
pointed at an ID and is also basically the same size. If you are using
purely default settings, then by default merge transfers less data. You can
change the default settings for transactional and it will then only transfer
just those columns which have changed.
Both replication methods will transfer data across a 64K link without any
prolems. The issue is with how much data, # of transactions, you plan on
sending between the two.
"Tom" <mcseman@.hotmail.com> wrote in message
news:OzH2p9iFGHA.2444@.TK2MSFTNGP11.phx.gbl...
> I'm trying to find out which type of replication takes the least network
> bandwidth to transfer the same amount of data between two servers.
> I need to transfer new data from one to another SQL2005 server. I could
> use merge or transactional replication
> or even create my own scripts that would select the right data from one
> server and import it on another but my real concern is what method
> performs best on slow connections (64Kb/s range)?
> I'm planning to test all methods but would still like to get some info
> from people with more experience in that area..
> Thanks.
> Tom
>
>
|||This is not quite accurate. With transactional replication all the logged
changes go across the wire, and are decomposed into individual singletons.
So an update transaction affecting 100 rows would be decomposed into 100
separate singleton updates.
The same behavior in merge, but with merge replication its the net change
which goes across the wire. Suppose the same row is updated 100 times. With
transactional replication 100 changes flow. With merge only the net change,
i.e. the final image of the row will flow if the sync occurs after 100
changes. This is why by default merge replication agents run each hour.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Z" <z@.z.com> wrote in message news:u3BpwtuFGHA.2212@.TK2MSFTNGP15.phx.gbl...
> Both. It has no bearing on inserts. You get essentially the same amount
> of data. It also has no bearing on deletes, because the transaction is
> simply pointed at an ID and is also basically the same size. If you are
> using purely default settings, then by default merge transfers less data.
> You can change the default settings for transactional and it will then
> only transfer just those columns which have changed.
> Both replication methods will transfer data across a 64K link without any
> prolems. The issue is with how much data, # of transactions, you plan on
> sending between the two.
> "Tom" <mcseman@.hotmail.com> wrote in message
> news:OzH2p9iFGHA.2444@.TK2MSFTNGP11.phx.gbl...
>
|||Please review Microsoft Resolver Descriptions in BOL, specifically the
section entitled. Where it states.
Microsoft SQL Server DATETIME (Earlier Wins) Conflict Resolver
Name of the column to be used to determine the conflict winner. It must have
a DATETIME data type.
Column with the earlier datetime value determines the conflict winner.
If one is set to NULL, the row containing the other is the winner. Supports
update conflicts, row, and column tracking. The column values are compared
directly and an adjustment is not made for different time zones.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Z" <z@.z.com> wrote in message news:u3BpwtuFGHA.2212@.TK2MSFTNGP15.phx.gbl...
> Both. It has no bearing on inserts. You get essentially the same amount
> of data. It also has no bearing on deletes, because the transaction is
> simply pointed at an ID and is also basically the same size. If you are
> using purely default settings, then by default merge transfers less data.
> You can change the default settings for transactional and it will then
> only transfer just those columns which have changed.
> Both replication methods will transfer data across a 64K link without any
> prolems. The issue is with how much data, # of transactions, you plan on
> sending between the two.
> "Tom" <mcseman@.hotmail.com> wrote in message
> news:OzH2p9iFGHA.2444@.TK2MSFTNGP11.phx.gbl...
>
|||sorry wrong post

Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Hilary Cotter" <hilary.cotter@.gmail.com> wrote in message
news:%23H8kM5uFGHA.1312@.TK2MSFTNGP09.phx.gbl...
> Please review Microsoft Resolver Descriptions in BOL, specifically the
> section entitled. Where it states.
> Microsoft SQL Server DATETIME (Earlier Wins) Conflict Resolver
> Name of the column to be used to determine the conflict winner. It must
> have a DATETIME data type.
> Column with the earlier datetime value determines the conflict winner.
> If one is set to NULL, the row containing the other is the winner.
> Supports update conflicts, row, and column tracking. The column values are
> compared directly and an adjustment is not made for different time zones.
> --
> Hilary Cotter
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
> Looking for a FAQ on Indexing Services/SQL FTS
> http://www.indexserverfaq.com
> "Z" <z@.z.com> wrote in message
> news:u3BpwtuFGHA.2212@.TK2MSFTNGP15.phx.gbl...
>
|||That is only if the updates happen between cycles of the replication engine.
Which is an extremely explicit case and also incredibly hard to actually do
in a production environment. Merge will only send what it needs to send.
It does this by picking up the ID for a row that changed and then pull the
data directly from the row. If you happen to be lucky enough to update a
row multiple times, quickly enough, merge will only pick up the most recent
change and send it. But, that is a very unlikely scenario.
Mike
http://www.solidqualitylearning.com
Disclaimer: This communication is an original work and represents my sole
views on the subject. It does not represent the views of any other person
or entity either by inference or direct reference.
"Hilary Cotter" <hilary.cotter@.gmail.com> wrote in message
news:ujkqL0uFGHA.752@.TK2MSFTNGP12.phx.gbl...
> This is not quite accurate. With transactional replication all the logged
> changes go across the wire, and are decomposed into individual singletons.
> So an update transaction affecting 100 rows would be decomposed into 100
> separate singleton updates.
> The same behavior in merge, but with merge replication its the net change
> which goes across the wire. Suppose the same row is updated 100 times.
> With transactional replication 100 changes flow. With merge only the net
> change, i.e. the final image of the row will flow if the sync occurs after
> 100 changes. This is why by default merge replication agents run each
> hour.
> --
> Hilary Cotter
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
> Looking for a FAQ on Indexing Services/SQL FTS
> http://www.indexserverfaq.com
> "Z" <z@.z.com> wrote in message
> news:u3BpwtuFGHA.2212@.TK2MSFTNGP15.phx.gbl...
>
|||It really depends on your DML. As far as I know there is no typical
workloads. Some workloads are insert heavy. Some are update heavy, some are
a mix of insert and delete. Some are a mix of all three.
Have a look at the workload MS used here for benchmarking.
http://www.microsoft.com/technet/pro.../mergrepl.mspx
Looks pretty update heavy to me! Are you perhaps insinuating that MS is
cooking the books?
The merge agent by default runs hourly. The distribution agent by default
runs continuously. By default the merge agent only sends a delta per row
during a sync no matter how many have updates occurred since the last sync.
With transactional there is no such consolidation.
Merge is also tuned for low bandwidth lines.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Michael Hotek" <mike@.solidqualitylearning.com> wrote in message
news:Oqpum4HGGHA.1388@.TK2MSFTNGP11.phx.gbl...
> That is only if the updates happen between cycles of the replication
> engine. Which is an extremely explicit case and also incredibly hard to
> actually do in a production environment. Merge will only send what it
> needs to send. It does this by picking up the ID for a row that changed
> and then pull the data directly from the row. If you happen to be lucky
> enough to update a row multiple times, quickly enough, merge will only
> pick up the most recent change and send it. But, that is a very unlikely
> scenario.
> --
> Mike
> http://www.solidqualitylearning.com
> Disclaimer: This communication is an original work and represents my sole
> views on the subject. It does not represent the views of any other person
> or entity either by inference or direct reference.
> "Hilary Cotter" <hilary.cotter@.gmail.com> wrote in message
> news:ujkqL0uFGHA.752@.TK2MSFTNGP12.phx.gbl...
>
|||Now just where did THAT accusation come from? I really don't appreciate you
sticking words in my mouth.
You had also better go back and take a MUCH better look at that benchmark,
because your conclusions are fatally flawed.
1. The standard workload specified (definitely NOT predominantly update)
a. 251140 inserts, 640 deletes, and 1920 updates at the publisher
b. 400 inserts, 200 updates, and 50 deletes at each subscriber
2. The high volume workload specified (definitely NOT predominantly update)
a. 40,000 inserts, 10,000 deletes, and 50,000 updates downloaded to
each subscriber
b. 101,000 inserts uploaded from each subscriber
c. No numbers for the publisher
3. The tests measured snapshot generation and the number of changes being
processed in comparison from 2000 to 2005
4. The benchmark was also explicitly focused on filtered subscriptions (an
area of particular performance issues in 2000)
5. Nowhere did it specify ANY network bandwith numbers
6. Nowhere did it specify what the update workload actually was (in terms of
explicit statements)
7. Nowhere did it specify that multiple updates to the same row occured
between synch cycles
8. The merge agent was DEFINITELY not configured with default settings
9. The benchmark as designd for showing performance improvements, related to
specific feature improvements
10. Nowhere did that benchmark discuss or provide any data whatsoever which
would be relevant to this discussion or to the question which was asked.
It doesn't matter if merge is configured to run continuosly, every
microsecond, or once per millenia. The merge engine picks up everything it
needs to process during each cycle. The ONLY time that the merge engine
essentially ignores multiple updates to a row of data is when those updates
happen under a VERY explicit circumstance. Your application must update the
row more than once and it must do so after the merge engine has executed a
synchronization cycle and before the next cycle has executed. Furthermore,
it only ignores a update when you have issued multiple updates to exactly
the same column within the same row of data more than once between
synchronization cycles. If you have updated more than one column in a row,
even across multiple update statements, between synchronization cycles, the
merge engine simply packages those into a single update statement. It is
extraordinarily rare in production applications to to essentially rapid fire
update the same row of data over and over and over again.
Benchmarks are benchmarks. They are designed to show what the theoretical
capacity of whatever is being tested can do.
The only applications I have ever come across in a production environment
which even met the muti-update in time requirement had virtually all of
their data generated by a machine. Only 1 of those was running merge
replication. And on that 1 application, the merge engine issued fewer
updates to the subscriber because multiple updates to the same row occured
between cycles of the replication engine during less than 10% of the total
synchronization cycles.
If you are going to quote numbers, provide benchmarks, and specify what
something does, you had better be VERY explicit in the exact configuration
you are basing it on with the exact options that are set.
The only thing which can differ between the transactional engine and the
merge engine is when you have an application which makes multiple updates to
the same row of data between synchronization cycles. If you do not cause
that very specific situation to exist, there is VERY little difference to
bandwidth impact. The merge engine, by default will only pick up those
columns which have changed whereas the transactional engine, by default,
will pick up all columns in the row. But, by changing a simple setting, the
merge engine can be made to send all of the columns (even those not changed)
while the transactional engine will only pick up just the columns which have
changed. Before you get into the multiple writes with the transactional
engine (pub->dist + dist->sub), you really need to spend a lot of time
running various scripts and testing the number of packets shuttled across
the network between the transactional engine sending changes and the merge
engine needing to calculate the data differential between
publisher/subscriber pairs along with the sending of the actual changes +
the conflict resolution process.
In terms of bandwidth consumed, both the transactional and the merge engine
can be configured to consume remarkably similar chunks of network bandwith.
Mike
http://www.solidqualitylearning.com
Disclaimer: This communication is an original work and represents my sole
views on the subject. It does not represent the views of any other person
or entity either by inference or direct reference.
"Hilary Cotter" <hilary.cotter@.gmail.com> wrote in message
news:%23yEHqGLGGHA.3176@.TK2MSFTNGP12.phx.gbl...
> It really depends on your DML. As far as I know there is no typical
> workloads. Some workloads are insert heavy. Some are update heavy, some
> are a mix of insert and delete. Some are a mix of all three.
> Have a look at the workload MS used here for benchmarking.
> http://www.microsoft.com/technet/pro.../mergrepl.mspx
> Looks pretty update heavy to me! Are you perhaps insinuating that MS is
> cooking the books?
> The merge agent by default runs hourly. The distribution agent by default
> runs continuously. By default the merge agent only sends a delta per row
> during a sync no matter how many have updates occurred since the last
> sync. With transactional there is no such consolidation.
> Merge is also tuned for low bandwidth lines.
> --
> Hilary Cotter
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
> Looking for a FAQ on Indexing Services/SQL FTS
> http://www.indexserverfaq.com
> "Michael Hotek" <mike@.solidqualitylearning.com> wrote in message
> news:Oqpum4HGGHA.1388@.TK2MSFTNGP11.phx.gbl...
>
|||I've been looking at the benchmark again, and while I still dispute some of
what you are saying, I notice that the benchmarks mention nothing about the
update patterns. For example as you point out, we don't know how many times
an individual row was updated. So, you are correct that these benchmarks
prove nothing.
I don't have perfect knowledge of the ratio of inserts/updates/deletes in
the world; my observations are that the relative frequency is
updates/inserts/deletes - in other words there are more updates, than
inserts, and there are more inserts than deletes in the workflow that I see
in my particular domain of influence. I concede that it could be wildly
different, but my particular experience is more updates than anything else.
I have checked and confirmed this with my peers.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Michael Hotek" <mike@.solidqualitylearning.com> wrote in message
news:uTP39rzGGHA.140@.TK2MSFTNGP12.phx.gbl...
> Now just where did THAT accusation come from? I really don't appreciate
> you sticking words in my mouth.
> You had also better go back and take a MUCH better look at that benchmark,
> because your conclusions are fatally flawed.
> 1. The standard workload specified (definitely NOT predominantly update)
> a. 251140 inserts, 640 deletes, and 1920 updates at the publisher
> b. 400 inserts, 200 updates, and 50 deletes at each subscriber
> 2. The high volume workload specified (definitely NOT predominantly
> update)
> a. 40,000 inserts, 10,000 deletes, and 50,000 updates downloaded to
> each subscriber
> b. 101,000 inserts uploaded from each subscriber
> c. No numbers for the publisher
> 3. The tests measured snapshot generation and the number of changes being
> processed in comparison from 2000 to 2005
> 4. The benchmark was also explicitly focused on filtered subscriptions (an
> area of particular performance issues in 2000)
> 5. Nowhere did it specify ANY network bandwith numbers
> 6. Nowhere did it specify what the update workload actually was (in terms
> of explicit statements)
> 7. Nowhere did it specify that multiple updates to the same row occured
> between synch cycles
> 8. The merge agent was DEFINITELY not configured with default settings
> 9. The benchmark as designd for showing performance improvements, related
> to specific feature improvements
> 10. Nowhere did that benchmark discuss or provide any data whatsoever
> which would be relevant to this discussion or to the question which was
> asked.
> It doesn't matter if merge is configured to run continuosly, every
> microsecond, or once per millenia. The merge engine picks up everything
> it needs to process during each cycle. The ONLY time that the merge
> engine essentially ignores multiple updates to a row of data is when those
> updates happen under a VERY explicit circumstance. Your application must
> update the row more than once and it must do so after the merge engine has
> executed a synchronization cycle and before the next cycle has executed.
> Furthermore, it only ignores a update when you have issued multiple
> updates to exactly the same column within the same row of data more than
> once between synchronization cycles. If you have updated more than one
> column in a row, even across multiple update statements, between
> synchronization cycles, the merge engine simply packages those into a
> single update statement. It is extraordinarily rare in production
> applications to to essentially rapid fire update the same row of data over
> and over and over again.
> Benchmarks are benchmarks. They are designed to show what the theoretical
> capacity of whatever is being tested can do.
> The only applications I have ever come across in a production environment
> which even met the muti-update in time requirement had virtually all of
> their data generated by a machine. Only 1 of those was running merge
> replication. And on that 1 application, the merge engine issued fewer
> updates to the subscriber because multiple updates to the same row occured
> between cycles of the replication engine during less than 10% of the total
> synchronization cycles.
> If you are going to quote numbers, provide benchmarks, and specify what
> something does, you had better be VERY explicit in the exact configuration
> you are basing it on with the exact options that are set.
> The only thing which can differ between the transactional engine and the
> merge engine is when you have an application which makes multiple updates
> to the same row of data between synchronization cycles. If you do not
> cause that very specific situation to exist, there is VERY little
> difference to bandwidth impact. The merge engine, by default will only
> pick up those columns which have changed whereas the transactional engine,
> by default, will pick up all columns in the row. But, by changing a
> simple setting, the merge engine can be made to send all of the columns
> (even those not changed) while the transactional engine will only pick up
> just the columns which have changed. Before you get into the multiple
> writes with the transactional engine (pub->dist + dist->sub), you really
> need to spend a lot of time running various scripts and testing the number
> of packets shuttled across the network between the transactional engine
> sending changes and the merge engine needing to calculate the data
> differential between publisher/subscriber pairs along with the sending of
> the actual changes + the conflict resolution process.
> In terms of bandwidth consumed, both the transactional and the merge
> engine can be configured to consume remarkably similar chunks of network
> bandwith.
> --
> Mike
> http://www.solidqualitylearning.com
> Disclaimer: This communication is an original work and represents my sole
> views on the subject. It does not represent the views of any other person
> or entity either by inference or direct reference.
>
> "Hilary Cotter" <hilary.cotter@.gmail.com> wrote in message
> news:%23yEHqGLGGHA.3176@.TK2MSFTNGP12.phx.gbl...
>

Wednesday, March 21, 2012

Replication bandwidth comparison

No comments:

Post a Comment

Replicating mirrored dbs

Blog Archive

About Me