Monday, February 20, 2012

Replicating to an archive database.

I want to replicate to an archive database. This means that the subscriber will have data that has been removed from the publisher. In my reading, I haven't seen any discussion of this specific scenario.

Here's what I imagine the solution might be:

EXEC sp_addpublication_snapshot
@.publication = N'My_Publication',
@.frequency_type = 1 -- only create the snapshot once
GO

...

EXEC sp_addarticle
@.publication = N'My_Publication',
@.article = N'My_Table',
@.source_owner = N'dbo',
@.source_object = N'My_Table',
@.del_cmd = 'NONE',
GO

I set the publication snapshot to only execute once, that would be during the maintenance window when it is initially installed. Then, on the tables that will contain archived data, I specify that deletes aren't replicated.

Here's my concern: aren't there times when you need to resync?

If you could push a new snapshot that dropped the tables on the subscriber and built the thing up from scratch, then things would sync-up just fine. But in this scenario if you drop the subscriber tables then you've just lost your archive.

What's the best way to handle this?

Thanks,

-=michael=-I'm really not certain that you want to do this. Replication is designed with some specific use-cases in mind and I do not believe that an archive database is one of those uses.

Instead, why not consider using DTS to move data from your production (live) database to your archive database? Alternatively, you could create some custom jobs to move data to the archive.

Regards,

hmscott|||I'm really not certain that you want to do this. Replication is designed with some specific use-cases in mind and I do not believe that an archive database is one of those uses.

This is, indeed, one of my concerns. I'm hoping that someone out there has experience doing something similar. Perhaps a better question would be, "How have people solved the archive/reporting problem?"

-=michael=-|||You could probably write a book of several thousand pages and still not cover every method individuals and companies have used to create archiving and reporting solutions (many people have).

Look on MS' web page for white papers on archiving and reporting. I think that you will find that their "best practices" approach uses DTS for moving data off the OLTP (transactional, or live) database onto the OLAP (reporting) database. It's their baby, their going to tout it.

Other vendors offer alternative packages; so of course you should look around. But DTS is free (with SQL Server), so it's probably a good starting point.

I myself have used it in several instances for moving data and I find it to be a workable solution for all but the very largest stores.

Regards,

hmscott|||I think that you will find that their "best practices" approach uses DTS for moving data off the OLTP (transactional, or live) database onto the OLAP (reporting) database.

Agreed. If all I cared about was moving inserts over to a reporting/archive database, then coding up an extract/transform/load/delete would be the right thing to do. Unfortunately, I've got a database with plenty of updates and deletes happening also. This means that I either have to find some way to identify and execute the update and delete transactions on the reporting/archive database (i.e. replicate them), or extract/transform/load the entire database with enough frequency to make the reports current enough to meet customer expectations (i.e. create snapshots). Given the size of our database, I'm hoping to replicate transactions.

-=michael=-|||Realize that when you are replicating tranactions to a second database, each insert, update, and delete will require an exclusive lock on row(s), page(s), or table(s), thereby inhibiting your reporting piece (select(s)) from executing in a timely manner, unless you incorporate nolock (and possible inconsistency) into the data retrieval operations.

With that in mind, you could place "instead of" delete triggers on the tables you wish to retain historical data on the reporting server. But you better have a great backup plan in place, because you won't be able to synchronize the reporting database with the OLTP database at any time.

Another way would be to modify the replication sprocs to not send deletes and build new rows on the updates.|||... Given the size of our database, I'm hoping to replicate transactions.

-=michael=-

What's the size of your database? Also, is there a "window" (or windows) to mark active versus closed transactions?

I used DTS to copy "open" status records from a source (non-SQL) with running updates every fifteen minutes. Closed records (of a specific status) were moved to a permanent repository every day.

If your environment is sufficiently large (and your budget sufficiently unlimited), you might consider a staging area using a log shipping pair and then a permanent data warehouse (fed from the log shipping target via DTS).

Regards,

hmscott

No comments:

Post a Comment