Change data capture

It is possible that the source and target are the same system physically, but that would not change the design pattern logically.

Any row in any table that has a timestamp in that column that is more recent than the last time data was captured is considered to have changed.

Database designers give tables whose changes must be captured a column that contains a version number.

For optimistic locking each row has an independent version number, typically a sequential counter.

Otherwise, it can act as a complement to the previous methods, indicating that a row, despite having a new version number or a later date, still shouldn't be updated on the target (for example, the data may require human validation).

As noted, it is not uncommon to see multiple CDC solutions at work in a single system, however, the combination of time, version, and status provides a particularly powerful mechanism and programmers should utilize them as a trio where possible.

Using them together allows for such logic as, "Capture all data for version 2.1 that changed between 2005-06-01 00:00 and 2005-07-01 00:00 where the status code indicates it is ready for production."

The queue table might have schema with the following fields: Id, TableName, RowId, Timestamp, Operation.

This queue table could then be "played back" to replicate the data from the source system to a target.

Data capture offers a challenge in that the structure, contents and use of a transaction log is specific to a database management system.

Other challenges in using transaction logs for change data capture include: CDC solutions based on transaction log files have distinct advantages that include: As often occurs in complex domains, the final solution to a CDC problem may have to balance many competing concerns.

Slowly changing dimension (SCD) model example