Most logical based replication solutions in Oracle are based on extracting (or mining) the transactions out of the transaction log (redo logs) and transforming this into SQL statements which are then run against the target database. The advantage of this type of replication are that it is done outside of the database causing minimum impact, is done in real-time and does not require any changes in the database such as trigger based replication.
These type of replication solutions are also called CDC or changed data capture solutions as they mine the redo logs for all the changed data as they are occurring in the source database.
All logical based replication solutions adhere to the same three fundamental principles:
- All set based SQL operations are converted to row-by-row SQL changes.
- For the data in the table to be replicated, each row in the table must be uniquely identified.
- Conflicts can cause data divergence and lost updates.
Lets discuss each of these principles in detail.
This first post will discuss the first principle and principle 2 and 3 will be covered in subsequent posts across the next weeks (subscribehere to ensure you don't miss out).
Note that Dbvisit Replicate is used as an example but these principles apply to all logical based change data capture solutions such as Golden Gate and Streams.
1. All set based SQL operations are converted to row-by-row SQL changes. The set based SQL operations on the source database are converted to row-by-row changes on the target database. SQL issued on the source database can change an arbitrary number of rows. This is mined by Dbvisit Replicate on the source database as row-by-row changes, and the changes are applied at the target database as row-by-row changes, each SQL only affecting one row at a time.
The consequence is that the SQL issued at the apply database is not the same issued against mine – instead, each SQL updates/deletes/inserts exactly one row, applying just one change.
Why is it like this? The reason is that this is the way that Oracle writes the transactions into the redo log. Oracle has already broken down the transaction to individual row-by-row changes so that is what the mining tools have to work with.
update PRICES set PRICE = PRICE - (PRICE * .10) where PRODUCT_CAT = 'OLD_STOCK';
This set based operation affecting 2 rows in our source database is split into 2 SQL statements before it is written into the redo logs. The formula (PRICE = PRICE - (PRICE * .10)) is also replaced by the actual hardcoded values of the new PRICE value.
The resulting SQL statements for the target database are:
update PRICES set PRICE = 10 where PRODUCT_CAT = 'OLD_STOCK' and PROD_ID = 101; update PRICES set PRICE = 23 where PRODUCT_CAT = 'OLD_STOCK' and PROD_ID = 102;
Notice that the target SQL has the primary key (PROD_ID) added to the WHERE statement. Adding the primary key will ensure that the target SQL only updates one row at a time. If the source SQL updates 100 records in a single SQL statement, then the target SQL will consist of 100 single SQL statements each with the unique primary key added.
It can be seen that having a primary key on the table to be replicated becomes very important. This allows the resulting target SQL to add the primary to the WHERE clause to ensure only one rows gets updated at a time.
What if the table to be replicated does not have a primary key? Does that mean we cannot replicate this table using logical based replication? The answer is no. In most cases we will still be able to replicate the table if there is no primary key, but this depends on principle number 2 which I will explain in the next post.