Many customers are interested in minimizing the impact of the masking process whilst maintaining up to date masked copies of production. Data masking is typically run as a batch process which operates on an entire dataset. This becomes increasingly inconvenient, time and resource consuming as the database grows and there is greater need to access up to date secure copies of production data.
There are two approaches to address this problem. One is to implement Continuous Masking using logical replication. I explain how to achieve this with GoldenGate and database triggers in my blog article "Instant, Secure, Up-to-date Test Data with Minimal Storage".
The alternative is to implement Incremental Masking using Change Data Capture (CDC) methods to identify the set of deltas since the current masked replica was last updated.
To generate the deltas you can apply CDC techniques, such as Oracle Data Integrator, between two point in time versions of the source database. You can do this using two database snapshots or flashback query if the retention period is adequate. The first snapshot needs to align to the last update on the masked target. You then apply the masking to the deltas and merge the results to the masked target.
Alternatively you can perform comparisons between the source table and the masked target table, as long as you are careful to avoid the obvious differences due to the masked columns in the target.
It turns out that the DBMS_COMPARISON package which performs differences between two tables or views, based on single tables, can be used to identify and apply changes between two copies of a table where the second copy has masked columns. The tables can of course be in different databases.
There are two approaches to address this problem. One is to implement Continuous Masking using logical replication. I explain how to achieve this with GoldenGate and database triggers in my blog article "Instant, Secure, Up-to-date Test Data with Minimal Storage".
The alternative is to implement Incremental Masking using Change Data Capture (CDC) methods to identify the set of deltas since the current masked replica was last updated.
To generate the deltas you can apply CDC techniques, such as Oracle Data Integrator, between two point in time versions of the source database. You can do this using two database snapshots or flashback query if the retention period is adequate. The first snapshot needs to align to the last update on the masked target. You then apply the masking to the deltas and merge the results to the masked target.
Alternatively you can perform comparisons between the source table and the masked target table, as long as you are careful to avoid the obvious differences due to the masked columns in the target.
It turns out that the DBMS_COMPARISON package which performs differences between two tables or views, based on single tables, can be used to identify and apply changes between two copies of a table where the second copy has masked columns. The tables can of course be in different databases.
The key is
to compare the source and target tables using a view on the source table that calls the masking
functions so that the values returned by the comparison match the actual masked values stored in
the target, otherwise we would get false differences. An interesting side
effect is that the deltas generated by the comparison function are based on the
values returned from the view and hence the desired masked values are good to
go if applying the changes using the dbms_comparison.converge() procedure. If necessary you can apply the changes by issuing DMLs based on the deltas identified by the dbms_comparison.compare() procedure.
The process
is
- Create a set of masking functions, you could use the functions referenced by Oracle Data Masking
- Create a view on the source table calling the relevant masking functions. This view can be used to populate the masked target.
- Create a comparison function using dbms_comparison.create_comparison().
- Execute comparison using dbms_comparison.compare().
- Review differences using the user_comparison_row_dif view.
- Apply changes found using dbms_comparison.converge().
This works
for all sensitive columns even if the primary or unique key contains a
sensitive column.
Caveats
- This is primarily intended masked targets where there are no independent DMLs occurring on the target (although this could be accommodated).
- There may be issues if we need to coordinate changes to multiple tables which have integrity constraints that could be violated unless all related DMLs are performed in the same transaction. In this circumstance we can issue DMLs based on the identified deltas.
- This can also work for the scenario where a sensitive column is part of a primary key and that primary key is updated in the source table. This is an unusual scenario but can be catered for with some extra coding.
- There are limitations on which type of masking functions can be used.
- Some coding is required to setup comparisons and the application of deltas.
- I am not sure how performant this is for very large tables.
Let me know if you want some example SQL which illustrates the process.
You can try also the data comparison and sync wizard in dbForge Studio for Oracle.
ReplyDelete