Building an Enterprise-Grade Oracle 19c RAC-to-RAC Data Guard with Broker
A practical, end-to-end walkthrough of standing up a 2-Node RAC physical standby for a 2-Node RAC primary — covering architecture, network and parameter prerequisites, RMAN active duplication, RAC conversion of the standby, redo apply, Data Guard Broker, and switchover/failover readiness.
1. Why RAC-to-RAC Data Guard?
Oracle Data Guard remains Oracle's reference-grade solution for database disaster recovery and high availability. It ships redo from a primary database to one or more standbys, applies it in real time, and orchestrates role transitions so that planned maintenance and unplanned outages do not translate into downtime for the business.
When the primary database itself already runs on Real Application Clusters (RAC), pairing it with a RAC physical standby produces a layered availability story: node-level resilience inside each cluster, and site-level resilience between the two clusters. This guide documents that pairing end to end on Oracle Database 19c (release 19.16), using a multitenant container database and a single pluggable database.
The deployed configuration delivers:
- Multi-node high availability on both primary and standby sites
- Enterprise disaster recovery with automatic redo transport and apply
- Zero or near-zero data loss with Maximum Availability or Protection modes
- Simplified role transitions via the Data Guard Broker
- PDB-level replication inside a CDB
2. Architecture Overview
The deployment uses two symmetric 2-node RAC clusters. The primary cluster hosts the read-write CDB; the standby cluster hosts a mounted physical standby that continuously applies redo received from the primary.
2.1 Primary RAC
| Component | Value |
|---|---|
| Cluster nodes | node1, node2 |
| Cluster software | Oracle Grid Infrastructure 19c (2-node RAC) |
| Container database (CDB) | test |
| Cluster instances | test1, test2 |
| Pluggable database | testpdb |
db_unique_name | test |
| ASM disk groups | +DATA, +RECO, +OCRVD |
| Role / version | Primary / Oracle Database 19.16 |
| SCAN | rac-scan.localdomain |
2.2 Physical Standby RAC
| Component | Value |
|---|---|
| Cluster nodes | DG-node1, DG-node2 |
| Cluster software | Oracle Grid Infrastructure 19c (2-node RAC) |
| Container database (CDB) | test |
| Cluster instances | test1, test2 |
| Pluggable database | testpdb |
db_unique_name | teststby |
| ASM disk groups | +DATA, +RECO |
| Role / version | Physical Standby / Oracle Database 19.16 |
| SCAN | DG-rac-scan.localdomain |
2.3 Logical Topology
3. Implementation Objectives
- Build a 2-node physical standby for an existing 2-node RAC primary.
- Enable real-time redo transport and managed real-time apply.
- Validate health, lag, and synchronization end to end.
- Place the configuration under Data Guard Broker management.
- Prepare the environment for switchover and failover.
- Replicate the PDB inside the CDB consistently with the primary.
4. Prerequisites Checklist
4.1 Operating system and database
- Identical Oracle binaries on both clusters (here, 19.16).
- Matching ASM disk groups (
+DATA,+RECO) on the primary and standby. - Network reachability between every node pair (TCP 1521 open in both directions).
4.2 Database configuration
- Primary in ARCHIVELOG mode and FORCE LOGGING.
- Oracle password file copied from the primary to every standby node.
tnsnames.oraandlistener.oraaligned on all four nodes.- Data Guard initialization parameters in place on the primary.
- Standby redo logs (SRLs) created on the primary, sized to match online redo logs.
5. Configuring the Primary RAC Database
5.1 Confirm ARCHIVELOG mode
Connect to test1 as SYSDBA and verify:
SQL> archive log list;
Database log mode Archive Mode
Automatic archival Enabled
Archive destination USE_DB_RECOVERY_FILE_DEST
Current log sequence 145.2 Enable FORCE LOGGING
FORCE LOGGING guarantees that every change generates redo, including operations that would normally bypass redo (NOLOGGING direct loads, certain DDL). This is non-negotiable for Data Guard correctness.
SQL> ALTER DATABASE FORCE LOGGING;
Database altered.
SQL> SELECT force_logging FROM v$database;
FORCE_LOGGING
-------------
YESFORCE LOGGING closes that gap.6. Initialization Parameters and Standby Redo Logs
6.1 Primary Data Guard parameters
Set the redo transport and recovery parameters on the primary. The SID='*' clause ensures both RAC instances pick up the same value from the SPFILE.
ALTER SYSTEM SET LOG_ARCHIVE_CONFIG='DG_CONFIG=(TEST,TESTSTBY)' SCOPE=BOTH SID='*';
ALTER SYSTEM SET LOG_ARCHIVE_DEST_1=
'LOCATION=USE_DB_RECOVERY_FILE_DEST
VALID_FOR=(ALL_LOGFILES,ALL_ROLES)
DB_UNIQUE_NAME=TEST' SCOPE=BOTH SID='*';
ALTER SYSTEM SET LOG_ARCHIVE_DEST_2=
'SERVICE=TESTSTBY ASYNC
VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE)
DB_UNIQUE_NAME=TESTSTBY' SCOPE=BOTH SID='*';
ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_1=ENABLE SCOPE=BOTH SID='*';
ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_2=ENABLE SCOPE=BOTH SID='*';
ALTER SYSTEM SET FAL_SERVER='TESTSTBY' SCOPE=BOTH SID='*';
ALTER SYSTEM SET FAL_CLIENT='TEST' SCOPE=BOTH SID='*';
ALTER SYSTEM SET STANDBY_FILE_MANAGEMENT=AUTO SCOPE=BOTH SID='*';
ALTER SYSTEM SET REMOTE_LOGIN_PASSWORDFILE=EXCLUSIVE SCOPE=SPFILE SID='*';What each parameter does
| Parameter | Role in this configuration |
|---|---|
LOG_ARCHIVE_CONFIG | Declares both DB unique names so each side accepts redo from the other. |
LOG_ARCHIVE_DEST_1 | Local archived redo log destination, valid in any role. |
LOG_ARCHIVE_DEST_2 | Remote ASYNC transport to TESTSTBY while in primary role. |
FAL_SERVER / FAL_CLIENT | Lets the standby fetch missing archives if a gap is detected. |
STANDBY_FILE_MANAGEMENT | Auto-creates standby datafiles when added on the primary. |
REMOTE_LOGIN_PASSWORDFILE | Required for redo transport authentication via SYSDBA password file. |
6.2 Add standby redo logs to the primary
RAC produces one redo thread per instance. SRLs must therefore be added per thread, with at least one extra group per thread, and sized identically to the online redo logs (200 MB here). Multiplexing across +DATA and +RECO protects against a single ASM disk-group failure.
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 GROUP 5 ('+DATA','+RECO') SIZE 200M;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 GROUP 6 ('+DATA','+RECO') SIZE 200M;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 GROUP 7 ('+DATA','+RECO') SIZE 200M;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 2 GROUP 8 ('+DATA','+RECO') SIZE 200M;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 2 GROUP 9 ('+DATA','+RECO') SIZE 200M;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 2 GROUP 10('+DATA','+RECO') SIZE 200M;Confirm the result with v$standby_log; you should see three groups per thread, each 200 MB.
7. Network Configuration
7.1 /etc/hosts on every node
Every node — both clusters — must resolve every public IP, private interconnect IP, VIP, and SCAN entry on the other side. A representative block from the primary nodes:
# PRIMARY
192.168.18.11 node1.localdomain node1
192.168.18.12 node2.localdomain node2
192.168.1.11 node1-priv.localdomain node1-priv
192.168.1.12 node2-priv.localdomain node2-priv
192.168.18.13 node1-vip.localdomain node1-vip
192.168.18.14 node2-vip.localdomain node2-vip
192.168.18.15 rac-scan.localdomain rac-scan
192.168.18.16 rac-scan.localdomain rac-scan
192.168.18.17 rac-scan.localdomain rac-scan
# DG STANDBY
192.168.18.31 DG-node1.localdomain DG-node1
192.168.18.32 DG-node2.localdomain DG-node2
192.168.2.31 DG-node1-priv.localdomain DG-node1-priv
192.168.2.32 DG-node2-priv.localdomain DG-node2-priv
192.168.18.33 DG-node1-vip.localdomain DG-node1-vip
192.168.18.34 DG-node2-vip.localdomain DG-node2-vip
192.168.18.35 DG-rac-scan.localdomain DG-rac-scan
192.168.18.36 DG-rac-scan.localdomain DG-rac-scan
192.168.18.37 DG-rac-scan.localdomain DG-rac-scan7.2 tnsnames.ora
Use one consistent tnsnames.ora across all four nodes. The service TESTSTBY resolves through both standby VIPs so that either node can accept redo transport connections.
TEST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL=TCP)(HOST=rac-scan)(PORT=1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = test)))
TESTPRMY =
(DESCRIPTION =
(ADDRESS = (PROTOCOL=TCP)(HOST=rac-scan.localdomain)(PORT=1521))
(CONNECT_DATA =
(SERVICE_NAME = test)))
TESTSTBY =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL=TCP)(HOST=DG-node1-vip.localdomain)(PORT=1521))
(ADDRESS = (PROTOCOL=TCP)(HOST=DG-node2-vip.localdomain)(PORT=1521)))
(CONNECT_DATA =
(SERVICE_NAME = teststby)))7.3 listener.ora with a static SID entry
RMAN active duplication needs to connect to a database that is in NOMOUNT. The cluster listener will not advertise such a service automatically, so register it statically on every relevant node:
# Primary node1 — Grid Home listener.ora
LISTENER_NODE1 =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL=TCP)(HOST=node1-vip.localdomain)(PORT=1521))))
SID_LIST_LISTENER_NODE1 =
(SID_LIST =
(SID_DESC =
(GLOBAL_DBNAME = test)
(ORACLE_HOME = /u01/app/oracle/product/19/db_1)
(SID_NAME = test1)))Mirror that pattern on node2, DG-node1, and DG-node2, swapping the VIP, GLOBAL_DBNAME (teststby on the standby) and SID_NAME as appropriate.
7.4 Smoke-test connectivity
Before going further, confirm bi-directional ping across public IPs and VIPs, then validate Oracle Net resolution:
$ tnsping TESTPRMY
OK (10 msec)
$ tnsping TESTSTBY
OK (10 msec)8. Password File and PFILE Distribution
Data Guard authenticates remote redo apply via the SYSDBA password file. Both sides must hold identical passwords. The procedure used here:
- Copy the active password file out of ASM to the local filesystem with
asmcmd pwcopy. - Distribute the file via
scpto every primary and standby node, placing it in$ORACLE_HOME/dbswith the correct name for each instance (orapwtest1,orapwtest2). - Generate a PFILE from the running primary SPFILE; that PFILE will seed the standby's parameter file in the next step.
-- On the primary, lift the password file out of ASM
ASMCMD> cd +DATA/TEST/PASSWORD
ASMCMD> pwcopy pwdtest.256.1220471505 /tmp/orapwtest1
ASMCMD> exit
-- Re-create with the desired SYS password and distribute
$ orapwd file=$ORACLE_HOME/dbs/orapwtest1 password='****' entries=5
$ scp $ORACLE_HOME/dbs/orapwtest1 oracle@DG-node1:$ORACLE_HOME/dbs/orapwtest1
$ scp $ORACLE_HOME/dbs/orapwtest1 oracle@DG-node2:$ORACLE_HOME/dbs/orapwtest2
-- Seed PFILE for the standby
SQL> CREATE PFILE FROM SPFILE;9. Creating the Standby with RMAN Active Duplication
9.1 Stage the standby PFILE and start NOMOUNT
On DG-node1, edit the seed PFILE so that key parameters are flipped for the standby identity, then start the auxiliary instance:
*.db_name='test'
*.db_unique_name='teststby'
*.cluster_database=true
*.fal_server='TEST'
*.fal_client='TESTSTBY'
*.log_archive_config='DG_CONFIG=(TESTSTBY,TEST)'
*.log_archive_dest_1='LOCATION=USE_DB_RECOVERY_FILE_DEST
VALID_FOR=(ALL_LOGFILES,ALL_ROLES)
DB_UNIQUE_NAME=TESTSTBY'
*.log_archive_dest_2='SERVICE=TEST ASYNC
VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE)
DB_UNIQUE_NAME=TEST'
*.standby_file_management='AUTO'
test1.instance_number=1
test2.instance_number=2
test1.thread=1
test2.thread=2
test1.undo_tablespace='UNDOTBS1'
test2.undo_tablespace='UNDOTBS2'$ mkdir -p /u01/app/oracle/admin/test/adump # both standby nodes
SQL> STARTUP NOMOUNT pfile='/u01/app/oracle/product/19/db_1/dbs/inittest.ora';9.2 Run RMAN active duplication
With the auxiliary instance up, launch RMAN on the standby host and duplicate from the live primary. The dorecover clause asks RMAN to apply archived logs as part of the duplication, leaving the standby mounted and ready for managed recovery.
$ rman target sys@TESTPRMY auxiliary sys@TESTSTBY
RMAN> DUPLICATE TARGET DATABASE FOR STANDBY
FROM ACTIVE DATABASE
DORECOVER
NOFILENAMECHECK;RMAN performs, in order:
- Backup-as-copy of the password file to the auxiliary
$ORACLE_HOME/dbs - SPFILE creation from memory and instance restart
- Restore of the standby control file from service
TESTPRMY - Restore of every datafile over the network (no staging area)
- Recovery using the primary's archived logs
- Final state: standby mounted as
PHYSICAL STANDBY
Verify on the standby:
SQL> SELECT name, db_unique_name, database_role, open_mode FROM v$database;
NAME DB_UNIQUE_NAME DATABASE_ROLE OPEN_MODE
------ ---------------- ------------------ ----------
TEST teststby PHYSICAL STANDBY MOUNTED10. Converting the Standby to RAC
RMAN active duplication produces a single-instance standby. The next phase promotes it to a 2-node RAC physical standby.
10.1 Move the SPFILE into ASM
SQL> CREATE PFILE FROM SPFILE;
SQL> CREATE SPFILE='+DATA/TESTSTBY/PARAMETERFILE/spfileteststby.ora'
FROM PFILE='/u01/app/oracle/product/19/db_1/dbs/inittest1.ora';
$ echo "SPFILE='+DATA/TESTSTBY/PARAMETERFILE/spfileteststby.ora'" \
> $ORACLE_HOME/dbs/inittest1.ora
$ scp $ORACLE_HOME/dbs/inittest1.ora oracle@DG-node2:$ORACLE_HOME/dbs/inittest2.ora10.2 Register the standby with Clusterware
$ srvctl add database \
-d teststby \
-o /u01/app/oracle/product/19/db_1 \
-p '+DATA/TESTSTBY/PARAMETERFILE/spfileteststby.ora' \
-r PHYSICAL_STANDBY \
-s MOUNT \
-a "DATA,RECO"
$ srvctl add instance -d teststby -i test1 -n DG-node1
$ srvctl add instance -d teststby -i test2 -n DG-node210.3 Set instance-specific parameters
Until each instance has its own instance_number, thread, and undo_tablespace, only one instance will start. Define them in the SPFILE:
SQL> ALTER SYSTEM SET instance_number=1 SID='test1' SCOPE=SPFILE;
SQL> ALTER SYSTEM SET instance_number=2 SID='test2' SCOPE=SPFILE;
SQL> ALTER SYSTEM SET thread=1 SID='test1' SCOPE=SPFILE;
SQL> ALTER SYSTEM SET thread=2 SID='test2' SCOPE=SPFILE;
SQL> ALTER SYSTEM SET undo_tablespace='UNDOTBS1' SID='test1' SCOPE=SPFILE;
SQL> ALTER SYSTEM SET undo_tablespace='UNDOTBS2' SID='test2' SCOPE=SPFILE;10.4 Restart through Clusterware
$ srvctl start database -d teststby
$ srvctl status database -d teststby
Instance test1 is running on node dg-node1
Instance test2 is running on node dg-node211. Starting Redo Apply
With both standby instances open in MOUNT, start managed real-time apply on one of them. USING CURRENT LOGFILE instructs the apply process to read directly from the SRLs, minimising apply lag.
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
USING CURRENT LOGFILE
DISCONNECT FROM SESSION;
Database altered.ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL; first on the active apply instance.11.1 Optional — open as Active Data Guard
To allow read-only workloads on the standby while apply continues, cancel apply, open the database read-only, and restart apply:
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
SQL> ALTER DATABASE OPEN READ ONLY;
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
USING CURRENT LOGFILE DISCONNECT FROM SESSION;12. Verification
12.1 On the primary RAC
SQL> SELECT name, open_mode, database_role, db_unique_name FROM v$database;
SQL> SELECT instance_name, host_name, thread#, status FROM gv$instance ORDER BY inst_id;
INSTANCE_NAME HOST_NAME THREAD# STATUS
------------- ---------------------- ------- ------
test1 node1.localdomain 1 OPEN
test2 node2.localdomain 2 OPEN
SQL> SELECT thread#, MAX(sequence#) FROM v$archived_log GROUP BY thread#;
THREAD# MAX(SEQUENCE#)
---------- ---------------
1 19
2 1212.2 On the standby RAC
SQL> SELECT name, db_unique_name, database_role, open_mode FROM v$database;
NAME DB_UNIQUE_NAME DATABASE_ROLE OPEN_MODE
------ ---------------- ----------------- ---------
TEST teststby PHYSICAL STANDBY MOUNTED
SQL> SELECT process, status, thread#, sequence#
FROM v$managed_standby
WHERE process IN ('MRP0','RFS','LGWR');
SQL> SELECT thread#, MAX(sequence#) FROM v$archived_log
WHERE applied = 'YES' GROUP BY thread#;What to look for:
MRP0inAPPLYING_LOGagainst the current SRL.RFSprocesses receiving from each primary thread.- Latest applied sequence on the standby keeping pace with the primary.
- No gap rows in
v$archive_gap.
12.3 PDB visibility
Because the configuration replicates a CDB, the pluggable database testpdb follows the CDB's redo stream. Confirm with show pdbs on each side; on the standby, the PDB will be in MOUNTED until you open the standby read-only.
13. Configuring the Data Guard Broker
The Data Guard Broker (DGMGRL) wraps everything you have just built into a single configuration object, with health monitoring, observable apply lag, and one-line switchover/failover commands.
13.1 Enable the broker on both sides
SQL> ALTER SYSTEM SET dg_broker_start=TRUE SCOPE=BOTH SID='*';The default broker configuration files live under $ORACLE_HOME/dbs. For a RAC database, point them at ASM so all instances share the same metadata:
SQL> ALTER SYSTEM SET dg_broker_config_file1='+DATA/TEST/DATAGUARD/dr1test.dat' SCOPE=BOTH SID='*';
SQL> ALTER SYSTEM SET dg_broker_config_file2='+DATA/TEST/DATAGUARD/dr2test.dat' SCOPE=BOTH SID='*';Apply the equivalent on the standby, swapping TEST for TESTSTBY.
13.2 Create the configuration
$ dgmgrl sys/****@TESTPRMY
DGMGRL> CREATE CONFIGURATION dg_test AS
PRIMARY DATABASE IS test
CONNECT IDENTIFIER IS TESTPRMY;
DGMGRL> ADD DATABASE teststby
AS CONNECT IDENTIFIER IS TESTSTBY
MAINTAINED AS PHYSICAL;
DGMGRL> ENABLE CONFIGURATION;
DGMGRL> SHOW CONFIGURATION;Healthy output looks like:
Configuration - dg_test
Protection Mode: MaxPerformance
Members:
test - Primary database
teststby - Physical standby database
Fast-Start Failover: Disabled
Configuration Status:
SUCCESS (status updated XX seconds ago)13.3 Validate each member
DGMGRL> VALIDATE DATABASE test;
DGMGRL> VALIDATE DATABASE teststby;
DGMGRL> SHOW DATABASE VERBOSE teststby;VALIDATE DATABASE checks that SRLs exist on both sides, that transport and apply credentials work, and that the standby has the redo thread coverage required for the primary it shadows.
14. Switchover and Failover Readiness
14.1 Planned switchover
A switchover is graceful and lossless. Verify that both sides are in sync, then issue:
DGMGRL> SHOW CONFIGURATION VERBOSE; -- expect SUCCESS, lag = 0
DGMGRL> SWITCHOVER TO teststby;The broker stops apply, archives the current log, transitions roles, and restarts the new primary and new standby through Clusterware. After the command returns, repeat SHOW CONFIGURATION; to confirm teststby is now Primary and test is the standby.
14.2 Unplanned failover
If the primary site is lost:
DGMGRL> FAILOVER TO teststby;The previous primary becomes invalid in the broker until reinstated. After the original site recovers, bring it back as a standby with:
DGMGRL> REINSTATE DATABASE test;15. Operational Tips and Lessons Learned
- Static listener entries are not optional. RMAN duplicate cannot reach a NOMOUNT instance through a dynamically registered service. Add
SID_LIST_LISTENER_*entries on every node that the duplicate or future role transitions might target. - Multiplex SRLs across two ASM disk groups. Putting members in
+DATAand+RECOprotects redo apply from a single disk-group outage. - Use VIP-based service entries for redo transport. Redo transport benefits from deterministic addresses; resolve
TESTSTBYvia standby VIPs rather than SCAN to avoid intermittent TNS failures during cluster events. - Drive MRP0 from one instance only. Running managed recovery on multiple standby instances simultaneously creates contention without improving apply throughput.
- Place broker config files in ASM so every instance reads a consistent broker state.
- Watch
v$dataguard_statsandv$archive_gapfor lag and gap detection, and consider integrating the broker's observer-style metrics into your monitoring stack. - Rehearse switchovers periodically. A standby that has never taken the primary role in anger is a liability, not an asset.
16. Conclusion
A 2-node RAC primary paired with a 2-node RAC physical standby is one of the strongest availability postures available on Oracle Database 19c. Each cluster absorbs node-level outages locally, and the Data Guard channel protects the database against the loss of an entire site. Bringing the Data Guard Broker into the picture turns a complex parameter-and-script stack into a single configuration object that DBAs can monitor, validate, and transition with one command.
The implementation walked through here — primary preparation, parameter and SRL setup, network alignment, RMAN active duplication, RAC conversion of the standby, redo apply, broker enablement, and switchover drills — is repeatable. With it as a baseline, you can extend the configuration with Maximum Availability mode, fast-start failover, an observer process, or an Active Data Guard read replica without revisiting the foundations.
thanks for reading :)
BR,
ZAHEER
Comments