Building an Enterprise-Grade Oracle 19c RAC-to-RAC Data Guard with Broker

A practical, end-to-end walkthrough of standing up a 2-Node RAC physical standby for a 2-Node RAC primary — covering architecture, network and parameter prerequisites, RMAN active duplication, RAC conversion of the standby, redo apply, Data Guard Broker, and switchover/failover readiness.

Database: Oracle 19c (19.16) Enterprise EditionTopology: 2-Node RAC → 2-Node RAC Physical StandbyStorage: ASM (+DATA, +RECO, +OCRVD)             Container: CDB test with PDB testpdb


1. Why RAC-to-RAC Data Guard?

Oracle Data Guard remains Oracle's reference-grade solution for database disaster recovery and high availability. It ships redo from a primary database to one or more standbys, applies it in real time, and orchestrates role transitions so that planned maintenance and unplanned outages do not translate into downtime for the business.

When the primary database itself already runs on Real Application Clusters (RAC), pairing it with a RAC physical standby produces a layered availability story: node-level resilience inside each cluster, and site-level resilience between the two clusters. This guide documents that pairing end to end on Oracle Database 19c (release 19.16), using a multitenant container database and a single pluggable database.

The deployed configuration delivers:

  • Multi-node high availability on both primary and standby sites
  • Enterprise disaster recovery with automatic redo transport and apply
  • Zero or near-zero data loss with Maximum Availability or Protection modes
  • Simplified role transitions via the Data Guard Broker
  • PDB-level replication inside a CDB

2. Architecture Overview

The deployment uses two symmetric 2-node RAC clusters. The primary cluster hosts the read-write CDB; the standby cluster hosts a mounted physical standby that continuously applies redo received from the primary.


2.1 Primary RAC

ComponentValue
Cluster nodesnode1, node2
Cluster softwareOracle Grid Infrastructure 19c (2-node RAC)
Container database (CDB)test
Cluster instancestest1, test2
Pluggable databasetestpdb
db_unique_nametest
ASM disk groups+DATA, +RECO, +OCRVD
Role / versionPrimary / Oracle Database 19.16
SCANrac-scan.localdomain

2.2 Physical Standby RAC

ComponentValue
Cluster nodesDG-node1, DG-node2
Cluster softwareOracle Grid Infrastructure 19c (2-node RAC)
Container database (CDB)test
Cluster instancestest1, test2
Pluggable databasetestpdb
db_unique_nameteststby
ASM disk groups+DATA, +RECO
Role / versionPhysical Standby / Oracle Database 19.16
SCANDG-rac-scan.localdomain

2.3 Logical Topology



3. Implementation Objectives

  • Build a 2-node physical standby for an existing 2-node RAC primary.
  • Enable real-time redo transport and managed real-time apply.
  • Validate health, lag, and synchronization end to end.
  • Place the configuration under Data Guard Broker management.
  • Prepare the environment for switchover and failover.
  • Replicate the PDB inside the CDB consistently with the primary.

4. Prerequisites Checklist

4.1 Operating system and database

  • Identical Oracle binaries on both clusters (here, 19.16).
  • Matching ASM disk groups (+DATA, +RECO) on the primary and standby.
  • Network reachability between every node pair (TCP 1521 open in both directions).

4.2 Database configuration

  • Primary in ARCHIVELOG mode and FORCE LOGGING.
  • Oracle password file copied from the primary to every standby node.
  • tnsnames.ora and listener.ora aligned on all four nodes.
  • Data Guard initialization parameters in place on the primary.
  • Standby redo logs (SRLs) created on the primary, sized to match online redo logs.

5. Configuring the Primary RAC Database

5.1 Confirm ARCHIVELOG mode

Connect to test1 as SYSDBA and verify:

SQL> archive log list;
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            USE_DB_RECOVERY_FILE_DEST
Current log sequence           14

5.2 Enable FORCE LOGGING

FORCE LOGGING guarantees that every change generates redo, including operations that would normally bypass redo (NOLOGGING direct loads, certain DDL). This is non-negotiable for Data Guard correctness.

SQL> ALTER DATABASE FORCE LOGGING;
Database altered.

SQL> SELECT force_logging FROM v$database;
FORCE_LOGGING
-------------
YES
Why this mattersA single NOLOGGING transaction on the primary can leave the standby with unrecoverable blocks. FORCE LOGGING closes that gap.

6. Initialization Parameters and Standby Redo Logs

6.1 Primary Data Guard parameters

Set the redo transport and recovery parameters on the primary. The SID='*' clause ensures both RAC instances pick up the same value from the SPFILE.

ALTER SYSTEM SET LOG_ARCHIVE_CONFIG='DG_CONFIG=(TEST,TESTSTBY)' SCOPE=BOTH SID='*';
ALTER SYSTEM SET LOG_ARCHIVE_DEST_1=
  'LOCATION=USE_DB_RECOVERY_FILE_DEST
   VALID_FOR=(ALL_LOGFILES,ALL_ROLES)
   DB_UNIQUE_NAME=TEST' SCOPE=BOTH SID='*';
ALTER SYSTEM SET LOG_ARCHIVE_DEST_2=
  'SERVICE=TESTSTBY ASYNC
   VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE)
   DB_UNIQUE_NAME=TESTSTBY' SCOPE=BOTH SID='*';
ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_1=ENABLE  SCOPE=BOTH SID='*';
ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_2=ENABLE  SCOPE=BOTH SID='*';
ALTER SYSTEM SET FAL_SERVER='TESTSTBY'            SCOPE=BOTH SID='*';
ALTER SYSTEM SET FAL_CLIENT='TEST'                SCOPE=BOTH SID='*';
ALTER SYSTEM SET STANDBY_FILE_MANAGEMENT=AUTO     SCOPE=BOTH SID='*';
ALTER SYSTEM SET REMOTE_LOGIN_PASSWORDFILE=EXCLUSIVE SCOPE=SPFILE SID='*';

What each parameter does

ParameterRole in this configuration
LOG_ARCHIVE_CONFIGDeclares both DB unique names so each side accepts redo from the other.
LOG_ARCHIVE_DEST_1Local archived redo log destination, valid in any role.
LOG_ARCHIVE_DEST_2Remote ASYNC transport to TESTSTBY while in primary role.
FAL_SERVER / FAL_CLIENTLets the standby fetch missing archives if a gap is detected.
STANDBY_FILE_MANAGEMENTAuto-creates standby datafiles when added on the primary.
REMOTE_LOGIN_PASSWORDFILERequired for redo transport authentication via SYSDBA password file.

6.2 Add standby redo logs to the primary

RAC produces one redo thread per instance. SRLs must therefore be added per thread, with at least one extra group per thread, and sized identically to the online redo logs (200 MB here). Multiplexing across +DATA and +RECO protects against a single ASM disk-group failure.

ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 GROUP 5 ('+DATA','+RECO') SIZE 200M;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 GROUP 6 ('+DATA','+RECO') SIZE 200M;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 GROUP 7 ('+DATA','+RECO') SIZE 200M;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 2 GROUP 8 ('+DATA','+RECO') SIZE 200M;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 2 GROUP 9 ('+DATA','+RECO') SIZE 200M;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 2 GROUP 10('+DATA','+RECO') SIZE 200M;

Confirm the result with v$standby_log; you should see three groups per thread, each 200 MB.

7. Network Configuration

7.1 /etc/hosts on every node

Every node — both clusters — must resolve every public IP, private interconnect IP, VIP, and SCAN entry on the other side. A representative block from the primary nodes:

# PRIMARY
192.168.18.11   node1.localdomain      node1
192.168.18.12   node2.localdomain      node2
192.168.1.11    node1-priv.localdomain node1-priv
192.168.1.12    node2-priv.localdomain node2-priv
192.168.18.13   node1-vip.localdomain  node1-vip
192.168.18.14   node2-vip.localdomain  node2-vip
192.168.18.15   rac-scan.localdomain   rac-scan
192.168.18.16   rac-scan.localdomain   rac-scan
192.168.18.17   rac-scan.localdomain   rac-scan

# DG STANDBY
192.168.18.31   DG-node1.localdomain      DG-node1
192.168.18.32   DG-node2.localdomain      DG-node2
192.168.2.31    DG-node1-priv.localdomain DG-node1-priv
192.168.2.32    DG-node2-priv.localdomain DG-node2-priv
192.168.18.33   DG-node1-vip.localdomain  DG-node1-vip
192.168.18.34   DG-node2-vip.localdomain  DG-node2-vip
192.168.18.35   DG-rac-scan.localdomain   DG-rac-scan
192.168.18.36   DG-rac-scan.localdomain   DG-rac-scan
192.168.18.37   DG-rac-scan.localdomain   DG-rac-scan

7.2 tnsnames.ora

Use one consistent tnsnames.ora across all four nodes. The service TESTSTBY resolves through both standby VIPs so that either node can accept redo transport connections.

TEST =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL=TCP)(HOST=rac-scan)(PORT=1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = test)))

TESTPRMY =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL=TCP)(HOST=rac-scan.localdomain)(PORT=1521))
    (CONNECT_DATA =
      (SERVICE_NAME = test)))

TESTSTBY =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL=TCP)(HOST=DG-node1-vip.localdomain)(PORT=1521))
      (ADDRESS = (PROTOCOL=TCP)(HOST=DG-node2-vip.localdomain)(PORT=1521)))
    (CONNECT_DATA =
      (SERVICE_NAME = teststby)))

7.3 listener.ora with a static SID entry

RMAN active duplication needs to connect to a database that is in NOMOUNT. The cluster listener will not advertise such a service automatically, so register it statically on every relevant node:

# Primary node1 — Grid Home listener.ora
LISTENER_NODE1 =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL=TCP)(HOST=node1-vip.localdomain)(PORT=1521))))

SID_LIST_LISTENER_NODE1 =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME = test)
      (ORACLE_HOME   = /u01/app/oracle/product/19/db_1)
      (SID_NAME      = test1)))

Mirror that pattern on node2, DG-node1, and DG-node2, swapping the VIP, GLOBAL_DBNAME (teststby on the standby) and SID_NAME as appropriate.

7.4 Smoke-test connectivity

Before going further, confirm bi-directional ping across public IPs and VIPs, then validate Oracle Net resolution:

$ tnsping TESTPRMY
OK (10 msec)
$ tnsping TESTSTBY
OK (10 msec)

8. Password File and PFILE Distribution

Data Guard authenticates remote redo apply via the SYSDBA password file. Both sides must hold identical passwords. The procedure used here:

  1. Copy the active password file out of ASM to the local filesystem with asmcmd pwcopy.
  2. Distribute the file via scp to every primary and standby node, placing it in $ORACLE_HOME/dbs with the correct name for each instance (orapwtest1, orapwtest2).
  3. Generate a PFILE from the running primary SPFILE; that PFILE will seed the standby's parameter file in the next step.
-- On the primary, lift the password file out of ASM
ASMCMD> cd +DATA/TEST/PASSWORD
ASMCMD> pwcopy pwdtest.256.1220471505 /tmp/orapwtest1
ASMCMD> exit

-- Re-create with the desired SYS password and distribute
$ orapwd file=$ORACLE_HOME/dbs/orapwtest1 password='****' entries=5
$ scp $ORACLE_HOME/dbs/orapwtest1 oracle@DG-node1:$ORACLE_HOME/dbs/orapwtest1
$ scp $ORACLE_HOME/dbs/orapwtest1 oracle@DG-node2:$ORACLE_HOME/dbs/orapwtest2

-- Seed PFILE for the standby
SQL> CREATE PFILE FROM SPFILE;

9. Creating the Standby with RMAN Active Duplication

9.1 Stage the standby PFILE and start NOMOUNT

On DG-node1, edit the seed PFILE so that key parameters are flipped for the standby identity, then start the auxiliary instance:

*.db_name='test'
*.db_unique_name='teststby'
*.cluster_database=true
*.fal_server='TEST'
*.fal_client='TESTSTBY'
*.log_archive_config='DG_CONFIG=(TESTSTBY,TEST)'
*.log_archive_dest_1='LOCATION=USE_DB_RECOVERY_FILE_DEST
                      VALID_FOR=(ALL_LOGFILES,ALL_ROLES)
                      DB_UNIQUE_NAME=TESTSTBY'
*.log_archive_dest_2='SERVICE=TEST ASYNC
                      VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE)
                      DB_UNIQUE_NAME=TEST'
*.standby_file_management='AUTO'
test1.instance_number=1
test2.instance_number=2
test1.thread=1
test2.thread=2
test1.undo_tablespace='UNDOTBS1'
test2.undo_tablespace='UNDOTBS2'
$ mkdir -p /u01/app/oracle/admin/test/adump          # both standby nodes
SQL> STARTUP NOMOUNT pfile='/u01/app/oracle/product/19/db_1/dbs/inittest.ora';

9.2 Run RMAN active duplication

With the auxiliary instance up, launch RMAN on the standby host and duplicate from the live primary. The dorecover clause asks RMAN to apply archived logs as part of the duplication, leaving the standby mounted and ready for managed recovery.

$ rman target sys@TESTPRMY auxiliary sys@TESTSTBY

RMAN> DUPLICATE TARGET DATABASE FOR STANDBY
      FROM ACTIVE DATABASE
      DORECOVER
      NOFILENAMECHECK;

RMAN performs, in order:

  • Backup-as-copy of the password file to the auxiliary $ORACLE_HOME/dbs
  • SPFILE creation from memory and instance restart
  • Restore of the standby control file from service TESTPRMY
  • Restore of every datafile over the network (no staging area)
  • Recovery using the primary's archived logs
  • Final state: standby mounted as PHYSICAL STANDBY

Verify on the standby:

SQL> SELECT name, db_unique_name, database_role, open_mode FROM v$database;

NAME   DB_UNIQUE_NAME   DATABASE_ROLE      OPEN_MODE
------ ---------------- ------------------ ----------
TEST   teststby         PHYSICAL STANDBY   MOUNTED

10. Converting the Standby to RAC

RMAN active duplication produces a single-instance standby. The next phase promotes it to a 2-node RAC physical standby.

10.1 Move the SPFILE into ASM

SQL> CREATE PFILE FROM SPFILE;
SQL> CREATE SPFILE='+DATA/TESTSTBY/PARAMETERFILE/spfileteststby.ora'
     FROM PFILE='/u01/app/oracle/product/19/db_1/dbs/inittest1.ora';

$ echo "SPFILE='+DATA/TESTSTBY/PARAMETERFILE/spfileteststby.ora'" \
       > $ORACLE_HOME/dbs/inittest1.ora
$ scp $ORACLE_HOME/dbs/inittest1.ora oracle@DG-node2:$ORACLE_HOME/dbs/inittest2.ora
Why ASMA RAC database needs an SPFILE that every instance can read concurrently. Placing it in ASM (or on shared storage) is the simplest way to satisfy that requirement.

10.2 Register the standby with Clusterware

$ srvctl add database \
    -d teststby \
    -o /u01/app/oracle/product/19/db_1 \
    -p '+DATA/TESTSTBY/PARAMETERFILE/spfileteststby.ora' \
    -r PHYSICAL_STANDBY \
    -s MOUNT \
    -a "DATA,RECO"

$ srvctl add instance -d teststby -i test1 -n DG-node1
$ srvctl add instance -d teststby -i test2 -n DG-node2

10.3 Set instance-specific parameters

Until each instance has its own instance_number, thread, and undo_tablespace, only one instance will start. Define them in the SPFILE:

SQL> ALTER SYSTEM SET instance_number=1     SID='test1' SCOPE=SPFILE;
SQL> ALTER SYSTEM SET instance_number=2     SID='test2' SCOPE=SPFILE;
SQL> ALTER SYSTEM SET thread=1              SID='test1' SCOPE=SPFILE;
SQL> ALTER SYSTEM SET thread=2              SID='test2' SCOPE=SPFILE;
SQL> ALTER SYSTEM SET undo_tablespace='UNDOTBS1' SID='test1' SCOPE=SPFILE;
SQL> ALTER SYSTEM SET undo_tablespace='UNDOTBS2' SID='test2' SCOPE=SPFILE;

10.4 Restart through Clusterware

$ srvctl start database -d teststby
$ srvctl status database -d teststby
Instance test1 is running on node dg-node1
Instance test2 is running on node dg-node2

11. Starting Redo Apply

With both standby instances open in MOUNT, start managed real-time apply on one of them. USING CURRENT LOGFILE instructs the apply process to read directly from the SRLs, minimising apply lag.

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
     USING CURRENT LOGFILE
     DISCONNECT FROM SESSION;
Database altered.
Operational tipRun apply on a single standby instance at a time. If you ever need to stop and restart it on the other node, issue ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL; first on the active apply instance.

11.1 Optional — open as Active Data Guard

To allow read-only workloads on the standby while apply continues, cancel apply, open the database read-only, and restart apply:

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
SQL> ALTER DATABASE OPEN READ ONLY;
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
     USING CURRENT LOGFILE DISCONNECT FROM SESSION;

12. Verification

12.1 On the primary RAC

SQL> SELECT name, open_mode, database_role, db_unique_name FROM v$database;
SQL> SELECT instance_name, host_name, thread#, status FROM gv$instance ORDER BY inst_id;

INSTANCE_NAME HOST_NAME              THREAD# STATUS
------------- ---------------------- ------- ------
test1         node1.localdomain            1 OPEN
test2         node2.localdomain            2 OPEN

SQL> SELECT thread#, MAX(sequence#) FROM v$archived_log GROUP BY thread#;
   THREAD#  MAX(SEQUENCE#)
---------- ---------------
         1              19
         2              12

12.2 On the standby RAC

SQL> SELECT name, db_unique_name, database_role, open_mode FROM v$database;
NAME   DB_UNIQUE_NAME   DATABASE_ROLE     OPEN_MODE
------ ---------------- ----------------- ---------
TEST   teststby         PHYSICAL STANDBY  MOUNTED

SQL> SELECT process, status, thread#, sequence#
     FROM v$managed_standby
     WHERE process IN ('MRP0','RFS','LGWR');

SQL> SELECT thread#, MAX(sequence#) FROM v$archived_log
     WHERE applied = 'YES' GROUP BY thread#;

What to look for:

  • MRP0 in APPLYING_LOG against the current SRL.
  • RFS processes receiving from each primary thread.
  • Latest applied sequence on the standby keeping pace with the primary.
  • No gap rows in v$archive_gap.

12.3 PDB visibility

Because the configuration replicates a CDB, the pluggable database testpdb follows the CDB's redo stream. Confirm with show pdbs on each side; on the standby, the PDB will be in MOUNTED until you open the standby read-only.

13. Configuring the Data Guard Broker

The Data Guard Broker (DGMGRL) wraps everything you have just built into a single configuration object, with health monitoring, observable apply lag, and one-line switchover/failover commands.

13.1 Enable the broker on both sides

SQL> ALTER SYSTEM SET dg_broker_start=TRUE SCOPE=BOTH SID='*';

The default broker configuration files live under $ORACLE_HOME/dbs. For a RAC database, point them at ASM so all instances share the same metadata:

SQL> ALTER SYSTEM SET dg_broker_config_file1='+DATA/TEST/DATAGUARD/dr1test.dat' SCOPE=BOTH SID='*';
SQL> ALTER SYSTEM SET dg_broker_config_file2='+DATA/TEST/DATAGUARD/dr2test.dat' SCOPE=BOTH SID='*';

Apply the equivalent on the standby, swapping TEST for TESTSTBY.

13.2 Create the configuration

$ dgmgrl sys/****@TESTPRMY

DGMGRL> CREATE CONFIGURATION dg_test AS
        PRIMARY DATABASE IS test
        CONNECT IDENTIFIER IS TESTPRMY;

DGMGRL> ADD DATABASE teststby
        AS CONNECT IDENTIFIER IS TESTSTBY
        MAINTAINED AS PHYSICAL;

DGMGRL> ENABLE CONFIGURATION;
DGMGRL> SHOW CONFIGURATION;

Healthy output looks like:

Configuration - dg_test
  Protection Mode: MaxPerformance
  Members:
  test     - Primary database
    teststby - Physical standby database

Fast-Start Failover:  Disabled
Configuration Status:
SUCCESS   (status updated XX seconds ago)

13.3 Validate each member

DGMGRL> VALIDATE DATABASE test;
DGMGRL> VALIDATE DATABASE teststby;
DGMGRL> SHOW DATABASE VERBOSE teststby;

VALIDATE DATABASE checks that SRLs exist on both sides, that transport and apply credentials work, and that the standby has the redo thread coverage required for the primary it shadows.

14. Switchover and Failover Readiness

14.1 Planned switchover

A switchover is graceful and lossless. Verify that both sides are in sync, then issue:

DGMGRL> SHOW CONFIGURATION VERBOSE;          -- expect SUCCESS, lag = 0
DGMGRL> SWITCHOVER TO teststby;

The broker stops apply, archives the current log, transitions roles, and restarts the new primary and new standby through Clusterware. After the command returns, repeat SHOW CONFIGURATION; to confirm teststby is now Primary and test is the standby.

14.2 Unplanned failover

If the primary site is lost:

DGMGRL> FAILOVER TO teststby;

The previous primary becomes invalid in the broker until reinstated. After the original site recovers, bring it back as a standby with:

DGMGRL> REINSTATE DATABASE test;
Test before you trustAlways rehearse switchover and failover in a non-production clone of the configuration. Document the role-transition runbook and the exact commands your operations team is expected to execute.

15. Operational Tips and Lessons Learned

  • Static listener entries are not optional. RMAN duplicate cannot reach a NOMOUNT instance through a dynamically registered service. Add SID_LIST_LISTENER_* entries on every node that the duplicate or future role transitions might target.
  • Multiplex SRLs across two ASM disk groups. Putting members in +DATA and +RECO protects redo apply from a single disk-group outage.
  • Use VIP-based service entries for redo transport. Redo transport benefits from deterministic addresses; resolve TESTSTBY via standby VIPs rather than SCAN to avoid intermittent TNS failures during cluster events.
  • Drive MRP0 from one instance only. Running managed recovery on multiple standby instances simultaneously creates contention without improving apply throughput.
  • Place broker config files in ASM so every instance reads a consistent broker state.
  • Watch v$dataguard_stats and v$archive_gap for lag and gap detection, and consider integrating the broker's observer-style metrics into your monitoring stack.
  • Rehearse switchovers periodically. A standby that has never taken the primary role in anger is a liability, not an asset.

16. Conclusion

A 2-node RAC primary paired with a 2-node RAC physical standby is one of the strongest availability postures available on Oracle Database 19c. Each cluster absorbs node-level outages locally, and the Data Guard channel protects the database against the loss of an entire site. Bringing the Data Guard Broker into the picture turns a complex parameter-and-script stack into a single configuration object that DBAs can monitor, validate, and transition with one command.

The implementation walked through here — primary preparation, parameter and SRL setup, network alignment, RMAN active duplication, RAC conversion of the standby, redo apply, broker enablement, and switchover drills — is repeatable. With it as a baseline, you can extend the configuration with Maximum Availability mode, fast-start failover, an observer process, or an Active Data Guard read replica without revisiting the foundations.

thanks for reading :)

BR,

ZAHEER

Comments

Popular posts from this blog

Installation of Oracle Applications R12.1.1 on Linux and vmware

Oracle AVDF Installation and Setup Document

Disable Firewall on Oracle Linux 8