Microsoft SQL Server Always On Best Practices on Oracle Cloud Infrastructure (OCI) Native

Microsoft SQL Server Always On Best Practices on Oracle Cloud Infrastructure (OCI) Native



πŸ“… July 01, 2026
🏷️ MS SQL Server ⏱ 25 min read
Article Overview:
In this blogpost we will see how we can architect, deploy and configure Microsoft SQL Server Always On Availability Groups (AOAG) on Oracle Cloud Infrastructure (OCI) Native — covering compute, storage, networking, the Windows cluster, the Availability Group itself, backups, monitoring and DR. This is ideal for mission-critical SQL Server workloads being migrated off VMware, Hyper-V or on-prem (e.g. with Bare Metal/VM shapes, Block Volumes, FastConnect, Private DNS).

Prerequisites:

System Requirements

  • Two or more Windows Server 2022 nodes (Bare Metal or VM.Standard.E5.Flex)
  • SQL Server 2019/2022 Enterprise Edition (required for Always On with readable secondaries)
  • OCI tenancy with a VCN, private subnet, and Block Volume quota for Ultra High Performance volumes
  • FastConnect (or equivalent private connectivity) if a DR replica spans a second region
  • Domain-joined nodes with a service account for SQL Engine and Agent

1. Provision the OCI Network and NSG:

bash#oci network vcn create --compartment-id $COMPARTMENT_ID --cidr-block "10.20.0.0/16" --display-name "sql-prod-vcn" --dns-label sqlprod

bash#oci network subnet create --compartment-id $COMPARTMENT_ID --vcn-id $VCN_ID --cidr-block "10.20.1.0/24" --display-name "sql-db-subnet" --dns-label sqldb --prohibit-public-ip-on-vnic true

bash#oci network nsg create --compartment-id $COMPARTMENT_ID --vcn-id $VCN_ID --display-name "sql-alwayson-nsg"

Add NSG rules to allow SQL Engine (1433), AG endpoint (5022) and WSFC RPC (135) only from within the DB subnet — never expose these publicly.

bash#oci network nsg rules add --nsg-id $NSG_ID --security-rules '[{"direction":"INGRESS","protocol":"6","source":"10.20.1.0/24","tcpOptions":{"destinationPortRange":{"min":1433,"max":1433}}}]'

bash#oci network nsg rules add --nsg-id $NSG_ID --security-rules '[{"direction":"INGRESS","protocol":"6","source":"10.20.1.0/24","tcpOptions":{"destinationPortRange":{"min":5022,"max":5024}}}]'

2. Provision the Compute Instances and Block Volumes:

Use Bare Metal or Flexible VM shapes placed in different Fault Domains so a single rack failure doesn't take out both replicas.

bash#oci compute instance launch --compartment-id $COMPARTMENT_ID --availability-domain $AD1 --fault-domain FAULT-DOMAIN-1 --shape "BM.Standard.E5.128" --display-name "sqlnode01" --subnet-id $SUBNET_ID --nsg-ids $NSG_ID --image-id $WIN2022_IMAGE_OCID

bash#oci compute instance launch --compartment-id $COMPARTMENT_ID --availability-domain $AD2 --fault-domain FAULT-DOMAIN-2 --shape "BM.Standard.E5.128" --display-name "sqlnode02" --subnet-id $SUBNET_ID --nsg-ids $NSG_ID --image-id $WIN2022_IMAGE_OCID

Create dedicated Ultra High Performance Block Volumes per node — separate volumes for Data, Logs and TempDB. Never combine these on one disk.

bash#oci bv volume create --compartment-id $COMPARTMENT_ID --availability-domain $AD1 --display-name "sqlnode01-data" --size-in-gbs 1024 --vpus-per-gb 60

bash#oci bv volume create --compartment-id $COMPARTMENT_ID --availability-domain $AD1 --display-name "sqlnode01-log" --size-in-gbs 256 --vpus-per-gb 60

bash#oci bv volume create --compartment-id $COMPARTMENT_ID --availability-domain $AD1 --display-name "sqlnode01-tempdb" --size-in-gbs 256 --vpus-per-gb 60

bash#oci compute volume-attachment attach --instance-id $NODE1_OCID --volume-id $DATA_VOL_OCID --type pv

3. Initialize and Format the Disks (run on each node):

PS C:\>Get-Disk | Where-Object PartitionStyle -eq 'RAW' | Initialize-Disk -PartitionStyle GPT -PassThru

PS C:\>New-Partition -DiskNumber 1 -DriveLetter D -UseMaximumSize | Format-Volume -FileSystem NTFS -AllocationUnitSize 65536 -NewFileSystemLabel "SQLData" -Confirm:$false

PS C:\>New-Partition -DiskNumber 2 -DriveLetter L -UseMaximumSize | Format-Volume -FileSystem NTFS -AllocationUnitSize 65536 -NewFileSystemLabel "SQLLogs" -Confirm:$false

PS C:\>New-Partition -DiskNumber 3 -DriveLetter T -UseMaximumSize | Format-Volume -FileSystem NTFS -AllocationUnitSize 65536 -NewFileSystemLabel "TempDB" -Confirm:$false

64 KB allocation unit size is the Microsoft-recommended setting for SQL Server data/log volumes — don't leave this at the NTFS default of 4 KB.

Grant Instant File Initialization to the SQL service account (faster growth, faster restores):

PS C:\>secedit /export /cfg C:\secpol.cfg
PS C:\>(Get-Content C:\secpol.cfg) -replace 'SeManageVolumePrivilege = ', 'SeManageVolumePrivilege = DOMAIN\svc-sqlengine,' | Set-Content C:\secpol.cfg
PS C:\>secedit /configure /db secedit.sdb /cfg C:\secpol.cfg /areas USER_RIGHTS

4. Build the Windows Server Failover Cluster:

PS C:\>Install-WindowsFeature -Name Failover-Clustering -IncludeManagementTools

PS C:\>Test-Cluster -Node sqlnode01, sqlnode02 -Include "Inventory", "Network", "System Configuration"

PS C:\>New-Cluster -Name SQLAOCLUSTER -Node sqlnode01, sqlnode02 -StaticAddress 10.20.1.50 -NoStorage

Don't skip Test-Cluster — fix every warning before creating the cluster. Configure a File Share Witness on a third node/Fault Domain so quorum stays odd-numbered:

PS C:\>Set-ClusterQuorum -FileShareWitness "\\sqlnode03\ClusterWitness"

5. Enable Always On and Tune the Instance:

PS C:\>Enable-SqlAlwaysOn -ServerInstance "sqlnode01" -Force

TempDB — one file per logical CPU, all equal size (example for 8 vCPU):

USE master;
ALTER DATABASE tempdb MODIFY FILE (NAME = tempdev, SIZE = 4096MB, FILEGROWTH = 512MB);
ALTER DATABASE tempdb ADD FILE (NAME = temp2, FILENAME = 'T:\TempDB\temp2.ndf', SIZE = 4096MB, FILEGROWTH = 512MB);
ALTER DATABASE tempdb ADD FILE (NAME = temp3, FILENAME = 'T:\TempDB\temp3.ndf', SIZE = 4096MB, FILEGROWTH = 512MB);
ALTER DATABASE tempdb ADD FILE (NAME = temp4, FILENAME = 'T:\TempDB\temp4.ndf', SIZE = 4096MB, FILEGROWTH = 512MB);

Memory and parallelism — reserve headroom for the OS, raise cost threshold off the default of 5:

EXEC sp_configure 'max server memory (MB)', 225280;  -- example: 256 GB host, cap SQL at 220 GB
EXEC sp_configure 'max degree of parallelism', 8;
EXEC sp_configure 'cost threshold for parallelism', 50;
RECONFIGURE;

6. Create the Availability Group:

CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'Use-A-Vault-Sourced-Secret!';
CREATE CERTIFICATE aoag_cert WITH SUBJECT = 'AlwaysOn Endpoint Certificate';

CREATE ENDPOINT [Hadr_endpoint]
  AS TCP (LISTENER_PORT = 5022)
  FOR DATABASE_MIRRORING (ROLE = ALL, ENCRYPTION = REQUIRED ALGORITHM AES);
ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED;

CREATE AVAILABILITY GROUP [AG-ProdFinance]
  WITH (DB_FAILOVER = ON, DTC_SUPPORT = NONE)
  FOR DATABASE [FinanceDB]
  REPLICA ON
    'SQLNODE01' WITH (
      ENDPOINT_URL = 'TCP://sqlnode01.sqlprod.oraclevcn.com:5022',
      AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
      FAILOVER_MODE = AUTOMATIC,
      SECONDARY_ROLE(ALLOW_CONNECTIONS = NO)
    ),
    'SQLNODE02' WITH (
      ENDPOINT_URL = 'TCP://sqlnode02.sqlprod.oraclevcn.com:5022',
      AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
      FAILOVER_MODE = AUTOMATIC,
      SECONDARY_ROLE(ALLOW_CONNECTIONS = READ_ONLY)
    );

Join the secondary (run on sqlnode02) and create the listener:

ALTER AVAILABILITY GROUP [AG-ProdFinance] JOIN;
ALTER DATABASE [FinanceDB] SET HADR AVAILABILITY GROUP = [AG-ProdFinance];

ALTER AVAILABILITY GROUP [AG-ProdFinance]
  ADD LISTENER 'AGLISTENER' (
    WITH IP ((N'10.20.1.60', N'255.255.255.0')),
    PORT = 1433
  );

Register the listener IP in OCI Private DNS so the FQDN always resolves correctly, regardless of which node is primary:

bash#oci dns rrset update --zone-name-or-id sqlprod.oraclevcn.com --domain "aglistener.sqlprod.oraclevcn.com" --rtype A --items '[{"domain":"aglistener.sqlprod.oraclevcn.com","rtype":"A","rdata":"10.20.1.60","ttl":30}]'

7. Configure Backups to OCI Object Storage:

Set backup preference so backups run on the secondary, not the primary:

ALTER AVAILABILITY GROUP [AG-ProdFinance]
  MODIFY REPLICA ON 'SQLNODE02' WITH (BACKUP_PRIORITY = 50);

BACKUP DATABASE [FinanceDB]
  TO URL = 'https://objectstorage.ap-singapore-1.oraclecloud.com/n/<namespace>/b/sql-backups/o/FinanceDB_Full.bak'
  WITH COMPRESSION, CHECKSUM, INIT;

BACKUP LOG [FinanceDB]
  TO URL = 'https://objectstorage.ap-singapore-1.oraclecloud.com/n/<namespace>/b/sql-backups/o/FinanceDB_Log.trn'
  WITH COMPRESSION, CHECKSUM;

Add a lifecycle policy to archive backups older than 30 days:

bash#oci os object-lifecycle-policy put --bucket-name sql-backups --namespace-name <namespace> --items '[{"name":"archive-after-30-days","action":"ARCHIVE","timeAmount":30,"timeUnit":"DAYS","isEnabled":true,"objectNameFilter":{"inclusionPrefixes":["FinanceDB_Full"]}}]'

8. Check Status — AG Health and Replication Lag:

SELECT
  ag.name AS ag_name,
  ar.replica_server_name,
  ars.role_desc,
  ars.synchronization_health_desc,
  drs.log_send_queue_size,
  drs.redo_queue_size
FROM sys.availability_groups ag
JOIN sys.availability_replicas ar ON ag.group_id = ar.group_id
JOIN sys.dm_hadr_availability_replica_states ars ON ar.replica_id = ars.replica_id
LEFT JOIN sys.dm_hadr_database_replica_states drs ON ar.replica_id = drs.replica_id
ORDER BY ag.name, ars.role_desc;

Wire an OCI Monitoring alarm on host-level metrics so you're paged before redo/send queues become a problem:

bash#oci monitoring alarm create --compartment-id $COMPARTMENT_ID --display-name "sqlnode-high-cpu" --namespace "oci_computeagent" --query-text 'CpuUtilization[5m]{resourceId="sqlnode01"}.mean() > 85' --severity CRITICAL --destinations '["'$ONS_TOPIC_OCID'"]' --is-enabled true

9. Quarterly DR Failover Drill:

-- Manual failover to the async DR replica
ALTER AVAILABILITY GROUP [AG-ProdFinance] FAILOVER;
-- Validate application connectivity via the listener, then fail back once confirmed
ALTER AVAILABILITY GROUP [AG-ProdFinance] FAILOVER;

Don't wait for a real outage to find out your DR replica doesn't actually work — run this drill and record RTO/RPO every quarter.


Conclusion:

A well-architected SQL Server Always On deployment on OCI Native combines Fault Domain–aware placement, separate Ultra High Performance Block Volumes for data/logs/TempDB, a properly validated Windows cluster with an odd-numbered quorum, and an Availability Group fronted by a Private DNS listener so applications never hardcode a node name. Layering in Object Storage-based backups with lifecycle policies, OCI Monitoring alarms on replication lag, and a real quarterly failover drill turns this from a one-time setup into an operationally sound HA/DR platform that can support enterprise-grade RTO/RPO targets.

Common mistakes to avoid: placing data and log files on the same volume, routing AG replication over a public VPN instead of FastConnect, single Fault Domain deployment for production, skipping the quorum witness, ignoring TempDB file count, and exposing port 1433 to the internet.

Syed Zaheer
Service Delivery Director · Techvisions · Cloud, AI & Managed Infrastructure
Syed Zaheer is Service Delivery Director at Techvisions, author, speaker, and technology enthusiast with deep expertise in Oracle landscape covering - databases, middleware, Applications, AI  and cloud infrastructure. He actively contributes to the Oracle community through technical articles, conference presentations, and knowledge-sharing initiatives, helping organizations modernize and optimize their enterprise technology platforms.

 

Comments

Popular posts from this blog

Installation of Oracle Applications R12.1.1 on Linux and vmware

ntp service in Maintenance mode Solaris 10

Oracle AVDF Installation and Setup Document