Scalability in the Cloud: Migrating over 200 TB SAP Oracle Database to Azure

  • Thread starter Thread starter anbugovi
  • Start date Start date
A

anbugovi

Overview




In this blog, we will cover the Azure solution and deployment approach to migrate very large Oracle databases (200 TB +) to Azure.



VM Solution




Azure Virtual Machine (VM) offers optimal vCPUs for managing Oracle license with high RAM ratio to accommodate large Oracle SGA, IO and network bandwidth to support transaction and batch workload. We tested both M192 and M176 SKUs with a 200 TB+ Oracle database.

In the below comparison, M176 is based on Intel Sapphire Rapids processor with DDR5 offers higher SAPS and 1.5 faster memory access than the M192 (Intel Cascade Lake based processor). M176 is also equipped with Azure Boost technology for improving both IO/Network throughput. In our testing, we found M176 offers higher SAPS, faster memory access, more IO & Network bandwidth.




VM SKU

Intel Chipset

vCPU

Memory GiB

IOPS/MBps

Network Bandwidth (Mbps)

M192idms_v2

Cascade Lake / DDR4

192

4096

80000/2000

30000

M176ds_4_v3

Sapphire Rapids / DDR5

176

3892

130000/4000

40000



System Global Area (SGA): Very Large Oracle databases benefit greatly from large SGA size. Customers with such sizeable Oracle workloads should deploy an Azure M-series with a minimum of 4 TB or more RAM size. Specific parameter recommendations below:



  • Set Linux Huge Pages to 75-90% of Physical RAM size
  • Set System Global Area (SGA) to 90% of Huge Page size
  • Set the Oracle parameter USE_LARGE_PAGES = ONLY



Storage Solution and Configuration




Azure has multiple storage options: Premium SSD, Premium SSDv2, Ultra and Azure NetApp Files (ANF). The chart below captures an overview of the storage characteristics for virtual machine Standard_M176ds_4_v3.




IO Metrics

Premium SSD

Premium SSDv2 (Pv2)

Azure NetApp Files (ANF)

IOPS

130K

130K

Millions

Throughput

4 GB/s

4 GB/s

>5 GB/s

Latency

Lower single digit

(in ms)

< 1 ms

< .4 ms

High Availability

Oracle Data Guard

Oracle Data Guard

Oracle Data Guard

Disaster Recovery



Oracle Data Guard

Oracle Data Guard

Oracle Data Guard and/or ANF Cross Region Replication

Storage Snapshot

Yes

No

Yes

Storage Manager

Automatic Storage Management (ASM)

ASM

dNFS



For 200 TB+ Oracle database workload, we tested the following storage configuration which optimally leverages both the network and IO channel from ANF and Premium SSDv2 (Pv2) respectively. Leveraging both ANF & Pv2 helped to optimize available VM throughputs effectively to meet and exceed the required IO requirements of such a large Oracle database.




Component

Disk Type

Number of Volumes

Size (TiB)

Total Throughput

GiB/s


Volume

Stripe Size

Oracle Home

Pv2

1

1

250

LVM



sapdata1-6

ANF

6

40 per volume

3000-4000

Individual



Oracle redo1-4

ANF

4

.5 per volume

500-2000

Individual



Oracle Fra

ANF

1

5

500-1000

Individual



Oracle Archive

Pv2

4

10

1500

LVM

64KB

Oracle Temp

Pv2 or Ephemeral

4

10

1500

LVM

64KB



Storage Deployment Approach




Both NFSv3 and NFSv4.1 are supported with Oracle Direct NFS (dNFS), we ultimately went with the combination of NFSv3 and Oracle Direct NFS. NFSv3 has been proven more reliable, more robust and is much less bug sensitive to dNFS than the newer NFS Version 4.1.

Application volume group for Oracle (AVG for Oracle) deploys all volumes required to install and operate the Oracle databases at enterprise scale, with optimal performance and according to best practices in a single step with optimized workflow. AVG for Oracle shortens Oracle database deployment time and ensures volume performance and stability, including the use of multiple storage endpoints (multiple IPs).



Oracle Database with Azure NetApp Files - Azure Example Scenarios | Microsoft Learn

Understand Azure NetApp Files application volume group for Oracle | Microsoft Learn



anbugovi_0-1730329087636.png



The Oracle data files can be distributed across sapdata volumes in round robins to avoid individual filesystem IO pressure.



High Availability Architecture




Azure offers a High Availability option by leveraging availability zones with SLA of 99.99. Most of the Azure regions provide VM SKU and low latency between the zones to deploy active-active HA setup across zones. However not every zone has got the required VM SKU so it is important to find out required VM SKU availability by running SAP-on-Azure-Scripts-and-Utilities/Get-VM-by-Zones at main · Azure/SAP-on-Azure-Scripts-and-Utilities (github.com) from your subscription. You can find out low latency zones by running SAP-on-Azure-Scripts-and-Utilities/AvZone-Latency-Test at main · Azure/SAP-on-Azure-Scripts-and-Utilities (github.com) . Combination of SKU availability and low latency script can guide you to identify zones that can offer active-active zone pair for HA deployment.

It is important to note that each subscription may be mapped to different physical zones. You can find out physical zone mapping using Azure API Subscriptions - List Locations - REST API (Azure Resource Management) | Microsoft Learn.



Below picture provides HA architecture.



anbugovi_1-1730326240563.png



Data Protection Strategy




Customers can leverage a combination of ANF snapshot on the primary VM and weekly Oracle streaming backups on HA stand-by. We recommend the ANF snapshot tool provided by Microsoft known as the application consistent snapshot. Both snapshot and cloning can be executed in minutes, regardless of database size. Cloned volumes can be leveraged for system copy, but it is critical that production and QA VMs be on the same physical zone to ensure low latency between them.



Technically, ANF does not prevent you from mounting NFS volumes across zones, so it is important that operational procedure established to keep both zone & ANF storage on same side.



Backup & Snapshot Approach​





Domain

Backup Component

Backup Options

Frequency

Ran against

Load on DB VM

Primary Region

DB

snapshot (azacsnap)

4 hours

HA Primary VM

Low

RMAN Backup

Daily incremental and weekly full

HA Stand-by VM

Low

Log

Archive Log Backup

15 minutes

HA Primary VM

Low

DR Region

DB

Oracle Data Guard

Current

n/a

Low



Database Restore​





Failure

Recovery Option

Recovery Time

Comment

DB Level

Snapshot

Log (roll-forward)

In Minutes

1st Option

RMAN Restore

Log (roll-forward)

In Hours

2nd Option

Region Wide

Oracle Data Guard

In Minutes

1st Option

RMAN Restore

Log (roll-forward)

In Hours

2nd Option



Migration Approach




Depending on on-prem HW, OS/DB and SAP software levels, migration falls into either Homogeneous or Heterogeneous migration category.



We will cover a heterogeneous migration approach in a separate blog and discuss about how to reduce downtime and improved benefits for very large databases.



In the homogenous migration approach, smaller databases can be migrated using backup and restore. Larger database can be migrated by setting up Oracle Data Guard (ODG) replication.

Customer should run Azure Quality Check against deployed solution to identify and address any Azure best practices deviation.

anbugovi_2-1730326240573.png



Testing Approaches




Customers have leveraged Oracle Real Application Testing (RAT) option to perform real-world testing of the Oracle Database. By capturing production workloads during the peak period and replaying on Azure can help identify the required VM SKU and storage solution. Customer leveraged Azure Monitoring Dashboards and RAT generated outputs to analyze and conclude the test results and move forward confidently to migrate the Oracle on SAP system to Azure.



The RAT test covers Oracle database performance requirements. It is highly recommended to run SAP level volume and performance testing to ensure that end-to-end SAP processing meets and exceeds performance KPIs.



System Performance




Azure innovations such as Mv3 (Intel sapphire rapids /DDR5), Azure Boost for improving IO & Network Throughput, ANF storage solutions with sub-milli second latency with DNFS combined with Oracle advanced compression has resulted in 30-50% of SAP processing improvement on Azure.



Conclusion




Azure has led SAP on Azure solutions over the years and reached new heights every year by bringing over advanced VM SKU, Storage/Network solution, end to end architecture and deployment approaches to successfully deploying the largest Oracle database on SAP to Azure. Azure successfully hosts 200 TB+ SAP on Oracle database!



Useful Links​




Below are key SAP Notes and Microsoft documentation for a successful Azure migration




Co-Authors​




Denny Koovakattu

Ralf Klahr

Sathish Thirunethiram

Continue reading...
 
Back
Top